{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Code Sample for Youtube-Data-API Software Package\n", "## Megan Brown\n", "[Slides](http://bit.ly/yt-slides) | [YouTube-Data-API Package](http://bit.ly/youtube-data-api) | [NB-Viewer](http://bit.ly/demo-notebook)\n", "\n", "To import packages for this demo:\n", "`pip install -r requirements.txt`\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "#These are the packages we will use in the demonstration\n", "import os\n", "import json\n", "import pandas as pd\n", "import datetime\n", "from youtube_api import YoutubeDataApi\n", "\n", "key = os.environ.get('YT_KEY')\n", "yt = YoutubeDataApi(key)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#Formats json items to print for readability\n", "def dump(doc):\n", " def default_handler(o):\n", " if isinstance(o, datetime.datetime):\n", " return o.isoformat()\n", " \n", " print(json.dumps(doc, sort_keys=True, indent=4, default=default_handler))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A workflow sample\n", "\n", "You may start with a channel name like 'LastWeekTonight' or 'TheNewYorkTimes'. Any data collected about channels must be collected using the channel ID, not the channel name." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Channel IDs\n", "The channel ID can be pulled by running `yt.get_channel_id_from_user(CHANNEL_ID)`" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'UC3XTzVzaHQEd30rQbuvCtTQ'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get the channel ID for the TV show Last Week Tonight\n", "channel_id = yt.get_channel_id_from_user('LastWeekTonight')\n", "channel_id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From this chnanel ID, we can get a wide variety of data including (but not limited to):\n", "* creation date\n", "* country\n", "* description\n", "* keywords\n", "* playlist ID (for uploads)\n", "* playlist ID (for likes)\n", "* number of subscriptions\n", "* channel name/title\n", "* channel topic IDs\n", "* number of videos\n", "* total number of views" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"account_creation_date\": \"2014-03-18T17:41:39\",\n", " \"channel_id\": \"UC3XTzVzaHQEd30rQbuvCtTQ\",\n", " \"collection_date\": \"2019-04-12T07:12:21.012418\",\n", " \"country\": null,\n", " \"description\": \"Breaking news on a weekly basis. Sundays at 11PM - only on HBO.\\nSubscribe to the Last Week Tonight channel for the latest videos from John Oliver and the LWT team.\",\n", " \"keywords\": null,\n", " \"playlist_id_likes\": \"LL3XTzVzaHQEd30rQbuvCtTQ\",\n", " \"playlist_id_uploads\": \"UU3XTzVzaHQEd30rQbuvCtTQ\",\n", " \"subscription_count\": \"6926188\",\n", " \"title\": \"LastWeekTonight\",\n", " \"topic_ids\": \"https://en.wikipedia.org/wiki/Entertainment|https://en.wikipedia.org/wiki/Humour|https://en.wikipedia.org/wiki/Television_program\",\n", " \"video_count\": \"267\",\n", " \"view_count\": \"1936918852\"\n", "}\n" ] } ], "source": [ "# get channel metadata for the channel ID we pulled for Last Week Tonight\n", "channel_meta = yt.get_channel_metadata(channel_id)\n", "dump(channel_meta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to channel metadata, we can pull relational data for channels like the people they subscribe to or feature" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[\n", " {\n", " \"collection_date\": \"2019-04-12T07:12:22.720954\",\n", " \"subscription_channel_id\": \"UCWPQB43yGKEum3eW0P9N_nQ\",\n", " \"subscription_kind\": \"youtube#channel\",\n", " \"subscription_publish_date\": \"2014-03-20T19:05:54\",\n", " \"subscription_title\": \"HBOBoxing\"\n", " },\n", " {\n", " \"collection_date\": \"2019-04-12T07:12:22.721991\",\n", " \"subscription_channel_id\": \"UCy6kyFxaMqGtpE3pQTflK8A\",\n", " \"subscription_kind\": \"youtube#channel\",\n", " \"subscription_publish_date\": \"2014-12-11T18:55:41\",\n", " \"subscription_title\": \"Real Time with Bill Maher\"\n", " }\n", "]\n" ] } ], "source": [ "# Get the channels that Last Week Tonight subscribes to\n", "subscriptions = yt.get_subscriptions(channel_id)\n", "dump(subscriptions[:2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting uploads by a user\n", "YouTube is consructed such that the video uploads by a user are stored in a playlist based on the user's channel ID. We can use this detail to generate the Upload Playlist ID for a given user and collect all videos posted by them." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'UU3XTzVzaHQEd30rQbuvCtTQ'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Install some utility functions from the YouTube package\n", "from youtube_api import youtube_api_utils as utils\n", "\n", "# get the playlist ID for Last Week Tonight's uploads\n", "playlist_id = utils.get_upload_playlist_id(channel_id)\n", "playlist_id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using this, we can now pull the uploads for this `playlist_id`\n", "The function `yt.get_videos_from_playlist_id(PLAYLIST_ID)` returns a list of videos from the playlist ID, in this case, the uploads. This returns a list of videos, their channels, and the publishing date." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | publish_date | \n", "video_id | \n", "channel_id | \n", "collection_date | \n", "
---|---|---|---|---|
0 | \n", "2019-04-08 06:30:00 | \n", "jCC8fPQOaxU | \n", "UC3XTzVzaHQEd30rQbuvCtTQ | \n", "2019-04-12 07:12:28.690187 | \n", "
1 | \n", "2019-04-01 06:30:01 | \n", "m8UQ4O7UiDs | \n", "UC3XTzVzaHQEd30rQbuvCtTQ | \n", "2019-04-12 07:12:28.691222 | \n", "
2 | \n", "2019-03-18 06:30:01 | \n", "Yq7Eh6JTKIg | \n", "UC3XTzVzaHQEd30rQbuvCtTQ | \n", "2019-04-12 07:12:28.691222 | \n", "
3 | \n", "2019-03-11 06:30:00 | \n", "FO0iG_P0P6M | \n", "UC3XTzVzaHQEd30rQbuvCtTQ | \n", "2019-04-12 07:12:28.691222 | \n", "
4 | \n", "2019-03-04 07:30:01 | \n", "_h1ooyyFkF0 | \n", "UC3XTzVzaHQEd30rQbuvCtTQ | \n", "2019-04-12 07:12:28.691222 | \n", "