{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Mining the Social Web\n", "\n", "## Mining Facebook\n", "\n", "This Jupyter Notebook provides an interactive way to follow along with the video lectures. The intent behind this notebook is to reinforce the concepts in a fun, convenient, and effective way." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Facebook API Access\n", "\n", "Facebook implements OAuth 2.0 as its standard authentication mechanism, but provides a convenient way for you to get an _access token_ for development purposes, and we'll opt to take advantage of that convenience in this notebook.\n", "\n", "To get started, log in to your Facebook account and go to https://developers.facebook.com/tools/explorer/ to obtain an ACCESS_TOKEN, and then paste it into the code cell below.\n", "\n", "**N.B.:** Since this chapter was written, Facebook has added more security and privacy measures to its API. Some of the code in this Jupyter Notebook requires somewhat privileged access to run. A sandboxed application registered with Facebook is fairly limited in terms of what information it can access. The use of certain endpoints must be reviewed and approved by Facebook. For more information, please read Facebook's documentation on reviewable features: https://developers.facebook.com/docs/apps/review." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Copy and paste in the value you just got from the inline frame into this variable and execute this cell.\n", "# Keep in mind that you could have just gone to https://developers.facebook.com/tools/access_token/\n", "# and retrieved the \"User Token\" value from the Access Token Tool\n", "\n", "ACCESS_TOKEN = ''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1. Making Graph API requests over HTTP" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests # pip install requests\n", "import json\n", "\n", "base_url = 'https://graph.facebook.com/me'\n", "\n", "# Specify which fields to retrieve\n", "fields = 'id,name,likes.limit(10){about}'\n", "\n", "url = '{0}?fields={1}&access_token={2}'.format(base_url, fields, ACCESS_TOKEN)\n", "\n", "# This API is HTTP-based and could be requested in the browser,\n", "# with a command line utlity like curl, or using just about\n", "# any programming language by making a request to the URL.\n", "# Click the hyperlink that appears in your notebook output\n", "# when you execute this code cell to see for yourself...\n", "print(url)\n", "\n", "# Interpret the response as JSON and convert back\n", "# to Python data structures\n", "content = requests.get(url).json()\n", "\n", "# Pretty-print the JSON and display it\n", "print(json.dumps(content, indent=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 2. Querying the Graph API with Python\n", "\n", "Facebook SDK for Python API reference:\n", "http://facebook-sdk.readthedocs.io/en/v2.0.0/api.html" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import facebook # pip install facebook-sdk\n", "import json\n", "\n", "# A helper function to pretty-print Python objects as JSON\n", "def pp(o):\n", " print(json.dumps(o, indent=1))\n", "\n", "# Create a connection to the Graph API with your access token\n", "g = facebook.GraphAPI(ACCESS_TOKEN, version='2.10')\n", "\n", "# Execute a few example queries:\n", "\n", "# Get my ID\n", "pp(g.get_object('me'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get the connections to an ID\n", "# Example connection names: 'feed', 'likes', 'groups', 'posts'\n", "pp(g.get_connections(id='me', connection_name='likes'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Search for a location, may require approved app\n", "pp(g.request(\"search\", {'type': 'place', 'center': '40.749444, -73.968056', 'fields': 'name, location'}))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 3. Querying the Graph API for Mining the Social Web and Counting Fans" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Search for a page's ID by name\n", "pp(g.request(\"search\", {'q': 'Mining the Social Web', 'type': 'page'}))\n", "\n", "# Grab the ID for the book and check the number of fans\n", "mtsw_id = '146803958708175'\n", "pp(g.get_object(id=mtsw_id, fields=['fan_count']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 4. Querying the Graph API for Open Graph objects by their URLs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# MTSW catalog link\n", "pp(g.get_object('http://shop.oreilly.com/product/0636920030195.do'))\n", "\n", "# PCI catalog link\n", "pp(g.get_object('http://shop.oreilly.com/product/9780596529321.do'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 5. Counting total number of page fans" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The following code may require the developer's app be submitted for review and\n", "# approved. See https://developers.facebook.com/docs/apps/review\n", "\n", "# Take, for example, three popular musicians and their page IDs.\n", "taylor_swift_id = '19614945368'\n", "drake_id = '83711079303'\n", "beyonce_id = '28940545600'\n", "\n", "# Declare a helper function for retrieving the total number of fans ('likes') a page has \n", "def get_total_fans(page_id):\n", " return int(g.get_object(id=page_id, fields=['fan_count'])['fan_count'])\n", "\n", "tswift_fans = get_total_fans(taylor_swift_id)\n", "drake_fans = get_total_fans(drake_id)\n", "beyonce_fans = get_total_fans(beyonce_id)\n", "\n", "print('Taylor Swift: {0} fans on Facebook'.format(tswift_fans))\n", "print('Drake: {0} fans on Facebook'.format(drake_fans))\n", "print('Beyoncé: {0} fans on Facebook'.format(beyonce_fans))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 6. Retrieving a page's feed" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Declare a helper function for retrieving the official feed from a given page.\n", "def retrieve_page_feed(page_id, n_posts):\n", " \"\"\"Retrieve the first n_posts from a page's feed in reverse\n", " chronological order.\"\"\"\n", " feed = g.get_connections(page_id, 'posts')\n", " posts = []\n", " posts.extend(feed['data'])\n", "\n", " while len(posts) < n_posts:\n", " try:\n", " feed = requests.get(feed['paging']['next']).json()\n", " posts.extend(feed['data'])\n", " except KeyError:\n", " # When there are no more posts in the feed, break\n", " print('Reached end of feed.')\n", " break\n", " \n", " if len(posts) > n_posts:\n", " posts = posts[:n_posts]\n", "\n", " print('{} items retrieved from feed'.format(len(posts)))\n", " return posts\n", "\n", "# Declare a helper function for returning the message content of a post\n", "def get_post_message(post):\n", " try:\n", " message = post['story']\n", " except KeyError:\n", " # Post may have 'message' instead of 'story'\n", " pass\n", " try:\n", " message = post['message']\n", " except KeyError:\n", " # Post has neither\n", " message = ''\n", " return message.replace('\\n', ' ')\n", "\n", "# Retrieve the last 5 items from their feeds\n", "for artist in [taylor_swift_id, drake_id, beyonce_id]:\n", " print()\n", " feed = retrieve_page_feed(artist, 5)\n", " for i, post in enumerate(feed):\n", " message = get_post_message(post)[:50]\n", " print('{0} - {1}...'.format(i+1, message))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 7. Measuring engagement" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Measure the response to a post in terms of likes, shares, and comments\n", "def measure_response(post_id):\n", " \"\"\"Returns the number of likes, shares, and comments on a \n", " given post as a measure of user engagement.\"\"\"\n", " likes = g.get_object(id=post_id, \n", " fields=['likes.limit(0).summary(true)'])\\\n", " ['likes']['summary']['total_count']\n", " shares = g.get_object(id=post_id, \n", " fields=['shares.limit(0).summary(true)'])\\\n", " ['shares']['count']\n", " comments = g.get_object(id=post_id, \n", " fields=['comments.limit(0).summary(true)'])\\\n", " ['comments']['summary']['total_count']\n", " return likes, shares, comments\n", "\n", "# Measure the relative share of a page's fans engaging with a post\n", "def measure_engagement(post_id, total_fans):\n", " \"\"\"Returns the number of likes, shares, and comments on a \n", " given post as a measure of user engagement.\"\"\"\n", " likes = g.get_object(id=post_id, \n", " fields=['likes.limit(0).summary(true)'])\\\n", " ['likes']['summary']['total_count']\n", " shares = g.get_object(id=post_id, \n", " fields=['shares.limit(0).summary(true)'])\\\n", " ['shares']['count']\n", " comments = g.get_object(id=post_id, \n", " fields=['comments.limit(0).summary(true)'])\\\n", " ['comments']['summary']['total_count']\n", " likes_pct = likes / total_fans * 100.0\n", " shares_pct = shares / total_fans * 100.0\n", " comments_pct = comments / total_fans * 100.0\n", " return likes_pct, shares_pct, comments_pct\n", "\n", "# Retrieve the last 5 items from the artists' feeds, print the\n", "# reaction and the degree of engagement\n", "artist_dict = {'Taylor Swift': taylor_swift_id,\n", " 'Drake': drake_id,\n", " 'Beyoncé': beyonce_id}\n", "for name, page_id in artist_dict.items():\n", " print()\n", " print(name)\n", " print('------------')\n", " feed = retrieve_page_feed(page_id, 5)\n", " total_fans = get_total_fans(page_id)\n", " \n", " for i, post in enumerate(feed):\n", " message = get_post_message(post)[:30]\n", " post_id = post['id']\n", " likes, shares, comments = measure_response(post_id)\n", " likes_pct, shares_pct, comments_pct = measure_engagement(post_id, total_fans)\n", " print('{0} - {1}...'.format(i+1, message))\n", " print(' Likes {0} ({1:7.5f}%)'.format(likes, likes_pct))\n", " print(' Shares {0} ({1:7.5f}%)'.format(shares, shares_pct))\n", " print(' Comments {0} ({1:7.5f}%)'.format(comments, comments_pct))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 8. Storing data in a pandas DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd # pip install pandas\n", "\n", "# Create a Pandas DataFrame to contain artist page\n", "# feed information\n", "columns = ['Name',\n", " 'Total Fans',\n", " 'Post Number',\n", " 'Post Date',\n", " 'Headline',\n", " 'Likes',\n", " 'Shares',\n", " 'Comments',\n", " 'Rel. Likes',\n", " 'Rel. Shares',\n", " 'Rel. Comments']\n", "musicians = pd.DataFrame(columns=columns)\n", "\n", "# Build the DataFrame by adding the last 10 posts and their audience\n", "# reaction for each of the artists\n", "for page_id in [taylor_swift_id, drake_id, beyonce_id]:\n", " name = g.get_object(id=page_id)['name']\n", " fans = get_total_fans(page_id)\n", " feed = retrieve_page_feed(page_id, 10)\n", " for i, post in enumerate(feed):\n", " likes, shares, comments = measure_response(post['id'])\n", " likes_pct, shares_pct, comments_pct = measure_engagement(post['id'], fans)\n", " musicians = musicians.append({'Name': name,\n", " 'Total Fans': fans,\n", " 'Post Number': i+1,\n", " 'Post Date': post['created_time'],\n", " 'Headline': get_post_message(post),\n", " 'Likes': likes,\n", " 'Shares': shares,\n", " 'Comments': comments,\n", " 'Rel. Likes': likes_pct,\n", " 'Rel. Shares': shares_pct,\n", " 'Rel. Comments': comments_pct,\n", " }, ignore_index=True)\n", "# Fix the dtype of a few columns\n", "for col in ['Post Number', 'Total Fans', 'Likes', 'Shares', 'Comments']:\n", " musicians[col] = musicians[col].astype(int)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Show a preview of the DataFrame\n", "musicians.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 9. Visualizing data stored in a pandas DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib # pip install matplotlib\n", "%matplotlib inline\n", "\n", "musicians[musicians['Name'] == 'Drake'].plot(x='Post Number', y='Likes', kind='bar')\n", "musicians[musicians['Name'] == 'Drake'].plot(x='Post Number', y='Shares', kind='bar')\n", "musicians[musicians['Name'] == 'Drake'].plot(x='Post Number', y='Comments', kind='bar')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "musicians[musicians['Name'] == 'Drake'].plot(x='Post Number', y='Rel. Likes', kind='bar')\n", "musicians[musicians['Name'] == 'Drake'].plot(x='Post Number', y='Rel. Shares', kind='bar')\n", "musicians[musicians['Name'] == 'Drake'].plot(x='Post Number', y='Rel. Comments', kind='bar')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 10. Comparing different artists to each other" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Reset the index to a multi-index\n", "musicians = musicians.set_index(['Name','Post Number'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The unstack method pivots the index labels\n", "# and lets you get data columns grouped by artist\n", "musicians.unstack(level=0)['Likes']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot the comparative reactions to each artist's last 10 Facebook posts\n", "plot = musicians.unstack(level=0)['Likes'].plot(kind='bar', subplots=False, figsize=(10,5), width=0.8)\n", "plot.set_xlabel('10 Latest Posts')\n", "plot.set_ylabel('Number of Likes Received')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot the engagement of each artist's Facebook fan base to the last 10 posts\n", "plot = musicians.unstack(level=0)['Rel. Likes'].plot(kind='bar', subplots=False, figsize=(10,5), width=0.8)\n", "plot.set_xlabel('10 Latest Posts')\n", "plot.set_ylabel('Likes / Total Fans (%)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 11. Calculate average engagement" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Average Likes / Total Fans')\n", "print(musicians.unstack(level=0)['Rel. Likes'].mean())\n", "\n", "print('\\nAverage Shares / Total Fans')\n", "print(musicians.unstack(level=0)['Rel. Shares'].mean())\n", "\n", "print('\\nAverage Comments / Total Fans')\n", "print(musicians.unstack(level=0)['Rel. Comments'].mean())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 1 }