{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Mining the Social Web, 2nd Edition\n", "\n", "##Chapter 2: Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More\n", "\n", "This IPython Notebook provides an interactive way to follow along with and explore the numbered examples from [_Mining the Social Web (2nd Edition)_](http://bit.ly/135dHfs). The intent behind this notebook is to reinforce the concepts from the sample code in a fun, convenient, and effective way. This notebook assumes that you are reading along with the book and have the context of the discussion as you work through these exercises.\n", "\n", "In the somewhat unlikely event that you've somehow stumbled across this notebook outside of its context on GitHub, [you can find the full source code repository here](http://bit.ly/16kGNyb).\n", "\n", "## Copyright and Licensing\n", "\n", "You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Facebook API Access\n", "\n", "Facebook implements OAuth 2.0 as its standard authentication mechanism, but provides a convenient way for you to get an _access token_ for development purposes, and we'll opt to take advantage of that convenience in this notebook. For details on implementing an OAuth flow with Facebook (all from within IPython Notebook), see the \\_AppendixB notebook from the [IPython Notebook Dashboard](/).\n", "\n", "For this first example, login to your Facebook account and go to https://developers.facebook.com/tools/explorer/ to obtain and set permissions for an access token that you will need to define in the code cell defining the ACCESS_TOKEN variable below. \n", "\n", "Be sure to explore the permissions that are available by clicking on the \"Get Access Token\" button that's on the page and exploring all of the tabs available. For example, you will need to set the \"friends_likes\" option under the \"Friends Data Permissions\" since this permission is used by the script below but is not a basic permission and is not enabled by default. \n", "\n", "
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Copy and paste in the value you just got from the inline frame into this variable and execute this cell.\n", "# Keep in mind that you could have just gone to https://developers.facebook.com/tools/access_token/\n", "# and retrieved the \"User Token\" value from the Access Token Tool\n", "\n", "ACCESS_TOKEN = ''" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 1. Making Graph API requests over HTTP" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import requests # pip install requests\n", "import json\n", "\n", "base_url = 'https://graph.facebook.com/me'\n", "\n", "# Get 10 likes for 10 friends\n", "fields = 'id,name,friends.limit(10).fields(likes.limit(10))'\n", "\n", "url = '%s?fields=%s&access_token=%s' % \\\n", " (base_url, fields, ACCESS_TOKEN,)\n", "\n", "# This API is HTTP-based and could be requested in the browser,\n", "# with a command line utlity like curl, or using just about\n", "# any programming language by making a request to the URL.\n", "# Click the hyperlink that appears in your notebook output\n", "# when you execute this code cell to see for yourself...\n", "print url\n", "\n", "# Interpret the response as JSON and convert back\n", "# to Python data structures\n", "content = requests.get(url).json()\n", "\n", "# Pretty-print the JSON and display it\n", "print json.dumps(content, indent=1)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: If you attempt to run a query for all of your friends' likes and it appears to hang, it is probably because you have a lot of friends who have a lot of likes. If this happens, you may need to add limits and offsets to the fields in the query as described in Facebook's [field expansion](https://developers.facebook.com/docs/reference/api/field_expansion/) documentation. However, the facebook library that we'll use in the next example handles some of these issues, so it's recommended that you hold off and try it out first. This initial example is just to illustrate that Facebook's API is built on top of HTTP.\n", "\n", "A couple of field limit/offset examples that illustrate the possibilities follow:\n", "\n", "\n", "fields = 'id,name,friends.limit(10).fields(likes)' # Get all likes for 10 friends \n", "fields = 'id,name,friends.offset(10).limit(10).fields(likes)' # Get all likes for 10 more friends \n", "fields = 'id,name,friends.fields(likes.limit(10))' # Get 10 likes for all friends \n", "fields = 'id,name,friends.fields(likes.limit(10))' # Get 10 likes for all friends\n", "" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 2. Querying the Graph API with Python" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import facebook # pip install facebook-sdk\n", "import json\n", "\n", "# A helper function to pretty-print Python objects as JSON\n", "\n", "def pp(o): \n", " print json.dumps(o, indent=1)\n", "\n", "# Create a connection to the Graph API with your access token\n", "\n", "g = facebook.GraphAPI(ACCESS_TOKEN)\n", "\n", "# Execute a few sample queries\n", "\n", "print '---------------'\n", "print 'Me'\n", "print '---------------'\n", "pp(g.get_object('me'))\n", "print\n", "print '---------------'\n", "print 'My Friends'\n", "print '---------------'\n", "pp(g.get_connections('me', 'friends'))\n", "print\n", "print '---------------'\n", "print 'Social Web'\n", "print '---------------'\n", "pp(g.request(\"search\", {'q' : 'social web', 'type' : 'page'}))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 3. Results for a Graph API query for Mining the Social Web" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Get an instance of Mining the Social Web\n", "# Using the page name also works if you know it.\n", "# e.g. 'MiningTheSocialWeb' or 'CrossFit'\n", "mtsw_id = '146803958708175'\n", "pp(g.get_object(mtsw_id))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 4. Querying the Graph API for Open Graph objects by their URLs" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# MTSW catalog link\n", "pp(g.get_object('http://shop.oreilly.com/product/0636920030195.do'))\n", "\n", "# PCI catalog link\n", "pp(g.get_object('http://shop.oreilly.com/product/9780596529321.do'))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 5. Comparing likes between Coke and Pepsi fan pages" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Find Pepsi and Coke in search results\n", "\n", "pp(g.request('search', {'q' : 'pepsi', 'type' : 'page', 'limit' : 5}))\n", "pp(g.request('search', {'q' : 'coke', 'type' : 'page', 'limit' : 5}))\n", "\n", "# Use the ids to query for likes\n", "\n", "pepsi_id = '56381779049' # Could also use 'PepsiUS'\n", "coke_id = '40796308305' # Could also use 'CocaCola'\n", "\n", "# A quick way to format integers with commas every 3 digits\n", "def int_format(n): return \"{:,}\".format(n)\n", "\n", "print \"Pepsi likes:\", int_format(g.get_object(pepsi_id)['likes'])\n", "print \"Coke likes:\", int_format(g.get_object(coke_id)['likes'])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 6. Querying a page for its \"feed\" and \"links\" connections" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pp(g.get_connections(pepsi_id, 'feed'))\n", "pp(g.get_connections(pepsi_id, 'links'))\n", "\n", "pp(g.get_connections(coke_id, 'feed'))\n", "pp(g.get_connections(coke_id, 'links'))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 7. Querying for all of your friends' likes" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# First, let's query for all of the likes in your social\n", "# network and store them in a slightly more convenient\n", "# data structure as a dictionary keyed on each friend's\n", "# name. We'll use a dictionary comprehension to iterate\n", "# over the friends and build up the likes in an intuitive\n", "# way, although the new \"field expansion\" feature could \n", "# technically do the job in one fell swoop as follows:\n", "#\n", "# g.get_object('me', fields='id,name,friends.fields(id,name,likes)')\n", "#\n", "# See Appendix C for more information on Python tips such as\n", "# dictionary comprehensions\n", "\n", "friends = g.get_connections(\"me\", \"friends\")['data']\n", "\n", "likes = { friend['name'] : g.get_connections(friend['id'], \"likes\")['data'] \n", " for friend in friends }\n", "\n", "print likes" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 8. Calculating the most popular likes among your friends" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Analyze all likes from friendships for frequency\n", "\n", "# pip install prettytable\n", "from prettytable import PrettyTable\n", "from collections import Counter\n", "friends_likes = Counter([like['name']\n", " for friend in likes \n", " for like in likes[friend]\n", " if like.get('name')])\n", "\n", "pt = PrettyTable(field_names=['Name', 'Freq'])\n", "pt.align['Name'], pt.align['Freq'] = 'l', 'r'\n", "[ pt.add_row(fl) for fl in friends_likes.most_common(10) ]\n", "\n", "print 'Top 10 likes amongst friends'\n", "print pt" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 9. Calculating the most popular categories for likes among your friends" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Analyze all like categories by frequency\n", "\n", "friends_likes_categories = Counter([like['category'] \n", " for friend in likes \n", " for like in likes[friend]])\n", "\n", "pt = PrettyTable(field_names=['Category', 'Freq'])\n", "pt.align['Category'], pt.align['Freq'] = 'l', 'r'\n", "[ pt.add_row(flc) for flc in friends_likes_categories.most_common(10) ]\n", "\n", "print \"Top 10 like categories for friends\"\n", "print pt" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 10. Calculating the number of likes for each friend and sorting by frequency" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Build a frequency distribution of number of likes by \n", "# friend with a dictionary comprehension and sort it in \n", "# descending order\n", "\n", "from operator import itemgetter\n", "\n", "num_likes_by_friend = { friend : len(likes[friend]) \n", " for friend in likes }\n", "\n", "\n", "pt = PrettyTable(field_names=['Friend', 'Num Likes'])\n", "pt.align['Friend'], pt.align['Num Likes'] = 'l', 'r'\n", "[ pt.add_row(nlbf) \n", " for nlbf in sorted(num_likes_by_friend.items(), \n", " key=itemgetter(1), \n", " reverse=True) ]\n", "\n", "print \"Number of likes per friend\"\n", "print pt" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 11. Finding common likes between an ego and its friendships in a social network" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Which of your likes are in common with which friends?\n", "my_likes = [ like['name'] \n", " for like in g.get_connections(\"me\", \"likes\")['data'] ]\n", "\n", "pt = PrettyTable(field_names=[\"Name\"])\n", "pt.align = 'l'\n", "[ pt.add_row((ml,)) for ml in my_likes ]\n", "print \"My likes\"\n", "print pt\n", "\n", "# Use the set intersection as represented by the ampersand\n", "# operator to find common likes.\n", "\n", "common_likes = list(set(my_likes) & set(friends_likes))\n", "\n", "pt = PrettyTable(field_names=[\"Name\"])\n", "pt.align = 'l'\n", "[ pt.add_row((cl,)) for cl in common_likes ]\n", "print\n", "print \"My common likes with friends\"\n", "print pt" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 12. Calculating the friends most similar to an ego in a social network" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Which of your friends like things that you like?\n", "\n", "similar_friends = [ (friend, friend_like['name']) \n", " for friend, friend_likes in likes.items()\n", " for friend_like in friend_likes\n", " if friend_like.get('name') in common_likes ]\n", "\n", "\n", "# Filter out any possible duplicates that could occur\n", "\n", "ranked_friends = Counter([ friend for (friend, like) in list(set(similar_friends)) ])\n", "\n", "\n", "pt = PrettyTable(field_names=[\"Friend\", \"Common Likes\"])\n", "pt.align[\"Friend\"], pt.align[\"Common Likes\"] = 'l', 'r'\n", "[ pt.add_row(rf) \n", " for rf in sorted(ranked_friends.items(), \n", " key=itemgetter(1), \n", " reverse=True) ]\n", "print \"My similar friends (ranked)\"\n", "print pt\n", "\n", "# Also keep in mind that you have the full range of plotting\n", "# capabilities available to you. A quick histogram that shows\n", "# how many friends.\n", "\n", "plt.hist(ranked_friends.values())\n", "plt.xlabel('Bins (number of friends with shared likes)')\n", "plt.ylabel('Number of shared likes in each bin')\n", "\n", "# Keep in mind that you can customize the binning\n", "# as desired. See http://matplotlib.org/api/pyplot_api.html\n", "\n", "# For example...\n", "\n", "plt.figure() # Display the previous plot\n", "plt.hist(ranked_friends.values(),\n", " bins=arange(1,max(ranked_friends.values()),1))\n", "plt.xlabel('Bins (number of friends with shared likes)')\n", "plt.ylabel('Number of shared likes in each bin')\n", "plt.figure() # Display the working plot" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 13. Constructing a graph of mutual friendships" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import networkx as nx # pip install networkx\n", "import requests # pip install requests\n", "\n", "friends = [ (friend['id'], friend['name'],)\n", " for friend in g.get_connections('me', 'friends')['data'] ]\n", "\n", "url = 'https://graph.facebook.com/me/mutualfriends/%s?access_token=%s'\n", "\n", "mutual_friends = {} \n", "\n", "# This loop spawns a separate request for each iteration, so\n", "# it may take a while. Optimization with a thread pool or similar\n", "# technique would be possible.\n", "for friend_id, friend_name in friends:\n", " r = requests.get(url % (friend_id, ACCESS_TOKEN,) )\n", " response_data = json.loads(r.content)['data']\n", " mutual_friends[friend_name] = [ data['name'] \n", " for data in response_data ]\n", " \n", "nxg = nx.Graph()\n", "\n", "[ nxg.add_edge('me', mf) for mf in mutual_friends ]\n", "\n", "[ nxg.add_edge(f1, f2) \n", " for f1 in mutual_friends \n", " for f2 in mutual_friends[f1] ]\n", "\n", "# Explore what's possible to do with the graph by \n", "# typing nxg. or executing a new cell with \n", "# the following value in it to see some pydoc on nxg\n", "print nxg" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 14. Finding and analyzing cliques in a graph of mutual friendships" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Finding cliques is a hard problem, so this could\n", "# take a while for large graphs.\n", "# See http://en.wikipedia.org/wiki/NP-complete and \n", "# http://en.wikipedia.org/wiki/Clique_problem.\n", "\n", "cliques = [c for c in nx.find_cliques(nxg)]\n", "\n", "num_cliques = len(cliques)\n", "\n", "clique_sizes = [len(c) for c in cliques]\n", "max_clique_size = max(clique_sizes)\n", "avg_clique_size = sum(clique_sizes) / num_cliques\n", "\n", "max_cliques = [c for c in cliques if len(c) == max_clique_size]\n", "\n", "num_max_cliques = len(max_cliques)\n", "\n", "max_clique_sets = [set(c) for c in max_cliques]\n", "friends_in_all_max_cliques = list(reduce(lambda x, y: x.intersection(y),\n", " max_clique_sets))\n", "\n", "print 'Num cliques:', num_cliques\n", "print 'Avg clique size:', avg_clique_size\n", "print 'Max clique size:', max_clique_size\n", "print 'Num max cliques:', num_max_cliques\n", "print\n", "print 'Friends in all max cliques:'\n", "print json.dumps(friends_in_all_max_cliques, indent=1)\n", "print\n", "print 'Max cliques:'\n", "print json.dumps(max_cliques, indent=1)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 15. Serializing a NetworkX graph to a file for consumption by D3" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from networkx.readwrite import json_graph\n", "\n", "nld = json_graph.node_link_data(nxg)\n", "\n", "json.dump(nld, open('resources/ch02-facebook/viz/force.json','w'))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: You may need to implement some filtering on the NetworkX graph before writing it out to a file for display in D3, and for more than dozens of nodes, it may not be reasonable to render a meaningful visualization without some JavaScript hacking on its parameters. View the JavaScript source in [force.html](files/resources/ch02-facebook/viz/force.html) for some of the details." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example 16. Visualizing a mutual friendship graph with D3" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.display import IFrame\n", "from IPython.core.display import display\n", "\n", "# IPython Notebook can serve files and display them into\n", "# inline frames. Prepend the path with the 'files' prefix.\n", "\n", "viz_file = 'files/resources/ch02-facebook/viz/force.html'\n", "\n", "display(IFrame(viz_file, '100%', '600px'))" ], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }