{ "metadata": { "name": "", "signature": "sha256:bc08c788e4e289c47515de4056acfde2ebe970dfacf2efdf266ca7c748a35e5a" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro to NetworkX\n", "\n", "### Adding & editing graph nodes\n", "\n", "We'll first take a look at creating a graph, and adding/editing nodes:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import networkx as nx\n", "\n", "'''use g = nx.Graph() to create a graph'''\n", "\n", "g = nx.Graph()\n", "\n", "'''Lesson: use .add_node(1) to add a single node'''\n", "\n", "# TODO: add a node\n", "\n", "'''Lesson: use .add_nodes_from([2, 3, 'four', 5]) to add in bulk'''\n", "\n", "# TODO: add multiple nodes\n", "\n", "\n", "g.nodes() # run g.nodes() to view the graph" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "'''Note that NetworkX won't complain if we re-add pre-existing nodes'''\n", "\n", "# TODO: try re-adding nodes to see what happens\n", "\n", "\n", "g.nodes() # display nodes" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "'''Lesson: remove syntax is similar to adding, eg:\n", " .remove_node()\n", " .remove_nodes_from()\n", "'''\n", "\n", "# TODO: try removing both 1) single nodes, 2) nodes in bulk\n", "\n", "\n", "\n", "g.nodes() # display nodes" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding & editing edges" ] }, { "cell_type": "code", "collapsed": false, "input": [ "h = nx.Graph() # let's create a 2nd graph to play with edges\n", "\n", "'''Lesson: to create an edge, just specify the 2 nodes that define it: \n", " .add_edge('a','b')\n", " Note that those nodes also get added (no need to make them beforehand!)\n", "'''\n", "\n", "# TODO: create an edge\n", "\n", "\n", "\n", "print 'edges:', h.edges() # see your new edge\n", "print 'nodes:', h.nodes() # verify that new nodes were also added" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "'''Lesson: adding multiple edges is similar to adding multiple nodes:\n", " .add_edges_from([('x','y'), ('y','z')])\n", "'''\n", "\n", "# TODO: create multiple new edges\n", "\n", "\n", "\n", "print 'edges:', h.edges() # see your new edge\n", "print 'nodes:', h.nodes() # verify that new nodes were also added" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing graphs" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# we need this 'magic' command to draw graphs inline\n", "%matplotlib inline \n", "\n", "GREEN = \"#77DD77\"\n", "BLUE = \"#99CCFF\"\n", "\n", "nx.draw(g, node_color=GREEN, with_labels=True)\n", "nx.draw(h, node_color=BLUE, with_labels=True)\n", "\n", "# TODO: nothing to write, just run this :)\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mini Quiz!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### How would you create the following graph?\n", "\n", "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "g = nx.Graph()\n", "\n", "# TODO: create the graph illustrated above\n", "\n", "nx.draw(g, node_color=BLUE, with_labels=True)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Directed Graphs" ] }, { "cell_type": "code", "collapsed": false, "input": [ "'''Lesson: use nx.DiGraph() to create a new directed graph\n", "'''\n", "\n", "# TODO: create a directed graph\n", "dg = \n", "\n", "\n", "dg.add_edges_from([(1,2), (2,3)])\n", "\n", "# TODO: run this cell, you should see 2 directed edges\n", "\n", "print 'directed edges:', dg.edges()\n", "nx.draw(dg, node_color=GREEN, with_labels=True)\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "'''We can make directed graphs from existing graphs, eg:\n", " nx.DiGraph(g)\n", "'''\n", "\n", "# TODO: create a directed graph from g\n", "dg = nx.DiGraph(g)\n", "\n", "\n", "nx.draw(dg, node_color=BLUE, with_labels=True)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "''' Notice that nodes A and B have TWO directed edges:\n", " A -> B\n", " B -> A\n", "'''\n", "\n", "# TODO: run dg.edges() to confirm that each node pair has TWO directed edges\n", "dg.edges()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding attributes to nodes and edges\n", "\n", "Sometimes you may want to attach attributes to either the nodes or edges:\n", "\n", "* Perhaps you want to save node properties that will be helpful with future analysis\n", "* Perhaps you want to attach visual descriptions, such a node size, edge width or graph color" ] }, { "cell_type": "code", "collapsed": false, "input": [ "cities = nx.Graph()\n", "\n", "cities.add_edge('San Diego', 'Los Angeles', { 'distance': 0.4})\n", "cities.add_edge('New York', 'Nashville', { 'distance': 5.6})\n", "cities.add_edge('Boston', 'D.C.', { 'distance': 0.8})\n", "\n", "nx.draw(cities)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Describing a Network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Degree Distribution:\n", "\n", "\n", "\n", "\n", "\n", "- 1 node with 4 edges\n", "- 1 node with 2 edges\n", "- 4 nodes with 1 edge\n", "\n", "Distribution:\n", "\n", " [(1:4), (1:2), (4:1)]\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Degree Distribution for all nodes\n", "print 'Degree Distribution:', g.degree()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Degree Distribution: {'A': 1, 'C': 1, 'B': 2, 'E': 1, 'D': 4, 'F': 1}\n" ] } ], "prompt_number": 28 }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Generate the graph above\n", "paths = nx.Graph()\n", "paths.add_edges_from([\n", " ('A','B'), ('B','D'), ('B','C'), ('D','E'), ('D','C'),\n", " ('C','1'), ('1','2'), ('1','3'), ('2','3'), \n", " ('E','2'), ('E','4')])\n", "\n", "# Display average shortest path details\n", "print 'Avg shortest path from A to E is', nx.shortest_path_length(paths, 'A','E'), 'hops:'\n", "print nx.shortest_path(paths, 'A','E')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Avg shortest path from A to E is 3 hops:\n", "['A', 'B', 'D', 'E']\n" ] } ], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Network Centrality (the higher result is better!)\n", "\n", "* **Degree: number of edges** for node X\n", "* **Betweenness: number of shortest paths** that pass through node X\n", "* **Closeness: average of the shortest paths** between X and all other nodes\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "''' To calculate Degree Distribution for all nodes, use:\n", " g.degree() for non-normalized values,\n", " nx.degree_centrality(g) for normalized values\n", "'''\n", "\n", "# TODO degree distrib., non-normalized\n", "\n", "# TODO degree distrib., normalized\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Degree: \n", "\t{'A': 1, 'C': 1, 'B': 2, 'E': 1, 'D': 4, 'F': 1}\n", "Degree centrality (normalized): \n", "\t{'A': 0.2, 'C': 0.2, 'B': 0.4, 'E': 0.2, 'D': 0.8, 'F': 0.2}\n", "Betweenness centrality: \n", "\t{'A': 0.0, 'C': 0.0, 'B': 4.0, 'E': 0.0, 'D': 9.0, 'F': 0.0}\n", "Betweenness centrality -- normalized: \n", "\t{'A': 0.0, 'C': 0.0, 'B': 0.4, 'E': 0.0, 'D': 0.9, 'F': 0.0}\n", "Closeness centrality: \n", "\t{'A': 0.4166666666666667, 'C': 0.5, 'B': 0.625, 'E': 0.5, 'D': 0.8333333333333334, 'F': 0.5}\n" ] } ], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": [ "''' To calculate betweenness centrality, use:\n", " nx.betweenness_centrality(g, normalized=True/False) default is True\n", "'''\n", "\n", "# TODO find betweenness centrality (both normalized and non)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "''' to calculate closeness centrality, use:\n", " nx.closeness_centrality(g)\n", "'''\n", "\n", "# TODO find closeness centrality" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modeling Networks\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "# Intro to the Twitter API" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to use the Twitter API, you'll need:\n", "\n", "* import oauth2 (pip install oauth2)\n", "* A twitter account\n", "* Twitter Consumer/Access tokens\n", "\n", "\n", "## Creating your Twitter Consumer/Access Tokens\n", "\n", "1) Go to https://apps.twitter.com/ and click **\"Create New App\"**\n", "\n", "* Twitter assumes you're making tokens for an app, so let's make a dummy app.\n", " \n", "2) Fill out **Name, Description and Website**:\n", "\n", "* For **Website**, I just put my github/twitter link\n", "* I left **Callback URL** empty\n", "* Click **\"Create your Twitter Application\"**\n", "\n", "3) Under **Application Settings**, set **Access level** to Read-only:\n", "\n", "* You don't _have_ to do this, but it's good practice.\n", "\n", "4) Notice that we're in the **Details** tab. Click on the **Keys and Access Tokens** tab:\n", "\n", "* You'll see **Consumer Key (API Key)** and **Consumer Secret (API Secret)**. We'll copy those in a second.\n", "\n", "5) Scroll to the bottom of the page and click the **\"Create my access token\"** button (under **Your Access Token > Token Actions**)\n", "\n", "Keep this page open - we'll paste these values into a config file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding your Twitter API tokens into config.json\n", "\n", "Using a text editor, open **networkx-tutorial/materials/config.json**, and paste your keys for the following:\n", "\n", "* **CONSUMER_KEY** - replace **\"[Consumer Key (API Key)]\"** with your value for **\"Consumer Key (API Key)\"**\n", "* **CONSUMER_SECRET**\n", "* **ACCESS_TOKEN**\n", "* **ACCESS_SECRET**\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Connecting to the Twitter API\n", "\n", "Now we're ready to use the Twitter API!" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import oauth2 as oauth\n", "import json\n", "\n", "with open('../materials/tutorial/config.json') as f:\n", " tokens = json.loads(f.read())\n", "\n", "consumer = oauth.Consumer(key=tokens['CONSUMER_KEY'], secret=tokens['CONSUMER_SECRET'])\n", "token = oauth.Token(key=tokens['ACCESS_TOKEN'], secret=tokens['ACCESS_SECRET'])\n", "\n", "client = oauth.Client(consumer, token)\n", "\n", "# TODO: run this... should get an object connecting to Twitter's API\n", "client" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Twitter's REST APIs\n", "\n", "Twitter has a rich set of API calls (full list is listed at https://dev.twitter.com/rest/public). Today we'll be using these:\n", "\n", "* [GET friends/list](https://dev.twitter.com/rest/reference/get/friends/list) - who is user X is following?\n", "* [GET followers/list](https://dev.twitter.com/rest/reference/get/followers/list) - who follows user X?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### GET followers/list: let's find out who follows you!\n", "\n", "You'll see from the [GET followers/list](https://dev.twitter.com/rest/reference/get/followers/list) documentation that the URL to get the list of followers is:\n", "\n", " https://api.twitter.com/1.1/followers/list.json?screen_name=[screen_name]\n", " \n", "Which returns:\n", "\n", "1) A response body\n", "\n", "* JSON representing the data we requested\n", "\n", "2) A response header\n", "\n", "* There's a lot of stuff here, but one param to note are the **HTTP Response Codes**, which will tell you if the request was successful. Or if not, why. The ones you should note are:\n", "\n", "* **200** - **STATUS_OKAY** - Success :) . This is what you want.\n", "* **429** - **RATE_LIMIT_EXCEEDED**. Uh-oh, slow it down :/. Twitter limits how frequently you can make requests, and you've exceeded it.\n", "* **401** - **UNAUTHORIZED_USER**. Twitter isn't accepting your Consumer/Access tokens. Verify tokens were pasted correctly, or try generating new tokens.\n", "\n", " \n", " \n", "Now that we know what to expect, let's try it!" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import json\n", "\n", "FOLLOWERS_URL = 'https://api.twitter.com/1.1/followers/list.json'\n", "\n", "# TODO: put your twitter handle here\n", "screen_name = 'my_twitter_handle'\n", "\n", "\n", "url = FOLLOWERS_URL + '?screen_name=' + screen_name\n", "header, response = client.request(url, method='GET')\n", "\n", "# let's save the whole response so you can take a look at it\n", "with open('../materials/tutorial/my_followers.json', 'w') as f:\n", " json.dump(json.loads(response), f, indent=2)\n", " \n", "print 'status:', header['status'] # should be 200 (STATUS_OKAY)\n", "print response[:200] # a lot of data!" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Extracting data from JSON result\n", "\n", "'my_followers.json' will look like the example below. Let's extract the values in RED:\n", "\n", "
\n",
      "{\n",
      "  \"previous_cursor\": 0, \n",
      "  \"previous_cursor_str\": \"0\", \n",
      "  \"next_cursor\": 1496386282559075381,  # use next_cursor to get the next page of results\n",
      "  \"users\": [\n",
      "    {\n",
      "      ...\n",
      "      \"screen_name\": \"celiala\",  # follower 1\n",
      "      ...\n",
      "    }, \n",
      "    {\n",
      "      ...\n",
      "      \"screen_name\": \"sarah_guido\",  # follower 2\n",
      "      ...\n",
      "    }\n",
      "  ],\n",
      "  ...\n",
      "}\n",
      "
\n", "\n", "Let's extract **next_cursor** and the list of **followers**:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = json.loads(response) # convert JSON string into a dictionary object\n", "\n", "next_cursor = data['next_cursor']\n", "followers = [u['screen_name'] for u in data['users']]\n", "\n", "# TODO: run this block to see what's in next_cursor and followers:\n", "print 'next_cursor:', next_cursor\n", "print len(followers), 'followers so far:', followers" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generating the subsequent Twitter API call\n", "\n", "To get the next page of results, simply pass next_cursor as your next cursor value:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "\n", "# use next_cursor to get next 20 results\n", "url = FOLLOWERS_URL + '?screen_name=' + screen_name + '&cursor=' + str(next_cursor)\n", "header, response = client.request(url, method='GET')\n", "\n", "if header['status'] == '200': # STATUS_OKAY\n", " \n", " data = json.loads(response) # convert JSON to dictionary object\n", "\n", " next_cursor = data['next_cursor']\n", " new_followers = [u['screen_name'] for u in data['users']]\n", " followers.extend(new_followers)\n", "\n", " # save raw JSON\n", " with open('../materials/tutorial/my_followers.' + str(next_cursor) + '.json', 'w') as f:\n", " json.dump(data, f, indent=2)\n", "\n", " # save followers so far\n", " with open('../materials/tutorial/my_followers.txt', 'w') as f:\n", " f.write('\\n'.join(followers))\n", "\n", " print 'next_cursor:', next_cursor\n", " print len(new_followers), 'new followers:', new_followers\n", "\n", "else:\n", " print header, response" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can just pass the screen_name and next_cursor, until we no longer get back a next_cursor.\n", "\n", "**Beware of Rate Limiting!** - GET followers/list only allows **15 calls in a 15-min window**. So, you may want to sleep between calls (`time.sleep(seconds_to_sleep)`)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Twitter Interactive Console\n", "\n", "To explore the other API Endpoints, Twitter has a great interactive UI console, where you can tweak the inputs and see the outputs:\n", "\n", "https://dev.twitter.com/rest/tools/console\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# At this point, switch to [lesson.ipynb](lesson.ipynb)!!!\n", "\n", "Go to lesson iPython notebook: [/notebooks/notebooks/lesson.ipynb](lesson.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# After the lesson:\n", "\n", "# Visualizations" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import networkx as nx\n", "\n", "# we need this 'magic' command to draw graphs inline\n", "%matplotlib inline \n", "\n", "g = nx.Graph()\n", "\n", "# let's attach a size attribute to each node to describe how big we want the node to be\n", "g.add_node(1, {'size': 800})\n", "g.add_node(2, {'size': 200})\n", "g.add_node(3, {'size': 200})\n", "g.add_node(4, {'size': 200})\n", "g.add_node(5, {'size': 200})\n", "\n", "g.add_edge(1,2, { 'thickness': 20})\n", "g.add_edge(1,3, { 'thickness': 20})\n", "g.add_edge(1,4, { 'thickness': 20})\n", "g.add_edge(1,5, { 'thickness': 20})\n", "g.add_edge(2,3, { 'thickness': 5})\n", "g.add_edge(3,4, { 'thickness': 5})\n", "g.add_edge(4,5, { 'thickness': 5})\n", "g.add_edge(5,2, { 'thickness': 5})\n", "\n", "# let's iterate through the nodes and edges and extract the list of node & edge sizes\n", "node_size = [attribs['size'] for (node, attribs) in g.nodes(data=True)]\n", "edge_thickness = [attribs['thickness'] for (v_from, v_to, attribs) in g.edges(data=True)]\n", "\n", "LIGHT_BLUE = '#A0CBE2'\n", "\n", "nx.draw(g, \n", " node_size = node_size, # node_size can either take a single value (where all nodes will be size N),\n", " # or a list of values, where Nth list value will be the size for the Nth node\n", " width = edge_thickness, # similarly, the Nth value corresponds to the width for edge N\n", " node_color = LIGHT_BLUE,\n", " edge_color = LIGHT_BLUE,\n", " font_size = 15,\n", " with_labels = True\n", ")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "import networkx as nx\n", "import matplotlib.pyplot as plt\n", "\n", "edgelist_txt = '../../data/retweets.txt'\n", "G = nx.read_edgelist(edgelist_txt, create_using=nx.DiGraph())" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "LAYOUTS = {\n", " 'circular': nx.circular_layout,\n", " 'fr': nx.fruchterman_reingold_layout,\n", " 'random': nx.random_layout,\n", " 'shell': nx.shell_layout,\n", " 'spectral': nx.spectral_layout,\n", " 'spring': nx.spring_layout\n", "}\n", "\n", "def save_layout(G, layout_name):\n", " elarge=[(u,v) for (u,v,d) in G.edges(data=True) if d['weight'] >1.5]\n", " esmall=[(u,v) for (u,v,d) in G.edges(data=True) if d['weight'] <=1.5]\n", " nlarge=[n for n in G.nodes() if n in ['PyTennessee']]\n", " pos=LAYOUTS[layout_name](G) # positions for all nodes\n", "\n", " print nlarge\n", " # nodes\n", " nx.draw_networkx_nodes(G,pos,nodelist=nlarge,node_size=1)\n", "\n", " # edges\n", " nx.draw_networkx_edges(G,pos,edgelist=elarge, width=1)\n", " nx.draw_networkx_edges(G,pos,edgelist=esmall, width=1,alpha=0.5,edge_color='#cccccc')\n", "\n", " # labelsM\n", " labels={}\n", " labels['PyTennessee']='PyTennessee'\n", " nx.draw_networkx_labels(G,pos,labels,font_size=6)\n", " #nx.draw_networkx_labels(G,pos,nodelist=nlarge,font_size=6,font_family='sans-serif')\n", "\n", " plt.axis('off')\n", " plt.savefig(layout_name + '.png', dpi=500)\n", "\n", "save_layout(G, 'spring')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Spring\n", "\n", "Below is the same graph from above, just bigger:\n", "\n", "\n", "## Other NetworkX Graphing Layouts\n", "\n", "### Circular\n", "\n", "### Fruchterman-Reingold\n", "\n", "### Random\n", "\n", "### Shell\n", "\n", "### Spectral\n", "" ] } ], "metadata": {} } ] }