{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import networkx as nx\n", "import matplotlib.pyplot as plt\n", "import warnings\n", "from custom import load_data as cf\n", "from itertools import combinations\n", "warnings.filterwarnings('ignore')\n", "\n", "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2\n", "%config InlineBackend.figure_format = 'retina'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Load Data\n", "\n", "As usual, let's start by loading some network data. This time round, we have a [physician trust](http://konect.uni-koblenz.de/networks/moreno_innovation) network, but slightly modified such that it is undirected rather than directed.\n", "\n", "> This directed network captures innovation spread among 246 physicians in for towns in Illinois, Peoria, Bloomington, Quincy and Galesburg. The data was collected in 1966. A node represents a physician and an edge between two physicians shows that the left physician told that the righ physician is his friend or that he turns to the right physician if he needs advice or is interested in a discussion. There always only exists one edge between two nodes even if more than one of the listed conditions are true." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Load the network.\n", "G = cf.load_physicians_network()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Make a Circos plot of the graph\n", "from nxviz import CircosPlot\n", "\n", "c = CircosPlot(G)\n", "c.draw()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Question\n", "\n", "What can you infer about the structure of the graph from the Circos plot?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Structures in a Graph\n", "\n", "We can leverage what we have learned in the previous notebook to identify special structures in a graph. \n", "\n", "In a network, cliques are one of these special structures." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Cliques\n", "\n", "In a social network, cliques are groups of people in which everybody knows everybody. \n", "\n", "**Questions:**\n", "1. What is the simplest clique?\n", "1. What is the simplest complex clique?\n", "\n", "Let's try implementing a simple algorithm that finds out whether a node is present in a simple complex clique." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Example code.\n", "def in_triangle(G, node):\n", " \"\"\"\n", " Returns whether a given node is present in a triangle relationship or not.\n", " \"\"\"\n", " # We first assume that the node is not present in a triangle.\n", " is_in_triangle = False\n", " \n", " # Then, iterate over every pair of the node's neighbors.\n", " for nbr1, nbr2 in combinations(G.neighbors(node), 2):\n", " # Check to see if there is an edge between the node's neighbors.\n", " # If there is an edge, then the given node is present in a triangle.\n", " if G.has_edge(nbr1, nbr2):\n", " is_in_triangle = True\n", " # We break because any triangle that is present automatically \n", " # satisfies the problem requirements.\n", " break\n", " return is_in_triangle\n", "\n", "in_triangle(G, 3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In reality, NetworkX already has a function that *counts* the number of triangles that any given node is involved in. This is probably more useful than knowing whether a node is present in a triangle or not, but the above code was simply for practice." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "nx.triangles(G, 3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise\n", "\n", "Can you write a function that takes in one node and its associated graph as an input, and returns a list or set of itself + all other nodes that it is in a triangle relationship with? Do not return the triplets, but the `set`/`list` of nodes. (5 min.)\n", "\n", "**Possible Implementation:** If I check every pair of my neighbors, any pair that are also connected in the graph are in a triangle relationship with me.\n", "\n", "Hint: Python's [`itertools`](https://docs.python.org/3/library/itertools.html) module has a `combinations` function that may be useful.\n", "\n", "Hint: NetworkX graphs have a `.has_edge(node1, node2)` function that checks whether an edge exists between two nodes.\n", "\n", "Verify your answer by drawing out the subgraph composed of those nodes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Possible answer\n", "def get_triangles(G, node):\n", " neighbors = set(G.neighbors(node))\n", " triangle_nodes = set()\n", " \"\"\"\n", " Fill in the rest of the code below.\n", " \"\"\"\n", "\n", " \n", " \n", " \n", " \n", " return triangle_nodes\n", "\n", "# Verify your answer with the following funciton call. Should return something of the form:\n", "# {3, 9, 11, 41, 42, 67}\n", "get_triangles(G, 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Then, draw out those nodes.\n", "nx.draw(G.subgraph(get_triangles(G, 3)), with_labels=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Compare for yourself that those are the only triangles that node 3 is involved in.\n", "neighbors3 = G.neighbors(3)\n", "neighbors3.append(3)\n", "nx.draw(G.subgraph(neighbors3), with_labels=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Friend Recommendation: Open Triangles\n", "\n", "Now that we have some code that identifies closed triangles, we might want to see if we can do some friend recommendations by looking for open triangles.\n", "\n", "Open triangles are like those that we described earlier on - A knows B and B knows C, but C's relationship with A isn't captured in the graph. \n", "\n", "What are the two general scenarios for finding open triangles that a given node is involved in?\n", "\n", "1. The given node is the centre node.\n", "1. The given node is one of the termini nodes." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise\n", "Can you write a function that identifies, for a given node, the other two nodes that it is involved with in an open triangle, if there is one? (5 min.)\n", "\n", "Note: For this exercise, only consider the case when the node of interest is the centre node.\n", "\n", "**Possible Implementation:** Check every pair of my neighbors, and if they are not connected to one another, then we are in an open triangle relationship." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Fill in your code here.\n", "def get_open_triangles(G, node):\n", " \"\"\"\n", " There are many ways to represent this. One may choose to represent only the nodes involved \n", " in an open triangle; this is not the approach taken here.\n", " \n", " Rather, we have a code that explicitly enumrates every open triangle present.\n", " \"\"\"\n", " open_triangle_nodes = []\n", " neighbors = set(G.neighbors(node))\n", " \n", " for n1, n2 in combinations(G.neighbors(node), 2):\n", " ...\n", " \n", " \n", " \n", " \n", " \n", " \n", " return open_triangle_nodes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# # Uncomment the following code if you want to draw out each of the triplets.\n", "# nodes = get_open_triangles(G, 2)\n", "# for i, triplet in enumerate(nodes):\n", "# fig = plt.figure(i)\n", "# nx.draw(G.subgraph(triplet), with_labels=True)\n", "print(get_open_triangles(G, 3))\n", "len(get_open_triangles(G, 3))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Triangle closure is also the core idea behind social networks' friend recommendation systems; of course, it's definitely more complicated than what we've implemented here." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Cliques\n", "\n", "We have figured out how to find triangles. Now, let's find out what **cliques** are present in the network. Recall: what is the definition of a clique?\n", "\n", "- NetworkX has a [clique-finding](https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.algorithms.clique.find_cliques.html) algorithm implemented.\n", "- This algorithm finds all maximally-sized cliques for a given node.\n", "- Note that maximal cliques of size `n` include all cliques of `size < n`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "scrolled": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "list(nx.find_cliques(G))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Exercise\n", "\n", "Try writing a function `maximal_cliques_of_size(size, G)` that implements a search for all maximal cliques of a given size. (3 min.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def maximal_cliqes_of_size(size, G):\n", " return ______________________\n", "\n", "maximal_cliqes_of_size(2, G)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Connected Components\n", "\n", "From [Wikipedia](https://en.wikipedia.org/wiki/Connected_component_%28graph_theory%29):\n", "\n", "> In graph theory, a connected component (or just component) of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph.\n", "\n", "NetworkX also implements a [function](https://networkx.github.io/documentation/networkx-1.9.1/reference/generated/networkx.algorithms.components.connected.connected_component_subgraphs.html) that identifies connected component subgraphs.\n", "\n", "Remember how based on the Circos plot above, we had this hypothesis that the physician trust network may be divided into subgraphs. Let's check that, and see if we can redraw the Circos visualization." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "ccsubgraphs = list(nx.connected_component_subgraphs(G))\n", "len(ccsubgraphs)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Exercise\n", "\n", "Draw a circos plot of the graph, but now colour and order the nodes by their connected component subgraph. (5 min.)\n", "\n", "Recall Circos API:\n", "\n", "```python\n", "c = CircosPlot(G, node_order='...', node_color='...')\n", "c.draw()\n", "plt.show() # or plt.savefig(...)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Start by labelling each node in the master graph G by some number\n", "# that represents the subgraph that contains the node.\n", "for i, g in enumerate(_____________):\n", " # Fill in code below.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "c = CircosPlot(G, _________)\n", "c.draw()\n", "plt.savefig('images/physicians.png', dpi=300)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "And \"admire\" the division of the US congress over the years..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "![Congress Voting Patterns](https://img.washingtonpost.com/wp-apps/imrs.php?src=https://img.washingtonpost.com/blogs/wonkblog/files/2015/04/journal.pone_.0123507.g002.png&w=1484)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "nams", "language": "python", "name": "nams" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" }, "widgets": { "state": {}, "version": "1.1.1" } }, "nbformat": 4, "nbformat_minor": 2 }