{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "%config InlineBackend.figure_format = 'retina'\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import YouTubeVideo\n", "\n", "YouTubeVideo(id=\"JjpbztqP9_0\", width=\"100%\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Graph traversal is akin to walking along the graph, node by node,\n", "constrained by the edges that connect the nodes.\n", "Graph traversal is particularly useful for understanding \n", "the local structure of certain portions of the graph\n", "and for finding paths that connect two nodes in the network.\n", "\n", "In this chapter, we are going to learn how to perform pathfinding in a graph,\n", "specifically by looking for _shortest paths_ via the _breadth-first search_ algorithm." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Breadth-First Search\n", "\n", "The BFS algorithm is a staple of computer science curricula,\n", "and for good reason:\n", "it teaches learners how to \"think on\" a graph,\n", "putting one in the position of \n", "\"the dumb computer\" that can't use a visual cortex to \n", "\"_just know_\" how to trace a path from one node to another.\n", "As a topic, learning how to do BFS\n", "additionally imparts algorithmic thinking to the learner.\n", "\n", "### Exercise: Design the algorithm\n", "\n", "Try out this exercise to get some practice with algorithmic thinking.\n", "\n", "> 1. On a piece of paper, conjure up a graph that has 15-20 nodes. Connect them any way you like.\n", "> 1. Pick two nodes. Pretend that you're standing on one of the nodes, but you can't see any further beyond one neighbor away.\n", "> 1. Work out how you can find _a_ path from the node you're standing on to the other node, given that you can _only_ see nodes that are one neighbor away but have an infinitely good memory.\n", "\n", "If you are successful at designing the algorithm, you should get the answer below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from nams import load_data as cf\n", "G = cf.load_sociopatterns_network()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from nams.solutions.paths import bfs_algorithm\n", "\n", "# UNCOMMENT NEXT LINE TO GET THE ANSWER.\n", "# bfs_algorithm()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise: Implement the algorithm\n", "\n", "> Now that you've seen how the algorithm works, try implementing it!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# FILL IN THE BLANKS BELOW\n", "\n", "def path_exists(node1, node2, G):\n", " \"\"\"\n", " This function checks whether a path exists between two nodes (node1, \n", " node2) in graph G.\n", " \"\"\"\n", " visited_nodes = _____\n", " queue = [_____]\n", " \n", " while len(queue) > 0:\n", " node = ___________\n", " neighbors = list(_________________)\n", " if _____ in _________:\n", " # print('Path exists between nodes {0} and {1}'.format(node1, node2))\n", " return True\n", " else:\n", " visited_nodes.___(____)\n", " nbrs = [_ for _ in _________ if _ not in _____________]\n", " queue = ____ + _____\n", " \n", " # print('Path does not exist between nodes {0} and {1}'.format(node1, node2))\n", " return False\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# UNCOMMENT THE FOLLOWING TWO LINES TO SEE THE ANSWER\n", "from nams.solutions.paths import path_exists\n", "# path_exists??" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# CHECK YOUR ANSWER AGAINST THE TEST FUNCTION BELOW\n", "from random import sample\n", "import networkx as nx\n", "\n", "\n", "def test_path_exists(N):\n", " \"\"\"\n", " N: The number of times to spot-check.\n", " \"\"\"\n", " for i in range(N):\n", " n1, n2 = sample(G.nodes(), 2)\n", " assert path_exists(n1, n2, G) == bool(nx.shortest_path(G, n1, n2))\n", " return True\n", " \n", "assert test_path_exists(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing Paths\n", "\n", "One of the objectives of that exercise before was to help you \"think on graphs\".\n", "Now that you've learned how to do so, you might be wondering,\n", "\"How do I visualize that path through the graph?\"\n", "\n", "Well first off, if you inspect the `test_path_exists` function above,\n", "you'll notice that NetworkX provides a `shortest_path()` function\n", "that you can use. Here's what using `nx.shortest_path()` looks like." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = nx.shortest_path(G, 7, 400)\n", "path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, it returns the nodes along the shortest path,\n", "incidentally in the exact order that you would traverse.\n", "\n", "One thing to note, though!\n", "If there are multiple shortest paths from one node to another,\n", "NetworkX will only return one of them.\n", "\n", "So how do you draw those nodes _only_?\n", "\n", "You can use the `G.subgraph(nodes)`\n", "to return a new graph that only has nodes in `nodes`\n", "and only the edges that exist between them.\n", "After that, you can use any plotting library you like.\n", "We will show an example here that uses nxviz's matrix plot.\n", "\n", "Let's see it in action:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import nxviz as nv\n", "g = G.subgraph(path)\n", "nv.matrix(g, sort_by=\"order\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Voila!_ Now we have the subgraph (1) extracted and (2) drawn to screen!\n", "In this case, the matrix plot is a suitable visualization for its compactness.\n", "The off-diagonals also show that each node is a neighbor to the next one.\n", "\n", "You'll also notice that if you try to modify the graph `g`, say by adding a node:\n", "\n", "```python\n", "g.add_node(2048)\n", "```\n", "\n", "you will get an error:\n", "\n", "```python\n", "---------------------------------------------------------------------------\n", "NetworkXError Traceback (most recent call last)\n", " in \n", "----> 1 g.add_node(2048)\n", "\n", "~/anaconda/envs/nams/lib/python3.7/site-packages/networkx/classes/function.py in frozen(*args, **kwargs)\n", " 156 def frozen(*args, **kwargs):\n", " 157 \"\"\"Dummy method for raising errors when trying to modify frozen graphs\"\"\"\n", "--> 158 raise nx.NetworkXError(\"Frozen graph can't be modified\")\n", " 159 \n", " 160 \n", "\n", "NetworkXError: Frozen graph can't be modified\n", "```\n", "\n", "From the perspective of semantics, this makes a ton of sense:\n", "the subgraph `g` is a perfect subset of the larger graph `G`,\n", "and should not be allowed to be modified\n", "unless the larger container graph is modified." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise: Draw path with neighbors one degree out\n", "\n", "Try out this next exercise:\n", "\n", "> Extend graph drawing with the neighbors of each of those nodes.\n", "> Use any of the nxviz plots (`nv.matrix`, `nv.arc`, `nv.circos`);\n", "> try to see which one helps you tell the best story." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from nams.solutions.paths import plot_path_with_neighbors\n", "\n", "### YOUR SOLUTION BELOW\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "plot_path_with_neighbors(G, 7, 400)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, we opted for an Arc plot because we only have one grouping of nodes but have a logical way to order them.\n", "Because the path follows the order, the edges being highlighted automatically look like hops through the graph." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bottleneck nodes\n", "\n", "We're now going to revisit the concept of an \"important node\",\n", "this time now leveraging what we know about paths.\n", "\n", "In the \"hubs\" chapter, we saw how a node that is \"important\"\n", "could be so because it is connected to many other nodes.\n", "\n", "Paths give us an alternative definition.\n", "If we imagine that we have to pass a message on a graph\n", "from one node to another,\n", "then there may be \"bottleneck\" nodes\n", "for which if they are removed,\n", "then messages have a harder time flowing through the graph.\n", "\n", "One metric that measures this form of importance\n", "is the \"betweenness centrality\" metric.\n", "On a graph through which a generic \"message\" is flowing,\n", "a node with a high betweenness centrality\n", "is one that has a high proportion of shortest paths\n", "flowing through it.\n", "In other words, it behaves like a _bottleneck_." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Betweenness centrality in NetworkX\n", "\n", "NetworkX provides a \"betweenness centrality\" function\n", "that behaves consistently with the \"degree centrality\" function,\n", "in that it returns a mapping from node to metric:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "pd.Series(nx.betweenness_centrality(G))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise: compare degree and betweenness centrality\n", "\n", "> Make a scatterplot of degree centrality on the x-axis\n", "> and betweenness centrality on the y-axis.\n", "> Do they correlate with one another?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "# YOUR ANSWER HERE:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from nams.solutions.paths import plot_degree_betweenness\n", "plot_degree_betweenness(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Think about it...\n", "\n", "...does it make sense that degree centrality and betweenness centrality\n", "are not well-correlated?\n", "\n", "Can you think of a scenario where a node has a\n", "\"high\" betweenness centrality\n", "but a \"low\" degree centrality?\n", "Before peeking at the graph below,\n", "think about your answer for a moment." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "nx.draw(nx.barbell_graph(5, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Recap\n", "\n", "In this chapter, you learned the following things:\n", "\n", "1. You figured out how to implement the breadth-first-search algorithm to find shortest paths.\n", "1. You learned how to extract subgraphs from a larger graph.\n", "1. You implemented visualizations of subgraphs, which should help you as you communicate with colleagues.\n", "1. You calculated betweenness centrality metrics for a graph, and visualized how they correlated with degree centrality." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solutions\n", "\n", "Here are the solutions to the exercises above." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from nams.solutions import paths\n", "import inspect\n", "\n", "print(inspect.getsource(paths))" ] } ], "metadata": { "kernelspec": { "display_name": "nams", "language": "python", "name": "nams" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }