{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python and NetworkX \n", "\n", "Python is a programming language. It allows us to load, manipulate, analyse, and output data. \n", "We do this by writing a set of instructions, a computer programme. A programme consists of a set of commands that typically act on variables. \n", "\n", "NetworkX is a module that we can add onto regular Python, with network-related commands and types of variables. \n", "Python Notebooks are an ‘interactive Python’ environment that allows us to run bits of Python code at a time. This is useful for learning Python. A similar environment, called ipython, which can be called from the command line (e.g. the ‘Terminal’ application on a Mac), is also useful for correcting programming errors (‘debugging’). \n", "\n", "This is a Python Jupyter notebook: a Python environment which allows text and code to be written in the same document. Each box is called a 'cell'. You 'activate' a cell by clicking anywhere within it, and the code in the active cell can be run using the 'run' button above, or by pressing shift + enter.\n", "\n", "Once the code has finished executing, the result will appear underneath the cell. In some cases, there won't be a printed output, but something stored in memory, to be used later, instead.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A simple command to the computer (click in the cell below and then shift + enter):" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello\n" ] } ], "source": [ "print('Hello')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us define a variable (e.g. a character string): " ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "s = 'Hello' " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see what is in a variable, we can type: " ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello\n" ] } ], "source": [ "print(s) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or simply:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello'" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Variables can be called almost anything: " ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "almostanything = 'See?' " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can change variables: " ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello World\n" ] } ], "source": [ "s = s + ' World' \n", "print(s) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Variables can also contain numbers: " ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7\n" ] } ], "source": [ "i = 3 \n", "j = 4 \n", "k = i+j \n", "print(k) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python has a very useful type of variable called a list: " ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "l = [4,2,5,4] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can add elements to a list: " ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[4, 2, 5, 4, 6]\n" ] } ], "source": [ "l = l + [6] \n", "print(l) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A list can also have strings. " ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "m = ['The','cat','sat','on','the','mat'] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can have lists of various different types of data, even including other lists! " ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "n = ['The',-7,['even','lists',[2,4]],3.14159,'End'] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can access individual list elements or ranges of list elements: " ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4\n" ] } ], "source": [ "print(l[0]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that computers start counting at zero, so ```l[0]``` is the first element of ```l```, and ```l[2]``` is the third element: " ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n" ] } ], "source": [ "print(l[2]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also consider ranges, for example the elements up to the third is given by: " ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[4, 2, 5]\n" ] } ], "source": [ "print(l[:3]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The elements after the third onwards: " ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5, 4, 6]\n" ] } ], "source": [ "print(l[2:]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second and third elements: " ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2, 5]\n" ] } ], "source": [ "print(l[1:3]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last element: " ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6\n" ] } ], "source": [ "print(l[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last element but one: " ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4\n" ] } ], "source": [ "print(l[-2]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also change individual list entries: " ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[4, 2, 9, 4, 6]\n" ] } ], "source": [ "l[2] = 9 \n", "print(l)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python has a very useful in-built list called ```range```. Type: " ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "range(0, 10)\n" ] } ], "source": [ "print(range(10)) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can manipulate variables in many complex ways. Often in Python this is done by taking the variable and appending a function after a period. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, to sort ```l```: " ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2, 4, 4, 6, 9]\n" ] } ], "source": [ "l.sort() \n", "print(l) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To remove the fourth element of ```l``` (remember, computers start counting at zero): " ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2, 4, 4, 9]\n" ] } ], "source": [ "l.pop(3) \n", "print(l) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To count how many times a certain value appears in a list: " ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l.count(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A fundamentally important type of function in Python is the ```for``` loop: " ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "2\n", "3\n", "4\n", "5\n", "6\n", "7\n", "8\n", "9\n" ] } ], "source": [ "for i in range(10): \n", " print(i) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This executes the indented code for every element ```i``` in a list. Note the colon. Let’s create another ```for``` loop to sum up the elements of ```l```: " ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "19\n" ] } ], "source": [ "total = 0 \n", "for i in l: \n", " total = total + i \n", "print(total) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two other variable types you should know about in Python, that we will discuss very briefly: \n", "Sets, which are like lists, but they are unordered and only contain each element once: " ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{9, 2, 4}\n" ] } ], "source": [ "r = set(l) \n", "print(r) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can perform set operations, such as the union and intersection between two sets, which makes this a powerful tool: " ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{4}\n" ] } ], "source": [ "s = set([3,4]) \n", "t = r.intersection(s) \n", "print(t)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The other import data type you should know about are dictionaries. In a dictionary we can store values using lookup keys. This is very useful. " ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "d = {} \n", "d['0207293834'] = 'A. N. Other' \n", "d['0207138827'] = 'H. Grant' \n", "d['0207394838'] = 'B. Jones' " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s look at our dictionary: " ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'B. Jones'" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d['0207394838'] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can consider its keys… " ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['0207293834', '0207138827', '0207394838'])" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.keys() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "…and values: " ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_values(['A. N. Other', 'H. Grant', 'B. Jones'])" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.values() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python and NetworkX" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To start analysing networks we need to import the networkx library." ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "import networkx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And to start we need to create an empty network: " ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [], "source": [ "nt = networkx.Graph()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s add some edges: " ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [], "source": [ "nt.add_edge('Andy','Brenda') \n", "nt.add_edge('Andy','Cecil') \n", "nt.add_edge('Andy','David') \n", "nt.add_edge('Brenda','David') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s look at our network: " ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "EdgeView([('Andy', 'Brenda'), ('Andy', 'Cecil'), ('Andy', 'David'), ('Brenda', 'David')])" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nt.nodes() \n", "nt.edges() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python has a function called ```len``` that tells us the number of elements in a list. \n", "\n", "So:" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(nt.nodes()) \n", "len(nt.edges()) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "tells us the number of nodes and edges. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now start analysing our network. Try: " ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nt.degree('Andy') \n", "nt.neighbors('Brenda') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NetworkX has many powerful network analysis tools. Some of the more complicated measurements look like this: " ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Andy', 'David']\n", "{'Andy': 0.6666666666666666, 'Brenda': 0.0, 'Cecil': 0.0, 'David': 0.0}\n" ] }, { "data": { "text/plain": [ "0.6666666666666666" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sp = networkx.shortest_path(nt,'Andy','David') \n", "print(sp) \n", "bc = networkx.betweenness_centrality(nt) \n", "print(bc) \n", "bc['Andy'] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let’s load some data from a file in our local directory. Download this file: \n", "\n", "http://www.tcm.phy.cam.ac.uk/~sea31/file.txt \n", "\n", "And put it in the same directory as this notebook.\n", "\n", "(or 'there is a file called.. in the same folder as this notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data in this file is in two columns that are separated by a comma. To open this file we write: " ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [], "source": [ "f = open('file.txt') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then read it by creating a ```for``` loop that looks at every line ```i``` in the file: " ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "for i in f: \n", " a,b = i.strip().split(',') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What this does is to take each line ```i``` in the file, ```strip``` away extraneous characters like carriage returns, and then ```split``` the line into a list where there’s a comma. Since we know that the data has two columns, we can directly assign the first element in that list to ```a``` and the second to ```b```, by writing ```a,b =``` at the beginning of the line. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to turn that data into a network, so we add a network edge for every line by writing ```nt.add_edge(a,b)``` inside the loop. Our complete code for importing the network is therefore: " ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [], "source": [ "f = open('file.txt') \n", "for i in f: \n", " a,b = i.strip().split(',') \n", " nt.add_edge(a,b) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s check our network: " ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Peter Pan', 'Superman']" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nt.edges() \n", "list(nt.neighbors('Bruce Wayne')) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Who is the best connected person in our network? To find out we need to go through all the nodes, look at their degree, and see whether that degree is higher than any degree we’ve encountered so far. \n", "We therefore start by creating a variable called ```max``` and set it to a value that is definitely lower than any possible degree, e.g. ```-1```. " ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [], "source": [ "max = -1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we look at every node’s degree and compare it to the value of ```max``` so far using an if statement (another important programming ingredient). If the degree is greater than ```max``` we update ```max``` and record the corresponding node ```i``` in the variable ```bestconn```. \n", "This goes on until we have found the best connected node, after which ```bestconn``` remains unchanged. " ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Andy 3\n", "Superman 4\n" ] } ], "source": [ "for i in nt.nodes(): \n", " if nt.degree(i) > max: \n", " bestconn = i \n", " max = nt.degree(i) \n", " print(bestconn,max) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What follows is a brief example of fancier Python code, which creates a ranking of the nodes by degree. First we import some extra tools: " ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [], "source": [ "import operator " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we create a dictionary with the degrees in it: " ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [], "source": [ "deg = nt.degree() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally we sort the key-value pairs of the dictionary by the value, with the largest first: " ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [], "source": [ "rnk = sorted(deg(),key=operator.itemgetter(1),reverse=True) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To look at the top 5: " ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('Superman', 4), ('Andy', 3), ('Brenda', 2), ('David', 2), ('Bruce Wayne', 2)]\n" ] } ], "source": [ "print(rnk[:5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's write this code all in one cell:" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Bruce Wayne 2\n", "Superman 4\n" ] } ], "source": [ "import networkx \n", "nt = networkx.Graph() \n", "f = open('file.txt') \n", "for i in f: \n", " a,b = i.strip().split(',') \n", " nt.add_edge(a,b) \n", "max = -1 \n", "for i in nt.nodes(): \n", " if nt.degree(i) > max: \n", " bestconn = i \n", " max = nt.degree(i) \n", " print(bestconn,max) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So if we want our programme to also output the degrees of everyone in the network: " ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Bruce Wayne 2\n", "Bruce Wayne 2\n", "Superman 4\n", "Superman 4\n", "Superman 4\n", "Superman 4\n", "Superman 4\n" ] } ], "source": [ "import networkx \n", "nt = networkx.Graph() \n", "f = open('file.txt') \n", "for i in f: \n", " a,b = i.strip().split(',') \n", " nt.add_edge(a,b) \n", "f.close() \n", "\n", "max = -1 \n", "for i in nt.nodes(): \n", " if nt.degree(i) > max: \n", " bestconn = i \n", " max = nt.degree(i) \n", " print(bestconn,max) \n", "\n", "\n", "#ff = open('output.txt','w') \n", "#for i in nt.nodes(): \n", "# ff.write(str(i)+'\\t'+str(nt.degree(i))+'\\n') \n", "# ff.close() \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook is available as a handout at www.tcm.phy.cam.ac.uk/~sea31/PythonNetworkX_notebooks_handout.pdf" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }