{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "qZYixNXYvPjf" }, "source": [ "# Small world assignment\n", "\n", "
\n", "# YOUR SOLUTION HERE\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How to they look like?\n", "\n", "A useful exercise to build intuition into the adjacency matrix is to imagine how the adjacency matrix of a network would look like if we \"draw\" it. What I mean by that is to draw a square grid of size $N \\times N$, and then fill in the cells according to the adjacency matrix (e.g., black if 1 and white if 0).\n", "\n", "For instance, consider a random network where every edge is randomly placed with a probability $p$. What would the adjacency matrix look like? Think about it for a moment before you move on. Use the cell below to write down your thoughts (not graded). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "# YOUR SOLUTION HERE\n", "" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "
It's just $\\mathbf{A}_{ji}$
You need to go to a node (say $k$) that is connected to $i$ first. And then you need to be able to go from $k$ to $j$. Therefore, it's $\\sum_k \\mathbf{A}_{ik} \\mathbf{A}_{kj}$
\n", "# YOUR SOLUTION HERE\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing shortest paths\n", "\n", "It is interesting to see the link between the adjacency matrix and the number of walks with a certain length. For the shortest paths, it is much easier to think about the problem computationally and algorithmically, especially when the network involves weights and directions. Let's first think about the BFS (breadth-first search) algorithm. \n", "\n", "### BFS algorithm\n", "\n", "When the network is unweighted, we can compute the shortest path and distance between any pair of nodes using the BFS algorithm. The algorithm is simple:\n", "\n", "1. Start from the source node $s$ and set the distance to itself as 0.\n", "2. Set the distance to all other nodes as $\\infty$. Or you can simply keep a set/dictionary of distances and check whether you have visited the node before.\n", "3. Add the source node to the queue.\n", "4. While the queue is not empty, do the following:\n", " 1. Pop the first node from the queue.\n", " 2. For each neighbor of the popped node, if the distance is $\\infty$, set the distance to the distance of the popped node plus 1, and add the neighbor to the queue.\n", "\n", "If it is unclear why this works, I encourage you to try to run the algorithm on a piece of paper with a simple network and watch how the distance is updated. Also there are many good videos on YouTube that explain the BFS algorithm that you can watch as well!\n", "\n", "**Q: Can you write a Python function that computes, for every node, the shortest distance to a given node using the BFS algorithm?**" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{0: 0, 1: 1, 2: 1, 3: 2, 4: 2}\n" ] } ], "source": [ "def simple_bfs(G, source_node) -> dict:\n", " # this function returns a dictionary with the distance from the source node \n", " # to all other nodes. \n", " # e.g., {0: 0, 1: 1, 2: 1, 3: 1} means that the distance from the source node \n", " # to node 0 is 0, to node 1 is 1, and so on. \n", " \n", " distance_from_source = {}\n", " nodes_to_explore = [] # the queue\n", "\n", " # YOUR SOLUTION HERE\n", "\n", " return distance_from_source\n", "\n", "# let's test the function\n", "G = nx.Graph()\n", "G.add_edges_from([(0, 1), (0,2), (1, 3), (3, 4), (1,4)])\n", "print(simple_bfs(G, 0))\n", "\n", "assert simple_bfs(G, 0) == {0: 0, 1: 1, 2: 1, 3: 2, 4: 2} # this is the expected answer\n" ] }, { "cell_type": "markdown", "metadata": { "id": "MwvW4PEQhj33" }, "source": [ "This function lets you compute the shortest path from a given node to all other nodes. We can now use this function to compute all possible shortest paths between every node pair, and then compute the average shortest path length. \n", "\n", "If we have $N$ nodes in our network, we have roughly $N^2$ shortest paths to compute. It is slow and also consumes a lot of memory. There is not a lot of things we can do to drastically reduce the computing time, but we can drastically reduce the memory usage if we only care about the average shortest path length. How would you do that? Think about it before you move on.\n", "\n", "
We can compute the shortest path length from a given node to all other nodes, and then keep only the counts of the shortest path lengths. In other words, instead keeping the dictionary that we had above, we can simply summarize it as: there is one node (0) with distance 0, two nodes (1, 2) with distance 1, and so on.