{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Graph Properties" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install networkx==2.5 \n", "!pip install matplotlib==3.2.2 \n", "!pip install pandas==1.1.3 \n", "!pip install scipy==1.6.2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "import networkx as nx\n", "import networkx.algorithms.community as nx_comm\n", "\n", "%matplotlib inline\n", "\n", "default_edge_color = 'gray'\n", "default_node_color = '#407cc9'\n", "enhanced_node_color = '#f5b042'\n", "enhanced_edge_color = '#cc2f04'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# draw a simple graph\n", "def draw_graph(G, node_names={}, filename=None, node_size=50):\n", " pos_nodes = nx.spring_layout(G)\n", " nx.draw(G, pos_nodes, with_labels=False, node_size=node_size, edge_color='gray')\n", " \n", " pos_attrs = {}\n", " for node, coords in pos_nodes.items():\n", " pos_attrs[node] = (coords[0], coords[1] + 0.08)\n", " \n", " nx.draw_networkx_labels(G, pos_attrs, labels=node_names, font_family='serif')\n", " \n", " plt.axis('off')\n", " axis = plt.gca()\n", " axis.set_xlim([1.2*x for x in axis.get_xlim()])\n", " axis.set_ylim([1.2*y for y in axis.get_ylim()])\n", " \n", " if filename:\n", " plt.savefig(filename, format=\"png\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# draw enhanced path on the graph\n", "def draw_enhanced_path(G, path_to_enhance, node_names={}, filename=None):\n", " path_edges = list(zip(path,path[1:]))\n", " pos_nodes = nx.spring_layout(G)\n", "\n", " plt.figure(figsize=(5,5),dpi=300)\n", " pos_nodes = nx.spring_layout(G)\n", " nx.draw(G, pos_nodes, with_labels=False, node_size=50, edge_color='gray')\n", " \n", " pos_attrs = {}\n", " for node, coords in pos_nodes.items():\n", " pos_attrs[node] = (coords[0], coords[1] + 0.08)\n", " \n", " nx.draw_networkx_labels(G, pos_attrs, labels=node_names, font_family='serif')\n", " nx.draw_networkx_edges(G,pos_nodes,edgelist=path_edges, edge_color='#cc2f04', style='dashed', width=2.0)\n", " \n", " plt.axis('off')\n", " axis = plt.gca()\n", " axis.set_xlim([1.2*x for x in axis.get_xlim()])\n", " axis.set_ylim([1.2*y for y in axis.get_ylim()])\n", " \n", " if filename:\n", " plt.savefig(filename, format=\"png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Shortest Path" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph()\n", "nodes = {1:'Dublin',2:'Paris',3:'Milan',4:'Rome',5:'Naples',6:'Moscow',7:'Tokyo'}\n", "G.add_nodes_from(nodes.keys())\n", "G.add_edges_from([(1,2),(1,3),(2,3),(3,4),(4,5),(5,6),(6,7),(7,5)])\n", "draw_graph(G, node_names=nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = nx.shortest_path(G,source=1,target=7)\n", "' -> '.join([nodes[p] for p in path])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "draw_enhanced_path(G, path, node_names=nodes, filename='shortest_path.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Characteristic path length\n", "\n", "The characteristic path length is defined as the average of all the shortest path lengths between all possible pair of nodes. This is one of the most commonly used measures of how efficiently information is spread across a network. Networks having shorter characteristic path lengths promote the quick transfer of information and reduce costs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.average_shortest_path_length(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, this metric cannot be always defined since it is not possible to compute a path among all the nodes in disconnected graphs. For this reason, network efficiency is also widely used." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Efficiency\n", "\n", "Global efficiency is the average of the inverse shortest path length for all pairs of nodes. Such a metric can be seen as a measure of how efficiently information is exchanged across a network. Efficiency is at a maximum when a graph is fully connected, while it is minimal for completely disconnected graphs. Intuitively, the shorter the path, the lower the measure." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(nx.global_efficiency(G))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The local efficiency of a node can be computed by considering only the neighborhood of the node in the calculation, without the node itself." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(nx.local_efficiency(G))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a fully connected graph, each node can be reached from any other node in the graph, and information is exchanged rapidly across the network. However, in a circular graph, several nodes should instead be traversed to reach the target node, making it less efficient." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# higher efficiency\n", "G = nx.complete_graph(n=7)\n", "nodes = {0:'Dublin',1:'Paris',2:'Milan',3:'Rome',4:'Naples',5:'Moscow',6:'Tokyo'}\n", "\n", "ge = round(nx.global_efficiency(G),2)\n", "\n", "# place the text box in axes coords\n", "ax = plt.gca()\n", "ax.text(-.4, -1.3, \"Global Efficiency:{}\".format(ge), fontsize=14, ha='left', va='bottom');\n", "\n", "draw_graph(G,node_names=nodes,filename='efficiency.png')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# lower efficiency\n", "G = nx.cycle_graph(n=7)\n", "nodes = {0:'Dublin',1:'Paris',2:'Milan',3:'Rome',4:'Naples',5:'Moscow',6:'Tokyo'}\n", "\n", "le = round(nx.global_efficiency(G),2)\n", "\n", "# place the text box in axes coords\n", "ax = plt.gca()\n", "ax.text(-.4, -1.3, \"Global Efficiency:{}\".format(le), fontsize=14, ha='left', va='bottom');\n", "\n", "draw_graph(G, node_names=nodes,filename='less_efficiency.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Integration metrics well describe the connection among nodes. However, more information about the presence of groups can be extracted by considering segregation metrics." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Segregation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clustering coefficient" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The clustering coefficient is a measure of how much nodes cluster together. It is defined as the fraction of triangles (complete subgraph of three nodes and three edges) around a node and is equivalent to the fraction of the node's neighbors that are neighbors of each other." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph()\n", "nodes = {1:'Dublin',2:'Paris',3:'Milan',4:'Rome',5:'Naples',6:'Moscow',7:'Tokyo'}\n", "G.add_nodes_from(nodes.keys())\n", "G.add_edges_from([(1,2),(1,3),(2,3),(3,4),(4,5),(5,6),(6,7),(7,5)])\n", "draw_graph(G, node_names=nodes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A global clustering coefficient is computed in networkx using the following command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.average_clustering(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The local clustering coefficient is computed in networkx using the following command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.clustering(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following graph, two clusters of nodes can be easily identified. By computing the clustering coefficient for each single node, it can be observed that Rome has the lowest value. Tokyo and Moscow, as well as Paris and Dublin, are instead very well connected within their respective groups (notice the size of each node is drawn proportionally to each node's clustering coefficient)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cc = nx.clustering(G)\n", "node_size=[(v + 0.1) * 200 for v in cc.values()]\n", "draw_graph(G, node_names=nodes, node_size=node_size,filename='clustering.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transitivity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A common variant of the clustering coefficient is known as transitivity. This can simply be defined as the ratio between the observed number of closed triplets (complete subgraph with three nodes and two edges) and the maximum possible number of closed triplets in the graph. Transitivity can be computed using networkx, as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.transitivity(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modularity\n", "\n", "Modularity was designed to quantify the division of a network in aggregated sets of highly interconnected nodes, commonly known as modules, communities, groups, or clusters. The main idea is that networks having high modularity will show dense connections within the module and sparse connections between modules.\n", "\n", "Consider a social network such as Reddit: members of communities related to video games tend to interact much more with other users in the same community, talking about recent news, favorite consoles, and so on. However, they will probably interact less with users talking about fashion. Differently from many other graph metrics, modularity is often computed by means of optimization algorithms." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph()\n", "nodes = {1:'Dublin',2:'Paris',3:'Milan',4:'Rome',5:'Naples',6:'Moscow',7:'Tokyo'}\n", "G.add_nodes_from(nodes.keys())\n", "G.add_edges_from([(1,2),(1,3),(2,3),(3,4),(4,5),(5,6),(6,7),(7,5)])\n", "draw_graph(G, node_names=nodes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Modularity in networkx is computed using the modularity function of the networkx.algorithms.community module, as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# partitions can be provided manually\n", "print(nx_comm.modularity(G, communities=[{1,2,3,4},{5,6,7}]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or automatically computed using networkx\n", "print(nx_comm.modularity(G, nx_comm.label_propagation_communities(G)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Centrality" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph()\n", "nodes = {1:'Dublin',2:'Paris',3:'Milan',4:'Rome',5:'Naples',6:'Moscow',7:'Tokyo'}\n", "G.add_nodes_from(nodes.keys())\n", "G.add_edges_from([(1,2),(1,3),(2,3),(3,4),(4,5),(5,6),(6,7),(7,5)])\n", "draw_graph(G, node_names=nodes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the most common and simple centrality metrics is the degree centrality metric. This is directly connected with the degree of a node, measuring the number of incident edges on a certain node. Intuitively, the more a node is connected to an other node, the more its degree centrality will assume high values. Note that, if a graph is directed, the in-degree centrality and out-degree centrality will be considered for each node, related to the number of incoming and outcoming edges, respectively. Degree centrality is computed in networkx by using the following command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.degree_centrality(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dc = nx.degree_centrality(G)\n", "node_size=[(v + 0.01) * 400 for v in dc.values()]\n", "draw_graph(G, node_names=nodes, node_size=node_size,filename='deg_centr.png')\n", "\n", "df = pd.DataFrame(dc,index=['Degree centrality'])\n", "df.columns = nodes.values()\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The closeness centrality metric attempts to quantify how much a node is close (well connected) to other nodes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.closeness_centrality(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dc = nx.closeness_centrality(G)\n", "node_size=[(v + 0.1) * 400 for v in dc.values()]\n", "draw_graph(G, node_names=nodes, node_size=node_size,filename='clos_centr.png')\n", "\n", "df = pd.DataFrame(dc,index=['Closeness centrality'])\n", "df.columns = nodes.values()\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The betweenness centrality metric evaluates how much a node acts as a bridge between other nodes. Even if poorly connected, a node can be strategically connected, helping to keep the whole network connected." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.betweenness_centrality(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dc = nx.betweenness_centrality(G)\n", "node_size=[(v + 0.1) * 400 for v in dc.values()]\n", "draw_graph(G, node_names=nodes, node_size=node_size,filename='bet_centrality.png')\n", "\n", "df = pd.DataFrame(dc,index=['Betweenness centrality'])\n", "df.columns = nodes.values()\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resiliency\n", "\n", "Resilience metrics enable us to measure the vulnerability of a graph." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Assortativity coefficient\n", "Assortativity is used to quantify the tendency of nodes being connected to similar nodes. There are several ways to measure such correlations. One of the most commonly used methods is the Pearson correlation coefficient between the degrees of directly connected nodes (nodes on two opposite ends of a link). The coefficient assumes positive values when there is a correlation between nodes of a similar degree, while it assumes negative values when there is a correlation between nodes of a different degree. Assortativity using the Pearson correlation coefficient is computed in networkx by using the following command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph()\n", "nodes = {1:'Dublin',2:'Paris',3:'Milan',4:'Rome',5:'Naples',6:'Moscow',7:'Tokyo'}\n", "G.add_nodes_from(nodes.keys())\n", "G.add_edges_from([(1,2),(1,3),(2,3),(3,4),(4,5),(5,6),(6,7),(7,5)])\n", "\n", "draw_graph(G, node_names=nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.degree_pearson_correlation_coefficient(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph()\n", "nodes = {1:'user1', 2:'user2', 3:'Football player', 4:'Fahsion blogger', 5:'user3', 6:'user4',\n", " 7:'user5', 8:'user6'}\n", "G.add_nodes_from(nodes.keys())\n", "G.add_edges_from([(1,3),(2,3),(7,3),(3,4),(5,4),(6,4),(8,4)])\n", "\n", "draw_graph(G, node_names=nodes,filename='assortativity.png')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.degree_pearson_correlation_coefficient(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Social networks are mostly assortative. However, the so-called influencers (famous singers, football players, fashion bloggers) tend to be followed (incoming edges) by several standard users, while tending to be connected with each other and showing a disassortative behavior." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is important to remark that these properties are a subset of all the possible metrics used to describe graphs. A wider set of metrics and algorithms can be found at https://networkx.org/documentation/stable/reference/algorithms/." ] } ], "metadata": { "colab": { "authorship_tag": "ABX9TyMQw8vnTtllM59lFcgu7Fmc", "collapsed_sections": [], "name": "rec-tut-gml-02-graph-properties.ipynb", "provenance": [ { "file_id": "1wqPq1fLY0e0f_4stYE6RGxJV8BNxgDU4", "timestamp": 1627814751704 } ] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }