{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What is this?\n", "\n", "This notebook is an almost one to one copy from Marks Needham's [Neo4j Graph Algorithms Jupyter Notebooks](https://github.com/mneedham/graph-algorithms-notebooks).\n", "\n", "The purpose of this notebook is to demonstrate how the original Jupyter Python notebooks culd look like with the Cypher kernel.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Contents\n", "\n", "**Path finding**\n", "\n", " * [All Pairs Shortest Path](#All-Pairs-Shortest-Path)\n", " * [Single Source Shortest Path](#Single-Source-Shortest-Path)\n", "\n", "**Centrality**\n", "\n", " * [Degree Centrality](#Degree-Centrality)\n", " * [Closeness Centrality](#Closeness-Centrality)\n", " * [Betweenness Centrality](#Betweenness-Centrality)\n", "\n", "**Community Detection**\n", "\n", " * [Strongly Connected Components](#Strongly-Connected-Components)\n", " * [Weighted Union Find](#Weighted-Union-Find)\n", " * [Unweighted Union Find](#Unweighted-Union-Find)\n", " * [Label Propagation](#Label-Propagation)\n", " * [Louvain](#Louvain)\n", " * [Triangle Counting](#Triangle-Counting)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# The Setup\n", "\n", "If you do not have a Neo4j database running with APOC installed, then you could use the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "%%bash\n", "rm -rf plugins\n", "mkdir plugins\n", "cd plugins\n", "echo \"Downloading APOC plugin\"\n", "wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.3.0.1/apoc-3.3.0.1-all.jar\n", "echo \"Downloading algorithms plugin\"\n", "wget https://github.com/neo4j-contrib/neo4j-graph-algorithms/releases/download/3.3.2.0/graph-algorithms-algo-3.3.2.0.jar\n", "cd .." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "dd8ea088ca50b68f80129a1805e351e1ce9404ad055bc0a58993e08ac62d9f80\r\n" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%bash\n", "docker run \\\n", " -d --name myneo4j \\\n", " --publish=7474:7474 \\\n", " --publish=7687:7687 \\\n", " --volume=$(pwd)/plugins:/plugins \\\n", " --env NEO4J_AUTH=neo4j/class \\\n", " --env=NEO4J_dbms_security_procedures_unrestricted=apoc.\\\\\\*,algo.\\\\\\* \\\n", " neo4j" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# All Pairs Shortest Path\n", "\n", "All Pairs Shortest Path (APSP) calculates the shortest (weighted) path between all pairs of nodes. This algorithm has optimisations that make it quicker than calling the SSSP algorithm for every pair of nodes in the graph." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CREATE (a:Loc{name:'A'}), (b:Loc{name:'B'}), (c:Loc{name:'C'}), \n", " (d:Loc{name:'D'}), (e:Loc{name:'E'}), (f:Loc{name:'F'}),\n", " (a)-[:ROAD {cost:50}]->(b),\n", " (a)-[:ROAD {cost:50}]->(c),\n", " (a)-[:ROAD {cost:100}]->(d),\n", " (a)-[:RAIL {cost:50}]->(d),\n", " (b)-[:ROAD {cost:40}]->(d),\n", " (c)-[:ROAD {cost:40}]->(d),\n", " (c)-[:ROAD {cost:80}]->(e),\n", " (d)-[:ROAD {cost:30}]->(e),\n", " (d)-[:ROAD {cost:80}]->(f),\n", " (e)-[:ROAD {cost:40}]->(f),\n", " (e)-[:RAIL {cost:20}]->(f);" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+-----------------+\n", "| labels | count |\n", "+-----------------+\n", "| [\"Loc\"] | 6 |\n", "+-----------------+\n", "\n", "1 row available after 1 ms, consumed after another 0 ms" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n) \n", "RETURN labels(n) AS labels, COUNT(*) AS count\n", "ORDER BY count DESC" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 47, label: \"Loc\", title: \"{name:B,
_id:47}\", color: 'rgba(118,150,34,0.5)'},\n", " { id: 46, label: \"Loc\", title: \"{name:A,
_id:46}\", color: 'rgba(118,150,34,0.5)'},\n", " { id: 51, label: \"Loc\", title: \"{name:F,
_id:51}\", color: 'rgba(118,150,34,0.5)'},\n", " { id: 48, label: \"Loc\", title: \"{name:C,
_id:48}\", color: 'rgba(118,150,34,0.5)'},\n", " { id: 50, label: \"Loc\", title: \"{name:E,
_id:50}\", color: 'rgba(118,150,34,0.5)'},\n", " { id: 49, label: \"Loc\", title: \"{name:D,
_id:49}\", color: 'rgba(118,150,34,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 49,to: 51, arrows:'to', title: \"{cost:80, _id:59,}\" },\n", " { from: 49,to: 50, arrows:'to', title: \"{cost:30, _id:58,}\" },\n", " { from: 48,to: 49, arrows:'to', title: \"{cost:40, _id:56,}\" },\n", " { from: 46,to: 47, arrows:'to', title: \"{cost:50, _id:51,}\" },\n", " { from: 46,to: 48, arrows:'to', title: \"{cost:50, _id:52,}\" },\n", " { from: 46,to: 49, arrows:'to', title: \"{cost:100, _id:53,}\" },\n", " { from: 47,to: 49, arrows:'to', title: \"{cost:40, _id:55,}\" },\n", " { from: 48,to: 50, arrows:'to', title: \"{cost:80, _id:57,}\" },\n", " { from: 50,to: 51, arrows:'to', title: \"{cost:20, _id:61,}\" },\n", " { from: 50,to: 51, arrows:'to', title: \"{cost:40, _id:60,}\" },\n", " { from: 46,to: 49, arrows:'to', title: \"{cost:50, _id:54,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('f3dd7152-5734-4f1e-81a1-631833e95ad6');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+----------------------------------------------------------------------------------------------------+\n", "| n | r | m |\n", "+----------------------------------------------------------------------------------------------------+\n", "| (:Loc {name: \"A\", _id_: 46}) | [:ROAD {_id_: 52, cost: 50}[46>48]] | (:Loc {name: \"C\", _id_: 48}) |\n", "| (:Loc {name: \"A\", _id_: 46}) | [:ROAD {_id_: 53, cost: 100}[46>49]] | (:Loc {name: \"D\", _id_: 49}) |\n", "| (:Loc {name: \"A\", _id_: 46}) | [:RAIL {_id_: 54, cost: 50}[46>49]] | (:Loc {name: \"D\", _id_: 49}) |\n", "| (:Loc {name: \"A\", _id_: 46}) | [:ROAD {_id_: 51, cost: 50}[46>47]] | (:Loc {name: \"B\", _id_: 47}) |\n", "| (:Loc {name: \"B\", _id_: 47}) | [:ROAD {_id_: 55, cost: 40}[47>49]] | (:Loc {name: \"D\", _id_: 49}) |\n", "| (:Loc {name: \"C\", _id_: 48}) | [:ROAD {_id_: 56, cost: 40}[48>49]] | (:Loc {name: \"D\", _id_: 49}) |\n", "| (:Loc {name: \"C\", _id_: 48}) | [:ROAD {_id_: 57, cost: 80}[48>50]] | (:Loc {name: \"E\", _id_: 50}) |\n", "| (:Loc {name: \"D\", _id_: 49}) | [:ROAD {_id_: 58, cost: 30}[49>50]] | (:Loc {name: \"E\", _id_: 50}) |\n", "| (:Loc {name: \"D\", _id_: 49}) | [:ROAD {_id_: 59, cost: 80}[49>51]] | (:Loc {name: \"F\", _id_: 51}) |\n", "| (:Loc {name: \"E\", _id_: 50}) | [:ROAD {_id_: 60, cost: 40}[50>51]] | (:Loc {name: \"F\", _id_: 51}) |\n", "| (:Loc {name: \"E\", _id_: 50}) | [:RAIL {_id_: 61, cost: 20}[50>51]] | (:Loc {name: \"F\", _id_: 51}) |\n", "+----------------------------------------------------------------------------------------------------+\n", "\n", "11 rows available after 26 ms, consumed after another 3 ms" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:Loc)-[r]->(m:Loc)\n", "RETURN n,r,m;" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+----------------------------+\n", "| source | target | distance |\n", "+----------------------------+\n", "| \"A\" | \"F\" | 100.0 |\n", "| \"B\" | \"F\" | 90.0 |\n", "| \"C\" | \"F\" | 90.0 |\n", "| \"A\" | \"E\" | 80.0 |\n", "| \"B\" | \"E\" | 70.0 |\n", "| \"C\" | \"E\" | 70.0 |\n", "| \"A\" | \"B\" | 50.0 |\n", "| \"A\" | \"C\" | 50.0 |\n", "| \"D\" | \"F\" | 50.0 |\n", "| \"A\" | \"D\" | 50.0 |\n", "+----------------------------+\n", "\n", "10 rows available after 14 ms, consumed after another 0 ms" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.allShortestPaths.stream('cost',{nodeQuery:'Loc',defaultValue:1.0})\n", "YIELD sourceNodeId, targetNodeId, distance\n", "WITH sourceNodeId, targetNodeId, distance \n", "WHERE algo.isFinite(distance) = true\n", "\n", "MATCH (source:Loc) WHERE id(source) = sourceNodeId\n", "MATCH (target:Loc) WHERE id(target) = targetNodeId\n", "WITH source, target, distance WHERE source <> target\n", "\n", "RETURN source.name AS source, target.name AS target, distance\n", "ORDER BY distance DESC\n", "LIMIT 10" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ " source target distance\r\n", "0 \"A\" \"F\" 100.0\r\n", "1 \"B\" \"F\" 90.0\r\n", "2 \"C\" \"F\" 90.0\r\n", "3 \"A\" \"E\" 80.0\r\n", "4 \"B\" \"E\" 70.0\r\n", "5 \"C\" \"E\" 70.0\r\n", "6 \"A\" \"B\" 50.0\r\n", "7 \"A\" \"C\" 50.0\r\n", "8 \"D\" \"F\" 50.0\r\n", "9 \"A\" \"D\" 50.0\r\n" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%python\n", "df" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Single Source Shortest Path\n", "The Single Source Shortest Path (SSSP) algorithm calculates the shortest (weighted) path between a pair of nodes. Dijkstra's algorithm is the most well known one in this category. SSSP is a real time graph algorithm - it can be used as part of the normal user flow in a web or mobile application." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+--------------+\n", "| name | cost |\n", "+--------------+\n", "| \"A\" | 0.0 |\n", "| \"D\" | 50.0 |\n", "| \"E\" | 80.0 |\n", "| \"F\" | 100.0 |\n", "+--------------+\n", "\n", "4 rows available after 1 ms, consumed after another 4 ms" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (start:Loc{name:'A'}), (end:Loc{name:'F'})\n", "CALL algo.shortestPath.stream(start, end, 'cost') \n", "YIELD nodeId, cost\n", "MATCH (other:Loc) WHERE id(other) = nodeId\n", "RETURN other.name AS name, cost" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Degree Centrality\n", "Degree Centrality is the simplest of all the centrality algorithms. It measures the number of incoming and outgoing relationships from a node.\n", "The algorithm can help us find popular nodes in a graph." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MERGE (nAlice:User {id:'Alice'})\n", "MERGE (nBridget:User {id:'Bridget'})\n", "MERGE (nCharles:User {id:'Charles'})\n", "MERGE (nDoug:User {id:'Doug'})\n", "MERGE (nMark:User {id:'Mark'})\n", "MERGE (nMichael:User {id:'Michael'})\n", "MERGE (nAlice)-[:FOLLOWS]->(nDoug)\n", "MERGE (nAlice)-[:FOLLOWS]->(nBridget)\n", "MERGE (nAlice)-[:FOLLOWS]->(nCharles)\n", "MERGE (nMark)-[:FOLLOWS]->(nDoug)\n", "MERGE (nMark)-[:FOLLOWS]->(nMichael)\n", "MERGE (nBridget)-[:FOLLOWS]->(nDoug)\n", "MERGE (nCharles)-[:FOLLOWS]->(nDoug)\n", "MERGE (nMichael)-[:FOLLOWS]->(nDoug)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 57, label: \"User\", title: \"{id:Michael,
_id:57}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 54, label: \"User\", title: \"{id:Charles,
_id:54}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 52, label: \"User\", title: \"{id:Alice,
_id:52}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 55, label: \"User\", title: \"{id:Doug,
_id:55}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 53, label: \"User\", title: \"{id:Bridget,
_id:53}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 56, label: \"User\", title: \"{id:Mark,
_id:56}\", color: 'rgba(149,187,174,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 52,to: 54, arrows:'to', title: \"{_id:64,}\" },\n", " { from: 53,to: 55, arrows:'to', title: \"{_id:67,}\" },\n", " { from: 56,to: 55, arrows:'to', title: \"{_id:65,}\" },\n", " { from: 57,to: 55, arrows:'to', title: \"{_id:69,}\" },\n", " { from: 52,to: 53, arrows:'to', title: \"{_id:63,}\" },\n", " { from: 56,to: 57, arrows:'to', title: \"{_id:66,}\" },\n", " { from: 52,to: 55, arrows:'to', title: \"{_id:62,}\" },\n", " { from: 54,to: 55, arrows:'to', title: \"{_id:68,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('41fa252f-def3-4fa1-ada0-ddd5875b6bd1');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+------------------------------------------------------------------------------------------------------+\n", "| n | f | m |\n", "+------------------------------------------------------------------------------------------------------+\n", "| (:User {_id_: 52, id: \"Alice\"}) | [:FOLLOWS {_id_: 62}[52>55]] | (:User {_id_: 55, id: \"Doug\"}) |\n", "| (:User {_id_: 52, id: \"Alice\"}) | [:FOLLOWS {_id_: 63}[52>53]] | (:User {_id_: 53, id: \"Bridget\"}) |\n", "| (:User {_id_: 52, id: \"Alice\"}) | [:FOLLOWS {_id_: 64}[52>54]] | (:User {_id_: 54, id: \"Charles\"}) |\n", "| (:User {_id_: 53, id: \"Bridget\"}) | [:FOLLOWS {_id_: 67}[53>55]] | (:User {_id_: 55, id: \"Doug\"}) |\n", "| (:User {_id_: 54, id: \"Charles\"}) | [:FOLLOWS {_id_: 68}[54>55]] | (:User {_id_: 55, id: \"Doug\"}) |\n", "| (:User {_id_: 56, id: \"Mark\"}) | [:FOLLOWS {_id_: 66}[56>57]] | (:User {_id_: 57, id: \"Michael\"}) |\n", "| (:User {_id_: 56, id: \"Mark\"}) | [:FOLLOWS {_id_: 65}[56>55]] | (:User {_id_: 55, id: \"Doug\"}) |\n", "| (:User {_id_: 57, id: \"Michael\"}) | [:FOLLOWS {_id_: 69}[57>55]] | (:User {_id_: 55, id: \"Doug\"}) |\n", "+------------------------------------------------------------------------------------------------------+\n", "\n", "8 rows available after 24 ms, consumed after another 4 ms" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:User)-[f:FOLLOWS]->(m:User)\n", "RETURN n, f, m" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Finally we can run the algorithm by executing the following query:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "+---------------------------------+\n", "| name | follows | followers |\n", "+---------------------------------+\n", "| \"Alice\" | 3 | 0 |\n", "| \"Bridget\" | 1 | 1 |\n", "| \"Charles\" | 1 | 1 |\n", "| \"Doug\" | 0 | 5 |\n", "| \"Mark\" | 2 | 0 |\n", "| \"Michael\" | 1 | 1 |\n", "+---------------------------------+\n", "\n", "6 rows available after 1 ms, consumed after another 1 ms" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (u:User)\n", "RETURN u.id AS name,\n", " size((u)-[:FOLLOWS]->()) AS follows,\n", " size((u)<-[:FOLLOWS]-()) AS followers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can see that Doug is the most popular user in our imaginary Twitter graph with 5 followers - all other users follow him but he doesn't follow anybody back. In the real Twitter network celebrities have very high follower counts but tend to follow very few back people. We could therefore consider Doug a celebrity!\n", "\n", "We can also call a version of the algorithm that will store the result as a property on a node. This is useful if we want to run future queries that use the result." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (u:User)\n", "set u.followers = size((u)<-[:FOLLOWS]-())" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 54, label: \"User\", title: \"{followers:1,
id:Charles,
_id:54}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 53, label: \"User\", title: \"{followers:1,
id:Bridget,
_id:53}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 57, label: \"User\", title: \"{followers:1,
id:Michael,
_id:57}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 55, label: \"User\", title: \"{followers:5,
id:Doug,
_id:55}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 56, label: \"User\", title: \"{followers:0,
id:Mark,
_id:56}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 52, label: \"User\", title: \"{followers:0,
id:Alice,
_id:52}\", color: 'rgba(149,187,174,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 52,to: 54, arrows:'to', title: \"{_id:64,}\" },\n", " { from: 53,to: 55, arrows:'to', title: \"{_id:67,}\" },\n", " { from: 56,to: 55, arrows:'to', title: \"{_id:65,}\" },\n", " { from: 57,to: 55, arrows:'to', title: \"{_id:69,}\" },\n", " { from: 52,to: 53, arrows:'to', title: \"{_id:63,}\" },\n", " { from: 56,to: 57, arrows:'to', title: \"{_id:66,}\" },\n", " { from: 52,to: 55, arrows:'to', title: \"{_id:62,}\" },\n", " { from: 54,to: 55, arrows:'to', title: \"{_id:68,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('c1652ff9-f990-4926-bcc9-62e3c608d932');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+----------------------------------------------------------------------------------------------------------------------------------+\n", "| n | f | m |\n", "+----------------------------------------------------------------------------------------------------------------------------------+\n", "| (:User {_id_: 52, followers: 0, id: \"Alice\"}) | [:FOLLOWS {_id_: 62}[52>55]] | (:User {_id_: 55, followers: 5, id: \"Doug\"}) |\n", "| (:User {_id_: 52, followers: 0, id: \"Alice\"}) | [:FOLLOWS {_id_: 63}[52>53]] | (:User {_id_: 53, followers: 1, id: \"Bridget\"}) |\n", "| (:User {_id_: 52, followers: 0, id: \"Alice\"}) | [:FOLLOWS {_id_: 64}[52>54]] | (:User {_id_: 54, followers: 1, id: \"Charles\"}) |\n", "| (:User {_id_: 53, followers: 1, id: \"Bridget\"}) | [:FOLLOWS {_id_: 67}[53>55]] | (:User {_id_: 55, followers: 5, id: \"Doug\"}) |\n", "| (:User {_id_: 54, followers: 1, id: \"Charles\"}) | [:FOLLOWS {_id_: 68}[54>55]] | (:User {_id_: 55, followers: 5, id: \"Doug\"}) |\n", "| (:User {_id_: 56, followers: 0, id: \"Mark\"}) | [:FOLLOWS {_id_: 66}[56>57]] | (:User {_id_: 57, followers: 1, id: \"Michael\"}) |\n", "| (:User {_id_: 56, followers: 0, id: \"Mark\"}) | [:FOLLOWS {_id_: 65}[56>55]] | (:User {_id_: 55, followers: 5, id: \"Doug\"}) |\n", "| (:User {_id_: 57, followers: 1, id: \"Michael\"}) | [:FOLLOWS {_id_: 69}[57>55]] | (:User {_id_: 55, followers: 5, id: \"Doug\"}) |\n", "+----------------------------------------------------------------------------------------------------------------------------------+\n", "\n", "8 rows available after 1 ms, consumed after another 3 ms" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:User)-[f:FOLLOWS]->(m:User)\n", "RETURN n, f, m" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Closeness Centrality\n", "\n", "Closeness Centrality is a way of detecting nodes that are able to spread information very efficiently through a graph.\n", "\n", "The Closeness Centrality of a node measures its average distance to all other nodes. Nodes with a high closeness score have the shortest distances to all other nodes." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MERGE (a:Node{id:\"A\"})\n", "MERGE (b:Node{id:\"B\"})\n", "MERGE (c:Node{id:\"C\"})\n", "MERGE (d:Node{id:\"D\"})\n", "MERGE (e:Node{id:\"E\"})\n", "\n", "MERGE (a)-[:LINK]->(b)\n", "MERGE (b)-[:LINK]->(a)\n", "MERGE (b)-[:LINK]->(c)\n", "MERGE (c)-[:LINK]->(b)\n", "MERGE (c)-[:LINK]->(d)\n", "MERGE (d)-[:LINK]->(c)\n", "MERGE (d)-[:LINK]->(e)\n", "MERGE (e)-[:LINK]->(d)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 62, label: \"Node\", title: \"{id:E,
_id:62}\", color: 'rgba(157,205,140,0.5)'},\n", " { id: 58, label: \"Node\", title: \"{id:A,
_id:58}\", color: 'rgba(157,205,140,0.5)'},\n", " { id: 59, label: \"Node\", title: \"{id:B,
_id:59}\", color: 'rgba(157,205,140,0.5)'},\n", " { id: 61, label: \"Node\", title: \"{id:D,
_id:61}\", color: 'rgba(157,205,140,0.5)'},\n", " { id: 60, label: \"Node\", title: \"{id:C,
_id:60}\", color: 'rgba(157,205,140,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 58,to: 59, arrows:'to', title: \"{_id:70,}\" },\n", " { from: 59,to: 60, arrows:'to', title: \"{_id:72,}\" },\n", " { from: 60,to: 59, arrows:'to', title: \"{_id:73,}\" },\n", " { from: 61,to: 62, arrows:'to', title: \"{_id:76,}\" },\n", " { from: 61,to: 60, arrows:'to', title: \"{_id:75,}\" },\n", " { from: 62,to: 61, arrows:'to', title: \"{_id:77,}\" },\n", " { from: 60,to: 61, arrows:'to', title: \"{_id:74,}\" },\n", " { from: 59,to: 58, arrows:'to', title: \"{_id:71,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('f2b959f3-bc57-4e52-ba96-86331e799a14');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+---------------------------------------------------------------------------------------+\n", "| n | f | m |\n", "+---------------------------------------------------------------------------------------+\n", "| (:Node {_id_: 58, id: \"A\"}) | [:LINK {_id_: 70}[58>59]] | (:Node {_id_: 59, id: \"B\"}) |\n", "| (:Node {_id_: 59, id: \"B\"}) | [:LINK {_id_: 72}[59>60]] | (:Node {_id_: 60, id: \"C\"}) |\n", "| (:Node {_id_: 59, id: \"B\"}) | [:LINK {_id_: 71}[59>58]] | (:Node {_id_: 58, id: \"A\"}) |\n", "| (:Node {_id_: 60, id: \"C\"}) | [:LINK {_id_: 74}[60>61]] | (:Node {_id_: 61, id: \"D\"}) |\n", "| (:Node {_id_: 60, id: \"C\"}) | [:LINK {_id_: 73}[60>59]] | (:Node {_id_: 59, id: \"B\"}) |\n", "| (:Node {_id_: 61, id: \"D\"}) | [:LINK {_id_: 75}[61>60]] | (:Node {_id_: 60, id: \"C\"}) |\n", "| (:Node {_id_: 61, id: \"D\"}) | [:LINK {_id_: 76}[61>62]] | (:Node {_id_: 62, id: \"E\"}) |\n", "| (:Node {_id_: 62, id: \"E\"}) | [:LINK {_id_: 77}[62>61]] | (:Node {_id_: 61, id: \"D\"}) |\n", "+---------------------------------------------------------------------------------------+\n", "\n", "8 rows available after 17 ms, consumed after another 2 ms" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:Node)-[f:LINK]->(m:Node)\n", "RETURN n, f, m" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The actual algorithm for computing the node centrality is computed with the following query:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "+---------------------------+\n", "| node | centrality |\n", "+---------------------------+\n", "| \"C\" | 0.6666666666666666 |\n", "| \"B\" | 0.5714285714285714 |\n", "| \"D\" | 0.5714285714285714 |\n", "| \"A\" | 0.4 |\n", "| \"E\" | 0.4 |\n", "+---------------------------+\n", "\n", "5 rows available after 48 ms, consumed after another 0 ms" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.closeness.stream('Node', 'LINK')\n", "YIELD nodeId, centrality\n", "\n", "MATCH (n:Node) WHERE id(n) = nodeId\n", "\n", "RETURN n.id AS node, centrality\n", "ORDER BY centrality DESC\n", "limit 20;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Betweenness Centrality\n", "\n", "Betweenness Centrality is a way of detecting the amount of influence a node has over the flow of information in a graph.\n", "\n", "\n", "It is often used to find nodes that serve as a bridge from one part of a graph to another. In the above example Alice is the main connection in the graph. If Alice is removed all connections in the graph would be cut off. This makes Alice \"important\" because she ensures that no nodes are isolated. " ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MERGE (nAlice:User {id:'Alice'})\n", "MERGE (nBridget:User {id:'Bridget'})\n", "MERGE (nCharles:User {id:'Charles'})\n", "MERGE (nDoug:User {id:'Doug'})\n", "MERGE (nMark:User {id:'Mark'})\n", "MERGE (nMichael:User {id:'Michael'})\n", "\n", "MERGE (nAlice)-[:MANAGE]->(nBridget)\n", "MERGE (nAlice)-[:MANAGE]->(nCharles)\n", "MERGE (nAlice)-[:MANAGE]->(nDoug)\n", "MERGE (nMark)-[:MANAGE]->(nAlice)\n", "MERGE (nCharles)-[:MANAGE]->(nMichael)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 54, label: \"User\", title: \"{followers:1,
id:Charles,
_id:54}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 53, label: \"User\", title: \"{followers:1,
id:Bridget,
_id:53}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 57, label: \"User\", title: \"{followers:1,
id:Michael,
_id:57}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 55, label: \"User\", title: \"{followers:5,
id:Doug,
_id:55}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 56, label: \"User\", title: \"{followers:0,
id:Mark,
_id:56}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 52, label: \"User\", title: \"{followers:0,
id:Alice,
_id:52}\", color: 'rgba(149,187,174,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 54,to: 57, arrows:'to', title: \"{_id:102,}\" },\n", " { from: 52,to: 54, arrows:'to', title: \"{_id:79,}\" },\n", " { from: 52,to: 53, arrows:'to', title: \"{_id:78,}\" },\n", " { from: 52,to: 55, arrows:'to', title: \"{_id:100,}\" },\n", " { from: 56,to: 52, arrows:'to', title: \"{_id:101,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('2f85a865-91b7-4c8d-bfbe-4247ca9575f3');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+----------------------------------------------------------------------------------------------------------------------------------+\n", "| n | f | m |\n", "+----------------------------------------------------------------------------------------------------------------------------------+\n", "| (:User {_id_: 52, followers: 0, id: \"Alice\"}) | [:MANAGE {_id_: 79}[52>54]] | (:User {_id_: 54, followers: 1, id: \"Charles\"}) |\n", "| (:User {_id_: 52, followers: 0, id: \"Alice\"}) | [:MANAGE {_id_: 78}[52>53]] | (:User {_id_: 53, followers: 1, id: \"Bridget\"}) |\n", "| (:User {_id_: 52, followers: 0, id: \"Alice\"}) | [:MANAGE {_id_: 100}[52>55]] | (:User {_id_: 55, followers: 5, id: \"Doug\"}) |\n", "| (:User {_id_: 54, followers: 1, id: \"Charles\"}) | [:MANAGE {_id_: 102}[54>57]] | (:User {_id_: 57, followers: 1, id: \"Michael\"}) |\n", "| (:User {_id_: 56, followers: 0, id: \"Mark\"}) | [:MANAGE {_id_: 101}[56>52]] | (:User {_id_: 52, followers: 0, id: \"Alice\"}) |\n", "+----------------------------------------------------------------------------------------------------------------------------------+\n", "\n", "5 rows available after 18 ms, consumed after another 0 ms" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:User)-[f:MANAGE]->(m:User)\n", "RETURN n, f, m" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+------------------------+\n", "| user | centrality |\n", "+------------------------+\n", "| \"Alice\" | 4.0 |\n", "| \"Charles\" | 2.0 |\n", "| \"Bridget\" | 0.0 |\n", "| \"Doug\" | 0.0 |\n", "| \"Mark\" | 0.0 |\n", "| \"Michael\" | 0.0 |\n", "+------------------------+\n", "\n", "6 rows available after 7 ms, consumed after another 0 ms" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.betweenness.stream('User','MANAGE',{direction:'out'}) \n", "YIELD nodeId, centrality\n", "\n", "MATCH (user:User) WHERE id(user) = nodeId\n", "\n", "RETURN user.id AS user,centrality\n", "ORDER BY centrality DESC\n", "LIMIT 20;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Strongly Connected Components\n", "\n", "The Strongly Connected Components (SCC) algorithm finds sets of connected nodes in a directed graph where each node is reachable in both directions from any other node in the same set. It is often used early in a graph analysis process to help us get an idea of how our graph is structured." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MERGE (nAlice:User {id:'Alice'})\n", "MERGE (nBridget:User {id:'Bridget'})\n", "MERGE (nCharles:User {id:'Charles'})\n", "MERGE (nDoug:User {id:'Doug'})\n", "MERGE (nMark:User {id:'Mark'})\n", "MERGE (nMichael:User {id:'Michael'})\n", "\n", "MERGE (nAlice)-[:FOLLOW]->(nBridget)\n", "MERGE (nAlice)-[:FOLLOW]->(nCharles)\n", "MERGE (nMark)-[:FOLLOW]->(nDoug)\n", "MERGE (nMark)-[:FOLLOW]->(nMichael)\n", "MERGE (nBridget)-[:FOLLOW]->(nMichael)\n", "MERGE (nDoug)-[:FOLLOW]->(nMark)\n", "MERGE (nMichael)-[:FOLLOW]->(nAlice)\n", "MERGE (nAlice)-[:FOLLOW]->(nMichael)\n", "MERGE (nBridget)-[:FOLLOW]->(nAlice)\n", "MERGE (nMichael)-[:FOLLOW]->(nBridget)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 54, label: \"User\", title: \"{followers:1,
id:Charles,
_id:54}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 53, label: \"User\", title: \"{followers:1,
id:Bridget,
_id:53}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 57, label: \"User\", title: \"{followers:1,
id:Michael,
_id:57}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 55, label: \"User\", title: \"{followers:5,
id:Doug,
_id:55}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 56, label: \"User\", title: \"{followers:0,
id:Mark,
_id:56}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 52, label: \"User\", title: \"{followers:0,
id:Alice,
_id:52}\", color: 'rgba(149,187,174,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 56,to: 57, arrows:'to', title: \"{_id:106,}\" },\n", " { from: 57,to: 52, arrows:'to', title: \"{_id:109,}\" },\n", " { from: 52,to: 54, arrows:'to', title: \"{_id:104,}\" },\n", " { from: 53,to: 52, arrows:'to', title: \"{_id:111,}\" },\n", " { from: 55,to: 56, arrows:'to', title: \"{_id:108,}\" },\n", " { from: 56,to: 55, arrows:'to', title: \"{_id:105,}\" },\n", " { from: 52,to: 57, arrows:'to', title: \"{_id:110,}\" },\n", " { from: 57,to: 53, arrows:'to', title: \"{_id:112,}\" },\n", " { from: 52,to: 53, arrows:'to', title: \"{_id:103,}\" },\n", " { from: 53,to: 57, arrows:'to', title: \"{_id:107,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('89136cb0-8c0a-4396-9353-6e4c28652e35');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+----------------------------------------------------------------------------------------------------------------------------------+\n", "| n | f | m |\n", "+----------------------------------------------------------------------------------------------------------------------------------+\n", "| (:User {_id_: 52, followers: 0, id: \"Alice\"}) | [:FOLLOW {_id_: 104}[52>54]] | (:User {_id_: 54, followers: 1, id: \"Charles\"}) |\n", "| (:User {_id_: 52, followers: 0, id: \"Alice\"}) | [:FOLLOW {_id_: 110}[52>57]] | (:User {_id_: 57, followers: 1, id: \"Michael\"}) |\n", "| (:User {_id_: 52, followers: 0, id: \"Alice\"}) | [:FOLLOW {_id_: 103}[52>53]] | (:User {_id_: 53, followers: 1, id: \"Bridget\"}) |\n", "| (:User {_id_: 53, followers: 1, id: \"Bridget\"}) | [:FOLLOW {_id_: 107}[53>57]] | (:User {_id_: 57, followers: 1, id: \"Michael\"}) |\n", "| (:User {_id_: 53, followers: 1, id: \"Bridget\"}) | [:FOLLOW {_id_: 111}[53>52]] | (:User {_id_: 52, followers: 0, id: \"Alice\"}) |\n", "| (:User {_id_: 55, followers: 5, id: \"Doug\"}) | [:FOLLOW {_id_: 108}[55>56]] | (:User {_id_: 56, followers: 0, id: \"Mark\"}) |\n", "| (:User {_id_: 56, followers: 0, id: \"Mark\"}) | [:FOLLOW {_id_: 105}[56>55]] | (:User {_id_: 55, followers: 5, id: \"Doug\"}) |\n", "| (:User {_id_: 56, followers: 0, id: \"Mark\"}) | [:FOLLOW {_id_: 106}[56>57]] | (:User {_id_: 57, followers: 1, id: \"Michael\"}) |\n", "| (:User {_id_: 57, followers: 1, id: \"Michael\"}) | [:FOLLOW {_id_: 109}[57>52]] | (:User {_id_: 52, followers: 0, id: \"Alice\"}) |\n", "| (:User {_id_: 57, followers: 1, id: \"Michael\"}) | [:FOLLOW {_id_: 112}[57>53]] | (:User {_id_: 53, followers: 1, id: \"Bridget\"}) |\n", "+----------------------------------------------------------------------------------------------------------------------------------+\n", "\n", "10 rows available after 15 ms, consumed after another 2 ms" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:User)-[f:FOLLOW]->(m:User)\n", "RETURN n, f, m" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+-----------------------+\n", "| name | partition |\n", "+-----------------------+\n", "| \"Alice\" | 0 |\n", "| \"Bridget\" | 0 |\n", "| \"Charles\" | 2 |\n", "| \"Doug\" | 3 |\n", "| \"Mark\" | 3 |\n", "| \"Michael\" | 0 |\n", "+-----------------------+\n", "\n", "6 rows available after 24 ms, consumed after another 9 ms" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.scc.stream('User','FOLLOW')\n", "YIELD nodeId, partition\n", "MATCH (u:User) WHERE id(u) = nodeId\n", "RETURN u.id AS name, partition" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ " name partition\r\n", "0 \"Alice\" 0\r\n", "1 \"Bridget\" 0\r\n", "2 \"Charles\" 2\r\n", "3 \"Doug\" 3\r\n", "4 \"Mark\" 3\r\n", "5 \"Michael\" 0\r\n" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%python\n", "df" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We have 3 strongly connected components in our sample graph. The first and biggest component has members Alice, Bridget, and Michael, while the second component has Doug and Mark. Charles ends up in his own component becuase there isn't an outgoing relationship from that node to any of the others.\n", "\n", "We can also call a version of the algorithm that will store the result as a property on a node. This is useful if we want to run future queries that use the result." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "+-------------------------------------------------------------------------------+\n", "| loadMillis | computeMillis | writeMillis | setCount | maxSetSize | minSetSize |\n", "+-------------------------------------------------------------------------------+\n", "| 0 | 0 | 19 | 3 | 3 | 1 |\n", "+-------------------------------------------------------------------------------+\n", "\n", "1 row available after 65 ms, consumed after another 5 ms" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.scc('User','FOLLOW', {write:true,partitionProperty:'partition'})\n", "YIELD loadMillis, computeMillis, writeMillis, setCount, maxSetSize, minSetSize;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Weighted Connected Components\n", "\n", "The Connected Components or UnionFind algorithm finds sets of connected nodes in an undirected graph where each node is reachable from any other node in the same set. It differs from the SCC algorithm because it only needs a path to exist between pairs of nodes in one direction, whereas SCC needs a path to exist in both directions. As with SCC, UnionFind is often used early in an analysis to understand a graph's structure." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MERGE (nAlice:User {id:'Alice'})\n", "MERGE (nBridget:User {id:'Bridget'})\n", "MERGE (nCharles:User {id:'Charles'})\n", "MERGE (nDoug:User {id:'Doug'})\n", "MERGE (nMark:User {id:'Mark'})\n", "MERGE (nMichael:User {id:'Michael'})\n", "\n", "MERGE (nAlice)-[:FRIEND {weight:0.5}]->(nBridget)\n", "MERGE (nAlice)-[:FRIEND {weight:4}]->(nCharles)\n", "MERGE (nMark)-[:FRIEND {weight:1}]->(nDoug)\n", "MERGE (nMark)-[:FRIEND {weight:2}]->(nMichael);" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 52, label: \"User\", title: \"{followers:0,
partition:0,
id:Alice,
_id:52}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 54, label: \"User\", title: \"{followers:1,
partition:2,
id:Charles,
_id:54}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 57, label: \"User\", title: \"{followers:1,
partition:0,
id:Michael,
_id:57}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 56, label: \"User\", title: \"{followers:0,
partition:3,
id:Mark,
_id:56}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 55, label: \"User\", title: \"{followers:5,
partition:3,
id:Doug,
_id:55}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 53, label: \"User\", title: \"{followers:1,
partition:0,
id:Bridget,
_id:53}\", color: 'rgba(149,187,174,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 56,to: 55, arrows:'to', title: \"{weight:1, _id:115,}\" },\n", " { from: 56,to: 57, arrows:'to', title: \"{weight:2, _id:116,}\" },\n", " { from: 52,to: 53, arrows:'to', title: \"{weight:0.5, _id:113,}\" },\n", " { from: 52,to: 54, arrows:'to', title: \"{weight:4, _id:114,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('57c51a44-b0a0-44fe-959f-760b3161f84e');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "| n | f | m |\n", "+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "| (:User {_id_: 52, followers: 0, partition: 0, id: \"Alice\"}) | [:FRIEND {weight: 0.5, _id_: 113}[52>53]] | (:User {_id_: 53, followers: 1, partition: 0, id: \"Bridget\"}) |\n", "| (:User {_id_: 52, followers: 0, partition: 0, id: \"Alice\"}) | [:FRIEND {weight: 4, _id_: 114}[52>54]] | (:User {_id_: 54, followers: 1, partition: 2, id: \"Charles\"}) |\n", "| (:User {_id_: 56, followers: 0, partition: 3, id: \"Mark\"}) | [:FRIEND {weight: 1, _id_: 115}[56>55]] | (:User {_id_: 55, followers: 5, partition: 3, id: \"Doug\"}) |\n", "| (:User {_id_: 56, followers: 0, partition: 3, id: \"Mark\"}) | [:FRIEND {weight: 2, _id_: 116}[56>57]] | (:User {_id_: 57, followers: 1, partition: 0, id: \"Michael\"}) |\n", "+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "\n", "4 rows available after 14 ms, consumed after another 1 ms" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:User)-[f:FRIEND]->(m:User)\n", "RETURN n, f, m" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+-------------------+\n", "| user | setId |\n", "+-------------------+\n", "| \"Alice\" | 0 |\n", "| \"Bridget\" | 1 |\n", "| \"Charles\" | 0 |\n", "| \"Doug\" | 4 |\n", "| \"Mark\" | 4 |\n", "| \"Michael\" | 4 |\n", "+-------------------+\n", "\n", "6 rows available after 21 ms, consumed after another 10 ms" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.unionFind.stream('User', 'FRIEND', {weightProperty:'weight', defaultValue:0.0, threshold:1.0, concurrency: 1})\n", "YIELD nodeId,setId\n", "\n", "MATCH (u:User) WHERE id(u) = nodeId\n", "\n", "RETURN u.id AS user, setId" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In this case we can see that because the weight of the relationship betwen Bridget and Alice is only 0.5, the relationship is ignored by the algorithm and Bridget ends up in her own component.\n", "\n", "We can also call a version of the algorithm that will store the result as a property on a node. This is useful if we want to run future queries that use the result." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+-------------------------------------------------------------+\n", "| nodes | setCount | loadMillis | computeMillis | writeMillis |\n", "+-------------------------------------------------------------+\n", "| 6 | 3 | 7 | 0 | 3 |\n", "+-------------------------------------------------------------+\n", "\n", "1 row available after 24 ms, consumed after another 0 ms" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.unionFind('User', 'FRIEND', {write:true, partitionProperty:\"partition\",weightProperty:'weight', defaultValue:0.0, threshold:1.0, concurrency: 1})\n", "YIELD nodes, setCount, loadMillis, computeMillis, writeMillis;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Unweighted Connected Components\n", "\n", "The Connected Components or UnionFind algorithm finds sets of connected nodes in an undirected graph where each node is reachable from any other node in the same set. It differs from the SCC algorithm because it only needs a path to exist between pairs of nodes in one direction, whereas SCC needs a path to exist in both directions. As with SCC, UnionFind is often used early in an analysis to understand a graph's structure.\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MERGE (nAlice:User {id:'Alice'})\n", "MERGE (nBridget:User {id:'Bridget'})\n", "MERGE (nCharles:User {id:'Charles'})\n", "MERGE (nDoug:User {id:'Doug'})\n", "MERGE (nMark:User {id:'Mark'})\n", "MERGE (nMichael:User {id:'Michael'})\n", "\n", "MERGE (nAlice)-[:FRIEND {weight:0.5}]->(nBridget)\n", "MERGE (nAlice)-[:FRIEND {weight:4}]->(nCharles)\n", "MERGE (nMark)-[:FRIEND {weight:1}]->(nDoug)\n", "MERGE (nMark)-[:FRIEND {weight:2}]->(nMichael);" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 52, label: \"User\", title: \"{followers:0,
partition:0,
id:Alice,
_id:52}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 57, label: \"User\", title: \"{followers:1,
partition:4,
id:Michael,
_id:57}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 55, label: \"User\", title: \"{followers:5,
partition:4,
id:Doug,
_id:55}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 56, label: \"User\", title: \"{followers:0,
partition:4,
id:Mark,
_id:56}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 54, label: \"User\", title: \"{followers:1,
partition:0,
id:Charles,
_id:54}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 53, label: \"User\", title: \"{followers:1,
partition:1,
id:Bridget,
_id:53}\", color: 'rgba(149,187,174,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 56,to: 55, arrows:'to', title: \"{weight:1, _id:115,}\" },\n", " { from: 56,to: 57, arrows:'to', title: \"{weight:2, _id:116,}\" },\n", " { from: 52,to: 53, arrows:'to', title: \"{weight:0.5, _id:113,}\" },\n", " { from: 52,to: 54, arrows:'to', title: \"{weight:4, _id:114,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('d5d21d32-1fe2-4329-8088-5fb4a19e79d7');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "| n | f | m |\n", "+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "| (:User {_id_: 52, followers: 0, partition: 0, id: \"Alice\"}) | [:FRIEND {weight: 0.5, _id_: 113}[52>53]] | (:User {_id_: 53, followers: 1, partition: 1, id: \"Bridget\"}) |\n", "| (:User {_id_: 52, followers: 0, partition: 0, id: \"Alice\"}) | [:FRIEND {weight: 4, _id_: 114}[52>54]] | (:User {_id_: 54, followers: 1, partition: 0, id: \"Charles\"}) |\n", "| (:User {_id_: 56, followers: 0, partition: 4, id: \"Mark\"}) | [:FRIEND {weight: 1, _id_: 115}[56>55]] | (:User {_id_: 55, followers: 5, partition: 4, id: \"Doug\"}) |\n", "| (:User {_id_: 56, followers: 0, partition: 4, id: \"Mark\"}) | [:FRIEND {weight: 2, _id_: 116}[56>57]] | (:User {_id_: 57, followers: 1, partition: 4, id: \"Michael\"}) |\n", "+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "\n", "4 rows available after 2 ms, consumed after another 3 ms" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:User)-[f:FRIEND]->(m:User)\n", "RETURN n, f, m" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+-------------------+\n", "| user | setId |\n", "+-------------------+\n", "| \"Alice\" | 0 |\n", "| \"Bridget\" | 0 |\n", "| \"Charles\" | 0 |\n", "| \"Doug\" | 4 |\n", "| \"Mark\" | 4 |\n", "| \"Michael\" | 4 |\n", "+-------------------+\n", "\n", "6 rows available after 19 ms, consumed after another 5 ms" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.unionFind.stream('User', 'FRIEND', {}) \n", "YIELD nodeId,setId\n", "\n", "MATCH (u:User) WHERE id(u) = nodeId\n", "\n", "RETURN u.id AS user, setId" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We have two distinct group of users that have no link between them. The first group contains Alice, Charles, and Bridget, while the second group contains Michael, Doug, and Mark.\n", "We can also call a version of the algorithm that will store the result as a property on a node. This is useful if we want to run future queries that use the result." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "+-------------------------------------------------------------+\n", "| nodes | setCount | loadMillis | computeMillis | writeMillis |\n", "+-------------------------------------------------------------+\n", "| 6 | 2 | 1 | 0 | 0 |\n", "+-------------------------------------------------------------+\n", "\n", "1 row available after 10 ms, consumed after another 0 ms" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.unionFind('User', 'FRIEND', {write:true, partitionProperty:\"partition\"}) \n", "YIELD nodes, setCount, loadMillis, computeMillis, writeMillis;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Label Propagation\n", "\n", "Label Propagation (LPA) is a fast algorithm for finding communities in a graph. It detects these communities using network structure alone as its guide and doesn't require a pre-defined objective function or prior information about the communities.\n", "\n", "One interesting feature of LPA is that we can give nodes preliminary labels to narrow down the range of solutions generated. This means that it can be used as semi-supervised way of finding communities where we hand-pick some initial communities." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MERGE (nAlice:User {id:'Alice'}) SET nAlice.predefined_label=52\n", "MERGE (nBridget:User {id:'Bridget'}) SET nBridget.predefined_label=21\n", "MERGE (nCharles:User {id:'Charles'}) SET nCharles.predefined_label=43\n", "MERGE (nDoug:User {id:'Doug'}) SET nDoug.predefined_label=21\n", "MERGE (nMark:User {id:'Mark'}) SET nMark.predefined_label=19\n", "MERGE (nMichael:User {id:'Michael'}) SET nMichael.predefined_label=52\n", "\n", "MERGE (nAlice)-[:FOLLOW]->(nBridget)\n", "MERGE (nAlice)-[:FOLLOW]->(nCharles)\n", "MERGE (nMark)-[:FOLLOW]->(nDoug)\n", "MERGE (nBridget)-[:FOLLOW]->(nMichael)\n", "MERGE (nDoug)-[:FOLLOW]->(nMark)\n", "MERGE (nMichael)-[:FOLLOW]->(nAlice)\n", "MERGE (nAlice)-[:FOLLOW]->(nMichael)\n", "MERGE (nBridget)-[:FOLLOW]->(nAlice)\n", "MERGE (nMichael)-[:FOLLOW]->(nBridget)\n", "MERGE (nCharles)-[:FOLLOW]->(nDoug)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 57, label: \"User\", title: \"{predefined_label:52,
followers:1,
partition:4,
id:Michael,
_id:57}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 55, label: \"User\", title: \"{predefined_label:21,
followers:5,
partition:4,
id:Doug,
_id:55}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 56, label: \"User\", title: \"{predefined_label:19,
followers:0,
partition:4,
id:Mark,
_id:56}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 52, label: \"User\", title: \"{predefined_label:52,
followers:0,
partition:0,
id:Alice,
_id:52}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 53, label: \"User\", title: \"{predefined_label:21,
followers:1,
partition:0,
id:Bridget,
_id:53}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 54, label: \"User\", title: \"{predefined_label:43,
followers:1,
partition:0,
id:Charles,
_id:54}\", color: 'rgba(149,187,174,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 56,to: 57, arrows:'to', title: \"{_id:106,}\" },\n", " { from: 57,to: 52, arrows:'to', title: \"{_id:109,}\" },\n", " { from: 52,to: 54, arrows:'to', title: \"{_id:104,}\" },\n", " { from: 53,to: 52, arrows:'to', title: \"{_id:111,}\" },\n", " { from: 55,to: 56, arrows:'to', title: \"{_id:108,}\" },\n", " { from: 54,to: 55, arrows:'to', title: \"{_id:83,}\" },\n", " { from: 52,to: 57, arrows:'to', title: \"{_id:110,}\" },\n", " { from: 56,to: 55, arrows:'to', title: \"{_id:105,}\" },\n", " { from: 57,to: 53, arrows:'to', title: \"{_id:112,}\" },\n", " { from: 52,to: 53, arrows:'to', title: \"{_id:103,}\" },\n", " { from: 53,to: 57, arrows:'to', title: \"{_id:107,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('3338793b-66ff-41bc-b51a-41e1db9d3467');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "| n | f | m |\n", "+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 0, id: \"Alice\"}) | [:FOLLOW {_id_: 104}[52>54]] | (:User {predefined_label: 43, _id_: 54, followers: 1, partition: 0, id: \"Charles\"}) |\n", "| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 0, id: \"Alice\"}) | [:FOLLOW {_id_: 110}[52>57]] | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: \"Michael\"}) |\n", "| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 0, id: \"Alice\"}) | [:FOLLOW {_id_: 103}[52>53]] | (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 0, id: \"Bridget\"}) |\n", "| (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 0, id: \"Bridget\"}) | [:FOLLOW {_id_: 107}[53>57]] | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: \"Michael\"}) |\n", "| (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 0, id: \"Bridget\"}) | [:FOLLOW {_id_: 111}[53>52]] | (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 0, id: \"Alice\"}) |\n", "| (:User {predefined_label: 43, _id_: 54, followers: 1, partition: 0, id: \"Charles\"}) | [:FOLLOW {_id_: 83}[54>55]] | (:User {predefined_label: 21, _id_: 55, followers: 5, partition: 4, id: \"Doug\"}) |\n", "| (:User {predefined_label: 21, _id_: 55, followers: 5, partition: 4, id: \"Doug\"}) | [:FOLLOW {_id_: 108}[55>56]] | (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: \"Mark\"}) |\n", "| (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: \"Mark\"}) | [:FOLLOW {_id_: 105}[56>55]] | (:User {predefined_label: 21, _id_: 55, followers: 5, partition: 4, id: \"Doug\"}) |\n", "| (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: \"Mark\"}) | [:FOLLOW {_id_: 106}[56>57]] | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: \"Michael\"}) |\n", "| (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: \"Michael\"}) | [:FOLLOW {_id_: 109}[57>52]] | (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 0, id: \"Alice\"}) |\n", "| (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: \"Michael\"}) | [:FOLLOW {_id_: 112}[57>53]] | (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 0, id: \"Bridget\"}) |\n", "+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "\n", "11 rows available after 2 ms, consumed after another 3 ms" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:User)-[f:FOLLOW]->(m:User)\n", "RETURN n, f, m" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.labelPropagation.stream(\"User\", \"FOLLOW\", {direction: \"OUTGOING\", iterations: 10})" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+-------------------------------------------------------------------------------------------+\n", "| nodes | iterations | loadMillis | computeMillis | writeMillis | write | partitionProperty |\n", "+-------------------------------------------------------------------------------------------+\n", "| 6 | 2 | 1 | 3 | 1 | TRUE | \"partition\" |\n", "+-------------------------------------------------------------------------------------------+\n", "\n", "1 row available after 18 ms, consumed after another 2 ms" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.labelPropagation('User', 'FOLLOW','OUTGOING',\n", " {iterations:10,partitionProperty:'partition', write:true})\n", "YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, write, partitionProperty;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Louvain\n", "\n", "The Louvain method of community detection is an algorithm for detecting communities in networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are compared to how connected they would be in a random network.\n", "\n", "The Louvain algorithm is one of the fastest modularity based algorithms and works well with large graphs. It also reveals a hierarchy of communities at different scales, which can be useful for understanding the global functioning of a network." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MERGE (nAlice:User {id:'Alice'})\n", "MERGE (nBridget:User {id:'Bridget'})\n", "MERGE (nCharles:User {id:'Charles'})\n", "MERGE (nDoug:User {id:'Doug'})\n", "MERGE (nMark:User {id:'Mark'})\n", "MERGE (nMichael:User {id:'Michael'})\n", "\n", "MERGE (nAlice)-[:FRIEND]->(nBridget)\n", "MERGE (nAlice)-[:FRIEND]->(nCharles)\n", "MERGE (nMark)-[:FRIEND]->(nDoug)\n", "MERGE (nBridget)-[:FRIEND]->(nMichael)\n", "MERGE (nCharles)-[:FRIEND]->(nMark)\n", "MERGE (nAlice)-[:FRIEND]->(nMichael)\n", "MERGE (nCharles)-[:FRIEND]->(nDoug)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 57, label: \"User\", title: \"{predefined_label:52,
followers:1,
partition:4,
id:Michael,
community:0,
_id:57}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 52, label: \"User\", title: \"{predefined_label:52,
followers:0,
partition:4,
id:Alice,
community:0,
_id:52}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 55, label: \"User\", title: \"{predefined_label:21,
followers:5,
partition:4,
id:Doug,
community:2,
_id:55}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 54, label: \"User\", title: \"{predefined_label:43,
followers:1,
partition:4,
id:Charles,
community:2,
_id:54}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 53, label: \"User\", title: \"{predefined_label:21,
followers:1,
partition:4,
id:Bridget,
community:0,
_id:53}\", color: 'rgba(149,187,174,0.5)'},\n", " { id: 56, label: \"User\", title: \"{predefined_label:19,
followers:0,
partition:4,
id:Mark,
community:2,
_id:56}\", color: 'rgba(149,187,174,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 56,to: 55, arrows:'to', title: \"{weight:1, _id:115,}\" },\n", " { from: 56,to: 57, arrows:'to', title: \"{weight:2, _id:116,}\" },\n", " { from: 53,to: 57, arrows:'to', title: \"{_id:120,}\" },\n", " { from: 52,to: 54, arrows:'to', title: \"{weight:4, _id:114,}\" },\n", " { from: 52,to: 57, arrows:'to', title: \"{_id:122,}\" },\n", " { from: 54,to: 56, arrows:'to', title: \"{_id:121,}\" },\n", " { from: 54,to: 55, arrows:'to', title: \"{_id:123,}\" },\n", " { from: 52,to: 53, arrows:'to', title: \"{weight:0.5, _id:113,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('97e8cf5c-f812-4f6a-8636-84fa6fcd558f');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "| n | f | m |\n", "+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 4, id: \"Alice\", community: 0}) | [:FRIEND {_id_: 122}[52>57]] | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: \"Michael\", community: 0}) |\n", "| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 4, id: \"Alice\", community: 0}) | [:FRIEND {weight: 0.5, _id_: 113}[52>53]] | (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 4, id: \"Bridget\", community: 0}) |\n", "| (:User {predefined_label: 52, _id_: 52, followers: 0, partition: 4, id: \"Alice\", community: 0}) | [:FRIEND {weight: 4, _id_: 114}[52>54]] | (:User {predefined_label: 43, _id_: 54, followers: 1, partition: 4, id: \"Charles\", community: 2}) |\n", "| (:User {predefined_label: 21, _id_: 53, followers: 1, partition: 4, id: \"Bridget\", community: 0}) | [:FRIEND {_id_: 120}[53>57]] | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: \"Michael\", community: 0}) |\n", "| (:User {predefined_label: 43, _id_: 54, followers: 1, partition: 4, id: \"Charles\", community: 2}) | [:FRIEND {_id_: 121}[54>56]] | (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: \"Mark\", community: 2}) |\n", "| (:User {predefined_label: 43, _id_: 54, followers: 1, partition: 4, id: \"Charles\", community: 2}) | [:FRIEND {_id_: 123}[54>55]] | (:User {predefined_label: 21, _id_: 55, followers: 5, partition: 4, id: \"Doug\", community: 2}) |\n", "| (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: \"Mark\", community: 2}) | [:FRIEND {weight: 1, _id_: 115}[56>55]] | (:User {predefined_label: 21, _id_: 55, followers: 5, partition: 4, id: \"Doug\", community: 2}) |\n", "| (:User {predefined_label: 19, _id_: 56, followers: 0, partition: 4, id: \"Mark\", community: 2}) | [:FRIEND {weight: 2, _id_: 116}[56>57]] | (:User {predefined_label: 52, _id_: 57, followers: 1, partition: 4, id: \"Michael\", community: 0}) |\n", "+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n", "\n", "8 rows available after 1 ms, consumed after another 1 ms" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:User)-[f:FRIEND]->(m:User)\n", "RETURN n, f, m" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+-----------------------+\n", "| user | community |\n", "+-----------------------+\n", "| \"Alice\" | 0 |\n", "| \"Bridget\" | 0 |\n", "| \"Michael\" | 0 |\n", "| \"Charles\" | 2 |\n", "| \"Doug\" | 2 |\n", "| \"Mark\" | 2 |\n", "+-----------------------+\n", "\n", "6 rows available after 59 ms, consumed after another 0 ms" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.louvain.stream('User', 'FRIEND', {})\n", "YIELD nodeId, community\n", "\n", "MATCH (user:User) WHERE id(user) = nodeId\n", "\n", "RETURN user.id AS user, community\n", "ORDER BY community;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Our algorithm found two communities with 3 members each. Mark, Doug, and Charles are all friends with each other, as are Bridget, Alice, and Michael. Charles is the only one who has friends in both communities but he has more in community 4 so he fits better in that one.\n", "\n", "We can also call a version of the algorithm that will store the result as a property on a node. This is useful if we want to run future queries that use the result." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "+--------------------------------------------------------------------------------+\n", "| nodes | communityCount | iterations | loadMillis | computeMillis | writeMillis |\n", "+--------------------------------------------------------------------------------+\n", "| 6 | 2 | 1 | 2 | 12 | 4 |\n", "+--------------------------------------------------------------------------------+\n", "\n", "1 row available after 26 ms, consumed after another 1 ms" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.louvain('User', 'FRIEND',\n", " {write:true, writeProperty:'community'})\n", "YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Triangle Counting\n", "\n", "Triangle counting is a community detection graph algorithm that is used to determine the number of triangles passing through each node in the graph. A triangle is a set of three nodes where each node has a relationship to all other nodes." ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CREATE (alice:Person{id:\"Alice\"}),\n", " (michael:Person{id:\"Michael\"}),\n", " (karin:Person{id:\"Karin\"}),\n", " (chris:Person{id:\"Chris\"}),\n", " (will:Person{id:\"Will\"}),\n", " (mark:Person{id:\"Mark\"})\n", "CREATE (michael)-[:KNOWS]->(karin),\n", " (michael)-[:KNOWS]->(chris),\n", " (will)-[:KNOWS]->(michael),\n", " (mark)-[:KNOWS]->(michael),\n", " (mark)-[:KNOWS]->(will),\n", " (alice)-[:KNOWS]->(michael),\n", " (will)-[:KNOWS]->(chris),\n", " (chris)-[:KNOWS]->(karin);" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "require([\"https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js\"], function(vis) {\n", " var nodes = new vis.DataSet([\n", " { id: 100, label: \"Person\", title: \"{id:Alice,
_id:100}\", color: 'rgba(90,78,41,0.5)'},\n", " { id: 102, label: \"Person\", title: \"{id:Karin,
_id:102}\", color: 'rgba(90,78,41,0.5)'},\n", " { id: 101, label: \"Person\", title: \"{id:Michael,
_id:101}\", color: 'rgba(90,78,41,0.5)'},\n", " { id: 105, label: \"Person\", title: \"{id:Mark,
_id:105}\", color: 'rgba(90,78,41,0.5)'},\n", " { id: 104, label: \"Person\", title: \"{id:Will,
_id:104}\", color: 'rgba(90,78,41,0.5)'},\n", " { id: 103, label: \"Person\", title: \"{id:Chris,
_id:103}\", color: 'rgba(90,78,41,0.5)'},\n", " \n", " ]);\n", "\n", " // create an array with edges\n", " var edges = new vis.DataSet([\n", " { from: 104,to: 103, arrows:'to', title: \"{_id:146,}\" },\n", " { from: 101,to: 102, arrows:'to', title: \"{_id:140,}\" },\n", " { from: 100,to: 101, arrows:'to', title: \"{_id:145,}\" },\n", " { from: 101,to: 103, arrows:'to', title: \"{_id:141,}\" },\n", " { from: 105,to: 101, arrows:'to', title: \"{_id:143,}\" },\n", " { from: 104,to: 101, arrows:'to', title: \"{_id:142,}\" },\n", " { from: 103,to: 102, arrows:'to', title: \"{_id:147,}\" },\n", " { from: 105,to: 104, arrows:'to', title: \"{_id:144,}\" },\n", " \n", " ]);\n", "\n", " // create a network\n", " var container = document.getElementById('47aac22a-1d0f-4931-abf9-27907e7f5b6f');\n", " var data = {\n", " nodes: nodes,\n", " edges: edges\n", " };\n", "\n", " var options = {\n", " edges: {\n", " arrows: {\n", " to: {\n", " scaleFactor: 0.5\n", " }\n", " }\n", " },\n", " width: '100%',\n", " height: '500px',\n", " interaction: {hover: true}\n", " };\n", "\n", " var network = new vis.Network(container, data, options);});" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "+-------------------------------------------------------------------------------------------------------------+\n", "| n | f | m |\n", "+-------------------------------------------------------------------------------------------------------------+\n", "| (:Person {_id_: 100, id: \"Alice\"}) | [:KNOWS {_id_: 145}[100>101]] | (:Person {_id_: 101, id: \"Michael\"}) |\n", "| (:Person {_id_: 101, id: \"Michael\"}) | [:KNOWS {_id_: 140}[101>102]] | (:Person {_id_: 102, id: \"Karin\"}) |\n", "| (:Person {_id_: 101, id: \"Michael\"}) | [:KNOWS {_id_: 141}[101>103]] | (:Person {_id_: 103, id: \"Chris\"}) |\n", "| (:Person {_id_: 103, id: \"Chris\"}) | [:KNOWS {_id_: 147}[103>102]] | (:Person {_id_: 102, id: \"Karin\"}) |\n", "| (:Person {_id_: 104, id: \"Will\"}) | [:KNOWS {_id_: 146}[104>103]] | (:Person {_id_: 103, id: \"Chris\"}) |\n", "| (:Person {_id_: 104, id: \"Will\"}) | [:KNOWS {_id_: 142}[104>101]] | (:Person {_id_: 101, id: \"Michael\"}) |\n", "| (:Person {_id_: 105, id: \"Mark\"}) | [:KNOWS {_id_: 144}[105>104]] | (:Person {_id_: 104, id: \"Will\"}) |\n", "| (:Person {_id_: 105, id: \"Mark\"}) | [:KNOWS {_id_: 143}[105>101]] | (:Person {_id_: 101, id: \"Michael\"}) |\n", "+-------------------------------------------------------------------------------------------------------------+\n", "\n", "8 rows available after 12 ms, consumed after another 1 ms" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n:Person)-[f:KNOWS]->(m:Person)\n", "RETURN n, f, m" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "+-------------------------------+\n", "| nodeA | nodeB | nodeC |\n", "+-------------------------------+\n", "| \"Michael\" | \"Chris\" | \"Will\" |\n", "| \"Michael\" | \"Karin\" | \"Chris\" |\n", "| \"Michael\" | \"Will\" | \"Mark\" |\n", "+-------------------------------+\n", "\n", "3 rows available after 31 ms, consumed after another 6 ms" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CALL algo.triangle.stream('Person','KNOWS') \n", "yield nodeA,nodeB,nodeC\n", "\n", "MATCH (a:Person) WHERE id(a) = nodeA\n", "MATCH (b:Person) WHERE id(b) = nodeB\n", "MATCH (c:Person) WHERE id(c) = nodeC\n", "\n", "RETURN a.id AS nodeA, b.id AS nodeB, c.id AS nodeC" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Cleaning up...\n", "\n", "To delete everything in the database run:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MATCH (n) DETACH DELETE n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To throw away the container, which we created above run:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "%%bash\n", "docker stop myneo4j\n", "docker rm myneo4j" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Cypher", "language": "cypher", "name": "cypher" }, "language_info": { "file_extension": ".cql", "mimetype": "text/cypher", "name": "cypher" } }, "nbformat": 4, "nbformat_minor": 2 }