{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introducing py2neo\n", "\n", "py2neo is the most popular of the Python drivers used to interact with Neo4j. For simplicity, this example assumes that you've got authentication turned off. \n", "\n", "You can turn authentication off by uncommenting this line in your neo4j.conf file:\n", "\n", "`dbms.security.auth_enabled=false`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll import py2neo and write a simple query to find all the groups that have 'Python' in the name:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from py2neo import Graph\n", "graph = Graph()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " group.name | topics \n", "--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n", " Python for Quant Finance | ['Data Mining', 'Computer programming', 'Data Analytics', 'Machine Learning', 'Predictive Analytics', 'Data Visualization', 'Big Data', 'Cloud Computing', 'Trading', 'Finance', 'Python', 'New Technology', 'Open Source']\n", "\n", " group.name | topics \n", "----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n", " Python and Django Coding Session | ['Front-end Development', 'HTML', 'Computer programming', 'Website Design', 'Programming Languages', 'Open Source', 'Software Development', 'Web Technology', 'Django', 'Web Development', 'Web Design', 'MySQL', 'Python', 'CSS']\n", "\n", " group.name | topics \n", "------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n", " London Python Project Nights | ['New Technology', 'Technology', 'Python', 'Software Development', 'Open Source', 'Open source python', 'Computer programming', 'Projects', 'Python Web Development', 'Getting started with Python']\n", "\n" ] } ], "source": [ "query = \"\"\"\n", "MATCH (group:Group)-[:HAS_TOPIC]->(topic)\n", "WHERE group.name CONTAINS \"Python\" \n", "RETURN group.name, COLLECT(topic.name) AS topics\n", "\"\"\"\n", "\n", "result = graph.cypher.execute(query)\n", "\n", "for row in result:\n", " print(row) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should see a few groups and a list of the topics that they have.\n", "\n", "# Calculating topic similarity\n", "\n", "Now that we've got the hang of executing Neo4j queries from Python let's calculate topic similarity based on common groups so that we can use it in our queries.\n", "\n", "We'll first import the igraph library:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from igraph import Graph as IGraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we'll write a query which finds all pairs of topics and then works out the number of common groups. We'll use that as our 'weight' in the similarity calculation." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ " | topic.name | other.name | weight\n", "----+-----------------------+------------+--------\n", " 1 | Open Source | Python | 13\n", " 2 | Big Data | Python | 12\n", " 3 | Computer programming | Python | 10\n", " 4 | Software Development | Python | 10\n", " 5 | Data Science | Python | 9\n", " 6 | Web Development | Python | 8\n", " 7 | Data Analytics | Python | 7\n", " 8 | Machine Learning | Python | 7\n", " 9 | Data Visualization | Python | 6\n", " 10 | Data Mining | Python | 6\n", " 11 | JavaScript | Python | 6\n", " 12 | Hadoop | Python | 6\n", " 13 | Cloud Computing | Python | 4\n", " 14 | Ruby | Python | 4\n", " 15 | Predictive Analytics | Python | 4\n", " 16 | Mobile Development | Python | 4\n", " 17 | iOS Development | Python | 4\n", " 18 | Programming Languages | Python | 3\n", " 19 | Apache Spark | Python | 3\n", " 20 | nodeJS | Python | 3" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query = \"\"\"\n", "MATCH (topic:Topic)<-[:HAS_TOPIC]-()-[:HAS_TOPIC]->(other:Topic)\n", "WHERE ID(topic) < ID(other)\n", "RETURN topic.name, other.name, COUNT(*) AS weight\n", "ORDER BY weight DESC\n", "LIMIT 20\n", "\"\"\"\n", "\n", "graph.cypher.execute(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's run the query again and wrap the output in igraph:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query = \"\"\"\n", "MATCH (topic:Topic)<-[:HAS_TOPIC]-()-[:HAS_TOPIC]->(other:Topic)\n", "WHERE ID(topic) < ID(other)\n", "RETURN topic.name, other.name, COUNT(*) AS weight\n", "\"\"\"\n", "\n", "ig = IGraph.TupleList(graph.cypher.execute(query), weights=True)\n", "ig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're now ready to run a community detection algorithm over the graph to see what clusters/communities we have:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "46" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clusters = IGraph.community_walktrap(ig, weights=\"weight\")\n", "clusters = clusters.as_clustering()\n", "len(clusters)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a quick look at what we've got:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[{'group': 0, 'id': 'Computer programming', 'label': 'Computer programming'},\n", " {'group': 0, 'id': 'Geeks & Nerds', 'label': 'Geeks & Nerds'},\n", " {'group': 1, 'id': 'Data Science', 'label': 'Data Science'},\n", " {'group': 2, 'id': 'Sci-Fi/Fantasy', 'label': 'Sci-Fi/Fantasy'},\n", " {'group': 3, 'id': 'Cloud Computing', 'label': 'Cloud Computing'},\n", " {'group': 4, 'id': 'Social CRM', 'label': 'Social CRM'},\n", " {'group': 5, 'id': 'Hack', 'label': 'Hack'},\n", " {'group': 0,\n", " 'id': 'Go programming language',\n", " 'label': 'Go programming language'},\n", " {'group': 0, 'id': 'Front-end Development', 'label': 'Front-end Development'},\n", " {'group': 0, 'id': 'Finding a New Job', 'label': 'Finding a New Job'}]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nodes = [node[\"name\"] for node in ig.vs]\n", "nodes = [{\"id\": x, \"label\": x} for x in nodes]\n", "nodes[:5]\n", "\n", "for node in nodes:\n", " idx = ig.vs.find(name=node[\"id\"]).index\n", " node[\"group\"] = clusters.membership[idx]\n", " \n", "nodes[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally we're going to write a Cypher query which takes the results of our community detection algorithm and writes the results back into Neo4j:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query = \"\"\"\n", "UNWIND {params} AS p \n", "MATCH (t:Topic {name: p.id}) \n", "MERGE (cluster:Cluster {name: p.group})\n", "MERGE (t)-[:IN_CLUSTER]->(cluster)\n", "\"\"\"\n", "\n", "graph.cypher.execute(query, params = nodes)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.0" } }, "nbformat": 4, "nbformat_minor": 0 }