# Introducing py2neo

py2neo is the most popular of the Python drivers used to interact with Neo4j. For simplicity, this example assumes that you've got authentication turned off. 

You can turn authentication off by uncommenting this line in your neo4j.conf file:

`dbms.security.auth_enabled=false`

Now we'll import py2neo and write a simple query to find all the groups that have 'Python' in the name:

In [1]:
from py2neo import Graph
graph = Graph()

In [2]:
query = """
MATCH (group:Group)-[:HAS_TOPIC]->(topic)
WHERE group.name CONTAINS "Python" 
RETURN group.name, COLLECT(topic.name) AS topics
"""

result = graph.cypher.execute(query)

for row in result:
 print(row) 

 group.name | topics 
--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Python for Quant Finance | ['Data Mining', 'Computer programming', 'Data Analytics', 'Machine Learning', 'Predictive Analytics', 'Data Visualization', 'Big Data', 'Cloud Computing', 'Trading', 'Finance', 'Python', 'New Technology', 'Open Source']

 group.name | topics 
----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Python and Django Coding Session | ['Front-end Development', 'HTML', 'Computer programming', 'Website Design', 'Programming Languages', 'Open Source', 'Software Development', 'Web Technology', 

You should see a few groups and a list of the topics that they have.

# Calculating topic similarity

Now that we've got the hang of executing Neo4j queries from Python let's calculate topic similarity based on common groups so that we can use it in our queries.

We'll first import the igraph library:

In [7]:
from igraph import Graph as IGraph

Next we'll write a query which finds all pairs of topics and then works out the number of common groups. We'll use that as our 'weight' in the similarity calculation.

In [5]:
query = """
MATCH (topic:Topic)<-[:HAS_TOPIC]-()-[:HAS_TOPIC]->(other:Topic)
WHERE ID(topic) < ID(other)
RETURN topic.name, other.name, COUNT(*) AS weight
ORDER BY weight DESC
LIMIT 20
"""

graph.cypher.execute(query)

 | topic.name | other.name | weight
----+-----------------------+------------+--------
 1 | Open Source | Python | 13
 2 | Big Data | Python | 12
 3 | Computer programming | Python | 10
 4 | Software Development | Python | 10
 5 | Data Science | Python | 9
 6 | Web Development | Python | 8
 7 | Data Analytics | Python | 7
 8 | Machine Learning | Python | 7
 9 | Data Visualization | Python | 6
 10 | Data Mining | Python | 6
 11 | JavaScript | Python | 6
 12 | Hadoop | Python | 6
 13 | Cloud Computing | Python | 4
 14 | Ruby | Python | 4
 15 | Predictive Analytics | Python | 4
 16 | Mobile Development | Python | 4
 17 | iOS Development | Python | 4
 18 | Programming Languages | Python | 3
 19 | Apache Spark | Python | 3
 20 | nodeJS | Python | 3

Now let's run the query again and wrap the output in igraph:

In [8]:
query = """
MATCH (topic:Topic)<-[:HAS_TOPIC]-()-[:HAS_TOPIC]->(other:Topic)
WHERE ID(topic) < ID(other)
RETURN topic.name, other.name, COUNT(*) AS weight
"""

ig = IGraph.TupleList(graph.cypher.execute(query), weights=True)
ig



We're now ready to run a community detection algorithm over the graph to see what clusters/communities we have:

In [9]:
clusters = IGraph.community_walktrap(ig, weights="weight")
clusters = clusters.as_clustering()
len(clusters)

46

Let's have a quick look at what we've got:

In [11]:
nodes = [node["name"] for node in ig.vs]
nodes = [{"id": x, "label": x} for x in nodes]
nodes[:5]

for node in nodes:
 idx = ig.vs.find(name=node["id"]).index
 node["group"] = clusters.membership[idx]
 
nodes[:10]

[{'group': 0, 'id': 'Computer programming', 'label': 'Computer programming'},
 {'group': 0, 'id': 'Geeks & Nerds', 'label': 'Geeks & Nerds'},
 {'group': 1, 'id': 'Data Science', 'label': 'Data Science'},
 {'group': 2, 'id': 'Sci-Fi/Fantasy', 'label': 'Sci-Fi/Fantasy'},
 {'group': 3, 'id': 'Cloud Computing', 'label': 'Cloud Computing'},
 {'group': 4, 'id': 'Social CRM', 'label': 'Social CRM'},
 {'group': 5, 'id': 'Hack', 'label': 'Hack'},
 {'group': 0,
 'id': 'Go programming language',
 'label': 'Go programming language'},
 {'group': 0, 'id': 'Front-end Development', 'label': 'Front-end Development'},
 {'group': 0, 'id': 'Finding a New Job', 'label': 'Finding a New Job'}]

And finally we're going to write a Cypher query which takes the results of our community detection algorithm and writes the results back into Neo4j:

In [12]:
query = """
UNWIND {params} AS p 
MATCH (t:Topic {name: p.id}) 
MERGE (cluster:Cluster {name: p.group})
MERGE (t)-[:IN_CLUSTER]->(cluster)
"""

graph.cypher.execute(query, params = nodes)

