Use Case 1

Find the companies where X has worked, and their roles at those companies

What questions would we have to ask of our data?

"Which companies has X worked for, and in what roles?"

Reviewing this question we can identify several entities, attributes and relationships. We have the concept of a company, a person (X), and a role. Further, a person worked for a company.

Company and Person are both entities, which we'll model as vertices with appropriate labels. For now, we'll assume a direct relationship between a person and a company: a person WORKED FOR a company. We'll make role an attribute of this relationship.

Adding in a few properties – firstName and lastName for a person, name for a company – we end up with the following data model:

Keep it simple

Over the course of this exercise we'll see role change place several times. At this stage it's a simple attribute of a relationship. In later steps we'll see it promoted to being a vertex in its own right.

As far as our current use case is concerned, role appears to be a simple value type, much like colour, height or weight. If it were a complex value type with several fields – such as address – or if there were some explicit structural relations between values – as there are in a category hierarchy – we would consider making it a vertex from the outset.

Sample dataset

We'll now create a sample dataset in line with our model. We'll include enough data to ensure that our queries have to exclude some portions of the graph in order to return a correct result.

Creating our sample data

In [1]:
%load_ext ipython_unittest
%run '../util/neptune.py'
In [2]:
neptune.clear()
g = neptune.graphTraversal()
clearing data...
clearing property graph data [edge_batch_size=200, edge_count=Unknown]...
clearing property graph data [vertex_batch_size=200, vertex_count=Unknown]...
clearing rdf data...
done
gremlin: ws://xxxxxxxxxxxxxxxx-xxxxxxxxxxxxxx.cluster-xxxxxxxxxxxx.us-east-1.neptune.amazonaws.com:8182/gremlin
In [3]:
(g.
   addV('Person').property(id,'p-1').property('firstName','Martha').property('lastName','Rivera').
   addV('Person').property(id,'p-2').property('firstName','Richard').property('lastName','Roe').
   addV('Person').property(id,'p-3').property('firstName','Li').property('lastName','Juan').
   addV('Person').property(id,'p-4').property('firstName','John').property('lastName','Stiles').
   addV('Person').property(id,'p-5').property('firstName','Saanvi').property('lastName','Sarkar').
   addV('Company').property(id,'c-1').property('name','Example Corp').
   addV('Company').property(id,'c-2').property('name','AnyCompany').
   V('p-1').addE('WORKED_FOR').to(V('c-1')).property('role','Principal Analyst').                         
   V('p-2').addE('WORKED_FOR').to(V('c-1')).property('role','Senior Analyst').                           
   V('p-3').addE('WORKED_FOR').to(V('c-1')).property('role','Analyst').
   V('p-4').addE('WORKED_FOR').to(V('c-1')).property('role','Analyst').                           
   V('p-5').addE('WORKED_FOR').to(V('c-2')).property('role','Manager').
   V('p-3').addE('WORKED_FOR').to(V('c-2')).property('role','Associate Analyst').
   toList())
Out[3]:
[e[96b3e41e-7075-e611-8e17-b1af4d1cf807][p-3-WORKED_FOR->c-2]]

Querying the data

Query 1 – Which companies has Li worked for, and in what roles?

To answer this question, we'll have to perform the following steps:

  1. Start at the Person vertex representing Li
  2. Follow WORKED_FOR edges to find each Company for whom Li has worked
  3. Select the Company details, and the role property of the relationship

Write a failing unit test

In [4]:
%%unittest

results = None # TODO

assert results == [{'company': 'Example Corp', 'role': 'Analyst'}, 
                   {'company': 'AnyCompany', 'role': 'Associate Analyst'}]

Fail
F
======================================================================
FAIL: test_1 (__main__.JupyterTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Cell Tests", line 5, in test_1
AssertionError: None != [{'company': 'Example Corp', 'role': 'Ana[58 chars]st'}]

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (failures=1)
Out[4]:
<unittest.runner.TextTestResult run=1 errors=0 failures=1>

And write the query to make it pass

In [5]:
%%unittest

results = (g.V('p-3').
             outE('WORKED_FOR').as_('e').
             otherV().
             project('company', 'role').
             by('name').
             by(select('e').values('role')).
             toList())

assert results == [{'company': 'Example Corp', 'role': 'Analyst'}, 
                   {'company': 'AnyCompany', 'role': 'Associate Analyst'}]

Success
.
----------------------------------------------------------------------
Ran 1 test in 0.012s

OK
Out[5]:
<unittest.runner.TextTestResult run=1 errors=0 failures=0>