# Querying

This notebook demonstrates Nexus Forge data [querying features](https://nexus-forge.readthedocs.io/en/latest/interaction.html#querying).

In [1]:
from kgforge.core import KnowledgeGraphForge

A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook [00-Initialization.ipynb](00%20-%20Initialization.ipynb).

In [2]:
forge = KnowledgeGraphForge("../../configurations/forge.yml")

## Imports

In [3]:
from kgforge.core import Resource
from kgforge.specializations.resources import Dataset
from kgforge.core.wrappings.paths import Filter, FilterOperator

## Retrieval

### latest version

In [4]:
jane = Resource(type="Person", name="Jane Doe", award=["Nobel"])

In [5]:
forge.register(jane)

<action> _register_one
<succeeded> True


In [6]:
resource = forge.retrieve(jane.id)

In [7]:
resource == jane

True

### specific version

In [8]:
jane = Resource(type="Person", name="Jane Doe", award=["Nobel"])

In [9]:
forge.register(jane)

<action> _register_one
<succeeded> True


In [10]:
forge.tag(jane, "v1")

<action> _tag_one
<succeeded> True


In [11]:
jane.email = ["jane.doe@epfl.ch", "jane.doe@example.org"]

In [12]:
forge.update(jane)

<action> _update_one
<succeeded> True


In [13]:
try:
    # DemoStore
    print(jane._store_metadata.version)
except:
    # BlueBrainNexus
    print(jane._store_metadata._rev)

3


In [14]:
jane_v1 = forge.retrieve(jane.id, version=1)

In [15]:
jane_v1_tag = forge.retrieve(jane.id, version="v1")

In [16]:
jane_v1_rev = forge.retrieve(jane.id+"?rev=1")

In [17]:
jane_v1 == jane_v1_tag

True

In [18]:
jane_v1 == jane_v1_rev

True

In [19]:
jane_v1 != jane

True

In [20]:
try:
    # DemoStore
    print(jane_v1._store_metadata.version)
except:
    # BlueBrainNexus
    print(jane_v1._store_metadata._rev)

1


### crossbucket retrieval
It is possible to retrieve resources stored in buckets different then the configured one. The configured store should of course support it.

In [21]:
resource = forge.retrieve(jane.id, cross_bucket=True) # cross_bucket defaults to False

In [22]:
resource._store_metadata

{'id': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/0ee3e0d7-84f5-424d-937a-76dc7a5d7a99',
 '_constrainedBy': 'https://bluebrain.github.io/nexus/schemas/unconstrained.json',
 '_createdAt': '2022-04-12T21:29:14.410Z',
 '_createdBy': 'https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy',
 '_deprecated': False,
 '_incoming': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/0ee3e0d7-84f5-424d-937a-76dc7a5d7a99/incoming',
 '_outgoing': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/0ee3e0d7-84f5-424d-937a-76dc7a5d7a99/outgoing',
 '_project': 'https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge',
 '_rev': 3,
 '_schemaProject': 'https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge',
 '_self': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/0ee3e0d7-84f5-424d-937a-76dc7a5d7a99',
 '_updatedAt': '2022-04-12T21:29:21.465Z',
 '_updatedBy': 'https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy'}

In [23]:
resource._last_action

Action(error=None, message=None, operation='retrieve', succeeded=True)

In [24]:
resource._synchronized

True

### Original source retrieval
When using BlueBrainNexusStore, it is possible to retrieve resources' payload as they were registered (retrieve_source=True) without any changes related to store added metadata or JSONLD framing.

In [25]:
resource = forge.retrieve(jane.id, retrieve_source=False) # retrieve_source defaults to True

In [26]:
forge.as_json(resource)

{'id': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/0ee3e0d7-84f5-424d-937a-76dc7a5d7a99',
 'type': 'Person',
 'award': 'Nobel',
 'email': ['jane.doe@epfl.ch', 'jane.doe@example.org'],
 'name': 'Jane Doe'}

In [27]:
resource._store_metadata

{'id': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/0ee3e0d7-84f5-424d-937a-76dc7a5d7a99',
 '_constrainedBy': 'https://bluebrain.github.io/nexus/schemas/unconstrained.json',
 '_createdAt': '2022-04-12T21:29:14.410Z',
 '_createdBy': 'https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy',
 '_deprecated': False,
 '_incoming': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/0ee3e0d7-84f5-424d-937a-76dc7a5d7a99/incoming',
 '_outgoing': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/0ee3e0d7-84f5-424d-937a-76dc7a5d7a99/outgoing',
 '_project': 'https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge',
 '_rev': 3,
 '_schemaProject': 'https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge',
 '_self': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/0ee3e0d7-84f5-424d-937a-76dc7a5d7a99',
 '_updatedAt': '2022-04-12T21:29:21.465Z',
 '_updatedBy': 'https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy'}

In [28]:
resource._last_action

Action(error=None, message=None, operation='retrieve', succeeded=True)

In [29]:
resource._synchronized

True

### error handling

In [4]:
resource = forge.retrieve("123")

<action> retrieve
<error> RetrievalError: 404 Client Error: Not Found for url: https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/%3A%2F%2F123/source



In [5]:
resource is None

True

## Searching

Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel. Commented lines are for DemoModel.

In [6]:
jane = Resource(type="Person", name="Jane Doe")
contribution_jane = Resource(type="Contribution", agent=jane)

In [7]:
john = Resource(type="Person", name="John Smith")
contribution_john = Resource(type="Contribution", agent=john)

In [38]:
dataset = Dataset(forge, type="Dataset", contribution=[contribution_jane, contribution_john])
dataset.add_distribution("../../data/associations.tsv")

In [53]:
forge.register(dataset)

<action> _register_one
<succeeded> True


In [54]:
forge.as_json(dataset)

{'id': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/69ff5c8c-098b-4440-8387-367bf9cfa9cc',
 'type': 'Dataset',
 'contribution': [{'type': 'Contribution',
   'agent': {'type': 'Person', 'name': 'Jane Doe'}},
  {'type': 'Contribution', 'agent': {'type': 'Person', 'name': 'John Smith'}}],
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault',
    'type': 'DiskStorage',
    '_rev': 1}},
  'contentSize': {'unitCode': 'bytes', 'value': 477},
  'contentUrl': 'https://bbp.epfl.ch/nexus/v1/files/dke/kgforge/af70dd9d-5161-49a4-a6ae-ef247f233694',
  'digest': {'algorithm': 'SHA-256',
   'value': '789aa07948683fe036ac29811814a826b703b562f7d168eb70dee1fabde26859'},
  'encodingFormat': 'text/tab-separated-values',
  'name': 'associations.tsv'}}

### Using resource paths as filters

The `paths` method load the template or property paths (ie. expected properties) for a given type.

Please refer to the [Modeling.ipynb](11%20-%20Modeling.ipynb) notebook to learn about templates and types.

In [39]:
p = forge.paths("Dataset")

Autocompletion is enabled on `p` and this can be used to create search filters.

Note: There is a known issue for RdfModel which requires using `p.type.id` instead of `p.type`.

All [python comparison operators](https://www.w3schools.com/python/gloss_python_comparison_operators.asp) are supported.

In [42]:
resources = forge.search(p.type.id=="Person", limit=3)

In [43]:
type(resources)

list

In [44]:
len(resources)

3

In [45]:
forge.as_dataframe(resources)

Unnamed: 0,id,type,email,name,address.type,address.country,address.locality
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,john.smith@epfl.ch,John Smith,,,
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,(missing),Jane Doe,PostalAddress,Switzerland,Geneva
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,john.smith@epfl.ch,John Smith,,,


In [46]:
forge.as_dataframe(resources, store_metadata=True)

Unnamed: 0,id,type,email,name,_constrainedBy,_createdAt,_createdBy,_deprecated,_incoming,_outgoing,_project,_rev,_schemaProject,_self,_updatedAt,_updatedBy,address.type,address.country,address.locality
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,john.smith@epfl.ch,John Smith,https://bluebrain.github.io/nexus/schemas/unco...,2021-05-07T07:46:04.511Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,1,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2021-05-07T07:46:04.511Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,,,
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,(missing),Jane Doe,https://bluebrain.github.io/nexus/schemas/unco...,2021-05-07T07:46:04.513Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,1,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2021-05-07T07:46:04.513Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,PostalAddress,Switzerland,Geneva
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,john.smith@epfl.ch,John Smith,https://bluebrain.github.io/nexus/schemas/unco...,2021-05-07T07:47:26.453Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,1,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2021-05-07T07:47:26.453Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,,,


In [47]:
# Search results are not synchronized
resources[0]._synchronized

False

#### Using nested resource property

Property autocompletion is available on a path `p` even for nested properties like `p.contribution`.

In [48]:
# Search for resources of type Dataset and with text/tab-separated-values as distribution.encodingFormat
resources = forge.search(p.type.id == "Dataset", p.distribution.encodingFormat == "text/tab-separated-values", limit=3)

In [49]:
len(resources)

3

In [50]:
forge.as_dataframe(resources)

Unnamed: 0,id,type,contribution,distribution.type,distribution.atLocation.type,distribution.atLocation.store.id,distribution.contentSize.unitCode,distribution.contentSize.value,distribution.contentUrl,distribution.digest.algorithm,distribution.digest.value,distribution.encodingFormat,distribution.name
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Dataset,"[{'type': 'Contribution', 'agent': {'type': 'P...",DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506,https://bbp.epfl.ch/nexus/v1/files/dke/kgforge...,SHA-256,9639abc864e91c645779f510ae5c06a1618941d569eb1a...,text/tab-separated-values,associations.tsv
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Dataset,"[{'type': 'Contribution', 'agent': {'type': 'P...",DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506,https://bbp.epfl.ch/nexus/v1/files/dke/kgforge...,SHA-256,9639abc864e91c645779f510ae5c06a1618941d569eb1a...,text/tab-separated-values,associations.tsv
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Dataset,,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506,https://bbp.epfl.ch/nexus/v1/files/dke/kgforge...,SHA-256,9639abc864e91c645779f510ae5c06a1618941d569eb1a...,text/tab-separated-values,associations.tsv


### Using dictionaries as filters
A dictionary can be provided for filters:
* {'type': {'id':'Dataset'}} is equivalent to p.type.id=="Dataset"
* only the '==' operator is supported
* nested dict are supported
* it is not mandatory for the provided properties and values to be defined in the forge model. Results will be retrieved if there are corresponding data in the store.

This feature is not supported when using the DemoStore


In [64]:
# Search for resources of type Dataset and with text/tab-separated-values as distribution.encodingFormat
# and created a given dateTime (by default, dateTime values should be signaled by the suffix "^^xsd:dateTime")
filters = {
           "type": "Dataset", 
           "distribution":{"encodingFormat":"text/tab-separated-values"},
           "_createdAt":dataset._store_metadata._createdAt+"^^xsd:dateTime"
          }
resources = forge.search(filters, limit=3)

In [65]:
type(resources)

list

In [66]:
len(resources)

1

In [67]:
forge.as_dataframe(resources, store_metadata=True)

Unnamed: 0,id,type,contribution,distribution.type,distribution.atLocation.type,distribution.atLocation.store.id,distribution.atLocation.store.type,distribution.atLocation.store._rev,distribution.contentSize.unitCode,distribution.contentSize.value,...,_createdBy,_deprecated,_incoming,_outgoing,_project,_rev,_schemaProject,_self,_updatedAt,_updatedBy
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Dataset,"[{'type': 'Contribution', 'agent': {'type': 'P...",DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,DiskStorage,1,bytes,477,...,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,1,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2022-04-12T21:45:50.461Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy


### Using built-in Filter objects

#### Supported filter operators

In [68]:
[f"{op.value} ({op.name})" for op in FilterOperator] # These are equivalent to the Python comparison operators

['__eq__ (EQUAL)',
 '__ne__ (NOT_EQUAL)',
 '__lt__ (LOWER_THAN)',
 '__le__ (LOWER_OR_Equal_Than)',
 '__gt__ (GREATER_Than)',
 '__ge__ (GREATER_OR_Equal_Than)']

In [73]:
# Search for resources of type Dataset and with text/tab-separated-values as distribution.encodingFormat
# and created a given dateTime (dateTime values should be signaled by the suffix "^^xsd:dateTime")
filter_1 = Filter(operator=FilterOperator.EQUAL, path=["type"], value="Dataset")
filter_2 = Filter(operator=FilterOperator.EQUAL, path=["distribution","encodingFormat"], value="text/tab-separated-values")
filter_3 = Filter(operator=FilterOperator.LOWER_OR_Equal_Than, path=["_createdAt"], value=dataset._store_metadata._createdAt+"^^xsd:dateTime")

resources = forge.search(filter_1, filter_2, filter_3, limit=3)

In [74]:
type(resources)

list

In [75]:
len(resources)

3

In [76]:
forge.as_dataframe(resources, store_metadata=True)

Unnamed: 0,id,type,contribution,distribution.type,distribution.atLocation.type,distribution.atLocation.store.id,distribution.contentSize.unitCode,distribution.contentSize.value,distribution.contentUrl,distribution.digest.algorithm,...,_createdBy,_deprecated,_incoming,_outgoing,_project,_rev,_schemaProject,_self,_updatedAt,_updatedBy
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Dataset,"[{'type': 'Contribution', 'agent': {'type': 'P...",DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506,https://bbp.epfl.ch/nexus/v1/files/dke/kgforge...,SHA-256,...,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,1,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2021-03-17T22:07:09.443Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Dataset,"[{'type': 'Contribution', 'agent': {'type': 'P...",DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506,https://bbp.epfl.ch/nexus/v1/files/dke/kgforge...,SHA-256,...,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,1,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2021-03-17T22:14:28.904Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Dataset,,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,506,https://bbp.epfl.ch/nexus/v1/files/dke/kgforge...,SHA-256,...,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,1,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2021-03-17T22:31:38.741Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy


### Using search endpoints

Two types of search endpoints are supported: 'sparql' (default) for graph queries and 'elastic' for document oriented queries. The types of available search endpoint can be configured (see [00-Initialization.ipynb](00%20-%20Initialization.ipynb) for an example of search endpoints config) or set when creating a KnowledgeGraphForge session using the 'searchendpoints' arguments.

The search endpoint to hit when calling forge.search(...) is 'sparql' by default but can be specified using the 'search_endpoint' argument.

#### SPARQL Search Endpoint

In [77]:
# Search for resources of type Person
filters = {"type": "Person"}
resources = forge.search(filters, limit=3, search_endpoint='sparql')

In [78]:
type(resources)

list

In [79]:
len(resources)

3

In [80]:
forge.as_dataframe(resources, store_metadata=True)

Unnamed: 0,id,type,birthDate,employer.type,employer.name,name,_constrainedBy,_createdAt,_createdBy,_deprecated,_incoming,_outgoing,_project,_rev,_schemaProject,_self,_updatedAt,_updatedBy,description
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,12.12.1990,Organization,epfl,Peter Kind,https://bluebrain.github.io/nexus/schemas/unco...,2020-03-08T20:00:09.092Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,1,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2020-03-08T20:00:09.092Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,12.12.1990,,,Peter Kind,https://bluebrain.github.io/nexus/schemas/unco...,2020-03-22T21:51:06.830Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,3,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2020-03-22T21:54:37.507Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,12.12.1990,,,Peter K.,https://bluebrain.github.io/nexus/schemas/unco...,2020-03-22T21:56:16.084Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,2,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2020-03-22T21:58:41.450Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,Resource without user provided context


#### ElasticSearch Endpoint

In [81]:
# Search for resources of type Person and retrieve their ids and names.
filters = {"@type": "http://schema.org/Person"}
resources = forge.search(filters, limit=3, 
                         search_endpoint='elastic', 
                         includes=["@id", "@type"]) # fields can also be excluded with 'excludes'

In [82]:
type(resources)

list

In [83]:
len(resources)

3

In [84]:
forge.as_dataframe(resources, store_metadata=True)

Unnamed: 0,id,type
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,http://schema.org/Person
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,http://schema.org/Person
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,http://schema.org/Person


In [90]:
# Search results are not synchronized
resources[0]._synchronized

False

In [91]:
resources[0].id

'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/e353bead-906e-4cd6-b7f4-948fc05c1ef9'

In [92]:
resources[0].type

'http://schema.org/Person'

### Crossbucket search
It is possible to search for resources stored in buckets different than the configured one. The configured store should of course support it.

In [93]:
resources = forge.search(p.type.id == "Association", limit=3, cross_bucket=True)  # cross_bucket defaults to False

In [94]:
type(resources)

list

In [95]:
len(resources)

3

In [96]:
forge.as_dataframe(resources)

Unnamed: 0,id,type,agent.type,agent.name
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Association,Person,Jane Doe
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Association,Person,Jane Doe
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Association,Person,Jane Doe


In [None]:
#Furthermore it is possible to filter by bucket when cross_bucket is set to True. Setting a bucket value when cross_bucket is False will trigger a not_supported exception.
resources = forge.search(p.type.id == "Person", limit=3, cross_bucket=True, bucket=<str>)  # add a bucket

In [98]:
type(resources)

list

In [99]:
len(resources)

3

In [100]:
forge.as_dataframe(resources)

Unnamed: 0,id,type,agent.type,agent.name
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Association,Person,Jane Doe
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Association,Person,Jane Doe
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Association,Person,Jane Doe


### Searching original source
When using BlueBrainNexusStore, it is possible to retrieve resources' payload as they were registered (retrieve_source=True) without any changes related to store added metadata or JSONLD framing.

In [109]:
resources = forge.search(p.type.id == "Association", limit=3, retrieve_source=False)  # retrieve_source defaults to True

In [110]:
type(resources)

list

In [111]:
len(resources)

3

In [112]:
forge.as_dataframe(resources)

Unnamed: 0,id,type,agent.type,agent.name
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Association,Person,Jane Doe
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Association,Person,Jane Doe
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Association,Person,Jane Doe


## Graph traversing

SPARQL is used as a query language to perform graph traversing.

Nexus Forge implements a SPARQL query rewriting strategy leveraging a configured RDFModel that lets users write SPARQL queries without adding prefix declarations, prefix names or long IRIs. With this strategy, only type and property names can be provided.

Please refer to the [Modeling.ipynb](11%20-%20Modeling.ipynb) notebook to learn about templates.

Note: DemoStore doesn't implement SPARQL operations yet. Please use another store for this section.

Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel.

In [113]:
jane = Resource(type="Person", name="Jane Doe")
contribution_jane = Resource(type="Contribution", agent=jane)

In [114]:
john = Resource(type="Person", name="John Smith")
contribution_john = Resource(type="Contribution", agent=john)

In [115]:
association = Resource(type="Dataset", contribution=[contribution_jane, contribution_john])

In [116]:
forge.register(association)

<action> _register_one
<succeeded> True


In [117]:
forge.template("Dataset") # Templates help know which property to use when writing a query to serach for a given type

{
    id: ""
    type:
    {
        id: ""
    }
    annotation:
    {
        id: ""
        type: Annotation
        citation:
        {
            id: ""
        }
        contribution:
        {
            id: ""
            type: Contribution
        }
        dateCreated: 9999-12-31T00:00:00
        dateModified: ""
        derivation:
        {
            id: ""
            type: Derivation
        }
        description: ""
        distribution:
        {
            id: ""
            type: DataDownload
            contentSize:
            {
                unitCode: ""
                value:
                [
                    0.0
                    0
                ]
            }
            digest:
            {
                algorithm: ""
                value: ""
            }
            encodingFormat: ""
            license: ""
            name: ""
        }
        generation:
        {
            id: ""
            type: Generation
        }
        hasBod

### Prefix and namespace free SPARQL query

When a forge RDFModel is configured, then there is no need to provide prefixes and namespaces when writing a SPARQL query. Prefixes and namespaces will be automatically inferred from the provided schemas and/or JSON-LD context and the query rewritten accordingly.

In [158]:
query = """
    SELECT ?id ?name ?contributor
    WHERE {
        ?id a Dataset ;
        contribution/agent ?contributor.
        ?contributor name ?name.
    }
"""

In [159]:
resources = forge.sparql(query, limit=3)

In [160]:
type(resources)

list

In [161]:
len(resources)

3

In [162]:
print(resources[0])

{
    id: https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/847380a2-4703-42d2-aa8e-64ed52fc594b
    contributor: https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/3f9f7b60-6463-41c8-be68-f512fc2a58fe
    name: John Smith
}


In [163]:
forge.as_dataframe(resources)

Unnamed: 0,id,contributor,name
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,John Smith
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,John Smith
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,t464205,Jane Doe


### display rewritten SPARQL query 

In [165]:
resources = forge.sparql(query, limit=3, debug=True)

Submitted query:
   PREFIX bmc: <https://bbp.epfl.ch/ontologies/core/bmc/>
   PREFIX bmo: <https://bbp.epfl.ch/ontologies/core/bmo/>
   PREFIX commonshapes: <https://neuroshapes.org/commons/>
   PREFIX datashapes: <https://neuroshapes.org/dash/>
   PREFIX dc: <http://purl.org/dc/elements/1.1/>
   PREFIX dcat: <http://www.w3.org/ns/dcat#>
   PREFIX dcterms: <http://purl.org/dc/terms/>
   PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
   PREFIX nsg: <https://neuroshapes.org/>
   PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
   PREFIX oa: <http://www.w3.org/ns/oa#>
   PREFIX obo: <http://purl.obolibrary.org/obo/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX schema: <http://schema.org/>
   PREFIX sh: <http://www.w3.org/ns/shacl#>
   PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
   PREFI

### Full SPARQL query

Regular SPARQL query can also be provided. When provided, the limit and offset arguments superseed any in query limit or offset values.

In [166]:
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
   PREFIX dcat: <http://www.w3.org/ns/dcat#>
   PREFIX dcterms: <http://purl.org/dc/terms/>
   PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
   PREFIX nsg: <https://neuroshapes.org/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX schema: <http://schema.org/>
   PREFIX sh: <http://www.w3.org/ns/shacl#>
   PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX vann: <http://purl.org/vocab/vann/>
   PREFIX void: <http://rdfs.org/ns/void#>
   PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   PREFIX : <https://neuroshapes.org/>
   SELECT ?id ?name
   WHERE {
       ?id a schema:Dataset ;
       nsg:contribution/prov:agent ?contributor.
       ?contributor schema:name ?name.
   }
   ORDER BY ?id
   LIMIT 1
   OFFSET 0
"""

In [172]:
# it is recommended to set 'rewrite' to 'False' to prevent the sparql query rewriting when a syntactically correct SPARQL query is provided.
resources = forge.sparql(query, rewrite=False, limit=3, offset=1, debug=True) 

Submitted query:
   
   PREFIX dc: <http://purl.org/dc/elements/1.1/>
      PREFIX dcat: <http://www.w3.org/ns/dcat#>
      PREFIX dcterms: <http://purl.org/dc/terms/>
      PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
      PREFIX nsg: <https://neuroshapes.org/>
      PREFIX owl: <http://www.w3.org/2002/07/owl#>
      PREFIX prov: <http://www.w3.org/ns/prov#>
      PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
      PREFIX schema: <http://schema.org/>
      PREFIX sh: <http://www.w3.org/ns/shacl#>
      PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
      PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
      PREFIX vann: <http://purl.org/vocab/vann/>
      PREFIX void: <http://rdfs.org/ns/void#>
      PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
      PREFIX : <https://neuroshapes.org/>
      SELECT ?id ?name
      WHERE {
          ?id a schema:Dataset ;
          nsg:contribution/prov:ag

In [173]:
type(resources)

list

In [174]:
len(resources)

3

In [175]:
type(resources[0])

kgforge.core.resource.Resource

In [176]:
forge.as_dataframe(resources)

Unnamed: 0,id,name
0,t463285,A person
1,t463453,A person
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Jane Doe


## ElasticSearch DSL Query

ElasticSearch DSL can be used as a query language search for resources provided that the configured store supports it. The 'BlueBrainNexusStore' supports ElasticSearch.

Note: DemoStore doesn't implement ElasaticSearch DSL operations.

In [177]:
jane = Resource(type="Person", name="Jane Doe")
contribution_jane = Resource(type="Contribution", agent=jane)

In [178]:
john = Resource(type="Person", name="John Smith")
contribution_john = Resource(type="Contribution", agent=john)

In [179]:
association = Resource(type="Dataset", contribution=[contribution_jane, contribution_john])

In [180]:
forge.register(association)

<action> _register_one
<succeeded> True


### Plain ElasticSearch DSL

In [181]:
query = """
        {
          "_source": {
            "includes": [
              "@id",
              "name"
            ]
          },
          "query": {
            "term": {
              "@type": "http://schema.org/Dataset"
            }
          }
        }
"""

In [182]:
# limit and offset (when provided in this method call) superseed 'size' and 'from' values provided in the query
resources = forge.elastic(query, limit=3)

In [183]:
type(resources)

list

In [184]:
len(resources)

3

In [185]:
type(resources[0])

kgforge.core.resource.Resource

In [186]:
forge.as_dataframe(resources)

Unnamed: 0,id,name
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Interesting Persons
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Interesting Persons
2,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,TTest name


## Downloading

Note: DemoStore doesn't implement file operations yet. Please use another store for this section.

In [187]:
jane = Resource(type="Person", name="Jane Doe")

In [188]:
! ls -p ../../data | egrep -v /$

associations.tsv
my_data.xwz
my_data_derived.txt
persons-with-id.csv
persons.csv
tfidfvectorizer_model_schemaorg_linking


In [189]:
distribution = forge.attach("../../data")

In [190]:
association = Resource(type="Association", agent=jane, distribution=distribution)

In [191]:
forge.register(association)

<action> _register_one
<succeeded> True


In [195]:
# The argument overwrite: bool can be provided to decide whether to overwrite (True) existing files with the same name or
# to create new ones (False) with their names suffixed with a timestamp.
# A cross_bucket argument can be provided to download data from the configured bucket (cross_bucket=False - the default value) 
# or from a bucket different than the configured one (cross_bucket=True). The configured store should support crossing buckets for this to work.
forge.download(association, "distribution.contentUrl", "./downloaded/")

In [196]:
! ls -l ./downloaded/

total 448
-rw-r--r--  1 mfsy  staff     477 Apr 13 00:01 associations.tsv
-rw-r--r--  1 mfsy  staff      16 Apr 13 00:01 my_data.xwz
-rw-r--r--  1 mfsy  staff      24 Apr 13 00:01 my_data_derived.txt
-rw-r--r--  1 mfsy  staff     126 Apr 13 00:01 persons-with-id.csv
-rw-r--r--  1 mfsy  staff      52 Apr 13 00:01 persons.csv
-rw-r--r--  1 mfsy  staff  204848 Apr 13 00:01 tfidfvectorizer_model_schemaorg_linking


In [194]:
#! rm -R ./downloaded/