# Register Datasets within the Blue Brain Knowledge Graph

This notebook presents a step by step approach for registering datasets (any resource with files attached) with eventually metadata and provenance in a configured project.

## Nexus Forge installation
Installation instruction can be found [here](https://nexus-forge.readthedocs.io/en/latest/#installation).

## Get a token

The [Nexus production deployment](https://bbp.epfl.ch/nexus/web/) can be used to login and get a token.

- Step 1: From the opened web page, click on the login button on the right corner and follow the instructions.

![login-ui](https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/login-ui.png)

- Step 2: At the end youâ€™ll see a token button on the right corner. Click on it to copy the token.

![copy-token](https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/copy-token.png)


In [None]:
import getpass
TOKEN = getpass.getpass()

## Set the Nexus deployment to work with

In [2]:
nexus_prod_endpoint    = "https://bbp.epfl.ch/nexus/v1"
nexus_staging_endpoint = "https://staging.nexus.ocp.bbp.epfl.ch/v1" # use staging to try and test. Note that accessing staging require to be in the VPN
nexus_endpoint = nexus_staging_endpoint

## Set the Nexus project to work with

### The project already exist ?
In production the existing BBP projects can be found [here](https://bbp.epfl.ch/nexus/web/admin/bbp) and the list of available studios can be found [here](https://bbp.epfl.ch/nexus/web/studios)

In [3]:
ORG = "bbp"
PROJECT = "ncmv3"

### The project needs to be created ?

#### In prod: request a project from PMO

#### In staging
A project can be created by any BBP user in staging but it will come with no specific configuration and might not correspond to prod projects' states which are initialised with GPFS storage, schemas and ontologies.

The [Nexus staging deployment](https://staging.nexus.ocp.bbp.epfl.ch/admin) can be used to create an organisation and a project.

- Step 1: From the opened web page, click on the create organization button and provide an organization name (Do not us special character or space).

![create_org](https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/create_org.png)

- Step 2: Then click on the create project button and provide a project name (Do not us special character or space).

![create_project](https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/create_project.png)


## Create a KnowledgeGraphForge session

In [4]:
from kgforge.core import KnowledgeGraphForge
from kgforge.core import Resource
from kgforge.specializations.resources import Dataset

import pandas as pd

A KnowledgeGraphForge session is a python object that exposes all necessary functions to register with metadata, search and download datasets. A configuration file is needed in order to create a KnowledgeGraphForge session but a ready to use configuration file is made available [here](https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/prod-forge-nexus.yml).

In [5]:
forge = KnowledgeGraphForge("https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/prod-forge-nexus.yml",
                           endpoint=nexus_endpoint,
                           bucket=f"{ORG}/{PROJECT}",
                           token= TOKEN
                           )

You are all set up !

## Create resources with some metadata

A resource is anything that can be identified and that can have metadata associated. The following cell creates two resources of type Person and Agent and with each a name as metadata.


### Using forge Resource object

Any 'property=value' can be given here as metadata. There are properties from the [Person schema](https://bbp-nexus.epfl.ch/datamodels/class-proventity.html) (Note that these schemas may change in the future) that can be reused.

In [6]:
jane = Resource(type="Person", name="Jane Doe", givenName="Jane", familyName="Doe")
john = Resource(type=["Person","Agent"], name="John Smith", givenName="John", familyName="Smith")
persons = [jane, john]

In [7]:
forge.register(persons)

<count> 2
<action> _register_many
<succeeded> True


In [8]:
# A resource can be retrieved by its id
result = forge.retrieve(id= john.id)

In [9]:
# Note that the Blue Brain Nexus has automatically generated ids (id property) for the resources. The generated ids are unique in the selected project.
forge.as_json(result)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469',
 'type': ['Person', 'Agent'],
 'familyName': 'Smith',
 'givenName': 'John',
 'name': 'John Smith'}

In [10]:
# Add store_metadata=True to see extra metadata added by Blue Brain Nexus (e.g. _rev, _createdBy, _updatedAt, _deprecated, ...)
forge.as_json(result, store_metadata=True)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469',
 'type': ['Person', 'Agent'],
 'familyName': 'Smith',
 'givenName': 'John',
 'name': 'John Smith',
 '_constrainedBy': 'https://bluebrain.github.io/nexus/schemas/unconstrained.json',
 '_createdAt': '2021-08-23T10:10:11.752Z',
 '_createdBy': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/realms/bbp/users/sy',
 '_deprecated': False,
 '_incoming': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/ncmv3/_/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469/incoming',
 '_outgoing': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/ncmv3/_/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469/outgoing',
 '_project': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/projects/bbp/ncmv3',
 '_rev': 1,
 '_schemaProject': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/projects/bbp/ncmv3',
 '_self': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/ncmv3/_/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469',
 '_updatedAt': '2021-08-23T10:10:11.752Z',
 '_up

### Using pandas dataframe

In [11]:
# Each person has a file attached. We'll use forge.attach to be able to register the files in the knowledge graph
scientists_df = pd.read_csv("../../data/persons.csv")
scientists_df

Unnamed: 0,type,name
0,Person,Marie Curie
1,Person,Albert Einstein


In [12]:
# Resources can be created from a pandas dataframe
scientists = forge.from_dataframe(scientists_df)

In [13]:
forge.as_json(scientists[0])

{'type': 'Person', 'name': 'Marie Curie'}

In [14]:
# Note that registering an existing resource in a given project will throw a 'RegistrationError: resource already exists' error. 
# forge.retrieve(id=...) can be used to fetch the registered resource as shown in next cell.
forge.register(scientists)

<count> 2
<action> _register_many
<succeeded> True


In [15]:
scientists = []
Marie_Curie = forge.retrieve(id= "https://www.wikidata.org/wiki/Q7186") # Let retrieve Marie Curie resource
Albert_Einstein = forge.retrieve(id= "https://www.wikidata.org/wiki/Q937") # Let retrieve Albert Einstein resource
scientists.append(Marie_Curie)
scientists.append(Albert_Einstein)

In [16]:
# Note that the Blue Brain Nexus Store has kept the provided id
forge.as_json(Marie_Curie)

{'id': 'https://www.wikidata.org/wiki/Q7186',
 'type': 'Person',
 'name': 'Marie Curie'}

See the notebook [DataFrame IO.ipynb](https://github.com/BlueBrain/nexus-forge/blob/master/examples/notebooks/getting-started/07%20-%20DataFrame%20IO.ipynb) for more details on converting Pandas DataFrame to forge Resources and the other way around.

Even though any type can be provided for a Resource, there are a set of available types that can be obtained programmatically by using the following command or by looking at the [schemas doc](https://bbp-nexus.epfl.ch/datamodels/entities-az.html) (Note these schemas may change in the future).

In [17]:
forge.types()

Managed entity types:
   - AcquisitionAnnotation
   - Activity
   - AffineLinearTransform
   - Agent
   - Analysis
   - AnalysisResult
   - AnnotatedSlice
   - Annotation
   - ApicalAnnotation
   - AtlasConstruction
   - AtlasRelease
   - AtlasSpatialReferenceSystem
   - BatchQualityMeasurementAnnotation
   - BluePyEfeFeatures
   - BoundingBox
   - BoutonDensity
   - BrainAtlasRelease
   - BrainAtlasSpatialReferenceSystem
   - BrainImaging
   - BrainLocation
   - BrainParcellationDataLayer
   - BrainParcellationMesh
   - BrainSlicing
   - BrainTemplateDataLayer
   - Cell
   - CellCounting
   - CellDensity
   - CellDensityDataLayer
   - CellPlacement
   - CellRecordSeries
   - CircuitCellProperties
   - Class
   - Collection
   - Concept
   - ConceptScheme
   - Configuration
   - Contribution
   - DataDownload
   - Dataset
   - DeformableTransform
   - Density
   - Derivation
   - DetailedCircuit
   - EModel
   - EModelBuilding
   - EModelRelease
   - EModelScript
   - ETypeFeatureProto

## Create a dataset from files

This use case is about registering files with metadata in the knowledge graph. A specific type of Resource, called Dataset will be used. Since Dataset is also a Resource, then everything that applies to Resource also applies to Dataset.

In [18]:
# Let list the files that will be used and let capture the start time for this part
import time
startedAtTime = time.strftime("%Y%m%d%H%M%S")
! ls -p ../../data | egrep -v /$

associations.tsv
my_data.xwz
my_data_derived.txt
persons.csv
tfidfvectorizer_model_schemaorg_linking


Any 'property=value' can be given here as metadata. We recommend to use properties from the [Dataset schema](https://bbp-nexus.epfl.ch/datamodels/class-schemadataset.html).
A Dataset is a Resource with a [distribution](https://schema.org/distribution) property to account for where the data (files) are stored and where they can be accessed.

In [19]:
# The file content type can be provided by setting the content_type.
my_data_distribution = forge.attach("../../data/my_data.xwz")
my_dataset = Dataset(forge, type=["Entity","Dataset", "MyOtherType"],name="Interesting Dataset", distribution=my_data_distribution)

### Register

In [20]:
forge.register(my_dataset)

<action> _register_one
<succeeded> True


In [21]:
# Visualise the metadata. Note the distribution property with file related metadata automatically added (contentSize, digest, encodingFormat, ...)
forge.as_json(my_dataset)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587',
 'type': ['Entity', 'Dataset', 'MyOtherType'],
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault'}},
  'contentSize': {'unitCode': 'bytes', 'value': 16},
  'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/ncmv3/2e9265a0-5ad4-4a54-a883-5c283d04359e',
  'digest': {'algorithm': 'SHA-256',
   'value': 'df03e7e93f870c6731540b3cae26391670da682c7a8dbdd18448cbcfc4fb7981'},
  'encodingFormat': 'application/octet-stream',
  'name': 'my_data.xwz'},
 'name': 'Interesting Dataset'}

### Retrieve by id

In [22]:
result = forge.retrieve(id= my_dataset.id)

In [23]:
forge.as_json(result)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587',
 'type': ['Entity', 'Dataset', 'MyOtherType'],
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault'}},
  'contentSize': {'unitCode': 'bytes', 'value': 16},
  'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/ncmv3/2e9265a0-5ad4-4a54-a883-5c283d04359e',
  'digest': {'algorithm': 'SHA-256',
   'value': 'df03e7e93f870c6731540b3cae26391670da682c7a8dbdd18448cbcfc4fb7981'},
  'encodingFormat': 'application/octet-stream',
  'name': 'my_data.xwz'},
 'name': 'Interesting Dataset'}

### Search by metadata
See the notebook [BBP KG Search and Download.ipynb](https://github.com/BlueBrain/nexus-forge/blob/master/examples/notebooks/use-cases/BBP%20KG%20Search%20and%20Download.ipynb) for more search details and options.

In [24]:
filters = {"type":"Dataset", "name":"Interesting Dataset"}
results = forge.search(filters, limit=3)
print(f"{len(results)} results found")

3 results found


In [25]:
forge.as_json(results[0])

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/37a1ec4a-8a29-4dca-84f8-9f4e373d685c',
 'type': 'Dataset',
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault'}},
  'contentSize': {'unitCode': 'bytes', 'value': 16},
  'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/ncmv3/bb386ca8-ec50-449b-a5c3-703fa1fa74b9',
  'digest': {'algorithm': 'SHA-256',
   'value': 'df03e7e93f870c6731540b3cae26391670da682c7a8dbdd18448cbcfc4fb7981'},
  'encodingFormat': 'application/octet-stream',
  'name': 'my_data.xwz'},
 'name': 'Interesting Dataset'}

In [26]:
# A list of resources can be transformed in pandas dataframe
forge.as_dataframe(results)

Unnamed: 0,id,type,distribution.type,distribution.atLocation.type,distribution.atLocation.store.id,distribution.contentSize.unitCode,distribution.contentSize.value,distribution.contentUrl,distribution.digest.algorithm,distribution.digest.value,distribution.encodingFormat,distribution.name,name
0,https://bbp.epfl.ch/neurosciencegraph/data/37a...,Dataset,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,16,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,SHA-256,df03e7e93f870c6731540b3cae26391670da682c7a8dbd...,application/octet-stream,my_data.xwz,Interesting Dataset
1,https://bbp.epfl.ch/neurosciencegraph/data/868...,Dataset,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,16,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,SHA-256,df03e7e93f870c6731540b3cae26391670da682c7a8dbd...,application/octet-stream,my_data.xwz,Interesting Dataset
2,https://bbp.epfl.ch/neurosciencegraph/data/417...,Dataset,DataDownload,Location,https://bluebrain.github.io/nexus/vocabulary/d...,bytes,16,https://staging.nexus.ocp.bbp.epfl.ch/v1/files...,SHA-256,df03e7e93f870c6731540b3cae26391670da682c7a8dbd...,application/octet-stream,my_data.xwz,Interesting Dataset


### Download

In [31]:
# The argument overwrite: bool can be provided to decide whether to overwrite (True) existing files with the same name or
# to create new ones (False) with their names suffixed with a timestamp  
my_dataset.download(path="./downloaded/", source="distributions")

In [32]:
! ls -l ./downloaded

total 8
-rw-r--r--  1 mfsy  staff  16 Aug 23 12:12 my_data.xwz


In [None]:
#! rm -R ./downloaded/

### Get storage path
In case the dataset files are stored in an external storage (e.g. GPFS), it is possible to get their location

In [33]:
forge.as_json(my_dataset.distribution.atLocation)

{'type': 'Location',
 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault'}}

In [None]:
# This will break when in staging as no gpfs storage is used
my_dataset.distribution.atLocation.location

### Add provenance information to the dataset

Provenance are specific metadata accounting for (among other things) data lineage (derivation), who contributed to the generation of the dataset (contribution), how the dataset was generated (generation), the subject of the dataset if any (subject).

#### Add derivation (from which datasets a given dataset derived from)
Let consider the file ../../data/my_data_derived.txt to derive from ../../data/my_data.xwz

In [35]:
# The file content type can be provided by setting the content_type.
my_derived_data_distribution = forge.attach("../../data/my_data_derived.txt", content_type="application/txt")

my_derived_dataset = Dataset(forge, name="Derived Dataset from my_dataset", distribution=my_derived_data_distribution)

In [36]:
forge.register(my_derived_dataset)

<action> _register_one
<succeeded> True


In [37]:
result = forge.retrieve(id=my_derived_dataset.id)

In [38]:
# Note the added distribution property
forge.as_json(result)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/817f7c26-e58c-4b83-8e38-9261bec4abd3',
 'type': 'Dataset',
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault'}},
  'contentSize': {'unitCode': 'bytes', 'value': 24},
  'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/ncmv3/d0b43417-84d8-4f2f-8b7e-7fb26ff4b473',
  'digest': {'algorithm': 'SHA-256',
   'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'},
  'encodingFormat': 'application/txt',
  'name': 'my_data_derived.txt'},
 'name': 'Derived Dataset from my_dataset'}

In [39]:
# my_derived_dataset derived from my_dataset
my_derived_dataset.add_derivation(my_dataset)

In [40]:
# Since the my_derived_dataset is already registered, it can updated to store its derivation information. If no change occurs (i.e there is nothing to update),
# then forge.update(...) will throw a "UpdatingError: resource should not be synchronized" error.
forge.update(my_derived_dataset)

<action> _update_one
<succeeded> True


In [41]:
# Note the increased _rev number because of the update
forge.as_json(my_derived_dataset, store_metadata=True)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/817f7c26-e58c-4b83-8e38-9261bec4abd3',
 'type': 'Dataset',
 'derivation': {'type': 'Derivation',
  'entity': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587?rev=1',
   'type': ['Entity', 'Dataset', 'MyOtherType'],
   'name': 'Interesting Dataset'}},
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault'}},
  'contentSize': {'unitCode': 'bytes', 'value': 24},
  'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/ncmv3/d0b43417-84d8-4f2f-8b7e-7fb26ff4b473',
  'digest': {'algorithm': 'SHA-256',
   'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'},
  'encodingFormat': 'application/txt',
  'name': 'my_data_derived.txt'},
 'name': 'Derived Dataset from my_dataset',
 '_constrainedBy': 'https://bluebrain.github.io/nexus/schemas/unconstrained.json',
 '_creat

#### Add contribution (which Person, Organization or Software contributed to the generation of the data)

Adding contributors to the dataset. The contributors are john, jane and the persons stored in the ../../data/persons.csv file. All persons from the file will be resources in the knowledge graph to be able to reference them as contributors.

In [42]:
# An id can also be provided to add_contribution(). By default, ids are versioned when referenced to avoid being impacted by further changes and keep the state at which they were when referenced.
for contributor in scientists:
    my_derived_dataset.add_contribution(contributor)
    
my_derived_dataset.add_contribution(john.id, versioned=False)
my_derived_dataset.add_contribution(jane)

In [43]:
forge.update(my_derived_dataset)

<action> _update_one
<succeeded> True


In [44]:
result = forge.retrieve(id= my_derived_dataset.id)

In [45]:
forge.as_json(result)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/817f7c26-e58c-4b83-8e38-9261bec4abd3',
 'type': 'Dataset',
 'contribution': [{'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469',
    'type': 'Agent'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/cbef0a1b-497e-4e4e-90d5-50c66bd6f1c0?rev=1',
    'type': 'Person'}}],
 'derivation': {'type': 'Derivation',
  'entity': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587?rev=1',
   'type': ['Entity', 'Dataset', 'MyOtherType'],
   'name': 'Interesting Dataset'}},
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'htt

In [46]:
# By adding store_metadata=True, the revision number of a resource can be introspected
forge.as_json(result, store_metadata=True)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/817f7c26-e58c-4b83-8e38-9261bec4abd3',
 'type': 'Dataset',
 'contribution': [{'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469',
    'type': 'Agent'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/cbef0a1b-497e-4e4e-90d5-50c66bd6f1c0?rev=1',
    'type': 'Person'}}],
 'derivation': {'type': 'Derivation',
  'entity': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587?rev=1',
   'type': ['Entity', 'Dataset', 'MyOtherType'],
   'name': 'Interesting Dataset'}},
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'htt

#### Add generation (which activity lead to the generation of the my_derived_dataset)
An activity used some entity to generate new ones aand can potentially follow a Protocol. It has a start and end time and is associated with some agents (Person, Organization and/or SoftwareAgent)

In [47]:
#Was a protocol followed ?
protocol = Resource(type="Protocol", name="Protocol used to generate the dataset", description="Description of the protocol")
                    
activity = Resource(type=["Activity", "MyCustomActivity"], 
                    description= "Activity",
                    used=Resource(id=my_dataset.id,type = my_dataset.type), # the value here can be an array of any dataset or entity (e.g. config files) that was used to generate my_derived_dataset 
                    hadProtocol=protocol,
                    startedAtTime=startedAtTime, 
                    endedAtTime=time.strftime("%Y%m%d%H%M%S"),
                    wasAssociatedWith= Resource(id = jane.id,type = jane.type) # the value here can be an array of any agents
                   )

In [48]:
forge.register(activity)

<action> _register_one
<succeeded> True


In [49]:
forge.as_json(activity)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/e0ac3c0e-1e60-480a-b900-d31eda8d5709',
 'type': ['Activity', 'MyCustomActivity'],
 'description': 'Activity',
 'endedAtTime': '20210823121249',
 'hadProtocol': {'type': 'Protocol',
  'description': 'Description of the protocol',
  'name': 'Protocol used to generate the dataset'},
 'startedAtTime': '20210823121052',
 'used': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587',
  'type': ['Entity', 'Dataset', 'MyOtherType']},
 'wasAssociatedWith': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/cbef0a1b-497e-4e4e-90d5-50c66bd6f1c0',
  'type': 'Person'}}

In [50]:
my_derived_dataset.add_generation(activity)

In [51]:
forge.update(my_derived_dataset)

<action> _update_one
<succeeded> True


In [52]:
forge.as_json(my_derived_dataset)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/817f7c26-e58c-4b83-8e38-9261bec4abd3',
 'type': 'Dataset',
 'contribution': [{'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469',
    'type': 'Agent'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/cbef0a1b-497e-4e4e-90d5-50c66bd6f1c0?rev=1',
    'type': 'Person'}}],
 'derivation': {'type': 'Derivation',
  'entity': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587?rev=1',
   'type': ['Entity', 'Dataset', 'MyOtherType'],
   'name': 'Interesting Dataset'}},
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'htt

#### Add Subject

The subject on wich the study was performeed can be added if any. The [subject schema](https://bbp-nexus.epfl.ch/datamodels/class-subject.html) can be used for more informatation.

In [53]:
# Note that Resource can be used as value of a property

my_derived_dataset.subject = Resource(type=["Subject","Entity"],
                                      name="P14-12 Rattus norvegicus Wistar Han",
                                      species= Resource(id="http://purl.obolibrary.org/obo/NCBITaxon_10116", label="Rattus norvegicus"),
                                      strain = Resource(id="http://purl.obolibrary.org/obo/RS_0001833", label="Wistar Han"),
                                      age    = Resource(period="Post-natal", value=14, unitCode="days"),
                                      sex    = Resource(id="http://purl.obolibrary.org/obo/PATO_0000384", label="male")
                                     )



In [54]:
forge.update(my_derived_dataset)

<action> _update_one
<succeeded> True


In [55]:
forge.as_json(my_derived_dataset)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/817f7c26-e58c-4b83-8e38-9261bec4abd3',
 'type': 'Dataset',
 'contribution': [{'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469',
    'type': 'Agent'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/cbef0a1b-497e-4e4e-90d5-50c66bd6f1c0?rev=1',
    'type': 'Person'}}],
 'derivation': {'type': 'Derivation',
  'entity': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587?rev=1',
   'type': ['Entity', 'Dataset', 'MyOtherType'],
   'name': 'Interesting Dataset'}},
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'htt

#### Add license

In [56]:
my_derived_dataset.license = Resource (id="https://creativecommons.org/licenses/by/4.0", label="CC BY 4.0", description="You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.") # this is just an example

In [57]:
forge.update(my_derived_dataset)

<action> _update_one
<succeeded> True


In [58]:
forge.as_json(my_derived_dataset)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/817f7c26-e58c-4b83-8e38-9261bec4abd3',
 'type': 'Dataset',
 'contribution': [{'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469',
    'type': 'Agent'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/cbef0a1b-497e-4e4e-90d5-50c66bd6f1c0?rev=1',
    'type': 'Person'}}],
 'derivation': {'type': 'Derivation',
  'entity': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587?rev=1',
   'type': ['Entity', 'Dataset', 'MyOtherType'],
   'name': 'Interesting Dataset'}},
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'htt

### Tag the dataset
Tagging a dataset is equivalent to git tag. It allows to version a dataset.

In [59]:
forge.tag(my_derived_dataset, value="releaseV112")

<action> _tag_one
<succeeded> True


In [60]:
my_derived_dataset.description="Derived Dataset description"

In [61]:
forge.update(my_derived_dataset)

<action> _update_one
<succeeded> True


In [62]:
forge.as_json(my_derived_dataset)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/817f7c26-e58c-4b83-8e38-9261bec4abd3',
 'type': 'Dataset',
 'contribution': [{'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469',
    'type': 'Agent'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/cbef0a1b-497e-4e4e-90d5-50c66bd6f1c0?rev=1',
    'type': 'Person'}}],
 'derivation': {'type': 'Derivation',
  'entity': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587?rev=1',
   'type': ['Entity', 'Dataset', 'MyOtherType'],
   'name': 'Interesting Dataset'}},
 'description': 'Derived Dataset description',
 'distribution': {'type': 'DataDownload',
  'atLocation

In [65]:
# version argument can be specified to retroeive the dataset at a given tag.
result = forge.retrieve(id=my_derived_dataset.id, version="releaseV112")

In [66]:
# Note that description is not retrieved as it was added after the tag
forge.as_json(result)

{'id': 'https://bbp.epfl.ch/neurosciencegraph/data/817f7c26-e58c-4b83-8e38-9261bec4abd3',
 'type': 'Dataset',
 'contribution': [{'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1',
    'type': 'Person'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/f7be52ed-75aa-4ac4-bfd5-ad7b75e36469',
    'type': 'Agent'}},
  {'type': 'Contribution',
   'agent': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/cbef0a1b-497e-4e4e-90d5-50c66bd6f1c0?rev=1',
    'type': 'Person'}}],
 'derivation': {'type': 'Derivation',
  'entity': {'id': 'https://bbp.epfl.ch/neurosciencegraph/data/8efdf9e0-bb78-417b-948f-b5bc137b3587?rev=1',
   'type': ['Entity', 'Dataset', 'MyOtherType'],
   'name': 'Interesting Dataset'}},
 'distribution': {'type': 'DataDownload',
  'atLocation': {'type': 'Location',
   'store': {'id': 'htt