# Searching and Downloading Data from the Blue Brain Knowledge Graph using the Knowledge Graph Forge

## Initialize and configure

### Get an authentication token

For now, the [Nexus web application](https://bbp.epfl.ch/nexus/web) can be used to get a token. We are looking for other simpler alternatives.

- Step 1: From the opened web page, click on the login button on the right corner and follow the instructions.

![login-ui](./login-ui.png)

- Step 2: At the end you’ll see a token button on the right corner. Click on it to copy the token.

![login-ui](./copy-token.png)


Once a token is obtained then proceed to paste it below.

In [1]:
import getpass

In [None]:
TOKEN = getpass.getpass()

### Configure a client (forge) to access the knowledge graph 

In [3]:
from kgforge.core import KnowledgeGraphForge

In [4]:
# Let target the sscx dissemination project in Nexus
ORG = "public"
PROJECT = "sscx"

In [None]:
forge = KnowledgeGraphForge("prod-forge-nexus.yml",bucket=f"{ORG}/{PROJECT}",token=TOKEN)

## Search and Download

In [None]:
forge.types()

### Ontologies

#### Set filters

In [7]:
# Supported filters for the time being are:
from kgforge.core.commons.strategies import ResolvingStrategy
text = "somatosensory"
limit=10

In [8]:
# other Search strategy can be ResolvingStrategy.BEST_MATCH, ResolvingStrategy.EXACT_MATCH
brain_region = forge.resolve(text, scope="ontology", target="terms", strategy=ResolvingStrategy.ALL_MATCHES, limit=limit)

In [None]:
forge.as_dataframe(brain_region).head(100)

### Neuron Morphologies 

#### Set filters

In [10]:
# Supported filters for the time being are:
_type = "ReconstructedCell"
classification_type="nsg:MType"
mType="L4_NBC"
brainRegion = "primary somatosensory cortex"
layer = "layer 4"
encodingFormat="application/swc"
limit=2

In [None]:
forge.template("Dataset")

#### Run Query

In [None]:
path = forge.paths("Dataset") # to have autocompletion on the properties

In [None]:
data = forge.search(path.type.id == _type,
 path.annotation.hasBody.type.id ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'
 path.annotation.hasBody.label ==mType,
 path.brainLocation.brainRegion.label == brainRegion,
 path.brainLocation.layer.label == layer,
 path.distribution.encodingFormat == encodingFormat,
 limit=limit)

print(str(len(data))+" dataset of type '"+_type+"' found.")

#### Display the results

In [None]:
DISPLAY_LIMIT = 10
reshaped_data = forge.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label",
 "brainLocation.layer.id","brainLocation.layer.label", "contribution",
 "brainLocation.layer.id","brainLocation.layer.label","distribution.name",
 "distribution.contentUrl","distribution.encodingFormat"])

forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

#### Dowload

In [16]:
dirpath = "./downloaded/"
forge.download(data, "distribution.contentUrl", dirpath)

#### Get storage path
It is possible to get files locations and storages (e.g. Blue Brain Nexus Store or GPFS, ...).

In [None]:
forge.as_json(data[0].distribution[0].atLocation)

In [None]:
data[0].distribution[0].atLocation.location

### Electrophysiology Traces

#### Set filters

In [19]:
# Supported filters for the time being are:
_type = "Trace"
classification_type="nsg:EType"
eType="cADpyr"
brainRegion = "primary somatosensory cortex"
layer = "layer 5"
encodingFormat="application/nwb"
limit=10

#### Run Query

In [None]:
path = forge.paths("Dataset") # to have autocompletion on the properties
data = forge.search(path.type.id == _type,
 path.annotation.hasBody.type.id ==classification_type,
 path.annotation.hasBody.label ==eType,
 path.brainLocation.brainRegion.label == brainRegion,
 path.brainLocation.layer.label == layer,
 path.distribution.encodingFormat == encodingFormat,
 limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

#### Display the results

In [None]:
DISPLAY_LIMIT = 10
reshaped_data = forge.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label",
 "brainLocation.layer.id","brainLocation.layer.label", "contribution",
 "brainLocation.layer.id","brainLocation.layer.label",
 "distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

#### Dowload

In [22]:
dirpath = "./downloaded/"
forge.download(data, "distribution.contentUrl", dirpath)

### LayerThickness 

#### Set filters

In [23]:
# Supported filters for the time being are:
_type = "LayerThickness"
brainRegion = "primary somatosensory cortex"
layer = "layer 2"
encodingFormat="application/xlsx"
limit=10

#### Run query

In [None]:
path = forge.paths("Dataset") # to have autocompletion on the properties
data = forge.search(path.type.id == _type,
 path.brainLocation.layer.label == layer,
 path.brainLocation.brainRegion.label == brainRegion,
 path.distribution.encodingFormat == encodingFormat,
 limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

#### Display Results

In [None]:
DISPLAY_LIMIT = 10
reshaped_data = forge.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label",
 "brainLocation.layer.id","brainLocation.layer.label", "contribution",
 "brainLocation.layer.id","brainLocation.layer.label","distribution.name",
 "distribution.contentUrl","distribution.encodingFormat"])

forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

#### Dowload

In [26]:
dirpath = "./downloaded/"
forge.download(data, "distribution.contentUrl", dirpath)

### Neuron Density

#### Set filters

In [27]:
# Supported filters for the time being are:
_type = "NeuronDensity"
brainRegion = "primary somatosensory cortex"
layer = "layer 2"
encodingFormat="application/xlsx"
limit=10

#### Run query

In [29]:
path = forge.paths("Dataset") # to have autocompletion on the properties
data = forge.search(path.type.id == _type,
 path.brainLocation.layer.label == layer,
 path.brainLocation.brainRegion.label == brainRegion,
 path.distribution.encodingFormat == encodingFormat,
 limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

1 data of type 'NeuronDensity' found.


#### Display Results

In [None]:
DISPLAY_LIMIT = 10
reshaped_data = forge.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

#### Dowload

In [31]:
dirpath = "./downloaded/"
forge.download(data, "distribution.contentUrl", dirpath)

### Atlas Release

In [None]:
# Let target the bbp/atlas project in Nexus

forge_atlas = KnowledgeGraphForge("prod-forge-nexus.yml", bucket="bbp/atlas", token=TOKEN)

Atlas related types:
 AtlasRelease
 BrainParcellationDataLayer
 CellDensityDataLayer
 GeneExpressionVolumetricDataLayer
 GliaCellDensity
 NISSLImageDataLayer

#### Set filters

In [35]:
# Supported filters for the time being are:
_type = "BrainParcellationDataLayer"
limit=10

#### Run query

In [None]:
#path = forge_atlas.paths("Dataset") # to have autocompletion on the properties
data = forge_atlas.search(path.type.id == _type,
 limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

#### Display Results

In [None]:
DISPLAY_LIMIT = 10
reshaped_data = forge_atlas.reshape(data, keep=["id","name","brainLocation.brainRegion.id","brainLocation.brainRegion.label", "contribution","distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge_atlas.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

In [38]:
dirpath = "./downloaded/"
forge_atlas.download(data, "distribution.contentUrl", dirpath)

### Data at a given tag
Tagged data are data with immutable identifiers. Such identifier gives the guarantee to retrieve the state of the data at the time the tag was created. Tag here is similaar to git tag.

#### Choose a bucket (or project) to query

In [39]:
bucket = "bbp/lnmce"

In [None]:
forge_tag = KnowledgeGraphForge("prod-forge-nexus.yml", bucket=bucket, token=TOKEN)

#### Set tag value

In [41]:
tag = "LNMCE2020"

#### Set filters

In [46]:
# Let search for Electrophysiology Traces
_type = "Trace"
classification_type="EType"
eType="bIR"
brainRegion = "primary somatosensory cortex"
encodingFormat="application/nwb"
limit=10

#### Run Query

In [47]:
path = forge_tag.paths("Dataset") # to have autocompletion on the properties
data = forge_tag.search(path.type.id == _type,
 path.annotation.hasBody.type.id ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'
 path.annotation.hasBody.label ==eType,
 path.brainLocation.brainRegion.label == brainRegion,
 path.distribution.encodingFormat == encodingFormat,
 limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

10 data of type 'Trace' found.


#### Retrieve results at the set tag

In [None]:
results = [forge_tag.retrieve(d.id, version=tag) for d in data]
print(str(f"{len(results)} data of type '{_type}' at tag {tag} found."))

#### Display the results

In [None]:
DISPLAY_LIMIT = 10
reshaped_data = forge_tag.reshape(results, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge_tag.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

#### Dowload

In [50]:
dirpath = "./downloaded/"
forge_tag.download(results, "distribution.contentUrl", dirpath)

### Data in a given view
A view exposes a subset of data for query and access in specialised indices (SPARQL, ElasticSearch).

In [51]:
# Here is an example of view url
view_url = "https://bluebrain.github.io/nexus/vocabulary/lnmce2020SparqlIndex"

In [None]:
searchendpoints = {"sparql":{"endpoint":"https://bluebrain.github.io/nexus/vocabulary/lnmce2020SparqlIndex"}}
forge_view = KnowledgeGraphForge("prod-forge-nexus.yml", bucket="bbp/lnmce", token=TOKEN, searchendpoints=searchendpoints)

#### Set filters

In [64]:
# Let search for Electrophysiology Traces
_type = "Trace"
classification_type=":EType"
eType="bIR"
brainRegion = "primary somatosensory cortex"
encodingFormat="application/nwb"
limit=10

#### Run Query

In [66]:
path = forge_view.paths("Dataset") # to have autocompletion on the properties
data = forge_view.search(path.type.id == _type,
 path.annotation.hasBody.type.id ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'
 path.annotation.hasBody.label ==eType,
 path.brainLocation.brainRegion.label == brainRegion,
 path.distribution.encodingFormat == encodingFormat,
 limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

10 data of type 'Trace' found.


#### Display the results

In [None]:
DISPLAY_LIMIT = 10
reshaped_data = forge_tag.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge_view.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

#### Dowload

In [68]:
dirpath = "./downloaded/"
forge_view.download(data, "distribution.contentUrl", dirpath)