# Allen dataset integration 

This notebook focuses demonstrate how to integrate Allen datasets into the Blue Brain Knowledge Graph

The tasks to be demonstrated are the following:

1. Configuration
2. Retrieve human neuron morphologies from the Allen Cell Types Database
3. Load the complete metadata of the neuron morphologies from Allen
4. Load the transformation Mappings
5. Map the neuron morphologies from Allen to the Neuroshapes Models
6. Add the created entities from Allen to Nexus
7. Retrieve the created entities

## 1. Session configuration

In [1]:
import getpass

import allensdk

from kgforge.core import KnowledgeGraphForge
from kgforge.version import __version__

Check versions

In [2]:
print("Allensdk is", allensdk.__version__, ", and Nexus Fogre is", __version__)

Allensdk is 1.7.1 , and Nexus Fogre is 0.2.1.dev82+g0b66b09


Please enter your BBP token:

In [3]:
# token = getpass.getpass()
token = "eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICI5T0R3Z1JSTFVsTTJHbFphVDZjVklnenJsb0lzUWJmbTBDck1icXNjNHQ4In0.eyJqdGkiOiI4NmJiNmUxNi04OWFhLTRjYjktODNhMC0zYjc2MDU0NDk5NGYiLCJleHAiOjE1OTExNjA4NTIsIm5iZiI6MCwiaWF0IjoxNTkxMTMyMDUyLCJpc3MiOiJodHRwczovL2JicGF1dGguZXBmbC5jaC9hdXRoL3JlYWxtcy9CQlAiLCJzdWIiOiJmOjBmZGFkZWY3LWIyYjktNDkyYi1hZjQ2LWM2NTQ5MmQ0NTljMjphZ2FyY2lhIiwidHlwIjoiQmVhcmVyIiwiYXpwIjoibmV4dXMtd2ViIiwibm9uY2UiOiJjM2M5YmRhMTZkNTM0Zjk0ODgxZGRhYzZhZGQ2NjAzZCIsImF1dGhfdGltZSI6MTU5MTEzMjA1MSwic2Vzc2lvbl9zdGF0ZSI6ImIwM2FjYWM1LTIxOGItNDA5My05Y2ZmLTRlNTRhY2FhNTQxYSIsImFjciI6IjAiLCJhbGxvd2VkLW9yaWdpbnMiOlsiaHR0cHM6Ly9kZXYubmV4dXMub2NwLmJicC5lcGZsLmNoIiwiaHR0cHM6Ly9iYnAuZXBmbC5jaCIsImh0dHA6Ly9kZXYubmV4dXMub2NwLmJicC5lcGZsLmNoIiwiaHR0cHM6Ly9zdGFnaW5nLm5leHVzLm9jcC5iYnAuZXBmbC5jaCIsImh0dHBzOi8vYmJwLW5leHVzLmVwZmwuY2giLCJodHRwczovL2JicHRlYW0uZXBmbC5jaCIsImh0dHA6Ly9zdGFnaW5nLm5leHVzLm9jcC5iYnAuZXBmbC5jaCJdLCJzY29wZSI6Im9wZW5pZCBwcm9maWxlIGdyb3VwcyBlbWFpbCIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJuYW1lIjoiQWxlamFuZHJhIEdhcmNpYSBSb2phcyBHYXJjaWEgUm9qYXMgTWFydGluZXoiLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJhZ2FyY2lhIiwiZ2l2ZW5fbmFtZSI6IkFsZWphbmRyYSBHYXJjaWEgUm9qYXMiLCJmYW1pbHlfbmFtZSI6IkdhcmNpYSBSb2phcyBNYXJ0aW5leiIsImVtYWlsIjoiYWxlamFuZHJhLmdhcmNpYXJvamFzQGVwZmwuY2gifQ.oB7SCkvflnTudSlDINGueJLZalRcMhTNPeevFpJRajCXXlRivH4JRENbVRuYZDO5__N3KjYigRkRIhrP-AXBxj8TLv8nFpyGP6G_1T3BRRqed9EjCxowoA13tEb7x40U1BNnoEPdGau2YiH3149MEALgKpmCLftiCd1ooEwpSzd-6NiVJuWbdlZWd12OJYg3D2oL62pt5n7tCgKUBCIphgG2Okc9StU1Wm-P6UsT--23q-0WXdxODMUrvrOiWi1d9V50LHQ3gzIOqHx4MruBA4NgvE_3QcIOuysWvhZVifErnjKmYAiLhcslkW8Ecilj7wkOu_ZBd64TnBTUdYKnkA"

Note: Initialiting the forge may take a few seconds if the source is a directory

In [4]:
forge = KnowledgeGraphForge("../../configurations/demo-forge-nexus-neuroshapes.yml", token=token)

## 2. Retrieve human neuron morphologies from the Allen Cell Types Database

In [5]:
from allensdk.core.cell_types_cache import CellTypesCache
from allensdk.api.queries.cell_types_api import CellTypesApi

### 2.A Downloaded files: Specify a directory where to download allen files

In [6]:
ALLEN_DIR = "allen_cell_types_database"
ctc = CellTypesCache(manifest_file=f"{ALLEN_DIR}/manifest.json")

In [7]:
human_cells = ctc.get_cells(species=[CellTypesApi.HUMAN], require_reconstruction=True)

In [8]:
!ls allen_cell_types_database

cells.json    manifest.json


### 2.B Pick a subset of cells to integrate

In [9]:
len(human_cells)

152

In [13]:
FROM = 8
TO = 10
human_cell_ids = [x["id"] for x in human_cells][FROM:TO]

In [14]:
human_cell_ids

[527942865, 529807751]

### 2.C  Check the data has not integrated those already by trying to fetch them with the Forge

For the picked cells, we check that they are not already integrated. If they are already integrated, get another couple of ids in human_cell_ids in the step 2.B.

Note that `forge.format` method is used to create the identifier for the patchedcells to be attempted to be retreived from Nexus.

In [15]:
for id_ in human_cell_ids:
    kg_id = forge.format("identifier", "patchedcells", id_)
    print(kg_id) 
    resource = forge.retrieve(kg_id)
    if resource:
        print("> already integrated")

https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/527942865
<action> retrieve
<error> RetrievalError: resource 'https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/527942865' not found for schema 'https://bluebrain.github.io/nexus/schemas/unconstrained.json'

https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/529807751
<action> retrieve
<error> RetrievalError: resource 'https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/529807751' not found for schema 'https://bluebrain.github.io/nexus/schemas/unconstrained.json'



In [16]:
human_cell_reconstructions = [ctc.get_reconstruction(x) for x in human_cell_ids]

2020-06-02 23:31:21,701 allensdk.api.api.retrieve_file_over_http INFO     Downloading URL: http://api.brain-map.org/api/v2/well_known_file_download/601947673
2020-06-02 23:31:23,344 allensdk.api.api.retrieve_file_over_http INFO     Downloading URL: http://api.brain-map.org/api/v2/well_known_file_download/667320244


## 3. Load the complete metadata of the neuron morphologies from Allen

In [17]:
import json

In [18]:
with open(f"{ALLEN_DIR}/cells.json") as f:
    allen_cell_types_metadata = json.load(f)
human_cell_metadata = [x for x in allen_cell_types_metadata if x["specimen__id"] in human_cell_ids]

Have a look to a single record

In [19]:
human_cell_metadata[0]

{'cell_reporter_status': None,
 'csl__normalized_depth': 0.694465802153644,
 'csl__x': 67.0,
 'csl__y': 256.0,
 'csl__z': 110.0,
 'donor__age': '24 yrs',
 'donor__disease_state': 'epilepsy',
 'donor__id': 527747035,
 'donor__name': 'H16.06.008',
 'donor__race': 'Hispanic',
 'donor__sex': 'Female',
 'donor__species': 'Homo Sapiens',
 'donor__years_of_seizure_history': '7',
 'ef__adaptation': 0.975858514620517,
 'ef__avg_firing_rate': 2.71252644713286,
 'ef__avg_isi': 368.66,
 'ef__f_i_curve_slope': 0.0785947712418301,
 'ef__fast_trough_v_long_square': -53.1875,
 'ef__peak_t_ramp': 4.21818666666667,
 'ef__ri': 136.718824505806,
 'ef__tau': 26.8857603438525,
 'ef__threshold_i_long_square': 70.0,
 'ef__upstroke_downstroke_ratio_long_square': 3.90618458032267,
 'ef__vrest': -67.7251434326172,
 'ephys_inst_thresh_thumb_path': '/api/v2/well_known_file_download/529903821',
 'ephys_thumb_path': '/api/v2/well_known_file_download/529903819',
 'erwkf__id': 616980497,
 'line_name': '',
 'm__biophys

## 4. Load the transformation Mappings

The forge a Dictionary Mapper that uses mapping files that provide the required transformation form a dictionary to another dictionary. The Dictionary Mappings are HJSON files containing the required transformations. This notebook has three mappings for: Subject, Patched Cell and Neuron Morphology.

In [20]:
!ls -l ../../mappings/allen-database-mappings

total 24
-rw-r--r--  1 agarcia  INTRANET\Domain Users  2059 Jun  2 23:23 NeuronMorphology.hjson
-rw-r--r--  1 agarcia  INTRANET\Domain Users  1043 Jun  2 23:23 PatchedCell.hjson
-rw-r--r--  1 agarcia  INTRANET\Domain Users   277 Jun  2 23:23 Subject.hjson


In [21]:
DIR = "../../mappings/allen-database-mappings"
subject_mapping_file = f"{DIR}/Subject.hjson"
patched_cel_mapping_file = f"{DIR}/PatchedCell.hjson"
neuronmorphology_mapping_file = f"{DIR}/NeuronMorphology.hjson"

In [22]:
from kgforge.specializations.mappings import DictionaryMapping

In [23]:
subject_mapping = DictionaryMapping.load(subject_mapping_file)
patchedcell_mapping = DictionaryMapping.load(patched_cel_mapping_file)
neuronmorphology_mapping = DictionaryMapping.load(neuronmorphology_mapping_file)

One of he mapping file context is shown next:

In [24]:
print(subject_mapping)

{
    id: forge.format("identifier", "subjects", x.donor__id)
    type: Subject
    identifier: x.donor__id
    name: x.donor__name
    sex: forge.resolve(x.donor__sex, scope="terms", target="sex")
    species: forge.resolve(x.donor__species, scope="terms", target="species")
}


Inside mapping files, it is possible to use methos form the Forge such as:

    - forge.format : used to format a string using a preconfigured string format (used previously in 2.C)
    - forge.resolve: used to retrieve identifiers using a string that is part of the name of the desired resource 
    
An example of the resolver is shown next:

In [25]:
from kgforge.core.commons.strategies import ResolvingStrategy

Check available resolvers

In [26]:
forge.resolvers()

Available scopes:
 -  agent :
     - resolver:  AgentResolver
         - targets:  agents
 -  ontology :
     - resolver:  OntologyResolver
         - targets:  terms
 -  terms :
     - resolver:  DemoResolver
         - targets:  species,sex,structure-layer


Resolve the identifier for male in the terms scope and sex as target

In [27]:
print(forge.resolve("male", scope="terms", target="sex"))

{
    id: http://purl.obolibrary.org/obo/PATO_0000384
    label: male
}


## 5. Map the neuron morphologies from Allen to the Neuroshapes Models

It is possible to provide a list of mappings to be applied to a single dataset.

In [30]:
mappings = [subject_mapping, patchedcell_mapping, neuronmorphology_mapping]

In [31]:
resources = forge.map(human_cell_metadata, mappings)

Check the created resources

In [32]:
len(resources)

6

In [33]:
print(resources[2])

{
    id: https://bbp.epfl.ch/neurosciencegraph/data/neuronmorphologies/527942865
    type: NeuronMorphology
    apicalDendrite: truncated
    brainLocation:
    {
        type: BrainLocation
        brainRegion:
        {
            id: http://api.brain-map.org/api/v2/data/Structure/12141
            label: MTG
        }
        coordinatesInBrainAtlas:
        {
            valueX: 67.0
            valueY: 256.0
            valueZ: 110.0
        }
        layer:
        {
            id: http://purl.obolibrary.org/obo/UBERON_0005394
            label: layer 5
        }
    }
    contribution:
    {
        type: Contribution
        agent:
        {
            id: https://www.grid.ac/institutes/grid.417881.3
            type: Organization
        }
    }
    derivation:
    [
        {
            type: Derivation
            entity:
            {
                id: https://bbp.epfl.ch/neurosciencegraph/data/subjects/527747035
                type: Subject
            }
        }


### 6. Register the created resources from Allen to Nexus

In [34]:
forge.register(resources)

<count> 6
<action> _register_many
<succeeded> True


In [35]:
print(resources[1])

{
    context: https://bbp.neuroshapes.org
    id: https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/527942865
    type: PatchedCell
    brainLocation:
    {
        type: BrainLocation
        brainRegion:
        {
            id: http://api.brain-map.org/api/v2/data/Structure/12141
            label: MTG
        }
    }
    contribution:
    {
        type: Contribution
        agent:
        {
            id: https://www.grid.ac/institutes/grid.417881.3
            type: Organization
        }
    }
    derivation:
    {
        type: Derivation
        entity:
        {
            id: https://bbp.epfl.ch/neurosciencegraph/data/subjects/527747035
            type: Subject
        }
    }
    identifier: 527942865
    name: H16.06.008.01.26.04
    subject:
    {
        id: https://bbp.epfl.ch/neurosciencegraph/data/subjects/527747035
        type: Subject
    }
}


### 7. Retrieve the created entities

If you know exactly the ID you can just retreive as did in 2.C or you can use the `search()` method. To search for resource you can start by picking a Type in the available types.

To create a search based on the `PatchedCell` structure, use the paths() method which will load the structure of the givent type in a Python object and these fields can be accessed using auto-completition.
Next, the `p` object will hold the properties of `PatchedCell` and can be used to create a search.

In [36]:
p = forge.paths("PatchedCell")

In [37]:
results = forge.search(p.type == "PatchedCell")

In [38]:
len(results)

10

In [39]:
DISPLAY_LIMIT = 25

In [40]:
forge.as_dataframe(results[:DISPLAY_LIMIT])

Unnamed: 0,id,type,brainLocation.type,brainLocation.brainRegion.id,brainLocation.brainRegion.label,contribution.type,contribution.agent.id,contribution.agent.type,derivation.type,derivation.entity.id,identifier,name,subject.id,subject.type
0,https://bbp.epfl.ch/neurosciencegraph/data/pat...,PatchedCell,BrainLocation,http://api.brain-map.org/api/v2/data/Structure...,MTG,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,Derivation,https://bbp.epfl.ch/neurosciencegraph/data/sub...,596020931,H17.06.009.11.04.02,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
1,https://bbp.epfl.ch/neurosciencegraph/data/pat...,PatchedCell,BrainLocation,http://api.brain-map.org/api/v2/data/Structure...,MTG,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,Derivation,https://bbp.epfl.ch/neurosciencegraph/data/sub...,519832676,H16.03.001.01.09.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
2,https://bbp.epfl.ch/neurosciencegraph/data/pat...,PatchedCell,BrainLocation,http://api.brain-map.org/api/v2/data/Structure...,AnG,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,Derivation,https://bbp.epfl.ch/neurosciencegraph/data/sub...,569095789,H17.06.004.11.05.04,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
3,https://bbp.epfl.ch/neurosciencegraph/data/pat...,PatchedCell,BrainLocation,http://api.brain-map.org/api/v2/data/Structure...,MTG,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,Derivation,https://bbp.epfl.ch/neurosciencegraph/data/sub...,528706755,H16.06.009.01.01.15.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
4,https://bbp.epfl.ch/neurosciencegraph/data/pat...,PatchedCell,BrainLocation,http://api.brain-map.org/api/v2/data/Structure...,MTG,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,Derivation,https://bbp.epfl.ch/neurosciencegraph/data/sub...,616647103,H17.03.005.11.09.02,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
5,https://bbp.epfl.ch/neurosciencegraph/data/pat...,PatchedCell,BrainLocation,http://api.brain-map.org/api/v2/data/Structure...,FroL,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,Derivation,https://bbp.epfl.ch/neurosciencegraph/data/sub...,531520637,H16.06.007.01.05.03,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
6,https://bbp.epfl.ch/neurosciencegraph/data/pat...,PatchedCell,BrainLocation,http://api.brain-map.org/api/v2/data/Structure...,MTG,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,Derivation,https://bbp.epfl.ch/neurosciencegraph/data/sub...,542143598,H16.03.008.11.11.03,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
7,https://bbp.epfl.ch/neurosciencegraph/data/pat...,PatchedCell,BrainLocation,http://api.brain-map.org/api/v2/data/Structure...,FroL,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,Derivation,https://bbp.epfl.ch/neurosciencegraph/data/sub...,531520401,H16.06.007.01.07.02,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
8,https://bbp.epfl.ch/neurosciencegraph/data/pat...,PatchedCell,BrainLocation,http://api.brain-map.org/api/v2/data/Structure...,MTG,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,Derivation,https://bbp.epfl.ch/neurosciencegraph/data/sub...,527942865,H16.06.008.01.26.04,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject
9,https://bbp.epfl.ch/neurosciencegraph/data/pat...,PatchedCell,BrainLocation,http://api.brain-map.org/api/v2/data/Structure...,MTG,Contribution,https://www.grid.ac/institutes/grid.417881.3,Organization,Derivation,https://bbp.epfl.ch/neurosciencegraph/data/sub...,529807751,H16.06.010.01.03.04.01,https://bbp.epfl.ch/neurosciencegraph/data/sub...,Subject


In [41]:
! rm -R allen_cell_types_database