{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Searching and Downloading Data from the Blue Brain Knowledge Graph using the Knowledge Graph Forge" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize and configure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get an authentication token" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For now, the [Nexus web application](https://bbp.epfl.ch/nexus/web) can be used to get a token. We are looking for other simpler alternatives.\n", "\n", "- Step 1: From the opened web page, click on the login button on the right corner and follow the instructions.\n", "\n", "![login-ui](./login-ui.png)\n", "\n", "- Step 2: At the end you’ll see a token button on the right corner. Click on it to copy the token.\n", "\n", "![login-ui](./copy-token.png)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once a token is obtained then proceed to paste it below." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import getpass" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TOKEN = getpass.getpass()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure a client (forge) to access the knowledge graph " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from kgforge.core import KnowledgeGraphForge" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Let target the sscx dissemination project in Nexus\n", "ORG = \"public\"\n", "PROJECT = \"sscx\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge = KnowledgeGraphForge(\"prod-forge-nexus.yml\",bucket=f\"{ORG}/{PROJECT}\",token=TOKEN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Search and Download" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.types()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ontologies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set filters" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Supported filters for the time being are:\n", "from kgforge.core.commons.strategies import ResolvingStrategy\n", "text = \"somatosensory\"\n", "limit=10" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# other Search strategy can be ResolvingStrategy.BEST_MATCH, ResolvingStrategy.EXACT_MATCH\n", "brain_region = forge.resolve(text, scope=\"ontology\", target=\"terms\", strategy=ResolvingStrategy.ALL_MATCHES, limit=limit)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.as_dataframe(brain_region).head(100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Neuron Morphologies " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set filters" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Supported filters for the time being are:\n", "_type = \"ReconstructedCell\"\n", "classification_type=\"nsg:MType\"\n", "mType=\"L4_NBC\"\n", "brainRegion = \"primary somatosensory cortex\"\n", "layer = \"layer 4\"\n", "encodingFormat=\"application/swc\"\n", "limit=2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.template(\"Dataset\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run Query" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = forge.paths(\"Dataset\") # to have autocompletion on the properties" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = forge.search(path.type.id == _type,\n", " path.annotation.hasBody.type.id ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'\n", " path.annotation.hasBody.label ==mType,\n", " path.brainLocation.brainRegion.label == brainRegion,\n", " path.brainLocation.layer.label == layer,\n", " path.distribution.encodingFormat == encodingFormat,\n", " limit=limit)\n", "\n", "print(str(len(data))+\" dataset of type '\"+_type+\"' found.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Display the results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DISPLAY_LIMIT = 10\n", "reshaped_data = forge.reshape(data, keep=[\"id\",\"name\",\"subject\",\"brainLocation.brainRegion.id\",\"brainLocation.brainRegion.label\",\n", " \"brainLocation.layer.id\",\"brainLocation.layer.label\", \"contribution\",\n", " \"brainLocation.layer.id\",\"brainLocation.layer.label\",\"distribution.name\",\n", " \"distribution.contentUrl\",\"distribution.encodingFormat\"])\n", "\n", "forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Dowload" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "dirpath = \"./downloaded/\"\n", "forge.download(data, \"distribution.contentUrl\", dirpath)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Get storage path\n", "It is possible to get files locations and storages (e.g. Blue Brain Nexus Store or GPFS, ...)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.as_json(data[0].distribution[0].atLocation)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data[0].distribution[0].atLocation.location" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Electrophysiology Traces" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set filters" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# Supported filters for the time being are:\n", "_type = \"Trace\"\n", "classification_type=\"nsg:EType\"\n", "eType=\"cADpyr\"\n", "brainRegion = \"primary somatosensory cortex\"\n", "layer = \"layer 5\"\n", "encodingFormat=\"application/nwb\"\n", "limit=10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run Query" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = forge.paths(\"Dataset\") # to have autocompletion on the properties\n", "data = forge.search(path.type.id == _type,\n", " path.annotation.hasBody.type.id ==classification_type,\n", " path.annotation.hasBody.label ==eType,\n", " path.brainLocation.brainRegion.label == brainRegion,\n", " path.brainLocation.layer.label == layer,\n", " path.distribution.encodingFormat == encodingFormat,\n", " limit=limit)\n", "\n", "print(str(len(data))+\" data of type '\"+_type+\"' found.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Display the results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DISPLAY_LIMIT = 10\n", "reshaped_data = forge.reshape(data, keep=[\"id\",\"name\",\"subject\",\"brainLocation.brainRegion.id\",\"brainLocation.brainRegion.label\",\n", " \"brainLocation.layer.id\",\"brainLocation.layer.label\", \"contribution\",\n", " \"brainLocation.layer.id\",\"brainLocation.layer.label\",\n", " \"distribution.name\",\"distribution.contentUrl\",\"distribution.encodingFormat\"])\n", "\n", "forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Dowload" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "dirpath = \"./downloaded/\"\n", "forge.download(data, \"distribution.contentUrl\", dirpath)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### LayerThickness " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set filters" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# Supported filters for the time being are:\n", "_type = \"LayerThickness\"\n", "brainRegion = \"primary somatosensory cortex\"\n", "layer = \"layer 2\"\n", "encodingFormat=\"application/xlsx\"\n", "limit=10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run query" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = forge.paths(\"Dataset\") # to have autocompletion on the properties\n", "data = forge.search(path.type.id == _type,\n", " path.brainLocation.layer.label == layer,\n", " path.brainLocation.brainRegion.label == brainRegion,\n", " path.distribution.encodingFormat == encodingFormat,\n", " limit=limit)\n", "\n", "print(str(len(data))+\" data of type '\"+_type+\"' found.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Display Results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DISPLAY_LIMIT = 10\n", "reshaped_data = forge.reshape(data, keep=[\"id\",\"name\",\"subject\",\"brainLocation.brainRegion.id\",\"brainLocation.brainRegion.label\",\n", " \"brainLocation.layer.id\",\"brainLocation.layer.label\", \"contribution\",\n", " \"brainLocation.layer.id\",\"brainLocation.layer.label\",\"distribution.name\",\n", " \"distribution.contentUrl\",\"distribution.encodingFormat\"])\n", "\n", "forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Dowload" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "dirpath = \"./downloaded/\"\n", "forge.download(data, \"distribution.contentUrl\", dirpath)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Neuron Density" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set filters" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# Supported filters for the time being are:\n", "_type = \"NeuronDensity\"\n", "brainRegion = \"primary somatosensory cortex\"\n", "layer = \"layer 2\"\n", "encodingFormat=\"application/xlsx\"\n", "limit=10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run query" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 data of type 'NeuronDensity' found.\n" ] } ], "source": [ "path = forge.paths(\"Dataset\") # to have autocompletion on the properties\n", "data = forge.search(path.type.id == _type,\n", " path.brainLocation.layer.label == layer,\n", " path.brainLocation.brainRegion.label == brainRegion,\n", " path.distribution.encodingFormat == encodingFormat,\n", " limit=limit)\n", "\n", "print(str(len(data))+\" data of type '\"+_type+\"' found.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Display Results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DISPLAY_LIMIT = 10\n", "reshaped_data = forge.reshape(data, keep=[\"id\",\"name\",\"subject\",\"brainLocation.brainRegion.id\",\"brainLocation.brainRegion.label\",\"brainLocation.layer.id\",\"brainLocation.layer.label\", \"contribution\",\"brainLocation.layer.id\",\"brainLocation.layer.label\",\"distribution.name\",\"distribution.contentUrl\",\"distribution.encodingFormat\"])\n", "\n", "forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Dowload" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "dirpath = \"./downloaded/\"\n", "forge.download(data, \"distribution.contentUrl\", dirpath)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Atlas Release" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let target the bbp/atlas project in Nexus\n", "\n", "forge_atlas = KnowledgeGraphForge(\"prod-forge-nexus.yml\", bucket=\"bbp/atlas\", token=TOKEN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Atlas related types:\n", " AtlasRelease\n", " BrainParcellationDataLayer\n", " CellDensityDataLayer\n", " GeneExpressionVolumetricDataLayer\n", " GliaCellDensity\n", " NISSLImageDataLayer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set filters" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "# Supported filters for the time being are:\n", "_type = \"BrainParcellationDataLayer\"\n", "limit=10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run query" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#path = forge_atlas.paths(\"Dataset\") # to have autocompletion on the properties\n", "data = forge_atlas.search(path.type.id == _type,\n", " limit=limit)\n", "\n", "print(str(len(data))+\" data of type '\"+_type+\"' found.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Display Results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DISPLAY_LIMIT = 10\n", "reshaped_data = forge_atlas.reshape(data, keep=[\"id\",\"name\",\"brainLocation.brainRegion.id\",\"brainLocation.brainRegion.label\", \"contribution\",\"distribution.name\",\"distribution.contentUrl\",\"distribution.encodingFormat\"])\n", "\n", "forge_atlas.as_dataframe(reshaped_data[:DISPLAY_LIMIT])" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "dirpath = \"./downloaded/\"\n", "forge_atlas.download(data, \"distribution.contentUrl\", dirpath)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data at a given tag\n", "Tagged data are data with immutable identifiers. Such identifier gives the guarantee to retrieve the state of the data at the time the tag was created. Tag here is similaar to git tag." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Choose a bucket (or project) to query" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "bucket = \"bbp/lnmce\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge_tag = KnowledgeGraphForge(\"prod-forge-nexus.yml\", bucket=bucket, token=TOKEN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set tag value" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "tag = \"LNMCE2020\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set filters" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "# Let search for Electrophysiology Traces\n", "_type = \"Trace\"\n", "classification_type=\"EType\"\n", "eType=\"bIR\"\n", "brainRegion = \"primary somatosensory cortex\"\n", "encodingFormat=\"application/nwb\"\n", "limit=10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run Query" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 data of type 'Trace' found.\n" ] } ], "source": [ "path = forge_tag.paths(\"Dataset\") # to have autocompletion on the properties\n", "data = forge_tag.search(path.type.id == _type,\n", " path.annotation.hasBody.type.id ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'\n", " path.annotation.hasBody.label ==eType,\n", " path.brainLocation.brainRegion.label == brainRegion,\n", " path.distribution.encodingFormat == encodingFormat,\n", " limit=limit)\n", "\n", "print(str(len(data))+\" data of type '\"+_type+\"' found.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Retrieve results at the set tag" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "results = [forge_tag.retrieve(d.id, version=tag) for d in data]\n", "print(str(f\"{len(results)} data of type '{_type}' at tag {tag} found.\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Display the results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DISPLAY_LIMIT = 10\n", "reshaped_data = forge_tag.reshape(results, keep=[\"id\",\"name\",\"subject\",\"brainLocation.brainRegion.id\",\"brainLocation.brainRegion.label\",\"brainLocation.layer.id\",\"brainLocation.layer.label\", \"contribution\",\"brainLocation.layer.id\",\"brainLocation.layer.label\",\"distribution.name\",\"distribution.contentUrl\",\"distribution.encodingFormat\"])\n", "\n", "forge_tag.as_dataframe(reshaped_data[:DISPLAY_LIMIT])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Dowload" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "dirpath = \"./downloaded/\"\n", "forge_tag.download(results, \"distribution.contentUrl\", dirpath)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data in a given view\n", "A view exposes a subset of data for query and access in specialised indices (SPARQL, ElasticSearch)." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "# Here is an example of view url\n", "view_url = \"https://bluebrain.github.io/nexus/vocabulary/lnmce2020SparqlIndex\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "searchendpoints = {\"sparql\":{\"endpoint\":\"https://bluebrain.github.io/nexus/vocabulary/lnmce2020SparqlIndex\"}}\n", "forge_view = KnowledgeGraphForge(\"prod-forge-nexus.yml\", bucket=\"bbp/lnmce\", token=TOKEN, searchendpoints=searchendpoints)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set filters" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "# Let search for Electrophysiology Traces\n", "_type = \"Trace\"\n", "classification_type=\":EType\"\n", "eType=\"bIR\"\n", "brainRegion = \"primary somatosensory cortex\"\n", "encodingFormat=\"application/nwb\"\n", "limit=10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run Query" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 data of type 'Trace' found.\n" ] } ], "source": [ "path = forge_view.paths(\"Dataset\") # to have autocompletion on the properties\n", "data = forge_view.search(path.type.id == _type,\n", " path.annotation.hasBody.type.id ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'\n", " path.annotation.hasBody.label ==eType,\n", " path.brainLocation.brainRegion.label == brainRegion,\n", " path.distribution.encodingFormat == encodingFormat,\n", " limit=limit)\n", "\n", "print(str(len(data))+\" data of type '\"+_type+\"' found.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Display the results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DISPLAY_LIMIT = 10\n", "reshaped_data = forge_tag.reshape(data, keep=[\"id\",\"name\",\"subject\",\"brainLocation.brainRegion.id\",\"brainLocation.brainRegion.label\",\"brainLocation.layer.id\",\"brainLocation.layer.label\", \"contribution\",\"brainLocation.layer.id\",\"brainLocation.layer.label\",\"distribution.name\",\"distribution.contentUrl\",\"distribution.encodingFormat\"])\n", "\n", "forge_view.as_dataframe(reshaped_data[:DISPLAY_LIMIT])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Dowload" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "dirpath = \"./downloaded/\"\n", "forge_view.download(data, \"distribution.contentUrl\", dirpath)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.7 (nexusforgelatest)", "language": "python", "name": "nexusforgelatest" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }