{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "jcyB6NFmQReL" }, "source": [ "# Tutorial: Integrate Neuroscience Datasets from Multiple Sources using MINDS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize and configure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install nexusforge==0.6.2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install allensdk" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install neurom[plotly]==3.0.1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --upgrade nest-asyncio==1.5.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get an authentication token" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [Nexus sandbox application](https://sandbox.bluebrainnexus.io/web) can be used to get a token:\n", "\n", "- Step 1: From the [web page](https://sandbox.bluebrainnexus.io/web), click on the login button in the top right corner and follow the instructions on screen.\n", "\n", "- Step 2: You will then see a `Copy token` button in the top right corner. Click on it to copy the token to the clipboard.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once a token is obtained, proceed to paste it as the value of the `TOKEN` variable below.\n", "\n", "__Important__: A Nexus token is valid for 8 hours, if your working session is open for more than 8 hours, you may need to refresh the value of the token and reintialize the forge client in the _'Configure a forge client to store, manage and access datasets'_ section below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import getpass" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TOKEN = getpass.getpass()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure a forge client to store, manage and access datasets" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import uuid\n", "import base64\n", "import requests\n", "import json\n", "from pathlib import Path\n", "\n", "from kgforge.core import KnowledgeGraphForge\n", "from kgforge.specializations.mappings import DictionaryMapping\n", "\n", "from allensdk.api.queries.cell_types_api import CellTypesApi\n", "from allensdk.core.cell_types_cache import CellTypesCache" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r = requests.get('https://raw.githubusercontent.com/BlueBrain/nexus/ef830192d4e7bb95f9351c4bdab7b0114c27e2f0/docs/src/main/paradox/docs/getting-started/notebooks/rdfmodel/jsonldcontext.json')\n", "dirpath = './rdfmodel'\n", "Path(dirpath).mkdir(parents=True, exist_ok=True)\n", "with open(f'{dirpath}/jsonldcontext.json', 'w') as outfile:\n", " json.dump(r.json(), outfile)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ORG = \"github-users\"\n", "PROJECT = \"\" # Provide here the automatically created project name created when you logged into the Nexus sandbox instance." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge = KnowledgeGraphForge(\"https://raw.githubusercontent.com/BlueBrain/nexus/ef830192d4e7bb95f9351c4bdab7b0114c27e2f0/docs/src/main/paradox/docs/getting-started/notebooks/forge.yml\",\n", " bucket=f\"{ORG}/{PROJECT}\",\n", " endpoint=\"https://sandbox.bluebrainnexus.io/v1\",\n", " token=TOKEN)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9W5M5Ck9Tq7q" }, "source": [ "## Download datasets from Allen Cell Types Database and MouseLight" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download mouse neuron morphologies from the Allen Cell Types Database" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will be downloading mouse neuron morphology data from the [Allen Cell Types Database](https://celltypes.brain-map.org/). The [AllenSDK](https://allensdk.readthedocs.io/en/latest/) can be used for data download." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ALLEN_DIR = \"allen_cell_types_database\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ctc = CellTypesCache(manifest_file=f\"{ALLEN_DIR}/manifest.json\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MAX_CELLS = 10 # Increase to include more cells\n", "SPECIES = CellTypesApi.MOUSE" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nm_allen_identifiers = [cell[\"id\"] for cell in ctc.get_cells(species=[SPECIES], require_reconstruction = True)][:MAX_CELLS]\n", "print(f\"Selected a mouse neuron with identifier: {nm_allen_identifiers}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(f\"{ALLEN_DIR}/cells.json\") as f:\n", " allen_cell_types_metadata = json.load(f)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nm_allen_metadata = [neuron for neuron in allen_cell_types_metadata if neuron[\"specimen__id\"] in nm_allen_identifiers]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download reconstruction files" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for identifier in nm_allen_identifiers:\n", " ctc.get_reconstruction(identifier)" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": true, "toc-nb-collapsed": true }, "source": [ "### Download mouse neuron electrophysiology recordings from the Allen Cell Types Database" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download Electrophysiology recordings" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for identifier in nm_allen_identifiers:\n", " ctc.get_ephys_data(identifier)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download mouse neuron morphologies from MouseLight project" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will be downloading mouse neuron morphology data from the [MouseLight project](https://www.janelia.org/project-team/mouselight)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "URL_GRAPHQL = \"http://ml-neuronbrowser.janelia.org/graphql/\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "URL_JSON = \"http://ml-neuronbrowser.janelia.org/json/\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "URL_SWC = \"http://ml-neuronbrowser.janelia.org/swc/\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nm_request = requests.post(URL_GRAPHQL, json={\"operationName\":\"SearchNeurons\",\n", " \"variables\":{\n", " \"context\":{\n", " \"scope\":6,\n", " \"nonce\":\"cjyo7xu7k00033h5yrj9jfpoy\",\n", " \"predicates\":[{\n", " \"predicateType\":3,\n", " \"tracingIdsOrDOIs\":[\"1\"],\n", " \"tracingIdsOrDOIsExactMatch\":False,\n", " \"tracingStructureIds\":[\"68e76074-1777-42b6-bbf9-93a6a5f02fa4\"],\n", " \"nodeStructureIds\":[\"c37953e1-a1e9-4b9a-847e-08d9566ced65\"],\n", " \"operatorId\":None,\n", " \"amount\":0,\n", " \"brainAreaIds\":[],\n", " \"arbCenter\":{\n", " \"x\":None,\n", " \"y\":None,\n", " \"z\":None},\n", " \"arbSize\":None,\n", " \"invert\":False,\n", " \"composition\":3\n", " }]\n", " }\n", " },\n", " \"query\":\"\"\"query SearchNeurons($context: SearchContext) {\\n searchNeurons(context: $context) \n", " {\\n totalCount\\n queryTime\\n nonce\\n \\n neurons {\\n id\\n \n", " idString\\n tracings {\\n id\\n tracingStructure {\\n id\\n \n", " name\\n value\\n __typename\\n }\\n soma {\\n id\\n \n", " x\\n y\\n z\\n radius\\n parentNumber\\n \n", " sampleNumber\\n brainAreaIdCcfV30\\n structureIdentifierId\\n \n", " __typename\\n }\\n __typename\\n }\\n __typename\\n }\\n \n", " __typename\\n }\\n}\\n\"\"\"\n", " })\n", "nm_mouselight_graphql = json.loads(nm_request.text)[\"data\"][\"searchNeurons\"][\"neurons\"]\n", "nm_mouselight_names = [x[\"idString\"] for x in nm_mouselight_graphql]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nm_mouselight_metadata = list()\n", "for name in nm_mouselight_names[0:MAX_CELLS]:\n", " a = requests.post(URL_JSON, \n", " json={\"ids\": [name]},\n", " headers={\"Accept\": \"*/*\", \"Content-Type\": \"application/json\"})\n", " c = json.loads(a.content.decode('utf-8'))\n", " nm_mouselight_metadata.append(c[\"contents\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i, nm in enumerate(nm_mouselight_metadata):\n", " allenId = nm[\"neurons\"][0][\"soma\"][\"allenId\"]\n", " allenInfo = nm[\"neurons\"][0][\"allenInformation\"]\n", " for info in allenInfo:\n", " if info[\"allenId\"] == allenId:\n", " allenLabel = info[\"name\"]\n", " nm_mouselight_metadata[i][\"neurons\"][0][\"allenLabel\"] = allenLabel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download reconstruction files" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for name in nm_mouselight_names[0:MAX_CELLS]:\n", " a = requests.post(URL_SWC, \n", " json={\"ids\": [name]},\n", " headers={\"Accept\": \"*/*\", \"Content-Type\": \"application/json\"})\n", " c = eval(a.content)\n", " base64_message = c[\"contents\"]\n", " base64_bytes = base64_message.encode('ascii')\n", " message_bytes = base64.b64decode(base64_bytes)\n", " dirpath = './mouselight'\n", " Path(dirpath).mkdir(parents=True, exist_ok=True)\n", " with open(f\"{dirpath}/{name}.swc\", \"wb\") as f:\n", " f.write(message_bytes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Map the Allen Cell Types Database neuron morphologies to Neuroshapes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "allen_nm_mapping = DictionaryMapping.load(\"https://raw.githubusercontent.com/BlueBrain/nexus/ef830192d4e7bb95f9351c4bdab7b0114c27e2f0/docs/src/main/paradox/docs/getting-started/notebooks/mappings/allen_morphology_dataset.hjson\") # TODO\n", "nm_allen_resources = forge.map(nm_allen_metadata, allen_nm_mapping)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Map the Allen Cell Types Database neuron electrophysiology recordings to Neuroshapes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "allen_ephys_mapping = DictionaryMapping.load(\"https://raw.githubusercontent.com/BlueBrain/nexus/ef830192d4e7bb95f9351c4bdab7b0114c27e2f0/docs/src/main/paradox/docs/getting-started/notebooks/mappings/allen_ephys_dataset.hjson\") # TODO\n", "nephys_allen_resources = forge.map(nm_allen_metadata, allen_ephys_mapping)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Map the MouseLight neuron morphologies to Neuroshapes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mouselight_nm_mapping = DictionaryMapping.load(\"https://raw.githubusercontent.com/BlueBrain/nexus/ef830192d4e7bb95f9351c4bdab7b0114c27e2f0/docs/src/main/paradox/docs/getting-started/notebooks/mappings/mouselight_dataset.hjson\") # TODO\n", "nm_mouselight_resources = forge.map(nm_mouselight_metadata, mouselight_nm_mapping)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Register\n", "\n", "If the registration fails, try refreshing the access token and reinitializing the forge client in the _'Configure a forge client to store, manage and access datasets'_ section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Register the Allen Cell Types Database neuron morphologies" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for resource in nm_allen_resources:\n", " resource.id = forge.format(\"identifier\", \"neuronmorphologies\", str(uuid.uuid4()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.register(nm_allen_resources)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Register the Allen Cell Types Database neuron electrophysiology recordings" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for resource in nephys_allen_resources:\n", " resource.id = forge.format(\"identifier\", \"traces\", str(uuid.uuid4()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.register(nephys_allen_resources)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Register the MouseLight neuron morphologies" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for resource in nm_mouselight_resources:\n", " resource.id = forge.format(\"identifier\", \"neuronmorphologies\", str(uuid.uuid4()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.register(nm_mouselight_resources)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Save the created resources in JSON files" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dirpath = './database'\n", "Path(dirpath).mkdir(parents=True, exist_ok=True)\n", "with open(f\"{dirpath}/mouselight-protocols.json\",\"w\") as f:\n", " json.dump(forge.as_jsonld(nm_mouselight_resources, form=\"expanded\"),f)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(f\"{dirpath}/allen-morphologies-protocols.json\",\"w\") as f:\n", " json.dump(forge.as_jsonld(nm_allen_resources, form=\"expanded\"),f)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(f\"{dirpath}/allen-ephys-protocols.json\",\"w\") as f:\n", " json.dump(forge.as_jsonld(nephys_allen_resources, form=\"expanded\"),f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Access" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set filters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_type = \"NeuronMorphology\"\n", "\n", "filters = {\"type\": _type}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run Query" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "number_of_results = 10 # You can limit the number of results, pass `None` to fetch all the results\n", "\n", "data = forge.search(filters, limit=number_of_results)\n", "\n", "print(f\"{str(len(data))} dataset(s) of type {_type} found\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Display the results as pandas dataframe" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "property_to_display = [\"id\",\"name\",\"subject\",\"brainLocation.brainRegion.id\",\"brainLocation.brainRegion.label\",\"brainLocation.layer.id\",\"brainLocation.layer.label\", \"contribution\",\"brainLocation.layer.id\",\"brainLocation.layer.label\",\"distribution.name\",\"distribution.contentUrl\",\"distribution.encodingFormat\"]\n", "reshaped_data = forge.reshape(data, keep=property_to_display)\n", "\n", "forge.as_dataframe(reshaped_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dirpath = \"./downloaded/\"\n", "forge.download(data, \"distribution.contentUrl\", dirpath, overwrite=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls ./downloaded/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Display a result as 3D Neuron Morphology" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from neurom import load_morphology\n", "from neurom.view.plotly_impl import plot_morph3d\n", "import IPython" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neuron = load_morphology(f\"{dirpath}/{data[0].distribution.name}\")\n", "plot_morph3d(neuron, inline=False)\n", "IPython.display.HTML(filename='./morphology-3D.html')" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": true, "toc-nb-collapsed": true }, "source": [ "## Version the dataset\n", "Tagging a dataset is equivalent to `git tag`. It allows to version a dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.tag(data, value=\"releaseV112\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The version argument can be specified to retrieve the dataset at a given tag.\n", "\n", "tagged_data = forge.retrieve(id=data[0].id, version=\"releaseV112\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.as_dataframe(tagged_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data[0].description=\"Neuron Morphology from Allen\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.update(data[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "non_tagged_data = forge.retrieve(id=data[0].id)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.as_dataframe(non_tagged_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "Step by step jupyter notebook for bringing data to Nexus v1.ipynb", "provenance": [], "version": "0.3.2" }, "kernelspec": { "display_name": "Python (mooc)", "language": "python", "name": "mooc" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.11" } }, "nbformat": 4, "nbformat_minor": 4 }