{ "cells": [ { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:19.036357Z", "start_time": "2019-09-23T18:50:19.031896Z" } }, "source": [ "# Querying\n", "\n", "This notebook demonstrates Nexus Forge data [querying features](https://nexus-forge.readthedocs.io/en/latest/interaction.html#querying)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.068658Z", "start_time": "2019-09-23T18:50:19.054054Z" } }, "outputs": [], "source": [ "from kgforge.core import KnowledgeGraphForge" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook [00-Initialization.ipynb](00%20-%20Initialization.ipynb)." ] }, { "cell_type": "code", "execution_count": 162, "metadata": {}, "outputs": [], "source": [ "forge = KnowledgeGraphForge(\"../../configurations/forge.yml\")" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": true, "toc-nb-collapsed": true }, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "from kgforge.core import Resource\n", "from kgforge.specializations.resources import Dataset\n", "from kgforge.core.wrappings.paths import Filter, FilterOperator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Retrieval" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### latest version" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "jane = Resource(type=\"Person\", name=\"Jane Doe\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " _register_one\n", " True\n" ] } ], "source": [ "forge.register(jane)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "resource = forge.retrieve(jane.id)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resource == jane" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### specific version" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "jane = Resource(type=\"Person\", name=\"Jane Doe\")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " _register_one\n", " True\n" ] } ], "source": [ "forge.register(jane)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " _tag_one\n", " True\n" ] } ], "source": [ "forge.tag(jane, \"v1\")" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "jane.email = \"jane.doe@epfl.ch\"" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " _update_one\n", " True\n" ] } ], "source": [ "forge.update(jane)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:21.317601Z", "start_time": "2019-09-23T18:50:21.310418Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n" ] } ], "source": [ "try:\n", " # DemoStore\n", " print(jane._store_metadata.version)\n", "except:\n", " # BlueBrainNexus\n", " print(jane._store_metadata._rev)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:21.332678Z", "start_time": "2019-09-23T18:50:21.322025Z" } }, "outputs": [], "source": [ "jane_v1 = forge.retrieve(jane.id, version=1)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:21.370051Z", "start_time": "2019-09-23T18:50:21.363782Z" } }, "outputs": [], "source": [ "jane_v1_tag = forge.retrieve(jane.id, version=\"v1\")" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:21.379911Z", "start_time": "2019-09-23T18:50:21.373539Z" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jane_v1 == jane_v1_tag" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### crossbucket retrieval\n", "It is possible to retrieve resources stored in buckets different then the configured one. The configured store should of course support it." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "resource = forge.retrieve(jane.id, cross_bucket=True) # cross_bucket defaults to False" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### error handling" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " retrieve\n", " RetrievalError: 404 Client Error: Not Found for url: https://sandbox.bluebrainnexus.io/v1/resources/github-users/mfsy/_/%3A%2F%2F123\n", "\n" ] } ], "source": [ "resource = forge.retrieve(\"123\")" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resource is None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Searching" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel. Commented lines are for DemoModel." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "jane = Resource(type=\"Person\", name=\"Jane Doe\")\n", "contribution_jane = Resource(type=\"Contribution\", agent=jane)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "john = Resource(type=\"Person\", name=\"John Smith\")\n", "contribution_john = Resource(type=\"Contribution\", agent=john)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "dataset = Dataset(forge, type=\"Dataset\", contribution=[contribution_jane, contribution_john])\n", "dataset.add_distribution(\"../../data/associations.tsv\")" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " _register_one\n", " True\n" ] } ], "source": [ "forge.register(dataset)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.as_json(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Paths as filters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `paths` method load the template or property paths for a given type.\n", "\n", "Please refer to the [Modeling.ipynb](11%20-%20Modeling.ipynb) notebook to learn about templates and types." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "p = forge.paths(\"Dataset\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You have autocompletion on `p` and this can be used to create search filters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: There is a known issue for RdfModel which requires using `p.type.id` instead of `p.type`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All [python comparison operators](https://www.w3schools.com/python/gloss_python_comparison_operators.asp) are supported." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "resources = forge.search(p.type.id==\"Person\", limit=3)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtype_schemaProjectnamedistribution.typedistribution.atLocation.typedistribution.atLocation.store.iddistribution.contentSize.unitCodedistribution.contentSize.valuedistribution.contentUrldistribution.digest.algorithmdistribution.digest.valuedistribution.encodingFormatdistribution.name
0https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...Jane DoeNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...John SmithNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...Jane DoeDataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes52.0https://sandbox.bluebrainnexus.io/v1/files/git...SHA-2561dacd765946963fda4949753659089c5f532714b418d30...text/csvpersons.csv
\n", "
" ], "text/plain": [ " id type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "\n", " _schemaProject name \\\n", "0 https://sandbox.bluebrainnexus.io/v1/projects/... Jane Doe \n", "1 https://sandbox.bluebrainnexus.io/v1/projects/... John Smith \n", "2 https://sandbox.bluebrainnexus.io/v1/projects/... Jane Doe \n", "\n", " distribution.type distribution.atLocation.type \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 DataDownload Location \n", "\n", " distribution.atLocation.store.id \\\n", "0 NaN \n", "1 NaN \n", "2 https://bluebrain.github.io/nexus/vocabulary/d... \n", "\n", " distribution.contentSize.unitCode distribution.contentSize.value \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 bytes 52.0 \n", "\n", " distribution.contentUrl \\\n", "0 NaN \n", "1 NaN \n", "2 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "\n", " distribution.digest.algorithm \\\n", "0 NaN \n", "1 NaN \n", "2 SHA-256 \n", "\n", " distribution.digest.value \\\n", "0 NaN \n", "1 NaN \n", "2 1dacd765946963fda4949753659089c5f532714b418d30... \n", "\n", " distribution.encodingFormat distribution.name \n", "0 NaN NaN \n", "1 NaN NaN \n", "2 text/csv persons.csv " ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.as_json(resources[2])" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtype_schemaProjectname_constrainedBy_createdAt_createdBy_deprecated_incoming_outgoing...distribution.typedistribution.atLocation.typedistribution.atLocation.store.iddistribution.contentSize.unitCodedistribution.contentSize.valuedistribution.contentUrldistribution.digest.algorithmdistribution.digest.valuedistribution.encodingFormatdistribution.name
0https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...Jane Doehttps://bluebrain.github.io/nexus/schemas/unco...2022-01-06T15:46:40.285Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources......NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...Jane Doehttps://bluebrain.github.io/nexus/schemas/unco...2022-01-06T15:47:00.719Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources......DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes52.0https://sandbox.bluebrainnexus.io/v1/files/git...SHA-2561dacd765946963fda4949753659089c5f532714b418d30...text/csvpersons.csv
2https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...Jane Doehttps://bluebrain.github.io/nexus/schemas/unco...2022-01-07T11:26:11.330Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources......NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

3 rows × 25 columns

\n", "
" ], "text/plain": [ " id type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "\n", " _schemaProject name \\\n", "0 https://sandbox.bluebrainnexus.io/v1/projects/... Jane Doe \n", "1 https://sandbox.bluebrainnexus.io/v1/projects/... Jane Doe \n", "2 https://sandbox.bluebrainnexus.io/v1/projects/... Jane Doe \n", "\n", " _constrainedBy \\\n", "0 https://bluebrain.github.io/nexus/schemas/unco... \n", "1 https://bluebrain.github.io/nexus/schemas/unco... \n", "2 https://bluebrain.github.io/nexus/schemas/unco... \n", "\n", " _createdAt \\\n", "0 2022-01-06T15:46:40.285Z \n", "1 2022-01-06T15:47:00.719Z \n", "2 2022-01-07T11:26:11.330Z \n", "\n", " _createdBy _deprecated \\\n", "0 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "1 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "2 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "\n", " _incoming \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... \n", "\n", " _outgoing ... distribution.type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... ... NaN \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... ... DataDownload \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... ... NaN \n", "\n", " distribution.atLocation.type \\\n", "0 NaN \n", "1 Location \n", "2 NaN \n", "\n", " distribution.atLocation.store.id \\\n", "0 NaN \n", "1 https://bluebrain.github.io/nexus/vocabulary/d... \n", "2 NaN \n", "\n", " distribution.contentSize.unitCode distribution.contentSize.value \\\n", "0 NaN NaN \n", "1 bytes 52.0 \n", "2 NaN NaN \n", "\n", " distribution.contentUrl \\\n", "0 NaN \n", "1 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "2 NaN \n", "\n", " distribution.digest.algorithm \\\n", "0 NaN \n", "1 SHA-256 \n", "2 NaN \n", "\n", " distribution.digest.value \\\n", "0 NaN \n", "1 1dacd765946963fda4949753659089c5f532714b418d30... \n", "2 NaN \n", "\n", " distribution.encodingFormat distribution.name \n", "0 NaN NaN \n", "1 text/csv persons.csv \n", "2 NaN NaN \n", "\n", "[3 rows x 25 columns]" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources, store_metadata=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Nested property querying" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Property autocompletion is available on a path `p` even for nested properties like `p.contribution`." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "# Search for resources of type Person and with text/tab-separated-values as distribution.encodingFormat\n", "resources = forge.search(p.type.id == \"Person\", p.distribution.encodingFormat == \"text/tab-separated-values\", limit=3)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtype_schemaProjectdistribution.typedistribution.atLocation.typedistribution.atLocation.store.iddistribution.contentSize.unitCodedistribution.contentSize.valuedistribution.contentUrldistribution.digest.algorithmdistribution.digest.valuedistribution.encodingFormatdistribution.namename
0https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-2569639abc864e91c645779f510ae5c06a1618941d569eb1a...text/tab-separated-valuesassociations.tsvJane Doe
1https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-2569639abc864e91c645779f510ae5c06a1618941d569eb1a...text/tab-separated-valuesassociations.tsvJane Doe
2https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-2569639abc864e91c645779f510ae5c06a1618941d569eb1a...text/tab-separated-valuesassociations.tsvJane Doe
\n", "
" ], "text/plain": [ " id type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "\n", " _schemaProject distribution.type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "1 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "2 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "\n", " distribution.atLocation.type \\\n", "0 Location \n", "1 Location \n", "2 Location \n", "\n", " distribution.atLocation.store.id \\\n", "0 https://bluebrain.github.io/nexus/vocabulary/d... \n", "1 https://bluebrain.github.io/nexus/vocabulary/d... \n", "2 https://bluebrain.github.io/nexus/vocabulary/d... \n", "\n", " distribution.contentSize.unitCode distribution.contentSize.value \\\n", "0 bytes 506 \n", "1 bytes 506 \n", "2 bytes 506 \n", "\n", " distribution.contentUrl \\\n", "0 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "1 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "2 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "\n", " distribution.digest.algorithm \\\n", "0 SHA-256 \n", "1 SHA-256 \n", "2 SHA-256 \n", "\n", " distribution.digest.value \\\n", "0 9639abc864e91c645779f510ae5c06a1618941d569eb1a... \n", "1 9639abc864e91c645779f510ae5c06a1618941d569eb1a... \n", "2 9639abc864e91c645779f510ae5c06a1618941d569eb1a... \n", "\n", " distribution.encodingFormat distribution.name name \n", "0 text/tab-separated-values associations.tsv Jane Doe \n", "1 text/tab-separated-values associations.tsv Jane Doe \n", "2 text/tab-separated-values associations.tsv Jane Doe " ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dict as filters\n", "A dictionary can be provided for filters:\n", "* {'type': {'id':'Dataset'}} is equivalent to p.type.id==\"Dataset\"\n", "* only the '==' operator is supported\n", "* nested dict are supported\n", "* it is not mandatory for the provided properties and values to be defined in the forge model. Results will be retrieved if there are corresponding data in the store.\n", "\n", "This feature is not supported when using the DemoStore\n" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "# Search for resources of type Person and with text/tab-separated-values as distribution.encodingFormat\n", "filters = {\"type\": \"Person\", \"distribution\":{\"encodingFormat\":\"text/tab-separated-values\"}}\n", "resources = forge.search(filters, limit=3)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtype_schemaProjectdistribution.typedistribution.atLocation.typedistribution.atLocation.store.iddistribution.contentSize.unitCodedistribution.contentSize.valuedistribution.contentUrldistribution.digest.algorithm..._createdAt_createdBy_deprecated_incoming_outgoing_project_rev_self_updatedAt_updatedBy
0https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-256...2021-08-17T11:00:14.662Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/projects/...1https://sandbox.bluebrainnexus.io/v1/resources...2021-08-17T11:00:14.662Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...
1https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-256...2021-08-23T09:12:24.049Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/projects/...1https://sandbox.bluebrainnexus.io/v1/resources...2021-08-23T09:12:24.049Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...
2https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-256...2021-08-23T09:18:43.327Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/projects/...1https://sandbox.bluebrainnexus.io/v1/resources...2021-08-23T09:18:43.327Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...
\n", "

3 rows × 25 columns

\n", "
" ], "text/plain": [ " id type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "\n", " _schemaProject distribution.type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "1 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "2 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "\n", " distribution.atLocation.type \\\n", "0 Location \n", "1 Location \n", "2 Location \n", "\n", " distribution.atLocation.store.id \\\n", "0 https://bluebrain.github.io/nexus/vocabulary/d... \n", "1 https://bluebrain.github.io/nexus/vocabulary/d... \n", "2 https://bluebrain.github.io/nexus/vocabulary/d... \n", "\n", " distribution.contentSize.unitCode distribution.contentSize.value \\\n", "0 bytes 506 \n", "1 bytes 506 \n", "2 bytes 506 \n", "\n", " distribution.contentUrl \\\n", "0 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "1 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "2 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "\n", " distribution.digest.algorithm ... _createdAt \\\n", "0 SHA-256 ... 2021-08-17T11:00:14.662Z \n", "1 SHA-256 ... 2021-08-23T09:12:24.049Z \n", "2 SHA-256 ... 2021-08-23T09:18:43.327Z \n", "\n", " _createdBy _deprecated \\\n", "0 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "1 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "2 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "\n", " _incoming \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... \n", "\n", " _outgoing \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... \n", "\n", " _project _rev \\\n", "0 https://sandbox.bluebrainnexus.io/v1/projects/... 1 \n", "1 https://sandbox.bluebrainnexus.io/v1/projects/... 1 \n", "2 https://sandbox.bluebrainnexus.io/v1/projects/... 1 \n", "\n", " _self \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... \n", "\n", " _updatedAt _updatedBy \n", "0 2021-08-17T11:00:14.662Z https://sandbox.bluebrainnexus.io/v1/realms/gi... \n", "1 2021-08-23T09:12:24.049Z https://sandbox.bluebrainnexus.io/v1/realms/gi... \n", "2 2021-08-23T09:18:43.327Z https://sandbox.bluebrainnexus.io/v1/realms/gi... \n", "\n", "[3 rows x 25 columns]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources, store_metadata=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Built-in Filter objects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Supported filter operators" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['__eq__ (EQUAL)',\n", " '__ne__ (NOT_EQUAL)',\n", " '__lt__ (LOWER_THAN)',\n", " '__le__ (LOWER_OR_Equal_Than)',\n", " '__gt__ (GREATER_Than)',\n", " '__ge__ (GREATER_OR_Equal_Than)']" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[f\"{op.value} ({op.name})\" for op in FilterOperator] # These are equivalent to the Python comparison operators" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [], "source": [ "# Search for resources of type Person and with text/tab-separated-values as distribution.encodingFormat\n", "\n", "filter_1 = Filter(operator=\"__eq__\", path=[\"type\"], value=\"Person\")\n", "filter_2 = Filter(operator=\"__eq__\", path=[\"distribution\",\"encodingFormat\"], value=\"text/tab-separated-values\")\n", "resources = forge.search(filter_1,filter_2, limit=3)" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtype_schemaProjectdistribution.typedistribution.atLocation.typedistribution.atLocation.store.iddistribution.contentSize.unitCodedistribution.contentSize.valuedistribution.contentUrldistribution.digest.algorithm..._createdAt_createdBy_deprecated_incoming_outgoing_project_rev_self_updatedAt_updatedBy
0https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-256...2021-08-17T11:00:14.662Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/projects/...1https://sandbox.bluebrainnexus.io/v1/resources...2021-08-17T11:00:14.662Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...
1https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-256...2021-08-23T09:12:24.049Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/projects/...1https://sandbox.bluebrainnexus.io/v1/resources...2021-08-23T09:12:24.049Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...
2https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-256...2021-08-23T09:18:43.327Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/projects/...1https://sandbox.bluebrainnexus.io/v1/resources...2021-08-23T09:18:43.327Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...
\n", "

3 rows × 25 columns

\n", "
" ], "text/plain": [ " id type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "\n", " _schemaProject distribution.type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "1 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "2 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "\n", " distribution.atLocation.type \\\n", "0 Location \n", "1 Location \n", "2 Location \n", "\n", " distribution.atLocation.store.id \\\n", "0 https://bluebrain.github.io/nexus/vocabulary/d... \n", "1 https://bluebrain.github.io/nexus/vocabulary/d... \n", "2 https://bluebrain.github.io/nexus/vocabulary/d... \n", "\n", " distribution.contentSize.unitCode distribution.contentSize.value \\\n", "0 bytes 506 \n", "1 bytes 506 \n", "2 bytes 506 \n", "\n", " distribution.contentUrl \\\n", "0 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "1 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "2 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "\n", " distribution.digest.algorithm ... _createdAt \\\n", "0 SHA-256 ... 2021-08-17T11:00:14.662Z \n", "1 SHA-256 ... 2021-08-23T09:12:24.049Z \n", "2 SHA-256 ... 2021-08-23T09:18:43.327Z \n", "\n", " _createdBy _deprecated \\\n", "0 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "1 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "2 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "\n", " _incoming \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... \n", "\n", " _outgoing \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... \n", "\n", " _project _rev \\\n", "0 https://sandbox.bluebrainnexus.io/v1/projects/... 1 \n", "1 https://sandbox.bluebrainnexus.io/v1/projects/... 1 \n", "2 https://sandbox.bluebrainnexus.io/v1/projects/... 1 \n", "\n", " _self \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... \n", "\n", " _updatedAt _updatedBy \n", "0 2021-08-17T11:00:14.662Z https://sandbox.bluebrainnexus.io/v1/realms/gi... \n", "1 2021-08-23T09:12:24.049Z https://sandbox.bluebrainnexus.io/v1/realms/gi... \n", "2 2021-08-23T09:18:43.327Z https://sandbox.bluebrainnexus.io/v1/realms/gi... \n", "\n", "[3 rows x 25 columns]" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources, store_metadata=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Search Endpoints\n", "\n", "Two types of search endpoints are supported: 'sparql' for graph queries and 'elastic' for document oriented queries. The types of available search endpoint can be configured (see [00-Initialization.ipynb](00%20-%20Initialization.ipynb) for an example of search endpoints config) or set when creating a KnowledgeGraphForge session using the 'searchendpoints' arguments.\n", "\n", "The search endpoint to hit when calling forge.search(...) is 'sparql' by default but can be specified using the 'search_endpoint' argument." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SPARQL Search Endpoint" ] }, { "cell_type": "code", "execution_count": 184, "metadata": {}, "outputs": [], "source": [ "# Search for resources of type Person and with text/tab-separated-values as distribution.encodingFormat\n", "filters = {\"type\": \"Person\", \"distribution\":{\"encodingFormat\":\"text/tab-separated-values\"}}\n", "resources = forge.search(filters, limit=3, search_endpoint='sparql')" ] }, { "cell_type": "code", "execution_count": 172, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 172, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": 173, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 173, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": 174, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtype_schemaProjectdistribution.typedistribution.atLocation.typedistribution.atLocation.store.iddistribution.contentSize.unitCodedistribution.contentSize.valuedistribution.contentUrldistribution.digest.algorithm..._createdAt_createdBy_deprecated_incoming_outgoing_project_rev_self_updatedAt_updatedBy
0https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-256...2021-08-17T11:00:14.662Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/projects/...1https://sandbox.bluebrainnexus.io/v1/resources...2021-08-17T11:00:14.662Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...
1https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-256...2021-08-23T09:12:24.049Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/projects/...1https://sandbox.bluebrainnexus.io/v1/resources...2021-08-23T09:12:24.049Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...
2https://sandbox.bluebrainnexus.io/v1/resources...Personhttps://sandbox.bluebrainnexus.io/v1/projects/...DataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes506https://sandbox.bluebrainnexus.io/v1/files/git...SHA-256...2021-08-23T09:18:43.327Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...Falsehttps://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/resources...https://sandbox.bluebrainnexus.io/v1/projects/...1https://sandbox.bluebrainnexus.io/v1/resources...2021-08-23T09:18:43.327Zhttps://sandbox.bluebrainnexus.io/v1/realms/gi...
\n", "

3 rows × 25 columns

\n", "
" ], "text/plain": [ " id type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... Person \n", "\n", " _schemaProject distribution.type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "1 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "2 https://sandbox.bluebrainnexus.io/v1/projects/... DataDownload \n", "\n", " distribution.atLocation.type \\\n", "0 Location \n", "1 Location \n", "2 Location \n", "\n", " distribution.atLocation.store.id \\\n", "0 https://bluebrain.github.io/nexus/vocabulary/d... \n", "1 https://bluebrain.github.io/nexus/vocabulary/d... \n", "2 https://bluebrain.github.io/nexus/vocabulary/d... \n", "\n", " distribution.contentSize.unitCode distribution.contentSize.value \\\n", "0 bytes 506 \n", "1 bytes 506 \n", "2 bytes 506 \n", "\n", " distribution.contentUrl \\\n", "0 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "1 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "2 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "\n", " distribution.digest.algorithm ... _createdAt \\\n", "0 SHA-256 ... 2021-08-17T11:00:14.662Z \n", "1 SHA-256 ... 2021-08-23T09:12:24.049Z \n", "2 SHA-256 ... 2021-08-23T09:18:43.327Z \n", "\n", " _createdBy _deprecated \\\n", "0 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "1 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "2 https://sandbox.bluebrainnexus.io/v1/realms/gi... False \n", "\n", " _incoming \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... \n", "\n", " _outgoing \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... \n", "\n", " _project _rev \\\n", "0 https://sandbox.bluebrainnexus.io/v1/projects/... 1 \n", "1 https://sandbox.bluebrainnexus.io/v1/projects/... 1 \n", "2 https://sandbox.bluebrainnexus.io/v1/projects/... 1 \n", "\n", " _self \\\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... \n", "\n", " _updatedAt _updatedBy \n", "0 2021-08-17T11:00:14.662Z https://sandbox.bluebrainnexus.io/v1/realms/gi... \n", "1 2021-08-23T09:12:24.049Z https://sandbox.bluebrainnexus.io/v1/realms/gi... \n", "2 2021-08-23T09:18:43.327Z https://sandbox.bluebrainnexus.io/v1/realms/gi... \n", "\n", "[3 rows x 25 columns]" ] }, "execution_count": 174, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources, store_metadata=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### ElasticSearch Endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Search for resources of type Person and retrieve their ids and names.\n", "\n", "filters = {\"@type\": \"http://schema.org/Person\"}\n", "resources = forge.search(filters, limit=3, search_endpoint='elastic', debug=True, includes=[\"@id\",\"name\"]) # fields can also be excluded with 'excludes'" ] }, { "cell_type": "code", "execution_count": 194, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 194, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": 195, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 195, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": 196, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@idname
0https://sandbox.bluebrainnexus.io/v1/resources...Jane Doe
1https://sandbox.bluebrainnexus.io/v1/resources...Jane Doe
2https://sandbox.bluebrainnexus.io/v1/resources...Jane Doe
\n", "
" ], "text/plain": [ " @id name\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... Jane Doe\n", "1 https://sandbox.bluebrainnexus.io/v1/resources... Jane Doe\n", "2 https://sandbox.bluebrainnexus.io/v1/resources... Jane Doe" ] }, "execution_count": 196, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources, store_metadata=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Crossbucket search\n", "It is possible to search for resources stored in buckets different than the configured one. The configured store should of course support it." ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [], "source": [ "resources = forge.search(p.type.id == \"Association\", limit=3, cross_bucket=True) # cross_bucket defaults to False" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtype_schemaProjectagent.typeagent.gender.idagent.gender.typeagent.gender.labelagent.namedistribution.typedistribution.atLocation.typedistribution.atLocation.store.iddistribution.contentSize.unitCodedistribution.contentSize.valuedistribution.contentUrldistribution.digest.algorithmdistribution.digest.valuedistribution.encodingFormatdistribution.namename
0https://kg.example.ch/associations/123Associationhttps://sandbox.bluebrainnexus.io/v1/projects/...Personhttp://purl.obolibrary.org/obo/PATO_0000383LabeledOntologyEntityfemaleMarie CurieDataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes46.0https://sandbox.bluebrainnexus.io/v1/files/git...SHA-256e0fe65f725bf28fe2b88c7bafb51fb5ef1df0ab14c68a3...text/plainmarie_curie.txtCurie Association
1https://sandbox.bluebrainnexus.io/v1/resources...Associationhttps://sandbox.bluebrainnexus.io/v1/projects/...Personhttp://purl.obolibrary.org/obo/PATO_0000384LabeledOntologyEntitymaleAlbert EinsteinDataDownloadLocationhttps://bluebrain.github.io/nexus/vocabulary/d...bytes50.0https://sandbox.bluebrainnexus.io/v1/files/git...SHA-25691a5ce5c84dc5bead730a4b49d0698b4aaef4bc06ce164...text/plainalbert_einstein.txtEinstein Association
2https://sandbox.bluebrainnexus.io/v1/resources...Associationhttps://sandbox.bluebrainnexus.io/v1/projects/...PersonNaNNaNNaNJane DoeNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " id type \\\n", "0 https://kg.example.ch/associations/123 Association \n", "1 https://sandbox.bluebrainnexus.io/v1/resources... Association \n", "2 https://sandbox.bluebrainnexus.io/v1/resources... Association \n", "\n", " _schemaProject agent.type \\\n", "0 https://sandbox.bluebrainnexus.io/v1/projects/... Person \n", "1 https://sandbox.bluebrainnexus.io/v1/projects/... Person \n", "2 https://sandbox.bluebrainnexus.io/v1/projects/... Person \n", "\n", " agent.gender.id agent.gender.type \\\n", "0 http://purl.obolibrary.org/obo/PATO_0000383 LabeledOntologyEntity \n", "1 http://purl.obolibrary.org/obo/PATO_0000384 LabeledOntologyEntity \n", "2 NaN NaN \n", "\n", " agent.gender.label agent.name distribution.type \\\n", "0 female Marie Curie DataDownload \n", "1 male Albert Einstein DataDownload \n", "2 NaN Jane Doe NaN \n", "\n", " distribution.atLocation.type \\\n", "0 Location \n", "1 Location \n", "2 NaN \n", "\n", " distribution.atLocation.store.id \\\n", "0 https://bluebrain.github.io/nexus/vocabulary/d... \n", "1 https://bluebrain.github.io/nexus/vocabulary/d... \n", "2 NaN \n", "\n", " distribution.contentSize.unitCode distribution.contentSize.value \\\n", "0 bytes 46.0 \n", "1 bytes 50.0 \n", "2 NaN NaN \n", "\n", " distribution.contentUrl \\\n", "0 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "1 https://sandbox.bluebrainnexus.io/v1/files/git... \n", "2 NaN \n", "\n", " distribution.digest.algorithm \\\n", "0 SHA-256 \n", "1 SHA-256 \n", "2 NaN \n", "\n", " distribution.digest.value \\\n", "0 e0fe65f725bf28fe2b88c7bafb51fb5ef1df0ab14c68a3... \n", "1 91a5ce5c84dc5bead730a4b49d0698b4aaef4bc06ce164... \n", "2 NaN \n", "\n", " distribution.encodingFormat distribution.name name \n", "0 text/plain marie_curie.txt Curie Association \n", "1 text/plain albert_einstein.txt Einstein Association \n", "2 NaN NaN NaN " ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Furthermore it is possible to filter by bucket when cross_bucket is set to True. Setting a bucket value when cross_bucket is False will trigger a not_supported exception.\n", "resources = forge.search(p.type.id == \"Person\", limit=3, cross_bucket=True, bucket=) # add a bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.as_dataframe(resources)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Graph traversing\n", "\n", "SPARQL is used as a query language to perform graph traversing.\n", "\n", "Nexus Forge implements a SPARQL query rewriting strategy leveraging a configured RDFModel that lets users write SPARQL queries without adding prefix declarations, prefix names or long IRIs. With this strategy, only provides type and property names can be provided.\n", "\n", "Please refer to the [Modeling.ipynb](11%20-%20Modeling.ipynb) notebook to learn about templates." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: DemoStore doesn't implement SPARQL operations yet. Please use another store for this section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel." ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [], "source": [ "jane = Resource(type=\"Person\", name=\"Jane Doe\")\n", "contribution_jane = Resource(type=\"Contribution\", agent=jane)" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [], "source": [ "john = Resource(type=\"Person\", name=\"John Smith\")\n", "contribution_john = Resource(type=\"Contribution\", agent=john)" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [], "source": [ "association = Resource(type=\"Dataset\", contribution=[contribution_jane, contribution_john])" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " _register_one\n", " True\n" ] } ], "source": [ "forge.register(association)" ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " id: \"\"\n", " type:\n", " {\n", " id: \"\"\n", " }\n", " annotation:\n", " {\n", " id: \"\"\n", " type: Annotation\n", " hasBody:\n", " {\n", " id: \"\"\n", " type:\n", " {\n", " id: \"\"\n", " }\n", " label: \"\"\n", " note: \"\"\n", " }\n", " hasTarget:\n", " {\n", " id: \"\"\n", " type: AnnotationTarget\n", " }\n", " note: \"\"\n", " }\n", " brainLocation:\n", " {\n", " id: \"\"\n", " type: BrainLocation\n", " atlasSpatialReferenceSystem:\n", " {\n", " id: \"\"\n", " type: AtlasSpatialReferenceSystem\n", " }\n", " brainRegion:\n", " {\n", " id: \"\"\n", " label: \"\"\n", " }\n", " coordinatesInBrainAtlas:\n", " {\n", " id: \"\"\n", " valueX: 0.0\n", " valueY: 0.0\n", " valueZ: 0.0\n", " }\n", " coordinatesInSlice:\n", " {\n", " spatialReferenceSystem:\n", " {\n", " id: \"\"\n", " type: SpatialReferenceSystem\n", " }\n", " valueX: 0.0\n", " valueY: 0.0\n", " valueZ: 0.0\n", " }\n", " distanceToBoundary:\n", " {\n", " boundary:\n", " {\n", " id: \"\"\n", " label: \"\"\n", " }\n", " distance:\n", " {\n", " unitCode: \"\"\n", " value:\n", " [\n", " 0.0\n", " 0\n", " ]\n", " }\n", " }\n", " layer:\n", " {\n", " id: \"\"\n", " label: \"\"\n", " }\n", " longitudinalAxis:\n", " [\n", " Dorsal\n", " Ventral\n", " ]\n", " positionInLayer:\n", " [\n", " Deep\n", " Superficial\n", " ]\n", " }\n", " contribution:\n", " {\n", " id: \"\"\n", " }\n", " distribution:\n", " {\n", " id: \"\"\n", " type: DataDownload\n", " contentSize:\n", " {\n", " unitCode: \"\"\n", " value:\n", " [\n", " 0.0\n", " 0\n", " ]\n", " }\n", " digest:\n", " {\n", " algorithm: \"\"\n", " value: \"\"\n", " }\n", " encodingFormat: \"\"\n", " license: \"\"\n", " name: \"\"\n", " }\n", " objectOfStudy:\n", " {\n", " id: \"\"\n", " type: ObjectOfStudy\n", " }\n", " releaseDate: 9999-12-31T00:00:00\n", " subject:\n", " {\n", " id: \"\"\n", " type: Subject\n", " }\n", "}\n" ] } ], "source": [ "forge.template(\"Dataset\") # Templates help know which property to use when writing a query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prefix and namespace free SPARQL query\n", "\n", "When a forge RDFModel is configured, then there is no need to provide prefixes and namespaces when writing a SPARQL query. Prefixes and namespaces will be automatically inferred from the provided schemas and/or JSON-LD context and the query rewritten accordingly." ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", " SELECT ?id ?name\n", " WHERE {\n", " ?id a Dataset ;\n", " contribution/agent ?contributor.\n", " ?contributor name ?name.\n", " }\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [], "source": [ "resources = forge.sparql(query, limit=3)" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "kgforge.core.resource.Resource" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources[0])" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idname
0https://sandbox.bluebrainnexus.io/v1/resources...John Smith
1https://sandbox.bluebrainnexus.io/v1/resources...Jane Doe
2https://sandbox.bluebrainnexus.io/v1/resources...John Smith
\n", "
" ], "text/plain": [ " id name\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... John Smith\n", "1 https://sandbox.bluebrainnexus.io/v1/resources... Jane Doe\n", "2 https://sandbox.bluebrainnexus.io/v1/resources... John Smith" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### display rewritten SPARQL query " ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Submitted query:\n", " PREFIX dc: \n", " PREFIX dcat: \n", " PREFIX dcterms: \n", " PREFIX mba: \n", " PREFIX nsg: \n", " PREFIX owl: \n", " PREFIX prov: \n", " PREFIX rdf: \n", " PREFIX rdfs: \n", " PREFIX schema: \n", " PREFIX sh: \n", " PREFIX shsh: \n", " PREFIX skos: \n", " PREFIX vann: \n", " PREFIX void: \n", " PREFIX xsd: \n", " PREFIX : \n", " \n", " SELECT ?id ?name\n", " WHERE {\n", " ?id a schema:Dataset ;\n", " nsg:contribution/prov:agent ?contributor.\n", " ?contributor schema:name ?name.\n", " }\n", "\n" ] } ], "source": [ "resources = forge.sparql(query, limit=3, debug=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Full SPARQL query\n", "\n", "regular SPARQL query can also be provided." ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "PREFIX dc: \n", " PREFIX dcat: \n", " PREFIX dcterms: \n", " PREFIX mba: \n", " PREFIX nsg: \n", " PREFIX owl: \n", " PREFIX prov: \n", " PREFIX rdf: \n", " PREFIX rdfs: \n", " PREFIX schema: \n", " PREFIX sh: \n", " PREFIX shsh: \n", " PREFIX skos: \n", " PREFIX vann: \n", " PREFIX void: \n", " PREFIX xsd: \n", " PREFIX : \n", " SELECT ?id ?name\n", " WHERE {\n", " ?id a schema:Dataset ;\n", " nsg:contribution/prov:agent ?contributor.\n", " ?contributor schema:name ?name.\n", " }\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [], "source": [ "resources = forge.sparql(query, limit=3)" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "kgforge.core.resource.Resource" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources[0])" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idname
0https://sandbox.bluebrainnexus.io/v1/resources...John Smith
1https://sandbox.bluebrainnexus.io/v1/resources...Jane Doe
2https://sandbox.bluebrainnexus.io/v1/resources...John Smith
\n", "
" ], "text/plain": [ " id name\n", "0 https://sandbox.bluebrainnexus.io/v1/resources... John Smith\n", "1 https://sandbox.bluebrainnexus.io/v1/resources... Jane Doe\n", "2 https://sandbox.bluebrainnexus.io/v1/resources... John Smith" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ElasticSearch DSL Query\n", "\n", "ElasticSearch DSL can be used as a query language search for resources provided that the configured store supports it. The 'BlueBrainNexusStore' supports ElasticSearch." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: DemoStore doesn't implement ElasaticSearch DSL operations." ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [], "source": [ "jane = Resource(type=\"Person\", name=\"Jane Doe\")\n", "contribution_jane = Resource(type=\"Contribution\", agent=jane)" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [], "source": [ "john = Resource(type=\"Person\", name=\"John Smith\")\n", "contribution_john = Resource(type=\"Contribution\", agent=john)" ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [], "source": [ "association = Resource(type=\"Dataset\", contribution=[contribution_jane, contribution_john])" ] }, { "cell_type": "code", "execution_count": 128, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " _register_one\n", " True\n" ] } ], "source": [ "forge.register(association)" ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " id: \"\"\n", " type:\n", " {\n", " id: \"\"\n", " }\n", " annotation:\n", " {\n", " id: \"\"\n", " type: Annotation\n", " hasBody:\n", " {\n", " id: \"\"\n", " type:\n", " {\n", " id: \"\"\n", " }\n", " label: \"\"\n", " note: \"\"\n", " }\n", " hasTarget:\n", " {\n", " id: \"\"\n", " type: AnnotationTarget\n", " }\n", " note: \"\"\n", " }\n", " brainLocation:\n", " {\n", " id: \"\"\n", " type: BrainLocation\n", " atlasSpatialReferenceSystem:\n", " {\n", " id: \"\"\n", " type: AtlasSpatialReferenceSystem\n", " }\n", " brainRegion:\n", " {\n", " id: \"\"\n", " label: \"\"\n", " }\n", " coordinatesInBrainAtlas:\n", " {\n", " id: \"\"\n", " valueX: 0.0\n", " valueY: 0.0\n", " valueZ: 0.0\n", " }\n", " coordinatesInSlice:\n", " {\n", " spatialReferenceSystem:\n", " {\n", " id: \"\"\n", " type: SpatialReferenceSystem\n", " }\n", " valueX: 0.0\n", " valueY: 0.0\n", " valueZ: 0.0\n", " }\n", " distanceToBoundary:\n", " {\n", " boundary:\n", " {\n", " id: \"\"\n", " label: \"\"\n", " }\n", " distance:\n", " {\n", " unitCode: \"\"\n", " value:\n", " [\n", " 0.0\n", " 0\n", " ]\n", " }\n", " }\n", " layer:\n", " {\n", " id: \"\"\n", " label: \"\"\n", " }\n", " longitudinalAxis:\n", " [\n", " Dorsal\n", " Ventral\n", " ]\n", " positionInLayer:\n", " [\n", " Deep\n", " Superficial\n", " ]\n", " }\n", " contribution:\n", " {\n", " id: \"\"\n", " }\n", " distribution:\n", " {\n", " id: \"\"\n", " type: DataDownload\n", " contentSize:\n", " {\n", " unitCode: \"\"\n", " value:\n", " [\n", " 0.0\n", " 0\n", " ]\n", " }\n", " digest:\n", " {\n", " algorithm: \"\"\n", " value: \"\"\n", " }\n", " encodingFormat: \"\"\n", " license: \"\"\n", " name: \"\"\n", " }\n", " objectOfStudy:\n", " {\n", " id: \"\"\n", " type: ObjectOfStudy\n", " }\n", " releaseDate: 9999-12-31T00:00:00\n", " subject:\n", " {\n", " id: \"\"\n", " type: Subject\n", " }\n", "}\n" ] } ], "source": [ "forge.template(\"Dataset\") # Templates help know which property to use when writing a query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plain ElasticSearch DSL" ] }, { "cell_type": "code", "execution_count": 155, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", " {\n", " \"_source\": {\n", " \"includes\": [\n", " \"@id\",\n", " \"name\"\n", " ]\n", " },\n", " \"query\": {\n", " \"term\": {\n", " \"@type\": \"http://schema.org/Dataset\"\n", " }\n", " }\n", " }\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 156, "metadata": {}, "outputs": [], "source": [ "resources = forge.elastic(query, limit=3) # limit and offset (when provided in this method call) superseed 'size' and 'from' values provided in the query" ] }, { "cell_type": "code", "execution_count": 157, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 157, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": 158, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 158, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(resources)" ] }, { "cell_type": "code", "execution_count": 159, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "kgforge.core.resource.Resource" ] }, "execution_count": 159, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(resources[0])" ] }, { "cell_type": "code", "execution_count": 160, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@idname
0https://bbp.epfl.ch/neurosciencegraph/data/neu...Scnn1a-Tg3-Cre;Ai14-187849.06.01.01
1https://bbp.epfl.ch/neurosciencegraph/data/neu...H17.06.004.11.05.04
2https://bbp.epfl.ch/neurosciencegraph/data/neu...H16.06.009.01.01.15.01
\n", "
" ], "text/plain": [ " @id \\\n", "0 https://bbp.epfl.ch/neurosciencegraph/data/neu... \n", "1 https://bbp.epfl.ch/neurosciencegraph/data/neu... \n", "2 https://bbp.epfl.ch/neurosciencegraph/data/neu... \n", "\n", " name \n", "0 Scnn1a-Tg3-Cre;Ai14-187849.06.01.01 \n", "1 H17.06.004.11.05.04 \n", "2 H16.06.009.01.01.15.01 " ] }, "execution_count": 160, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forge.as_dataframe(resources)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: DemoStore doesn't implement file operations yet. Please use another store for this section." ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [], "source": [ "jane = Resource(type=\"Person\", name=\"Jane Doe\")" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "associations.tsv\n", "my_data.xwz\n", "my_data_derived.txt\n", "persons.csv\n", "tfidfvectorizer_model_schemaorg_linking\n" ] } ], "source": [ "! ls -p ../../data | egrep -v /$" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [], "source": [ "distribution = forge.attach(\"../../data\")" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [], "source": [ "association = Resource(type=\"Association\", agent=jane, distribution=distribution)" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " _register_one\n", " True\n" ] } ], "source": [ "forge.register(association)" ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [], "source": [ "# The argument overwrite: bool can be provided to decide whether to overwrite (True) existing files with the same name or\n", "# to create new ones (False) with their names suffixed with a timestamp.\n", "# A cross_bucket argument can be provided to download data from the configured bucket (cross_bucket=False - the default value) \n", "# or from a bucket different than the configured one (cross_bucket=True). The configured store should support crossing buckets for this to work.\n", "forge.download(association, \"distribution.contentUrl\", \"./downloaded/\")" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 440\n", "-rw-r--r-- 1 mfsy staff 477 Jan 7 13:51 associations.tsv\n", "-rw-r--r-- 1 mfsy staff 16 Jan 7 13:51 my_data.xwz\n", "-rw-r--r-- 1 mfsy staff 24 Jan 7 13:51 my_data_derived.txt\n", "-rw-r--r-- 1 mfsy staff 52 Jan 7 13:51 persons.csv\n", "-rw-r--r-- 1 mfsy staff 204848 Jan 7 13:51 tfidfvectorizer_model_schemaorg_linking\n" ] } ], "source": [ "! ls -l ./downloaded/" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [], "source": [ "#! rm -R ./downloaded/" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.7 (nexusforgelatest)", "language": "python", "name": "nexusforgelatest" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }