{ "cells": [ { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:19.036357Z", "start_time": "2019-09-23T18:50:19.031896Z" } }, "source": [ "# Mapping\n", "\n", "[Mappings](https://nexus-forge.readthedocs.io/en/latest/interaction.html#mapping) are pre-defined and declarative rules that encode the logic on how to transform a specific data source into [Resources](https://nexus-forge.readthedocs.io/en/latest/interaction.html#resource) that eventually conform to targeted schemas supported by the configured [Model](https://nexus-forge.readthedocs.io/en/latest/interaction.html#modeling). \n", "\n", "This notebook specifically demonstrates the `DictionaryMapping` wich is based on a JSON structure that represent a targeted structure along with Python code that will apply desired transformations on the data source." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.068658Z", "start_time": "2019-09-23T18:50:19.054054Z" } }, "outputs": [], "source": [ "from kgforge.core import KnowledgeGraphForge" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook [00-Initialization.ipynb](00%20-%20Initialization.ipynb)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "forge = KnowledgeGraphForge(\"../../configurations/forge.yml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from kgforge.core import Resource" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from kgforge.specializations.mappings import DictionaryMapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "scientists = [\n", " {\n", " \"id\": 123,\n", " \"name\": \"Marie Curie\",\n", " \"gender\": \"female\",\n", " \"middle_name\": \"Salomea\",\n", " },\n", " {\n", " \"id\": 456,\n", " \"name\": \"Albert Einstein\",\n", " \"gender\": \"male\",\n", " \"middle_name\": \"(missing)\",\n", " },\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mapping data to a targeted template" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### basics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: DemoModel and RdfModel schemas have not been synchronized yet. The following cell is to be run with DemoModel." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Managed entity types:\n", " - Activity\n", " - Contribution\n", " - Dataset\n", " - Entity\n", " - Ontology\n", " - Person\n" ] } ], "source": [ "forge.types()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " id: \"\"\n", "}\n" ] } ], "source": [ "forge.template(\"Contribution\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "mapping_simple = DictionaryMapping(\"\"\"\n", " type: Contribution\n", " agent:\n", " {\n", " type: Person\n", " name: x.name\n", " }\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "resources_simple = forge.map(scientists, mapping_simple)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " type: Contribution\n", " agent:\n", " {\n", " type: Person\n", " name: Marie Curie\n", " }\n", "}\n" ] } ], "source": [ "print(resources_simple[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### missing values" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "mapping_na = DictionaryMapping(\"\"\"\n", " type: Contribution\n", " agent:\n", " {\n", " type: Person\n", " name: x.name\n", " additionalName: x.middle_name\n", " }\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " type: Contribution\n", " agent:\n", " {\n", " type: Person\n", " additionalName: (missing)\n", " name: Albert Einstein\n", " }\n", "}\n" ] } ], "source": [ "print(forge.map(scientists[1], mapping_na))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " type: Contribution\n", " agent:\n", " {\n", " type: Person\n", " name: Albert Einstein\n", " }\n", "}\n" ] } ], "source": [ "print(forge.map(scientists[1], mapping_na, na=\"(missing)\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### multiple mappings" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "mapping_person = DictionaryMapping(\"\"\"\n", " id: forge.format(\"identifier\", \"persons\", x.id)\n", " type: Person\n", " name: x.name\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "mapping_association = DictionaryMapping(\"\"\"\n", " type: Contribution\n", " agent: forge.format(\"identifier\", \"persons\", x.id)\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "resources_graph = forge.map(scientists, [mapping_person, mapping_association])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " id: https://kg.example.ch/persons/123\n", " type: Person\n", " name: Marie Curie\n", "}\n" ] } ], "source": [ "print(resources_graph[0])" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " type: Contribution\n", " agent: https://kg.example.ch/persons/123\n", "}\n" ] } ], "source": [ "print(resources_graph[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### managed mappings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: RdfModel doesn't implement managed mappings operations yet. Please use DemoModel for this section." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.sources()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.mappings(\"scientists-database\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mapping = forge.mapping(\"Contribution\", \"scientists-database\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "resources = forge.map(scientists, mapping, na=\"(missing)\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(resources)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(resources[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(mapping)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(resources[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Managing mappings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: RdfModel doesn't implement managed mappings operations yet. Please use DemoModel for this section." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "filepath = \"mappings/scientists-database/DictionaryMapping/Contribution.hjson\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### saving" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mapping.save(filepath)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'mappings/scientists-database/DictionaryMapping/Contribution.hjson'" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "filepath" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### tracking & sharing changes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ! cd mappings" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ! git add Contribution.hjson" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ! git commit -m \"Add Association mapping\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ! git push" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### loading" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "loaded = DictionaryMapping.load(filepath)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# loaded == mapping" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ! rm -R ./mappings/" ] } ], "metadata": { "kernelspec": { "display_name": "Python (nexusforgelatest)", "language": "python", "name": "nexusforgelatest" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }