{ "cells": [ { "cell_type": "markdown", "id": "20a69830e031dd42", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "# LinkML Metamodel Mappings\n", "\n", "The primary use case of LinkML is to map *data* to *data*. However, because the LinkML metamodel is\n", "expressed as a LinkML schema, it is possible to map to and from the metamodel itself.\n", "\n", "__NOTE__ this workflow is not yet fully matured, and is subject to change." ] }, { "cell_type": "markdown", "id": "926150383baeb0f1", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "## Creation of ad-hoc metamodel\n", "\n", "Let's assume that we have data represented using models conforming to an ad-hoc metamodel with core constructs: `Schema`, `Table`, and `Column`. We can create a LinkML schema that represents this metamodel as follows:" ] }, { "cell_type": "code", "execution_count": 1, "id": "569394bf0a0aa7c4", "metadata": { "ExecuteTime": { "end_time": "2024-04-11T02:15:26.923272Z", "start_time": "2024-04-11T02:15:26.737620Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "name: test-schema\n", "id: http://example.org/test-schema\n", "default_prefix: http://example.org/test-schema/\n", "classes:\n", " Schema:\n", " attributes:\n", " id:\n", " description: The name of the schema\n", " required: true\n", " tables:\n", " multivalued: true\n", " range: Table\n", " inlined_as_list: true\n", " tree_root: true\n", " Table:\n", " attributes:\n", " id:\n", " description: The name of the table\n", " required: true\n", " columns:\n", " multivalued: true\n", " range: Column\n", " inlined_as_list: true\n", " Column:\n", " attributes:\n", " id:\n", " description: The name of the column\n", " required: true\n", " primary_key:\n", " range: boolean\n", " datatype:\n", " range: string\n", "prefixes: {}\n" ] } ], "source": [ "import yaml\n", "from linkml_runtime.linkml_model import SlotDefinition\n", "from linkml.utils.schema_builder import SchemaBuilder\n", "\n", "sb = SchemaBuilder()\n", "sb.add_class(\"Schema\", \n", " tree_root=True,\n", " slots=[\n", " SlotDefinition(\"id\", required=True, description=\"The name of the schema\"),\n", " SlotDefinition(\"tables\", range=\"Table\", multivalued=True, inlined_as_list=True),\n", " ],\n", " use_attributes=True,\n", " )\n", "sb.add_class(\"Table\",\n", " slots=[\n", " SlotDefinition(\"id\", required=True, description=\"The name of the table\"),\n", " SlotDefinition(\"columns\", range=\"Column\", multivalued=True, inlined_as_list=True),\n", " ],\n", " use_attributes=True,\n", " )\n", "sb.add_class(\"Column\",\n", " slots=[\n", " SlotDefinition(\"id\", required=True, description=\"The name of the column\"),\n", " SlotDefinition(\"primary_key\", range=\"boolean\"),\n", " SlotDefinition(\"datatype\", range=\"string\"),\n", " ],\n", " use_attributes=True,\n", " )\n", "\n", "my_metamodel = sb.as_dict()\n", "print(yaml.dump(sb.as_dict(), sort_keys=False))" ] }, { "cell_type": "markdown", "id": "504e5e995a56d9d", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "## Example schema conforming to ad-hoc metamodel\n", "\n", "Now we'll make an example schema that conforms to this metamodel; this will be a fairly boring schema with a single table `Person` with three columns: `id`, `name`, and `description`:" ] }, { "cell_type": "code", "execution_count": 2, "id": "2412c7508be85763", "metadata": { "ExecuteTime": { "end_time": "2024-04-11T02:15:26.927278Z", "start_time": "2024-04-11T02:15:26.925061Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "my_schema = {\n", " \"id\": \"my_schema\",\n", " \"tables\": [\n", " {\n", " \"id\": \"Person\",\n", " \"columns\": [\n", " {\n", " \"id\": \"id\",\n", " \"primary_key\": True,\n", " \"datatype\": \"integer\"\n", " },\n", " {\n", " \"id\": \"name\",\n", " \"datatype\": \"string\"\n", " },\n", " {\n", " \"id\": \"description\",\n", " \"datatype\": \"string\"\n", " }\n", " ]\n", " },\n", " ]\n", "}" ] }, { "cell_type": "markdown", "id": "a97465ad75c14ab9", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "## Mapping to LinkML metamodel\n", "\n", "Now we'll create mappings from the ad-hoc metamodel to the LinkML metamodel, where **Table** maps to a LinkML **ClassDefinition**, **Column** maps to a LinkML **SlotDefinition*." ] }, { "cell_type": "code", "execution_count": 3, "id": "781147edebbfeb9c", "metadata": { "ExecuteTime": { "end_time": "2024-04-11T02:15:26.930857Z", "start_time": "2024-04-11T02:15:26.929165Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "metamap = {\n", " \"class_derivations\": {\n", " \"SchemaDefinition\": {\n", " \"populated_from\": \"Schema\",\n", " \"slot_derivations\": {\n", " \"name\": {\n", " \"populated_from\": \"id\",\n", " },\n", " \"id\": {\n", " \"expr\": \"'https://example.org/' + id\",\n", " },\n", " \"classes\": {\n", " \"populated_from\": \"tables\",\n", " \"dictionary_key\": \"name\",\n", " \"cast_collection_as\": \"MultiValuedDict\",\n", " },\n", " }\n", " },\n", " \"ClassDefinition\": {\n", " \"populated_from\": \"Table\",\n", " \"slot_derivations\": {\n", " \"name\": {\n", " \"populated_from\": \"id\",\n", " },\n", " \"attributes\": {\n", " \"populated_from\": \"columns\",\n", " \"dictionary_key\": \"name\",\n", " \"cast_collection_as\": \"MultiValuedDict\",\n", " },\n", " }\n", " },\n", " \"SlotDefinition\": {\n", " \"populated_from\": \"Column\",\n", " \"slot_derivations\": {\n", " \"name\": {\n", " \"populated_from\": \"id\",\n", " },\n", " \"identifier\": {\n", " \"populated_from\": \"primary_key\",\n", " },\n", " \"range\": {\n", " \"populated_from\": \"datatype\",\n", " },\n", " },\n", " },\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 4, "id": "df33c4db991ce4b3", "metadata": { "ExecuteTime": { "end_time": "2024-04-11T02:15:27.366482Z", "start_time": "2024-04-11T02:15:26.931843Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "from linkml_map.session import Session\n", "\n", "session = Session()" ] }, { "cell_type": "code", "execution_count": 5, "id": "e5edb3e9efa49ed1", "metadata": { "ExecuteTime": { "end_time": "2024-04-11T02:15:27.450810Z", "start_time": "2024-04-11T02:15:27.369239Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "session.set_source_schema(my_metamodel)\n", "session.set_object_transformer(metamap)" ] }, { "cell_type": "code", "execution_count": 6, "id": "5fcae9c463c7146b", "metadata": { "ExecuteTime": { "end_time": "2024-04-11T02:15:27.464442Z", "start_time": "2024-04-11T02:15:27.453655Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:linkml_map.transformer.object_transformer:Unexpected: my_schema for type Schema\n", "WARNING:linkml_map.transformer.object_transformer:Unexpected: Person for type Schema\n", "WARNING:linkml_map.transformer.object_transformer:Unexpected: id for type Schema\n", "WARNING:linkml_map.transformer.object_transformer:Unexpected: True for type boolean\n", "WARNING:linkml_map.transformer.object_transformer:Unexpected: integer for type string\n", "WARNING:linkml_map.transformer.object_transformer:Unexpected: name for type Schema\n", "WARNING:linkml_map.transformer.object_transformer:Unexpected: string for type string\n", "WARNING:linkml_map.transformer.object_transformer:Unexpected: description for type Schema\n", "WARNING:linkml_map.transformer.object_transformer:Unexpected: string for type string\n", "WARNING:linkml_map.transformer.transformer:Unknown target range SlotDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range SlotDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range SlotDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range SlotDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range SlotDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range SlotDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range SlotDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range SlotDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range SlotDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range ClassDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range ClassDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range ClassDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range ClassDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range ClassDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range ClassDefinition\n", "WARNING:linkml_map.transformer.transformer:Unknown target range ClassDefinition\n" ] }, { "data": { "text/plain": [ "{'name': 'my_schema',\n", " 'id': 'https://example.org/my_schema',\n", " 'classes': {'Person': {'attributes': {'id': {'identifier': True,\n", " 'range': 'integer'},\n", " 'name': {'identifier': None, 'range': 'string'},\n", " 'description': {'identifier': None, 'range': 'string'}}}}}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "session.transform(my_schema)" ] }, { "cell_type": "markdown", "id": "917015b3f12eac2f", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "## Customizing the LinkML model\n", "\n", "A different scenario is where you might want to customize the existing LinkML metamodel, in particular, adding additional\n", "constraints. For example:\n", "\n", "- class and names MUST be alphanumeric with no spaces\n", "- every element MUST have a definition\n", "\n", "**TODO**" ] }, { "cell_type": "code", "execution_count": 6, "id": "87c714635c108ea8", "metadata": { "ExecuteTime": { "end_time": "2024-04-11T02:15:27.470074Z", "start_time": "2024-04-11T02:15:27.466371Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 5 }