{ "cells": [ { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:19.036357Z", "start_time": "2019-09-23T18:50:19.031896Z" } }, "source": [ "# Modeling\n", "\n", "The configured `Model` in the forge will provide predefined properties of a set of Types if may contain." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.068658Z", "start_time": "2019-09-23T18:50:19.054054Z" } }, "outputs": [], "source": [ "from kgforge.core import KnowledgeGraphForge" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "forge = KnowledgeGraphForge(\"../../configurations/demo-forge.yml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Imports" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import json" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from kgforge.core import Resource" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from kgforge.specializations.resources import Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prefixes\n", "Prefixes are namespaces that are used to put Resource properties within a context." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.455574Z", "start_time": "2019-09-23T18:50:20.443717Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Used prefixes:\n", " prov http://www.w3.org/ns/prov# \n", " rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#\n", " schema http://schema.org/ \n" ] } ], "source": [ "forge.prefixes()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Types\n", "The `type` property of a Resource can be associated to the available types in the Model. These types have a pre-defined set of properties." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.469871Z", "start_time": "2019-09-23T18:50:20.460504Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Managed entity types:\n", " - Association\n", " - Person\n" ] } ], "source": [ "forge.types()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Templates\n", "The template will provide a set of properties for the givent type that is recomended to be used when creating Resources." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### showing the properties of a type + getting the template of a Mapping for a type" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.482501Z", "start_time": "2019-09-23T18:50:20.473769Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " DemoModel does not distinguish values and constraints in templates for now.\n", " DemoModel does not automatically include nested schemas for now.\n", "{\n", " type: Person\n", " name: hasattr\n", "}\n" ] } ], "source": [ "forge.template(\"Person\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.704970Z", "start_time": "2019-09-23T18:50:20.494904Z" } }, "outputs": [], "source": [ "# forge.template(\"Person\", only_required=True)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# forge.template(\"Association\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### creating (a) Resource instance(s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### manually (JSON)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " DemoModel does not distinguish values and constraints in templates for now.\n", " DemoModel does not automatically include nested schemas for now.\n", "{\n", " \"type\": \"Person\",\n", " \"name\": \"hasattr\"\n", "}\n" ] } ], "source": [ "forge.template(\"Person\", output=\"json\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "data = {\n", " \"type\": \"Person\",\n", " \"name\": \"Jane\"\n", "}" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "resource_json = forge.from_json(data)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " type: Person\n", " name: Jane\n", "}\n" ] } ], "source": [ "print(resource_json)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### programmatically (Dict)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " DemoModel does not distinguish values and constraints in templates for now.\n", " DemoModel does not automatically include nested schemas for now.\n" ] } ], "source": [ "template = forge.template(\"Person\", output=\"dict\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "template[\"name\"] = \"Jane\"" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "resource_dict = forge.from_json(template)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " type: Person\n", " name: Jane\n", "}\n" ] } ], "source": [ "print(resource_dict)" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-09-23T14:43:14.465381Z", "start_time": "2019-09-23T14:43:14.455846Z" } }, "source": [ "## Validation\n", "It is possible to verify that a Resource is compliant with the suggested type schema available in the Model." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "jane = Resource(type=\"Person\", name=\"Jane Doe\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "john = Resource(type=\"Person\", name=\"John Smith\")" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "persons = [jane, john]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.716542Z", "start_time": "2019-09-23T18:50:20.708109Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 2\n", " _validate_one\n", " True\n" ] } ], "source": [ "forge.validate(persons)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.734693Z", "start_time": "2019-09-23T18:50:20.718924Z" } }, "outputs": [ { "data": { "text/plain": [ "Action(error=None, message=None, operation='_validate_one', succeeded=True)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jane._last_action" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.749126Z", "start_time": "2019-09-23T18:50:20.739075Z" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jane._validated" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### automatic status update" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.787796Z", "start_time": "2019-09-23T18:50:20.766286Z" } }, "outputs": [], "source": [ "jane.email = \"jane.doe@epfl.ch\"" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.832020Z", "start_time": "2019-09-23T18:50:20.811928Z" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jane._validated" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### lazy actions handling" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "type,name,distribution\n", "Person,Marie Curie,../../data/scientists-database/marie_curie.txt\n", "Person,Albert Einstein,../../data/scientists-database/albert_einstein.txt\n" ] } ], "source": [ "! cat ../../data/persons.csv" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "distribution = forge.attach(\"../../data/persons.csv\")" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "jane = Resource(type=\"Person\", name=\"Jane Doe\", distribution=distribution)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " _validate_one\n", " False\n", " ValidationError: resource has lazy actions which need to be executed before\n" ] } ], "source": [ "forge.validate(jane)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# forge.validate(jane, execute_actions_before=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### error handling" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.917767Z", "start_time": "2019-09-23T18:50:20.902559Z" } }, "outputs": [], "source": [ "mistake = Resource(type=\"Person\")" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "resource = Resource(type=\"Association\", agent=mistake)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " _validate_one\n", " False\n", " ValidationError: name is missing\n" ] } ], "source": [ "forge.validate(resource)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.986359Z", "start_time": "2019-09-23T18:50:20.948705Z" } }, "outputs": [ { "data": { "text/plain": [ "Action(error='ValidationError', message='name is missing', operation='_validate_one', succeeded=False)" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resource._last_action" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:21.016505Z", "start_time": "2019-09-23T18:50:20.998401Z" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resource._validated" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:kgforge37]", "language": "python", "name": "conda-env-kgforge37-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 4 }