{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# JSON-LD 1.0 Context issue\n", "\n", "This notebook demonstrates the basic problem with using JSON-LD 1.0 and the curies used in the [prefixcommons](https://github.com/prefixcommons) library. For the purposes of this demo, we will use the [biocontext](https://github.com/prefixcommons/biocontext) [monarch context](https://raw.githubusercontent.com/prefixcommons/biocontext/master/registry/monarch_context.jsonld). This is a serious problem because a not-insignificant portion of the prefixcommons libraries use prefixes that end in something other than \"/\" or \"#\".\n", "\n", "This issue exists because of a fix described in https://lists.w3.org/Archives/Public/public-rdf-comments/2018Jan/0002.html . Basically, the solution in JSON-LD 1.0 is, \"if it doesn't look like a prefix, it isn't a prefix\".\n", "\n", "The `@prefix` tag was added in the [JSON-LD 1.1 specification](https://w3c.github.io/json-ld-syntax/#compact-iris) to allow one to force a any string to be treated as a prefix. This, however, currently has to be done on a per-prefix basis:\n", "```json\n", "{\n", " \"@context\" : {\n", " \"CHEBI\" : {\n", " \"@id\": \"http://purl.obolibrary.org/obo/CHEBI_\",\n", " \"@prefix\": true\n", " }\n", " }\n", "}\n", "```\n", "\n", "The problem with this approach is that [prefixcommons](https://github.com/prefixcommons) library users use _both_ the raw json _and_ the rdflib json-ld parser, meaning that, unless the prefixcommons parser is enhanced to recognize the expanded format, the above fix won't work.\n", "\n", "An [issue](https://github.com/w3c/json-ld-syntax/issues/329) has been filed suggesting (now requesting) that `@prefix` be allowed as a default on the entire context:\n", "```json\n", "{\n", " \"@context\" : {\n", " \"@prefix\": true,\n", " \"CHEBI\" : \"http://purl.obolibrary.org/obo/CHEBI_\"\n", " ...\n", " }\n", "}\n", "```" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "pycharm": { "is_executing": true } }, "outputs": [], "source": [ "!pip install -q prefixcommons\n", "!pip install -q rdflib\n", "!pip install -q rdflib-jsonld\n", "!pip install -q jsonasobj" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Utilities" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "from contextlib import closing\n", "from typing import Optional, Dict\n", "import requests\n", "from jsonasobj import loads\n", "from prefixcommons import curie_util\n", "from rdflib import Graph\n", "\n", "def fetch_pc_context(name: str) -> Optional[str]:\n", " \"\"\"\n", " Retrive the prefixcommons JSON-LD entry for name\n", " :param name: context name\n", " :return: String representation of JSON-LD context\n", " \"\"\"\n", " url = f\"https://raw.githubusercontent.com/prefixcommons/biocontext/master/registry/{name}.jsonld\"\n", " with closing(requests.get(url, stream=False)) as resp:\n", " if resp.status_code == 200:\n", " return resp.text\n", " else:\n", " print(f\"Cannot fetch: {url}\")\n", "\n", "def prefix_for(prefixes: Dict[str, str], prefix: str) -> str:\n", " \"\"\"\n", " Format the prefix entry in prefixes\n", " :param prefixes: map from prefix to URI\n", " :param prefix: prefix to map\n", " :return: result\n", " \"\"\"\n", " if prefix in prefixes:\n", " return f'@prefix {prefix}: <{prefixes[prefix]}> .'\n", " else:\n", " return f'*prefix: {prefix} not mapped'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we load the context as a plain JSON-LD object, both the BIOGRID and CHEBI contexts are are aliases" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Entry for BIOGRID is http://thebiogrid.org/\n", "Entry for CHEBI is http://purl.obolibrary.org/obo/CHEBI_\n" ] } ], "source": [ "ctxt_str = fetch_pc_context('monarch_context')\n", "ctxt = loads(ctxt_str)\n", "print(f\"Entry for BIOGRID is {ctxt['@context'].BIOGRID}\")\n", "print(f\"Entry for CHEBI is {ctxt['@context'].CHEBI}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The prefix commons utility doesn't use the JSON-LD library, so both of the prefixes are represented" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "@prefix BIOGRID: .\n", "@prefix CHEBI: .\n" ] } ], "source": [ "curie_map = {k: v for k, v in curie_util.read_biocontext('monarch_context').items()}\n", "print(prefix_for(curie_map, 'BIOGRID'))\n", "print(prefix_for(curie_map, 'CHEBI'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we use the JSON-LD library, however, URI's that don't end in \"#\" or \"/\" are _not_ treated as prefixes (!)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "@prefix BIOGRID: .\n", "*prefix: CHEBI not mapped\n" ] } ], "source": [ "g = Graph()\n", "g.parse(data=ctxt_str, format=\"json-ld\")\n", "prefixes = {k:v for k, v in g.namespaces()}\n", "print(prefix_for(prefixes, 'BIOGRID'))\n", "print(prefix_for(prefixes, 'CHEBI'))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 1 }