{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# JSON-LD 1.0 Context issue\n",
"\n",
"This notebook demonstrates the basic problem with using JSON-LD 1.0 and the curies used in the [prefixcommons](https://github.com/prefixcommons) library. For the purposes of this demo, we will use the [biocontext](https://github.com/prefixcommons/biocontext) [monarch context](https://raw.githubusercontent.com/prefixcommons/biocontext/master/registry/monarch_context.jsonld). This is a serious problem because a not-insignificant portion of the prefixcommons libraries use prefixes that end in something other than \"/\" or \"#\".\n",
"\n",
"This issue exists because of a fix described in https://lists.w3.org/Archives/Public/public-rdf-comments/2018Jan/0002.html . Basically, the solution in JSON-LD 1.0 is, \"if it doesn't look like a prefix, it isn't a prefix\".\n",
"\n",
"The `@prefix` tag was added in the [JSON-LD 1.1 specification](https://w3c.github.io/json-ld-syntax/#compact-iris) to allow one to force a any string to be treated as a prefix. This, however, currently has to be done on a per-prefix basis:\n",
"```json\n",
"{\n",
" \"@context\" : {\n",
" \"CHEBI\" : {\n",
" \"@id\": \"http://purl.obolibrary.org/obo/CHEBI_\",\n",
" \"@prefix\": true\n",
" }\n",
" }\n",
"}\n",
"```\n",
"\n",
"The problem with this approach is that [prefixcommons](https://github.com/prefixcommons) library users use _both_ the raw json _and_ the rdflib json-ld parser, meaning that, unless the prefixcommons parser is enhanced to recognize the expanded format, the above fix won't work.\n",
"\n",
"An [issue](https://github.com/w3c/json-ld-syntax/issues/329) has been filed suggesting (now requesting) that `@prefix` be allowed as a default on the entire context:\n",
"```json\n",
"{\n",
" \"@context\" : {\n",
" \"@prefix\": true,\n",
" \"CHEBI\" : \"http://purl.obolibrary.org/obo/CHEBI_\"\n",
" ...\n",
" }\n",
"}\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"pycharm": {
"is_executing": true
}
},
"outputs": [],
"source": [
"!pip install -q prefixcommons\n",
"!pip install -q rdflib\n",
"!pip install -q rdflib-jsonld\n",
"!pip install -q jsonasobj"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Utilities"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from contextlib import closing\n",
"from typing import Optional, Dict\n",
"import requests\n",
"from jsonasobj import loads\n",
"from prefixcommons import curie_util\n",
"from rdflib import Graph\n",
"\n",
"def fetch_pc_context(name: str) -> Optional[str]:\n",
" \"\"\"\n",
" Retrive the prefixcommons JSON-LD entry for name\n",
" :param name: context name\n",
" :return: String representation of JSON-LD context\n",
" \"\"\"\n",
" url = f\"https://raw.githubusercontent.com/prefixcommons/biocontext/master/registry/{name}.jsonld\"\n",
" with closing(requests.get(url, stream=False)) as resp:\n",
" if resp.status_code == 200:\n",
" return resp.text\n",
" else:\n",
" print(f\"Cannot fetch: {url}\")\n",
"\n",
"def prefix_for(prefixes: Dict[str, str], prefix: str) -> str:\n",
" \"\"\"\n",
" Format the prefix entry in prefixes\n",
" :param prefixes: map from prefix to URI\n",
" :param prefix: prefix to map\n",
" :return: result\n",
" \"\"\"\n",
" if prefix in prefixes:\n",
" return f'@prefix {prefix}: <{prefixes[prefix]}> .'\n",
" else:\n",
" return f'*prefix: {prefix} not mapped'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When we load the context as a plain JSON-LD object, both the BIOGRID and CHEBI contexts are are aliases"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Entry for BIOGRID is http://thebiogrid.org/\n",
"Entry for CHEBI is http://purl.obolibrary.org/obo/CHEBI_\n"
]
}
],
"source": [
"ctxt_str = fetch_pc_context('monarch_context')\n",
"ctxt = loads(ctxt_str)\n",
"print(f\"Entry for BIOGRID is {ctxt['@context'].BIOGRID}\")\n",
"print(f\"Entry for CHEBI is {ctxt['@context'].CHEBI}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The prefix commons utility doesn't use the JSON-LD library, so both of the prefixes are represented"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@prefix BIOGRID: .\n",
"@prefix CHEBI: .\n"
]
}
],
"source": [
"curie_map = {k: v for k, v in curie_util.read_biocontext('monarch_context').items()}\n",
"print(prefix_for(curie_map, 'BIOGRID'))\n",
"print(prefix_for(curie_map, 'CHEBI'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When we use the JSON-LD library, however, URI's that don't end in \"#\" or \"/\" are _not_ treated as prefixes (!)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@prefix BIOGRID: .\n",
"*prefix: CHEBI not mapped\n"
]
}
],
"source": [
"g = Graph()\n",
"g.parse(data=ctxt_str, format=\"json-ld\")\n",
"prefixes = {k:v for k, v in g.namespaces()}\n",
"print(prefix_for(prefixes, 'BIOGRID'))\n",
"print(prefix_for(prefixes, 'CHEBI'))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
},
"pycharm": {
"stem_cell": {
"cell_type": "raw",
"metadata": {
"collapsed": false
},
"source": []
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}