{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# JSON-LD 1.0 Context issue\n",
    "\n",
    "This notebook demonstrates the basic problem with using JSON-LD 1.0 and the curies used in the [prefixcommons](https://github.com/prefixcommons) library. For the purposes of this demo, we will use the [biocontext](https://github.com/prefixcommons/biocontext) [monarch context](https://raw.githubusercontent.com/prefixcommons/biocontext/master/registry/monarch_context.jsonld).  This is a serious problem because a not-insignificant portion of the prefixcommons libraries use prefixes that end in something other than \"/\" or \"#\".\n",
    "\n",
    "This issue exists because of a fix described in https://lists.w3.org/Archives/Public/public-rdf-comments/2018Jan/0002.html .  Basically, the solution in JSON-LD 1.0 is, \"if it doesn't look like a prefix, it isn't a prefix\".\n",
    "\n",
    "The `@prefix` tag was added in the [JSON-LD 1.1 specification](https://w3c.github.io/json-ld-syntax/#compact-iris) to allow one to force a any string to be treated as a prefix.  This, however, currently has to be done on a per-prefix basis:\n",
    "```json\n",
    "{\n",
    "   \"@context\" : {\n",
    "       \"CHEBI\" : {\n",
    "          \"@id\": \"http://purl.obolibrary.org/obo/CHEBI_\",\n",
    "          \"@prefix\": true\n",
    "       }\n",
    "   }\n",
    "}\n",
    "```\n",
    "\n",
    "The problem with this approach is that [prefixcommons](https://github.com/prefixcommons) library users use _both_ the raw json _and_ the rdflib json-ld parser, meaning that, unless the prefixcommons parser is enhanced to recognize the expanded format, the above fix won't work.\n",
    "\n",
    "An [issue](https://github.com/w3c/json-ld-syntax/issues/329) has been filed suggesting (now requesting) that `@prefix` be allowed as a default on the entire context:\n",
    "```json\n",
    "{\n",
    "   \"@context\" : {\n",
    "       \"@prefix\": true,\n",
    "       \"CHEBI\" : \"http://purl.obolibrary.org/obo/CHEBI_\"\n",
    "        ...\n",
    "   }\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "pycharm": {
     "is_executing": true
    }
   },
   "outputs": [],
   "source": [
    "!pip install -q prefixcommons\n",
    "!pip install -q rdflib\n",
    "!pip install -q rdflib-jsonld\n",
    "!pip install -q jsonasobj"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Utilities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from contextlib import closing\n",
    "from typing import Optional, Dict\n",
    "import requests\n",
    "from jsonasobj import loads\n",
    "from prefixcommons import curie_util\n",
    "from rdflib import Graph\n",
    "\n",
    "def fetch_pc_context(name: str) -> Optional[str]:\n",
    "    \"\"\"\n",
    "    Retrive the prefixcommons JSON-LD entry for name\n",
    "    :param name: context name\n",
    "    :return: String representation of JSON-LD context\n",
    "    \"\"\"\n",
    "    url = f\"https://raw.githubusercontent.com/prefixcommons/biocontext/master/registry/{name}.jsonld\"\n",
    "    with closing(requests.get(url, stream=False)) as resp:\n",
    "        if resp.status_code == 200:\n",
    "            return resp.text\n",
    "        else:\n",
    "            print(f\"Cannot fetch: {url}\")\n",
    "\n",
    "def prefix_for(prefixes: Dict[str, str], prefix: str) -> str:\n",
    "    \"\"\"\n",
    "    Format the prefix entry in prefixes\n",
    "    :param prefixes: map from prefix to URI\n",
    "    :param prefix: prefix to map\n",
    "    :return: result\n",
    "    \"\"\"\n",
    "    if prefix in prefixes:\n",
    "        return f'@prefix {prefix}: <{prefixes[prefix]}> .'\n",
    "    else:\n",
    "        return f'*prefix: {prefix} not mapped'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we load the context as a plain JSON-LD object, both the BIOGRID and CHEBI contexts are are aliases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Entry for BIOGRID is http://thebiogrid.org/\n",
      "Entry for CHEBI is http://purl.obolibrary.org/obo/CHEBI_\n"
     ]
    }
   ],
   "source": [
    "ctxt_str = fetch_pc_context('monarch_context')\n",
    "ctxt = loads(ctxt_str)\n",
    "print(f\"Entry for BIOGRID is {ctxt['@context'].BIOGRID}\")\n",
    "print(f\"Entry for CHEBI is {ctxt['@context'].CHEBI}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The prefix commons utility doesn't use the JSON-LD library, so both of the prefixes are represented"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "@prefix BIOGRID: <http://thebiogrid.org/> .\n",
      "@prefix CHEBI: <http://purl.obolibrary.org/obo/CHEBI_> .\n"
     ]
    }
   ],
   "source": [
    "curie_map = {k: v for k, v in curie_util.read_biocontext('monarch_context').items()}\n",
    "print(prefix_for(curie_map, 'BIOGRID'))\n",
    "print(prefix_for(curie_map, 'CHEBI'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we use the JSON-LD library, however, URI's that don't end in \"#\"  or \"/\" are _not_ treated as prefixes (!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "@prefix BIOGRID: <http://thebiogrid.org/> .\n",
      "*prefix: CHEBI not mapped\n"
     ]
    }
   ],
   "source": [
    "g = Graph()\n",
    "g.parse(data=ctxt_str, format=\"json-ld\")\n",
    "prefixes = {k:v for k, v in g.namespaces()}\n",
    "print(prefix_for(prefixes, 'BIOGRID'))\n",
    "print(prefix_for(prefixes, 'CHEBI'))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}