{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CWPK \\#44: Annotation Ingest\n",
    "=======================================\n",
    "\n",
    "More Fields, But Less Complexity\n",
    "--------------------------\n",
    "\n",
    "<div style=\"float: left; width: 305px; margin-right: 10px;\">\n",
    "\n",
    "<img src=\"http://kbpedia.org/cwpk-files/cooking-with-kbpedia-305.png\" title=\"Cooking with KBpedia\" width=\"305\" />\n",
    "\n",
    "</div>\n",
    "\n",
    "We now tackle the ingest of annotations for classes and properties in this installment of the [*Cooking with Python and KBpedia*](https://www.mkbergman.com/cooking-with-python-and-kbpedia/) series. In prior installments we built the structural aspects of [KBpedia](https://kbpedia.org/). We now add the labels, definitions, and other assignments to them.\n",
    "\n",
    "As with the extraction routines, we will split these efforts into class annotations and then property annotations. Our actual load routines are fairly straightforward, and we have no real logic concerns in how these annotations get added. The most complex wrinkle we will need to address are those annotation fields, <code>altLabels</code> and <code>notes</code> in particular, where we have potentially many assignments for a single reference concept (RC) or property. Like we saw with the extraction routines, for these items we will need to set up additional internal loops to segregate and assign the items for loading based on our standard double-pipe ('||') delimiter.\n",
    "\n",
    "The two functions we develop in this installment, <code>class_annot_builder</code> and <code>prop_annot_builder</code> will be added to the <code>build.py</code> module."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Start-up\n",
    "\n",
    "Since we are in an active part of the build cycle, we want to continue with our main knowledge graph in-progress for our load routine, so please make sure that <code>kb_src</code> is set to 'standard' in your <code>config.py</code> configuration. We then invoke our standard start-up:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from cowpoke.__main__ import *\n",
    "from cowpoke.config import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading Class Annotations\n",
    "Class annotations consist of potentially the item's <code>prefLabel</code>, <code>altLabels</code>, <code>definition</code>, and <code>editorialNote</code>. The first item is mandatory, the next two should be provided to adhere to best practices. The last is optional. There are, of course, other standard annotations possible. Should your own conventions require or encourage them, you will likely need to modify the procedure below to account for that fact.\n",
    "\n",
    "As with these methods before, we provide a header showing 'typical' configuration settings (in <code>config.py</code>), and then proceed with a method that loops through all of the rows in the input file. Here is the basic class annotation build procedure. There are no new wrinkles in this routine from what has been seen previously:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### KEY CONFIG SETTINGS (see build_deck in config.py) ###                  \n",
    "# 'kb_src'        : 'standard'                                        \n",
    "# 'loop_list'     : file_dict.values(),                           # see 'in_file'\n",
    "# 'loop'          : 'class_loop',\n",
    "# 'in_file'       : 'C:/1-PythonProjects/kbpedia/v300/build_ins/classes/Generals_annot_out.csv',\n",
    "# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/target/ontologies/kbpedia_reference_concepts_test.csv',\n",
    "\n",
    "\n",
    "def class_annot_build(**build_deck):\n",
    "    print('Beginning KBpedia class annotation build . . .')\n",
    "    loop_list = build_deck.get('loop_list')\n",
    "    loop = build_deck.get('loop')\n",
    "    class_loop = build_deck.get('class_loop')\n",
    "#    r_id = ''\n",
    "#    r_pref = ''\n",
    "#    r_def = ''\n",
    "#    r_alt = ''\n",
    "#    r_note = ''\n",
    "    if loop is not 'class_loop':\n",
    "        print(\"Needs to be a 'class_loop'; returning program.\")\n",
    "        return\n",
    "    for loopval in loop_list:\n",
    "        print('   . . . processing', loopval) \n",
    "        in_file = loopval\n",
    "        with open(in_file, 'r', encoding='utf8') as input:\n",
    "            is_first_row = True\n",
    "            reader = csv.DictReader(input, delimiter=',', fieldnames=[C])                 \n",
    "            for row in reader:\n",
    "                r_id_frag = row['id']\n",
    "                id = getattr(rc, r_id_frag)\n",
    "                if id == None:\n",
    "                    print(r_id_frag)\n",
    "                    continue\n",
    "                r_pref = row['prefLabel']\n",
    "                r_alt = row['altLabel']\n",
    "                r_def = row['definition']\n",
    "                r_note = row['editorialNote']\n",
    "                if is_first_row:                                       \n",
    "                    is_first_row = False\n",
    "                    continue      \n",
    "                id.prefLabel.append(r_pref)\n",
    "                id.definition.append(r_def)\n",
    "                i_alt = r_alt.split('||')\n",
    "                if i_alt != ['']: \n",
    "                    for item in i_alt:\n",
    "                        id.altLabel.append(item)\n",
    "                i_note = r_note.split('||')\n",
    "                if i_note != ['']:   \n",
    "                    for item in i_note:\n",
    "                        id.editorialNote.append(item)\n",
    "    print('KBpedia class annotation build is complete.')               \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class_annot_build(**build_deck)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kb.save(file=r'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts_test.owl', format='rdfxml') "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "BTW, when we commit this method to our <code>build.py</code> module, we will add the save routine at the end.\n",
    "\n",
    "### Loading Property Annotations\n",
    "We now turn our attention to annotations of properties:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### KEY CONFIG SETTINGS (see build_deck in config.py) ###                  \n",
    "# 'kb_src'        : 'standard'                                        \n",
    "# 'loop_list'     : file_dict.values(),                           # see 'in_file'\n",
    "# 'loop'          : 'class_loop',\n",
    "# 'in_file'       : 'C:/1-PythonProjects/kbpedia/v300/build_ins/properties/prop_annot_out.csv',\n",
    "# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/target/ontologies/kbpedia_reference_concepts_test.csv',\n",
    "\n",
    "def prop_annot_build(**build_deck):\n",
    "    print('Beginning KBpedia property annotation build . . .')\n",
    "    loop_list = build_deck.get('loop_list')\n",
    "    loop = build_deck.get('loop')\n",
    "    out_file = build_deck.get('out_file')\n",
    "    if loop is not 'property_loop':\n",
    "        print(\"Needs to be a 'property_loop'; returning program.\")\n",
    "        return\n",
    "    for loopval in loop_list:\n",
    "        print('   . . . processing', loopval) \n",
    "        in_file = loopval\n",
    "        with open(in_file, 'r', encoding='utf8') as input:\n",
    "            is_first_row = True\n",
    "            reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'prefLabel', 'subPropertyOf', 'domain',  \n",
    "                                   'range', 'functional', 'altLabel', 'definition', 'editorialNote'])                 \n",
    "            for row in reader:\n",
    "                r_id = row['id']                \n",
    "                r_pref = row['prefLabel']\n",
    "                r_dom = row['domain']\n",
    "                r_rng = row['range']\n",
    "                r_alt = row['altLabel']\n",
    "                r_def = row['definition']\n",
    "                r_note = row['editorialNote']\n",
    "                r_id = r_id.replace('rc.', '')\n",
    "                id = getattr(rc, r_id)\n",
    "                if id == None:\n",
    "                    print(r_id)\n",
    "                    continue\n",
    "                if is_first_row:                                       \n",
    "                    is_first_row = False\n",
    "                    continue\n",
    "                id.prefLabel.append(r_pref)\n",
    "                i_dom = r_dom.split('||')\n",
    "                if i_dom != ['']: \n",
    "                    for item in i_dom:\n",
    "                        id.domain.append(item)\n",
    "                if 'owl.' in r_rng:\n",
    "                    r_rng = r_rng.replace('owl.', '')\n",
    "                    r_rng = getattr(owl, r_rng)\n",
    "                    id.range.append(r_rng)\n",
    "                elif r_rng == ['']:\n",
    "                    continue\n",
    "                else:\n",
    "#                    id.range.append(r_rng)\n",
    "                i_alt = r_alt.split('||')    \n",
    "                if i_alt != ['']: \n",
    "                    for item in i_alt:\n",
    "                        id.altLabel.append(item)\n",
    "                id.definition.append(r_def)        \n",
    "                i_note = r_note.split('||')\n",
    "                if i_note != ['']:   \n",
    "                    for item in i_note:\n",
    "                        id.editorialNote.append(item)\n",
    "    print('KBpedia property annotation build is complete.') "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prop_annot_build(**build_deck)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hmmm. One of the things we notice in this routine is that our <code>domain</code> and <code>range</code> assignments have not been adequately picked up in our earlier KBpedia version 2.50 build routines (the ones undertaken in [Clojure](https://en.wikipedia.org/wiki/Clojure) before this **CWPK** series). As a result, we can not adequately test <code>range</code> and will need to address this oversight before our series is over.\n",
    "\n",
    "As before, we will add our 'save' routine as well when we commit the method to the <code>build.py</code> module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kb.save(file=r'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts_test.owl', format='rdfxml') "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have all of the building blocks to create our extract-build roundtrip. We summarize the formal steps and configuration settings in **CWPK #47**. But, first, we need to return to cleaning our input files and instituting some unit tests.\n",
    "\n",
    "\n",
    " <div style=\"background-color:#efefff; border:1px dotted #ceceff; vertical-align:middle; margin:15px 60px; padding:8px;\"> \n",
    "  <span style=\"font-weight: bold;\">NOTE:</span> This article is part of the <a href=\"https://www.mkbergman.com/cooking-with-python-and-kbpedia/\" style=\"font-style: italic;\">Cooking with Python and KBpedia</a> series. See the <a href=\"https://www.mkbergman.com/cooking-with-python-and-kbpedia/\"><strong>CWPK</strong> listing</a> for other articles in the series. <a href=\"http://kbpedia.org/\">KBpedia</a> has its own Web site.\n",
    "  </div>\n",
    "\n",
    "<div style=\"background-color:#ebf8e2; border:1px dotted #71c837; vertical-align:middle; margin:15px 60px; padding:8px;\"> \n",
    "\n",
    "<span style=\"font-weight: bold;\">NOTE:</span> This <strong>CWPK \n",
    "installment</strong> is available both as an online interactive\n",
    "file <a href=\"https://mybinder.org/v2/gh/Cognonto/CWPK/master\" ><img src=\"https://mybinder.org/badge_logo.svg\" style=\"display:inline-block; vertical-align: middle;\" /></a> or as a <a href=\"https://github.com/Cognonto/CWPK\" title=\"CWPK notebook\" alt=\"CWPK notebook\">direct download</a> to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the <code>*.ipynb</code> file. It may take a bit of time for the interactive option to load.</div>\n",
    "\n",
    "<div style=\"background-color:#feeedc; border:1px dotted #f7941d; vertical-align:middle; margin:15px 60px; padding:8px;\"> \n",
    "<div style=\"float: left; margin-right: 5px;\"><img src=\"http://kbpedia.org/cwpk-files/warning.png\" title=\"Caution!\" width=\"32\" /></div>I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment -- which is part of the fun of Python -- and to <a href=\"mailto:mike@mkbergman.com\">notify me</a> should you make improvements.    \n",
    "\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}