{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "CWPK \\#44: Annotation Ingest\n", "=======================================\n", "\n", "More Fields, But Less Complexity\n", "--------------------------\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "We now tackle the ingest of annotations for classes and properties in this installment of the [*Cooking with Python and KBpedia*](https://www.mkbergman.com/cooking-with-python-and-kbpedia/) series. In prior installments we built the structural aspects of [KBpedia](https://kbpedia.org/). We now add the labels, definitions, and other assignments to them.\n", "\n", "As with the extraction routines, we will split these efforts into class annotations and then property annotations. Our actual load routines are fairly straightforward, and we have no real logic concerns in how these annotations get added. The most complex wrinkle we will need to address are those annotation fields, altLabels and notes in particular, where we have potentially many assignments for a single reference concept (RC) or property. Like we saw with the extraction routines, for these items we will need to set up additional internal loops to segregate and assign the items for loading based on our standard double-pipe ('||') delimiter.\n", "\n", "The two functions we develop in this installment, class_annot_builder and prop_annot_builder will be added to the build.py module." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Start-up\n", "\n", "Since we are in an active part of the build cycle, we want to continue with our main knowledge graph in-progress for our load routine, so please make sure that kb_src is set to 'standard' in your config.py configuration. We then invoke our standard start-up:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from cowpoke.__main__ import *\n", "from cowpoke.config import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading Class Annotations\n", "Class annotations consist of potentially the item's prefLabel, altLabels, definition, and editorialNote. The first item is mandatory, the next two should be provided to adhere to best practices. The last is optional. There are, of course, other standard annotations possible. Should your own conventions require or encourage them, you will likely need to modify the procedure below to account for that fact.\n", "\n", "As with these methods before, we provide a header showing 'typical' configuration settings (in config.py), and then proceed with a method that loops through all of the rows in the input file. Here is the basic class annotation build procedure. There are no new wrinkles in this routine from what has been seen previously:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### KEY CONFIG SETTINGS (see build_deck in config.py) ### \n", "# 'kb_src' : 'standard' \n", "# 'loop_list' : file_dict.values(), # see 'in_file'\n", "# 'loop' : 'class_loop',\n", "# 'in_file' : 'C:/1-PythonProjects/kbpedia/v300/build_ins/classes/Generals_annot_out.csv',\n", "# 'out_file' : 'C:/1-PythonProjects/kbpedia/v300/target/ontologies/kbpedia_reference_concepts_test.csv',\n", "\n", "\n", "def class_annot_build(**build_deck):\n", " print('Beginning KBpedia class annotation build . . .')\n", " loop_list = build_deck.get('loop_list')\n", " loop = build_deck.get('loop')\n", " class_loop = build_deck.get('class_loop')\n", "# r_id = ''\n", "# r_pref = ''\n", "# r_def = ''\n", "# r_alt = ''\n", "# r_note = ''\n", " if loop is not 'class_loop':\n", " print(\"Needs to be a 'class_loop'; returning program.\")\n", " return\n", " for loopval in loop_list:\n", " print(' . . . processing', loopval) \n", " in_file = loopval\n", " with open(in_file, 'r', encoding='utf8') as input:\n", " is_first_row = True\n", " reader = csv.DictReader(input, delimiter=',', fieldnames=[C]) \n", " for row in reader:\n", " r_id_frag = row['id']\n", " id = getattr(rc, r_id_frag)\n", " if id == None:\n", " print(r_id_frag)\n", " continue\n", " r_pref = row['prefLabel']\n", " r_alt = row['altLabel']\n", " r_def = row['definition']\n", " r_note = row['editorialNote']\n", " if is_first_row: \n", " is_first_row = False\n", " continue \n", " id.prefLabel.append(r_pref)\n", " id.definition.append(r_def)\n", " i_alt = r_alt.split('||')\n", " if i_alt != ['']: \n", " for item in i_alt:\n", " id.altLabel.append(item)\n", " i_note = r_note.split('||')\n", " if i_note != ['']: \n", " for item in i_note:\n", " id.editorialNote.append(item)\n", " print('KBpedia class annotation build is complete.') \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class_annot_build(**build_deck)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kb.save(file=r'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts_test.owl', format='rdfxml') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "BTW, when we commit this method to our build.py module, we will add the save routine at the end.\n", "\n", "### Loading Property Annotations\n", "We now turn our attention to annotations of properties:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### KEY CONFIG SETTINGS (see build_deck in config.py) ### \n", "# 'kb_src' : 'standard' \n", "# 'loop_list' : file_dict.values(), # see 'in_file'\n", "# 'loop' : 'class_loop',\n", "# 'in_file' : 'C:/1-PythonProjects/kbpedia/v300/build_ins/properties/prop_annot_out.csv',\n", "# 'out_file' : 'C:/1-PythonProjects/kbpedia/v300/target/ontologies/kbpedia_reference_concepts_test.csv',\n", "\n", "def prop_annot_build(**build_deck):\n", " print('Beginning KBpedia property annotation build . . .')\n", " loop_list = build_deck.get('loop_list')\n", " loop = build_deck.get('loop')\n", " out_file = build_deck.get('out_file')\n", " if loop is not 'property_loop':\n", " print(\"Needs to be a 'property_loop'; returning program.\")\n", " return\n", " for loopval in loop_list:\n", " print(' . . . processing', loopval) \n", " in_file = loopval\n", " with open(in_file, 'r', encoding='utf8') as input:\n", " is_first_row = True\n", " reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'prefLabel', 'subPropertyOf', 'domain', \n", " 'range', 'functional', 'altLabel', 'definition', 'editorialNote']) \n", " for row in reader:\n", " r_id = row['id'] \n", " r_pref = row['prefLabel']\n", " r_dom = row['domain']\n", " r_rng = row['range']\n", " r_alt = row['altLabel']\n", " r_def = row['definition']\n", " r_note = row['editorialNote']\n", " r_id = r_id.replace('rc.', '')\n", " id = getattr(rc, r_id)\n", " if id == None:\n", " print(r_id)\n", " continue\n", " if is_first_row: \n", " is_first_row = False\n", " continue\n", " id.prefLabel.append(r_pref)\n", " i_dom = r_dom.split('||')\n", " if i_dom != ['']: \n", " for item in i_dom:\n", " id.domain.append(item)\n", " if 'owl.' in r_rng:\n", " r_rng = r_rng.replace('owl.', '')\n", " r_rng = getattr(owl, r_rng)\n", " id.range.append(r_rng)\n", " elif r_rng == ['']:\n", " continue\n", " else:\n", "# id.range.append(r_rng)\n", " i_alt = r_alt.split('||') \n", " if i_alt != ['']: \n", " for item in i_alt:\n", " id.altLabel.append(item)\n", " id.definition.append(r_def) \n", " i_note = r_note.split('||')\n", " if i_note != ['']: \n", " for item in i_note:\n", " id.editorialNote.append(item)\n", " print('KBpedia property annotation build is complete.') " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "prop_annot_build(**build_deck)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hmmm. One of the things we notice in this routine is that our domain and range assignments have not been adequately picked up in our earlier KBpedia version 2.50 build routines (the ones undertaken in [Clojure](https://en.wikipedia.org/wiki/Clojure) before this **CWPK** series). As a result, we can not adequately test range and will need to address this oversight before our series is over.\n", "\n", "As before, we will add our 'save' routine as well when we commit the method to the build.py module." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kb.save(file=r'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts_test.owl', format='rdfxml') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have all of the building blocks to create our extract-build roundtrip. We summarize the formal steps and configuration settings in **CWPK #47**. But, first, we need to return to cleaning our input files and instituting some unit tests.\n", "\n", "\n", "
\n", " NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.\n", "
\n", "\n", "
\n", "\n", "NOTE: This CWPK \n", "installment is available both as an online interactive\n", "file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
\n", "\n", "
\n", "
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment -- which is part of the fun of Python -- and to notify me should you make improvements. \n", "\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }