{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "CWPK \\#27: A 'Roundtrip' Philosophy \n", "===================================\n", "\n", "Making the Transition to Methods and Modules\n", "--------------------------------------------\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "With this installment, we transition to our third major part in our\n", "[*Cooking with Python and KBpedia*](https://www.mkbergman.com/cooking-with-python-and-kbpedia/) series.\n", "We have evaluated and decided upon our alternatives, then installed and\n", "configured them while gaining some exposure, and now are transitioning\n", "to applying those tools to developing our first methods. This transition\n", "will culminate with us packaging our first module in the [KBpedia](https://kbpedia.org/) system,\n", "in the process beginning to undertake bulk modifications. These bulk\n", "capabilities are at the heart of adopting and then extending KBpedia for\n", "your own domain purposes. Of course, in still later installments, we will\n", "probe more advanced methods and capabilities, but this current part will\n", "help us move in that direction by setting the [Python](https://en.wikipedia.org/wiki/Python_(programming_language)) groundwork. Besides\n", "this intro article, this third major part is almost entirely devoted to\n", "Python code and code management.\n", "\n", "When I begin posting that code, you will note that I change the standard blue message box at the conclusion of each installment. Yes, I'm a newbie,\n", "though with some exposure to programming best practices, but I am still\n", "most decidedly an amateur. One of the fun things in working with Python\n", "is the multiplicity of packages or modules or styles available to you as\n", "a programmer (amateur or not). There are great books and great online\n", "resources, which I often cite as we move forward, but I have found\n", "interactive coding to be an absolute blast with [Jupyter Notebook](https://en.wikipedia.org/wiki/Project_Jupyter). One\n", "can literally search and find Python coding options immediately on the\n", "Web and then test them directly in the notebook. I love the immediate testing, the tactile sense of interacting with\n", "the important code blocks. Knowing this, it is helpful to always bring\n", "forward the same environment and domain each time I work with the\n", "system. That means I am always working with information of relevance and\n", "testing routines of importance. I also like the ability to really do\n", "[Knuth's](https://en.wikipedia.org/wiki/Donald_Knuth) [literate programming](https://en.wikipedia.org/wiki/Literate_programming) with the interspersing of comment and\n", "context.\n", "\n", "So, as we kick off this new part, I wanted to start with a largely narrative\n", "introduction. I know where I want to go with this series, but since I am\n", "documenting as I go, I really don't know for sure the path to get to\n", "objectives. I thought, therefore, that how I think about things and\n", "problems could be a logic trace for your own way to think about things.\n", "I think thinking in programmatic terms is more dynamic than report\n", "writing or project planning, my two main activities for decades. Coding\n", "is a faster, more consuming sport.\n", "\n", "### Why the Idea of 'Roundtripping'?\n", "\n", "Our experience over the past decade has brought three main lessons to\n", "the fore. First, knowledge -- and, therefore, [knowledge graphs](https://en.wikipedia.org/wiki/Ontology_(information_science)) to represent\n", "it -- is dynamic and must to updated and maintained on a constant basis. A\n", "static knowledge graph is probably better than none at all, but is a far\n", "cry from the usefulness that can be gained from having knowledge\n", "currency an objective of a knowledge graph (and its supporting knowledge\n", "bases).\n", "\n", "Second, while in expression they may be complex, knowledge systems are\n", "fundamentally simple and understandable. The complexity of a knowledge\n", "system arises from the emergence of simple rules, interacting in\n", "exponentially large ways. Implications are [deductive](https://en.wikipedia.org/wiki/Deductive_reasoning), predictions are [inductive](https://en.wikipedia.org/wiki/Inductive_reasoning), and new knowledge arises from [abductive](https://en.wikipedia.org/wiki/Abductive_reasoning) ways to interact with these systems. We should be able to break down our knowledge systems into fundamentally simple structures, modify those simple structures,\n", "and then build them back up again to their non-linear dynamics.\n", "\n", "And, third, we have multiple ways we need to interact with knowledge\n", "graphs and [bases](https://en.wikipedia.org/wiki/Knowledge_base), and multiple best-breed tools to do so. Sometimes we\n", "want to build and maintain a knowledge structure as something unto\n", "itself, with logic and integrity checks and tests and expansions of\n", "capabilities. Other actions might have us staging data to third-party\n", "applications for [AI](https://en.wikipedia.org/wiki/Artificial_intelligence) or [machine learning](https://en.wikipedia.org/wiki/Machine_learning). We also may need to make bulk\n", "modifications for specific application purposes or to tailor it\n", "specifically to our current domain. The different tools that might\n", "support these and other activities are best served when something akin\n", "to a common data interchange format is found. In our case, that is [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) in [UTF-8](https://en.wikipedia.org/wiki/UTF-8) format, often expressed as [N3](https://en.wikipedia.org/wiki/Notation3) [semantic triples](https://en.wikipedia.org/wiki/Semantic_triple).\n", "\n", "Once the idea of a common exchange framework emerges, the sense of\n", "'[roundtripping](https://en.wikipedia.org/wiki/Round-trip_engineering)' becomes obvious. We use one tool for one purpose, export its information in a format with semantics sufficient for another tool\n", "to ingest it, make changes to it, and export it back again in a format\n", "with semantics readable by the first tool. Actually, in practice, good\n", "roundtripping more resembles a hub-and-spoke design, with the common\n", "representation framework at the hub and common to the spokes. \n", "\n", "In our design of the KBpedia processing system moving forward, then, we\n", "will want to break down, or 'decompose' working, fully-specified and\n", "logically tested knowledge graphs into component parts that we can work\n", "with and modify offline, so to speak. We may work with and modify these\n", "component parts quite extensively in this 'offline' mode. We could, for\n", "example, swap out entire modules for specific domains with our own\n", "favored representations of that domain. We may also want to isolate all\n", "of our language strings to translate the knowledge graph to other\n", "languages. Or we may want to prune areas while we expand the specificity\n", "in others. We may even make changes in big chunks to the grounded upper\n", "structure of our knowledge graph because our design is inherently\n", "malleable. A huge source of ossification in knowledge graphs is this\n", "inability to be decomposed into re-processible building blocks. \n", "\n", "### A Mindset of Patterns\n", "\n", "These big design considerations have a complimentary part at the\n", "implementation level of the code. The same drivers of hierarchy and\n", "generalizability that govern a modular architecture also govern code\n", "design, or so it seems to me. Maybe it is because of this pattern of\n", "break-down and build-up of the specification components of KBpedia that\n", "I also see repeatability in code steps. We start with a file and its\n", "context. We process that file to extract its mapped semantics and data.\n", "We manipulate that storehouse of assertions to support many\n", "applications. We are continually learning and adding to the storehouse.\n", "We make bulk moves and bulk changes to our underlying data. We are\n", "constantly opening and writing to files, and representing our\n", "information as two-dimensional arrays of records (rows) and\n", "characteristics (columns). We are needing to monitor changes and log\n", "errors and events to file while processing. We need to find our stored\n", "procedures and save stuff so we may readily find it again.\n", "\n", "The idea of patterns and the power it brings to questions of scoping,\n", "abstraction, and design is substantial. I agree with the dictum that if\n", "you do something three times you should generalize and code it. My guess\n", "is that the search for the better algorithm and design is a key\n", "motivator to the professional programmer. For my purposes, however, this\n", "mindset is really just one of trying to think through generic activities\n", "that a given code block is intended to address, and then assess if more\n", "than three applications of this block (or parts of it) are likely across\n", "the intended code base. Once so stated, it is pretty obvious that\n", "'generalizability' is very much a function of current use and context,\n", "so one dynamic aspect of programming is the continual refactoring of\n", "prior code to make it generalizable. When stated in words that way, it\n", "sounds perhaps a little crazy. But, in practice, generalizability of\n", "code leads to further simplicity, maintainability, and (hopefully)\n", "efficiency.\n", "\n", "Python has many wonderful features to support patterns. One may, for\n", "example, adopt a ['functional' programming](https://en.wikipedia.org/wiki/Functional_programming) style in working with Python despite the language not being initially designed to be so. Extensions\n", "of functionality occur in any existing programming style with Python.\n", "\n", "Any information passed to those routines should also be abstracted to\n", "logical names within input records. Automation only occurs through\n", "generalization. Like the simplicity argument made above, simple machines\n", "like [automatons](https://en.wikipedia.org/wiki/Cellular_automaton) are easier to orchestrate and manage, even if their\n", "outcomes appear chaotic. So, what I think we would like to do in the\n", "totally abstract is have a limited number of functional method\n", "primitives to which we pass generic instructions and information using a\n", "relative small subset of named objects. Again, this is one of the key\n", "strengths of Python: the objectification of the language linked to\n", "nameable spaces. \n", "\n", "### High-level Build Overview\n", "\n", "In its most general terms, we build KBpedia from three (actually, four,\n", "I cheat, and will explain in a bit) pieces. The first is the structural\n", "scaffolding of concepts and their 'is-a' hierarchical relationships. The\n", "second are the properties of the instances that the concepts represent,\n", "and how we understand, qualify, and quantify those things. The third\n", "piece is the way we label or describe or point to or indicate those\n", "things.\n", "\n", "From these components we can build, and in the process logically test,\n", "the entire KBpedia from scratch. Since that is now working in our internal implementations with [Clojure](https://en.wikipedia.org/wiki/Clojure), that is a de minimus capability\n", "we want to capture in Python. While the build process begins with these\n", "input files and adds to the core starting point (the 'bootstrap' as best\n", "understood) we do not have that Python build code as we start out.\n", "Further, in a strange way, we never did have such a starting point for\n", "KBpedia in Clojure anyway. The code base for KBpedia we inherited from\n", "the previous generation [UMBEL](https://en.wikipedia.org/wiki/UMBEL). And, UMBEL had some historical methods\n", "for building its knowledge graph directly from [OpenCyc](https://en.wikipedia.org/wiki/Cyc). The modular\n", "build routines had never been re-factored into the core routines of\n", "either UMBEL or KBpedia!\n", "\n", "Fundamentally, this is not a big deal, since our modular approach and\n", "additions and modifications present no conceptual or implementation\n", "challenges. Still, the fact remains that our Clojure build routines do\n", "not begin at the root build premise. The easier way to bootstrap into a\n", "complete code base for roundtripping, then, is to first extract away the\n", "logical pieces from the coherent full KBpedia, until there is nothing\n", "left but the 'core' of the ontology. This core, of course, is the\n", "Kbpedia Knowledge Ontology, or [KKO](https://kbpedia.org/docs/kko-upper-structure/). For the bootstrapping process to\n", "work, we begin with a KKO specified core, and then extract or add\n", "pieces to it. We extract when\n", "we are capturing changes to the ontology graph that might have been made\n", "while in production or development using something like the [Protégé](https://en.wikipedia.org/wiki/Prot%C3%A9g%C3%A9_(software)) [IDE](https://en.wikipedia.org/wiki/Integrated_development_environment).\n", "We build when we are submitting\n", "our modifications to the 'core' and its existing components while\n", "testing for [consistency](https://en.wikipedia.org/wiki/Consistency) and [satisfiability](https://en.wikipedia.org/wiki/Satisfiability). \n", "\n", "Thus, while we may be tackling specific tasks a little backwards by\n", "dealing with extraction first, in the spirit of roundtripping these are\n", "merely questions of where one breaks into the system. For this **CWPK**\n", "series, that starts with extraction.\n", "\n", "By the way, what was that reference to the fourth piece? Well, it is\n", "mapping KBpedia to external sources to facilitate retrieval and\n", "integration. We will cover that topic as well toward the end of our\n", "series. We are able to defer this topic since the mapping question is a\n", "bit of a secondary orbit from the central question of building and\n", "modifying KBpedia (or its derivatives). \n", "\n", "### A Caution and Some Basic Intuitions\n", "\n", "My caution is just to reiterate that the Python code to come is one\n", "approach, among certainly many options, most of which I am sure would be\n", "easier to understand or better performing than what I am offering. Yet there is much to be said\n", "about getting 'first twitch' from these Jupyter Notebook installments\n", "and being able to test and extend these notions on your own.\n", "\n", "And, what are these notions? Given the functional richness of the Python\n", "landscape it is only fair that I share some of my prejudices and\n", "intuitions about the specific methods put forth in the remaining code.\n", "Here are a few:\n", "\n", "- I like the idea of 'generators'. Much of what we deal with in these scripts and KBpedia itself can be expressed in the 'generator' style of efficiently looping over specific sets or iterating things\n", "- A 'set' notation is at the heart of [W3C](https://en.wikipedia.org/wiki/World_Wide_Web_Consortium) standards (though sometimes masked as such) and the Python built-in set manipulation methods seem to be a powerful way of manipulating and comparing very large datasets. The set notation includes terminology such as intersction, union, difference, disjoint, subset, update, etc.\n", "- And, our view of CSV files as a central standard likely means we need to investigate and compare and choose among multiple CSV options in Python.\n", "\n", "Once we get these basic coding methods in place it is time to turn our efforts into a standard Python module. Our transition will be aided by working the [Spyder](https://en.wikipedia.org/wiki/Spyder_(software)) IDE into our code-development workflow toward the end of this third part.\n", "\n", "
\n", " NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site.\n", "
\n", "\n", "
\n", "\n", "NOTE: This CWPK \n", "installment is available both as an online interactive\n", "file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
\n", "\n", "
\n", "
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment -- which is part of the fun of Python -- and to notify me should you make improvements. \n", "\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }