{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CWPK \\#25: Querying KBpedia with SPARQL\n",
    "=======================================\n",
    "\n",
    "Now, We Open Up the Power\n",
    "--------------------------\n",
    "\n",
    "<div style=\"float: left; width: 305px; margin-right: 10px;\">\n",
    "\n",
    "<img src=\"http://kbpedia.org/cwpk-files/cooking-with-kbpedia-305.png\" title=\"Cooking with KBpedia\" width=\"305\" />\n",
    "\n",
    "</div>\n",
    "\n",
    "In our recent installments we have been looking at how to search -- ultimately, of course, related to how to extract -- information from our knowledge graph, [KBpedia](https://kbpedia.org/), and the various large-scale knowledge bases to which it maps, such as [Wikipedia](https://en.wikipedia.org/wiki/Main_Page), [DBpedia](https://en.wikipedia.org/wiki/DBpedia), and [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page). We've seen that [owlready2](http://www.lesfleursdunormal.fr/static/informatique/owlready/index_en.html) offers us some native search capabilities, and that we can extend that by indexing additional attributes. What is powerful about knowledge graphs, however, is that all nodes and all edges are structural from the get-go, and we can easily add meaningful structure to our searches by how we represent the pieces (nodes) and by how we relate, or connect, them using the edges.\n",
    "\n",
    "Today's knowledge graphs are explicit in organizing information by structure. The exact scope of this structure varies across representations, and certainly one challenge in getting information to work together from multiple locations and provenances is the diversity of these representations. Those are the questions of semantics, and, fortunately, semantic technologies and parsers give us rich ways to retrieve and relate that structure. So, great, we now have structure galore! What are we going to do with it?\n",
    "\n",
    "Well, this structured information exists, literally, everywhere. We have huge online structured datestores, trillions of semi-structured Web pages and records, and meaningful information and analysis across a rich pastiche of hierarchies and relationships. What is clear in any attempt to solve a meaningful problem is that we need much external information as well as much grounding in our internal circumstances. Problem solving can not be separated from obtaining and integrating meaningful information.\n",
    "\n",
    "Thus, it is essential that we be able to query external information stores on an equivalent basis to our local ones. This equivalence requires both internal and external sources be structured and queriable on an equivalent basis, which is where the [W3C](https://en.wikipedia.org/wiki/World_Wide_Web_Consortium)-enabled standards and [SPARQL](https://en.wikipedia.org/wiki/SPARQL) come in."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h3 id=\"The-Role-of-SPARQL\">The Role of SPARQL</h3>\n",
    "<p>I think one can argue that the purpose of semantic technologies like <a href=\"https://en.wikipedia.org/wiki/Resource_Description_Framework\" target=\"_blank\" rel=\"noopener\">RDF</a> and <a href=\"https://en.wikipedia.org/wiki/Web_Ontology_Language\" target=\"_blank\" rel=\"noopener\">OWL</a> is to enable a machine-readable format for human symbolic information. As a result, we now have a rich suite of standards and implementations using those standards.</p>\n",
    "\n",
    "<p>The real purpose, and advantage, of SPARQL is to make explicit all of the structural aspects of a knowledge graph to inspection and query. Because of this intimate relationship, SPARQL is more often than not the most capable and precise language for extracting information from ontologies or knowledge graphs. SPARQL, pronounced \"sparkle\", is a recursive acronym for <em>SPARQL Protocol and RDF Query Language</em>, and has many syntactical and structural parallels with the <a href=\"https://en.wikipedia.org/wiki/SQL\" target=\"_blank\" rel=\"noopener\">SQL</a> database query language.</p>\n",
    "\n",
    "<p>All explicit assignments of a semantic term in RDF or OWL or their semantic derivatives can be used as a query basis in SPARQL. Thus, SPARQL is the <em>sine qua non</em> option for obtaining information from an ontology or knowledge graph. SPARQL is the most flexible and responsive way to manipulate a semantically structured information store.</p>\n",
    "\n",
    "<p>Let's inspect the general components of a SPARQL query specification:</p>\n",
    "\n",
    "<div style=\"margin: 10px auto; display: table;\">\n",
    "\n",
    "<img src=\"files/sparql-query-parts.png\" title=\"SPARQL Query Specification\" width=\"800\" alt=\"SPARQL Query Specification\" />\n",
    "\n",
    "</div>\n",
    "<div style=\"margin: 10px auto; display: table; font-style: italic;\">\n",
    "\n",
    "Figure 1: SPARQL Query Specification\n",
    "\n",
    "</div>\n",
    "<p>This figure is from Lee Feigenbaum's SPARQL slides, included with other useful links under the <strong>Additional Documentation</strong> below.</p>\n",
    "\n",
    "<p>Note that every SPARQL query gets directed to a specific endpoint, where access to the underlying RDF datastore takes place. These endpoints can be either local or accessed via the Web, with both examples shown below. In a standalone situation, the endpoint location is indicated by the <code>FROM</code> keyword. In our examples using RDFLib via Owlready2, these locations are set to a Python object.</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extended Startup\n",
    "Let's start again with the start-up script we used in the last installment, only now also opening <code>rdflib</code> and relating its namespace <code>graph</code> to the <code>world</code> namespace of KBpedia.\n",
    "\n",
    "<div style=\"background-color:#eee; border:1px dotted #aaa; vertical-align:middle; margin:15px 60px; padding:8px;\"><strong>Which environment?</strong> The specific load routine you should choose below depends on whether you are using the online MyBinder service (the 'raw' version) or local files. The example below is based on using local files (though replace with your own local directory specification). If loading from MyBinder, replace with the lines that are commented (<code>#</code>) out.</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "main = 'C:/1-PythonProjects/kbpedia/sandbox/kbpedia_reference_concepts.owl'\n",
    "# main = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kbpedia_reference_concepts.owl'\n",
    "skos_file = 'http://www.w3.org/2004/02/skos/core' \n",
    "kko_file = 'C:/1-PythonProjects/kbpedia/sandbox/kko.owl'\n",
    "# kko_file = 'https://raw.githubusercontent.com/Cognonto/CWPK/master/sandbox/builds/ontologies/kko.owl'\n",
    "\n",
    "from owlready2 import *\n",
    "world = World()\n",
    "kb = world.get_ontology(main).load()\n",
    "rc = kb.get_namespace('http://kbpedia.org/kko/rc/')\n",
    "\n",
    "skos = world.get_ontology(skos_file).load()\n",
    "kb.imported_ontologies.append(skos)\n",
    "\n",
    "kko = world.get_ontology(kko_file).load()\n",
    "kb.imported_ontologies.append(kko)\n",
    "\n",
    "import rdflib\n",
    "\n",
    "graph = world.as_rdflib_graph()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We could have put the <code>import</code> statement for the RDFLib package at the top, but anywhere prior to formatting the query is fine."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now may manipulate the knowledge graph as we would in a standard way using (in this case) the namespace <code>world</code> for owlready2 and access all of the additional functionality available via RDFLib using the (in this case) the <code>graph</code> namespace. This is a great example of the Python ecosystem at work.\n",
    "\n",
    "Further, because of even greater integration, there are some native commands in Owlready2 that have been mapped to RDFLib making the syntax and conventions in working with both libraries easier.\n",
    "\n",
    "### Basic SPARQL Forms\n",
    "In the last installment we presented two wrinkles for how to express your SPARQL queries to your local datastore. This form I noted looked closer to a standard SPARQL expression shown in *Figure 1*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "* Owlready2 * Warning: ignoring cyclic subclass of/subproperty of, involving:\n",
      "  http://kbpedia.org/kko/rc/Person\n",
      "  http://kbpedia.org/kko/rc/HomoSapiens\n",
      "\n"
     ]
    }
   ],
   "source": [
    "form_1 = list(graph.query_owlready(\"\"\"\n",
    "  PREFIX rc: <http://kbpedia.org/kko/rc/>\n",
    "  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>\n",
    "  SELECT DISTINCT ?x ?label\n",
    "  WHERE\n",
    "  {\n",
    "    ?x rdfs:subClassOf rc:Mammal.\n",
    "    ?x skos:prefLabel  ?label. \n",
    "  }\n",
    "\"\"\"))\n",
    "\n",
    "print(form_1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The query above has a warning message we can ignore and lists all of the direct sub-classes to Mammal in KBpedia.\n",
    "\n",
    "The last installment also offered a second form, which is the one I will be using hereafter. I am doing so because this form, and its further abstraction, is a more repeatable approach. In general, this advantage is because we can take this format and abstract it into a 'wrapper' that encapsulates the method of making the SPARQL call separate, abstracted from the actual SPARQL specification. We will increasingly touch on these topics, but for now this is the format we will take:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]\n"
     ]
    }
   ],
   "source": [
    "form_2 = \"\"\"\n",
    "  PREFIX rc: <http://kbpedia.org/kko/rc/>\n",
    "  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>\n",
    "  SELECT DISTINCT ?x ?label\n",
    "  WHERE\n",
    "  {\n",
    "    ?x rdfs:subClassOf rc:Mammal.\n",
    "    ?x skos:prefLabel  ?label. \n",
    "  }\n",
    "\"\"\"\n",
    "\n",
    "results = list(graph.query_owlready(form_2))\n",
    "print(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These two examples cover how to access the local datastore.\n",
    "### External SPARQL Examples\n",
    "We really like what we have seen with the SPARQL querying of the internal data store using RDFLib within Owlready2. But what of querying outside sources. (And, would it not be cool to be able to mix-and-match internal and external stuff?)\n",
    "\n",
    "As we try to use RDFLib as is against external SPARQL endpoints we quickly see that we are not adequately identifying and talking with these sites. Well, we have been here before, but the nature of stuff with Python and packages and dependencies and such often requires another capability.\n",
    "\n",
    "Some quick poking turns up that we are lacking a HTTP-aware 'wrapper' to external sites. We turn up a promising package in <code>sparqlwrapper</code>. We discover it is on <code>conda-forge</code> so we back out the system, and at the command line add the package:\n",
    "\n",
    "<code>$ conda install sparqlwrapper</code>\n",
    "\n",
    "We again get the feedback to the screen as the Anaconda configuration manager does its thing. When finally installed and the prompt returns, we again load up Jupyter Notebook and return to this notebook page.\n",
    "\n",
    "We are now ready to try our first external example, this time to Wikidata, after we <code>import SPARQLwrapper</code> and set our endpoint target to Wikidata (<code>https://query.wikidata.org/sparql</code>):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'head': {'vars': ['item', 'itemLabel', 'wikilink', 'itemDescription', 'subClass', 'subClassLabel']}, 'results': {'bindings': [{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q537127'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q8667674'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'road bridge'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'bridge that carries road traffic'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Road bridges'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q11755880'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q8656043'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'residential building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building mainly used for residential purposes'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Residential buildings'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q16831714'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q6259373'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'government building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building built for and by the government, such as a town hall'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Government buildings'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q24398318'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q5655238'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'religious building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building intended for religious worship or other activities related to a religion; ceremonial structures that are related to or concerned with religion'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Religious buildings and structures'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q25297630'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q7344076'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'international bridge'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'bridge built across a geopolitical boundary'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:International bridges'}}]}}\n"
     ]
    }
   ],
   "source": [
    "from SPARQLWrapper import SPARQLWrapper, JSON\n",
    "from rdflib import Graph\n",
    "\n",
    "sparql = SPARQLWrapper(\"https://query.wikidata.org/sparql\")\n",
    "\n",
    "sparql.setQuery(\"\"\"\n",
    "  PREFIX schema: <http://schema.org/>\n",
    "  SELECT ?item ?itemLabel ?wikilink ?itemDescription ?subClass ?subClassLabel WHERE {\n",
    "  VALUES ?item { wd:Q25297630\n",
    "  wd:Q537127\n",
    "  wd:Q16831714\n",
    "  wd:Q24398318\n",
    "  wd:Q11755880\n",
    "  wd:Q681337\n",
    "}\n",
    "  ?item wdt:P910 ?subClass.\n",
    "\n",
    "  SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n",
    "}\n",
    "\"\"\")\n",
    "sparql.setReturnFormat(JSON)\n",
    "results = sparql.query().convert()\n",
    "print(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great! It works, and our first information retrieval from an external site!\n",
    "\n",
    "Let me point out a couple of things about this format. First, the endpoint already has some built-in prefixes (<code>wd:</code> and <code>wdt:</code>) so we did not need to declare them in the query header. Second, there are some unique query capabilities of the Wikidata site noted by the <code>SERVICE</code> designation.\n",
    "\n",
    "When first querying a new site it is perhaps best to stick to vanilla forms of SPARQL, but as one learns more it is possible to tailor queries more specifically. We also see that our setup will allow us to take advantage of what each endpoint gives us.\n",
    "\n",
    "So, let's take another example, this one using the DBpedia endpoint, to show how formats may also differ from endpoint to endpoint:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from SPARQLWrapper import SPARQLWrapper, RDFXML\n",
    "from rdflib import Graph\n",
    "\n",
    "sparql = SPARQLWrapper(\"http://dbpedia.org/sparql\")\n",
    "\n",
    "sparql.setQuery(\"\"\"\n",
    "    PREFIX dbo: <http://dbpedia.org/ontology/>\n",
    "    PREFIX schema: <http://schema.org/>\n",
    "\n",
    "    CONSTRUCT {\n",
    "      ?lang a schema:Language ;\n",
    "      schema:alternateName ?iso6391Code .\n",
    "    }\n",
    "    WHERE {\n",
    "      ?lang a dbo:Language ;\n",
    "      dbo:iso6391Code ?iso6391Code .\n",
    "      FILTER (STRLEN(?iso6391Code)=2) # to filter out non-valid values\n",
    "    }\n",
    "\"\"\")\n",
    "\n",
    "sparql.setReturnFormat(RDFXML)\n",
    "results = sparql.query().convert()\n",
    "print(results.serialize(format='xml'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice again how the structure of our query code is pretty patterned. We also see in the two examples how we can specify different query results serializations (<code>JSON</code> and <code>RDFXML</code> in these examples) for our results sets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Additional Documentation\n",
    "\n",
    "The idea of a SPARQL tutorial is outside of the defined scope of this CWPK series. But, the power of SPARQL is substantial and it is well worth the time to learn more about this flexible language, that reminds one of SQL in many ways, but has its own charms and powers. Here are some great starting links about SPARQL:\n",
    "\n",
    "- Lee Feigenbaum's [SPARQL by Example: The Cheat Sheet](https://www.slideshare.net/LeeFeigenbaum/sparql-cheat-sheet)\n",
    "- [SPARQL in 11 minutes](https://www.youtube.com/watch?v=FvGndkpa4K0) video by Bob DuCharme\n",
    "- [Learning SPARQL](http://www.learningsparql.com/) by Bob DuCharme\n",
    "- [sparqlwarpper documentation](https://github.com/RDFLib/sparqlwrapper)\n",
    "- [Wikidata SPARQL query examples](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples).\n",
    "\n",
    "\n",
    " <div style=\"background-color:#efefff; border:1px dotted #ceceff; vertical-align:middle; margin:15px 60px; padding:8px;\"> \n",
    "  <span style=\"font-weight: bold;\">NOTE:</span> This article is part of the <a href=\"https://www.mkbergman.com/cooking-with-python-and-kbpedia/\" style=\"font-style: italic;\">Cooking with Python and KBpedia</a> series. See the <a href=\"https://www.mkbergman.com/cooking-with-python-and-kbpedia/\"><strong>CWPK</strong> listing</a> for other articles in the series. <a href=\"http://kbpedia.org/\">KBpedia</a> has its own Web site.\n",
    "  </div>\n",
    "\n",
    "<div style=\"background-color:#ebf8e2; border:1px dotted #71c837; vertical-align:middle; margin:15px 60px; padding:8px;\"> \n",
    "\n",
    "<span style=\"font-weight: bold;\">NOTE:</span> This <strong>CWPK \n",
    "installment</strong> is available both as an online interactive\n",
    "file <a href=\"https://mybinder.org/v2/gh/Cognonto/CWPK/master\" ><img src=\"https://mybinder.org/badge_logo.svg\" style=\"display:inline-block; vertical-align: middle;\" /></a> or as a <a href=\"https://github.com/Cognonto/CWPK\" title=\"CWPK notebook\" alt=\"CWPK notebook\">direct download</a> to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the <code>*.ipynb</code> file. It may take a bit of time for the interactive option to load.</div>\n",
    "\n",
    "<div style=\"background-color:#feeedc; border:1px dotted #f7941d; vertical-align:middle; margin:15px 60px; padding:8px;\"> \n",
    "<div style=\"float: left; margin-right: 5px;\"><img src=\"http://kbpedia.org/cwpk-files/warning.png\" title=\"Caution!\" width=\"32\" /></div>I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment -- which is part of the fun of Python -- and to <a href=\"mailto:mike@mkbergman.com\">notify me</a> should you make improvements.    \n",
    "\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}