{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "CWPK \\#25: Querying KBpedia with SPARQL\n", "=======================================\n", "\n", "Now, We Open Up the Power\n", "--------------------------\n", "\n", "
I think one can argue that the purpose of semantic technologies like RDF and OWL is to enable a machine-readable format for human symbolic information. As a result, we now have a rich suite of standards and implementations using those standards.
\n", "\n", "The real purpose, and advantage, of SPARQL is to make explicit all of the structural aspects of a knowledge graph to inspection and query. Because of this intimate relationship, SPARQL is more often than not the most capable and precise language for extracting information from ontologies or knowledge graphs. SPARQL, pronounced \"sparkle\", is a recursive acronym for SPARQL Protocol and RDF Query Language, and has many syntactical and structural parallels with the SQL database query language.
\n", "\n", "All explicit assignments of a semantic term in RDF or OWL or their semantic derivatives can be used as a query basis in SPARQL. Thus, SPARQL is the sine qua non option for obtaining information from an ontology or knowledge graph. SPARQL is the most flexible and responsive way to manipulate a semantically structured information store.
\n", "\n", "Let's inspect the general components of a SPARQL query specification:
\n", "\n", "This figure is from Lee Feigenbaum's SPARQL slides, included with other useful links under the Additional Documentation below.
\n", "\n", "Note that every SPARQL query gets directed to a specific endpoint, where access to the underlying RDF datastore takes place. These endpoints can be either local or accessed via the Web, with both examples shown below. In a standalone situation, the endpoint location is indicated by the FROM
keyword. In our examples using RDFLib via Owlready2, these locations are set to a Python object.
rdflib
and relating its namespace graph
to the world
namespace of KBpedia.\n",
"\n",
"#
) out.import
statement for the RDFLib package at the top, but anywhere prior to formatting the query is fine."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now may manipulate the knowledge graph as we would in a standard way using (in this case) the namespace world
for owlready2 and access all of the additional functionality available via RDFLib using the (in this case) the graph
namespace. This is a great example of the Python ecosystem at work.\n",
"\n",
"Further, because of even greater integration, there are some native commands in Owlready2 that have been mapped to RDFLib making the syntax and conventions in working with both libraries easier.\n",
"\n",
"### Basic SPARQL Forms\n",
"In the last installment we presented two wrinkles for how to express your SPARQL queries to your local datastore. This form I noted looked closer to a standard SPARQL expression shown in *Figure 1*:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[rc.AbominableSnowman, 'abominable snowman'], [rc.Afroinsectiphilia, 'Afroinsectiphilia'], [rc.Eutheria, 'placental mammal'], [rc.Marsupial, 'pouched mammal'], [rc.Australosphenida, 'Australosphenida'], [rc.Bigfoot, 'Sasquatch'], [rc.Monotreme, 'monotreme'], [rc.Vampire, 'vampire'], [rc.Werewolf, 'werewolf']]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"* Owlready2 * Warning: ignoring cyclic subclass of/subproperty of, involving:\n",
" http://kbpedia.org/kko/rc/Person\n",
" http://kbpedia.org/kko/rc/HomoSapiens\n",
"\n"
]
}
],
"source": [
"form_1 = list(graph.query_owlready(\"\"\"\n",
" PREFIX rc: sparqlwrapper
. We discover it is on conda-forge
so we back out the system, and at the command line add the package:\n",
"\n",
"$ conda install sparqlwrapper
\n",
"\n",
"We again get the feedback to the screen as the Anaconda configuration manager does its thing. When finally installed and the prompt returns, we again load up Jupyter Notebook and return to this notebook page.\n",
"\n",
"We are now ready to try our first external example, this time to Wikidata, after we import SPARQLwrapper
and set our endpoint target to Wikidata (https://query.wikidata.org/sparql
):"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'head': {'vars': ['item', 'itemLabel', 'wikilink', 'itemDescription', 'subClass', 'subClassLabel']}, 'results': {'bindings': [{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q537127'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q8667674'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'road bridge'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'bridge that carries road traffic'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Road bridges'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q11755880'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q8656043'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'residential building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building mainly used for residential purposes'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Residential buildings'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q16831714'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q6259373'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'government building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building built for and by the government, such as a town hall'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Government buildings'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q24398318'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q5655238'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'religious building'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'building intended for religious worship or other activities related to a religion; ceremonial structures that are related to or concerned with religion'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:Religious buildings and structures'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q25297630'}, 'subClass': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q7344076'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'international bridge'}, 'itemDescription': {'xml:lang': 'en', 'type': 'literal', 'value': 'bridge built across a geopolitical boundary'}, 'subClassLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Category:International bridges'}}]}}\n"
]
}
],
"source": [
"from SPARQLWrapper import SPARQLWrapper, JSON\n",
"from rdflib import Graph\n",
"\n",
"sparql = SPARQLWrapper(\"https://query.wikidata.org/sparql\")\n",
"\n",
"sparql.setQuery(\"\"\"\n",
" PREFIX schema: wd:
and wdt:
) so we did not need to declare them in the query header. Second, there are some unique query capabilities of the Wikidata site noted by the SERVICE
designation.\n",
"\n",
"When first querying a new site it is perhaps best to stick to vanilla forms of SPARQL, but as one learns more it is possible to tailor queries more specifically. We also see that our setup will allow us to take advantage of what each endpoint gives us.\n",
"\n",
"So, let's take another example, this one using the DBpedia endpoint, to show how formats may also differ from endpoint to endpoint:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from SPARQLWrapper import SPARQLWrapper, RDFXML\n",
"from rdflib import Graph\n",
"\n",
"sparql = SPARQLWrapper(\"http://dbpedia.org/sparql\")\n",
"\n",
"sparql.setQuery(\"\"\"\n",
" PREFIX dbo: JSON
and RDFXML
in these examples) for our results sets."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Additional Documentation\n",
"\n",
"The idea of a SPARQL tutorial is outside of the defined scope of this CWPK series. But, the power of SPARQL is substantial and it is well worth the time to learn more about this flexible language, that reminds one of SQL in many ways, but has its own charms and powers. Here are some great starting links about SPARQL:\n",
"\n",
"- Lee Feigenbaum's [SPARQL by Example: The Cheat Sheet](https://www.slideshare.net/LeeFeigenbaum/sparql-cheat-sheet)\n",
"- [SPARQL in 11 minutes](https://www.youtube.com/watch?v=FvGndkpa4K0) video by Bob DuCharme\n",
"- [Learning SPARQL](http://www.learningsparql.com/) by Bob DuCharme\n",
"- [sparqlwarpper documentation](https://github.com/RDFLib/sparqlwrapper)\n",
"- [Wikidata SPARQL query examples](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples).\n",
"\n",
"\n",
" *.ipynb
file. It may take a bit of time for the interactive option to load.