{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "CWPK \\#50: Querying External Sources\n", "=======================================\n", "\n", "The Nearly Infinite Usefulness of SPARQL\n", "--------------------------\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "We are now two-thirds of the way through our **CWPK** series. One reason we have emphasized '[roundtripping](https://en.wikipedia.org/wiki/Round-trip_format_conversion)' in [*Cooking with Python and KBpedia*](https://www.mkbergman.com/cooking-with-python-and-kbpedia/) is to accommodate the incorporation of information from external sources into [KBpedia](https://kbpedia.org/). From hierarchical relationships to annotations like definitions or labels, external sources can be essential. Of course, one can find flat files or spreadsheets or [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) files directly, but often times we need *specific* information that can only come from querying the external source directly. Two of the ones we heavily rely on in particular -- [Wikidata](https://en.wikipedia.org/wiki/Wikidata) and [DBpedia](https://en.wikipedia.org/wiki/DBpedia) -- provide this access through [SPARQL](https://en.wikipedia.org/wiki/SPARQL) queries. We first introduced SPARQL in [**CWPK #25**](https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/). \n", "\n", "External SPARQL queries are the basis of getting instance data, values for instance attributes, missing fields like altLabels and skos.definition, existing crosswalks or mappings, longer descriptions, subsumption relations, related links, and interesting joins and intersections across external knowledge base content. Often, one is able to specify the format (serialization) of the desired results.\n", "\n", "The outputs from these external queries can be manipulated as strings, and then written to flat files useful for ingest into the various build routines. Of course, it is important that the format and CSV-nature of the results be maintained in a form that the build routines expect. One may alter the build formats or the extract formats, but to work they need to match on both ends.\n", "\n", "So, what we provide in today's installment are some guidelines and recipes for using SPARQL to obtain information you need and to write them to flat files. Because of their importance, we emphasize Wikidata and DBpedia (also a stand-in for Wikipedia) in our examples. Once populated, you may need to do some intermediate [wrangling](https://en.wikipedia.org/wiki/Data_wrangling) of these files to get them into shape for direct import. We covered that topic in brief in [**CWPK #36**](https://www.mkbergman.com/2374/cwpk-36-bulk-modification-techniques/), but really do not address file wrangling further here. There are way too many varieties to cover the topic in a meaningful way, though we certainly have examples in today's installment and across the entire **CWPK** series that should provide a useful foundation to your own efforts." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Choosing Access Method\n", "There are not that many public SPARQL endpoints available, and some are not always up and available. But the endpoints that do exist, with their identification in the **Query Sources** section at the conclusion of today's installment, are often comprehensive and with high value. The two we will be emphasizing today, Wikidata and DBpedia (and, by extension, the [linked open data](https://en.wikipedia.org/wiki/Linked_data#Linked_open_data) (LOD) cloud beyond that), are among the most valuable. (Of course, many endpoints, like ones specific to a particular organization, are private, and can be parts of valuable, distributed information ecosystems.) Another notable endpoint worthy of your attention is the [LOD endpoint maintained](http://lod.openlinksw.com/sparql) by OpenLink Software.\n", "\n", "It is possible to query many of these sources directly online with an HTML interface, often also providing a choice of the output format desired. In some of the examples below, I provide a **Try it!** link that takes you directly to the source site and uses their native SPARQL interface. (Also, inspect the URI links for these **Try it!** options, since it shows how SPARQL gets communicated over the Web.) You may often find this is the fastest and cleanest way to get useful results, and sometimes better formatted than what our home-brewed options below produce. Your mileage may vary. In any case, it is useful to learn how to conduct direct SPARQL capabilities from within *cowpoke*. For that reason, I emphasize our home-brewed examples below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting Up This Installment\n", "Like we have been emphasing of late, we begin today's installment with our standard start-up instructions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from cowpoke.__main__ import *\n", "from cowpoke.config import *\n", "from owlready2 import *" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from SPARQLWrapper import SPARQLWrapper, JSON\n", "from rdflib import Graph\n", "\n", "#sparql = SPARQLWrapper('http://dbpedia.org/sparql')\n", "sparql = SPARQLWrapper('https://query.wikidata.org/sparql')\n", "graph = world.as_rdflib_graph()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, we actually have a very capable query method to our own internal stores:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "form_1 = list(graph.query_owlready(\"\"\"\n", " PREFIX rc: \n", " PREFIX skos: \n", " SELECT DISTINCT ?x ?label\n", " WHERE\n", " {\n", " ?x rdfs:subClassOf rc:Eutheria.\n", " ?x skos:prefLabel ?label. \n", " }\n", "\"\"\"))\n", "\n", "print(form_1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Wikidata Queries\n", "\n", "For the following Wikidata queries, Run these assignments first:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from SPARQLWrapper import SPARQLWrapper, JSON\n", "from rdflib import Graph\n", "\n", "sparql = SPARQLWrapper('https://query.wikidata.org/sparql', agent='cowpoke 0.1 (github.com/Cognonto/cowpoke)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to assign an 'agent=' because of limits Wikidata occasionally puts on queries. If you do many requests, you may want to consider adding your own agent defintion.\n", "\n", "One of the techniques I use most heavily is the VALUES statement. This construct allows a listing of IDs to be passed to the query source. Depending on various endpoint limits, you may be able to list 1000 or more IDs in such a listing; experience with a given endpoint will dictate. If you use the VALUES construct, just make sure you are using the proper format and prefix (wd: in this instance for a Q item within Wikidata) in front of each value.\n", "\n", "##### Parent Class from Q IDs\n", "The first query is to obtain the parent class from submitted listing of Q items. You may also **[Try it!](https://query.wikidata.org/#PREFIX%20schema%3A%20%3Chttp%3A%2F%2Fschema.org%2F%3E%0ASELECT%20%3Fitem%20%3FitemLabel%20%3Fwikilink%20%3FitemDescription%20%3FsubClass%20%3FsubClassLabel%20WHERE%20%7B%0A%20%20VALUES%20%3Fitem%20%7B%20wd%3AQ25297630%0A%20%20wd%3AQ537127%0A%20%20wd%3AQ16831714%0A%20%20wd%3AQ24398318%0A%20%20wd%3AQ11755880%0A%20%20wd%3AQ681337%0A%7D%0A%20%3Fitem%20wdt%3AP910%20%3FsubClass.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D)** directly from [Wikidata](https://query.wikidata.org/):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "PREFIX schema: \n", "SELECT ?item ?itemLabel ?wikilink ?itemDescription ?subClass ?subClassLabel WHERE {\n", " VALUES ?item { wd:Q25297630\n", " wd:Q537127\n", " wd:Q16831714\n", " wd:Q24398318\n", " wd:Q11755880\n", " wd:Q681337\n", "}\n", " ?item wdt:P910 ?subClass.\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", "}\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "#for result in results[\"results\"][\"bindings\"]:\n", "# print(result[\"item\"][\"value\"])\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that once we set our SPARQL endpoint and user agent, we are able to cut-and-paste different SPARQL queries between the opening and ending triple quotes (\"\"\"). The bracketing statements around that can be used repeatedly for different queries.\n", "\n", "Go ahead and toggle between the print statements above to see how we can start varying outputs. Chances are you will need to do some string manipulation before your flat files are ready for ingest, but we can vary these specifications to get the initial output closer to our requirements.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### subClass and Instance listings for Q ID\n", "**[Try it!](https://query.wikidata.org/#%23subClass%20and%20Instance%20of%20Q%20ID%0A%0ASELECT%20%3Fsubclass%20%3FsubclassLabel%20%3Finstance%20%3FinstanceLabel%0AWHERE%0A%7B%0A%20%20%3Fsubclass%20wdt%3AP279%20wd%3AQ183366.%0A%20%20%3Finstance%20wdt%3AP31%20wd%3AQ183366.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0AORDER%20BY%20xsd%3Ainteger%28SUBSTR%28STR%28%3Fsubclass%29%2CSTRLEN%28%22http%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ%22%29%2B1%29%29)** as well." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "SELECT ?subclass ?subclassLabel ?instance ?instanceLabel\n", "WHERE\n", "{\n", " ?subclass wdt:P279 wd:Q183366.\n", " ?instance wdt:P31 wd:Q183366.\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", "}\n", "ORDER BY xsd:integer(SUBSTR(STR(?subclass),STRLEN(\"http://www.wikidata.org/entity/Q\")+1))\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "#for result in results[\"results\"][\"bindings\"]:\n", "# print(result[\"item\"][\"value\"])\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Useful Q Item Attributes\n", "\n", "**[Try it!](https://query.wikidata.org/#PREFIX%20schema%3A%20%3Chttp%3A%2F%2Fschema.org%2F%3E%0A%0ASELECT%20%3Fitem%20%3FitemLabel%20%3Fclass%20%3FclassLabel%20%3Fdescription%20%3Farticle%20%3FitemAltLabel%20WHERE%20%7B%0A%20%20VALUES%20%3Fitem%20%7B%20wd%3AQ1%20wd%3AQ2%20wd%3AQ3%20wd%3AQ4%20wd%3AQ5%20%7D%0A%20%20%3Fitem%20wdt%3AP31%20%3Fclass%3B%0A%20%20%20%20%20%20%20%20wdt%3AP5008%20%3Fproject.%0A%23%20%20%3Farticle%20rdfs%3Acomment%20%3Fdescription.%0A%20%20%0A%20%20%20OPTIONAL%20%7B%0A%20%20%20%20%3Farticle%20schema%3Aabout%20%3Fitem.%0A%20%20%20%20%3Farticle%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20%20%7D%0A%20%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "PREFIX schema: \n", "\n", "SELECT ?item ?itemLabel ?class ?classLabel ?description ?article ?itemAltLabel WHERE {\n", " VALUES ?item { wd:Q1 wd:Q2 wd:Q3 wd:Q4 wd:Q5 }\n", " ?item wdt:P31 ?class;\n", " wdt:P5008 ?project.\n", "# ?article rdfs:comment ?description.\n", " \n", " OPTIONAL {\n", " ?article schema:about ?item.\n", " ?article schema:isPartOf .\n", " }\n", " \n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\n", "}\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Get English Wikipedia Article Names from Q ID\n", "**[Try it!](https://query.wikidata.org/#%23Names%20of%20Wikipedia%20articles%20in%20multiple%20languages%0ASELECT%20DISTINCT%20%3Flang%20%3Fname%20WHERE%20%7B%0A%20%20VALUES%20%3Fitem%20%7B%20wd%3AQ1%0A%20wd%3AQ2%0A%20wd%3AQ3%0A%20wd%3AQ4%0A%20wd%3AQ5%20%0A%7D%0A%20%20%3Farticle%20schema%3Aabout%20%3Fitem%3B%20schema%3AinLanguage%20%3Flang%3B%20schema%3Aname%20%3Fname%20.%0A%20%20FILTER%28%3Flang%20in%20%28%27en%27%29%29%20.%0A%20%20FILTER%20%28%21CONTAINS%28%3Fname%2C%20%27%3A%27%29%29%20.%0A%7D)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "SELECT DISTINCT ?lang ?item ?name WHERE {\n", " VALUES ?item { wd:Q1\n", "wd:Q2\n", "wd:Q3\n", "wd:Q4\n", "wd:Q5 \n", "}\n", " ?article schema:about ?item; schema:inLanguage ?lang; schema:name ?name .\n", " FILTER(?lang in ('en')) .\n", " FILTER (!CONTAINS(?name, ':')) .\n", "}\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Listing of Q IDs from Property\n", "**[Try it!](https://query.wikidata.org/#SELECT%0A%20%20%3Fitem%20%3FitemLabel%0A%20%20%3Fvalue%20%3FvalueLabel%0A%23%20valueLabel%20is%20only%20useful%20for%20properties%20with%20item-datatype%0AWHERE%20%0A%7B%0A%20%20%3Fitem%20wdt%3AP2167%20%3Fvalue%0A%20%20%23%20change%20P1800%20to%20another%20property%20%20%20%20%20%20%20%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", " SELECT\n", " ?item ?itemLabel\n", " ?value ?valueLabel\n", " # valueLabel is only useful for properties with item-datatype\n", " WHERE \n", " {\n", " ?item wdt:P2167 ?value\n", " # change P2167 to desired property \n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\n", " }\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### subClass and Instance Listings for Q ID\n", "**[Try it!](https://query.wikidata.org/#%23subClass%20and%20Instance%20of%20Q%20ID%0A%0ASELECT%20%3Fsubclass%20%3FsubclassLabel%20%3Finstance%20%3FinstanceLabel%0AWHERE%0A%7B%0A%20%20%3Fsubclass%20wdt%3AP279%20wd%3AQ183366.%0A%20%20%3Finstance%20wdt%3AP31%20wd%3AQ183366.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0AORDER%20BY%20xsd%3Ainteger%28SUBSTR%28STR%28%3Fsubclass%29%2CSTRLEN%28%22http%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ%22%29%2B1%29%29)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "SELECT ?subclass ?subclassLabel ?instance ?instanceLabel\n", "WHERE\n", "{\n", " ?subclass wdt:P279 wd:Q183366.\n", " ?instance wdt:P31 wd:Q183366.\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", "}\n", "ORDER BY xsd:integer(SUBSTR(STR(?subclass),STRLEN(\"http://www.wikidata.org/entity/Q\")+1))\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Missing Q Data from Wikidata\n", "**[Try it!](https://query.wikidata.org/#PREFIX%20schema%3A%20%3Chttp%3A%2F%2Fschema.org%2F%3E%0APREFIX%20w%3A%20%3Chttps%3A%2F%2Fen.wikipedia.org%2Fwiki%2F%3E%0ASELECT%20%3Fwikipedia%20%3Fitem%20WHERE%20%7B%0AVALUES%20%3Fwikipedia%20%7B%20w%3ATom_Hanks%20%7D%0A%0A%20%20%3Fwikipedia%20schema%3Aabout%20%3Fitem%20.%0A%0A%20%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "PREFIX schema: \n", "PREFIX w: \n", "SELECT ?wikipedia ?item WHERE {\n", "VALUES ?wikipedia { w:Tom_Hanks }\n", "?wikipedia schema:about ?item .\n", "SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\n", "}\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Q ID from Wikipedia ID\n", "**[Try it!](https://query.wikidata.org/#PREFIX%20schema%3A%20%3Chttp%3A%2F%2Fschema.org%2F%3E%0APREFIX%20w%3A%20%3Chttps%3A%2F%2Fen.wikipedia.org%2Fwiki%2F%3E%0ASELECT%20%3Fwikipedia%20%3Fitem%20WHERE%20%7B%0AVALUES%20%3Fwikipedia%20%7B%20w%3AEuthanasia%0Aw%3ACommercial_art_gallery%0A%20%7D%0A%0A%20%20%3Fwikipedia%20schema%3Aabout%20%3Fitem%20.%0A%0A%20%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "PREFIX schema: \n", "PREFIX w: \n", "SELECT ?wikipedia ?item WHERE {\n", "VALUES ?wikipedia { w:Euthanasia\n", "w:Commercial_art_gallery\n", "}\n", " ?wikipedia schema:about ?item .\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\n", "}\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### schema.org ← → Wikidata Mapping\n", "**[Try it!](https://query.wikidata.org/#%23%20ontology%20mappings%20from%20Wikidata%0ASELECT%20%3Fwd%20%3FwdLabel%20%3Ftype%20%3Furi%20%3Fprefix%20%3FlocalName%20WHERE%20%7B%0A%20%20%7B%0A%20%20%20%20%7B%20%3Fwd%20wdt%3AP1628%20%3Furi%20.%20BIND%28%22equivalent%20property%22%20AS%20%3Ftype%29%20%7D%20UNION%0A%20%20%20%20%7B%20%3Fwd%20wdt%3AP1709%20%3Furi%20.%20BIND%28%22equivalent%20class%22%20AS%20%3Ftype%29%20%7D%20UNION%0A%20%20%20%20%7B%20%3Fwd%20wdt%3AP2888%20%3Furi%20.%20BIND%28%22exact%20match%22%20AS%20%3Ftype%29%20%7D%20UNION%0A%20%20%20%20%7B%20%3Fwd%20wdt%3AP2235%20%3Furi%20.%20BIND%28%22superproperty%22%20AS%20%3Ftype%29%20%7D%20UNION%0A%20%20%20%20%7B%20%3Fwd%20wdt%3AP2236%20%3Furi%20.%20BIND%28%22subproperty%22%20AS%20%3Ftype%29%20%7D%20%0A%20%20%7D%0A%20%20BIND%28%20REPLACE%28STR%28%3Furi%29%2C%27%5B%5E%23%2F%5D%2B%24%27%2C%27%27%29%20AS%20%3Fprefix%29%0A%20%20BIND%28%20REPLACE%28STR%28%3Furi%29%2C%27%5E.%2a%5B%23%2F%5D%27%2C%27%27%29%20AS%20%3FlocalName%29%0A%20%0A%20%20%23%20filter%20by%20ontology%20%28otherwise%20timeout%20expected%29%0A%20%20FILTER%28%3Fprefix%20%3D%20%22http%3A%2F%2Fschema.org%2F%22%29%0A%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%0A%7D%20ORDER%20BY%20%3Fprefix%20%3FlocalName)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "SELECT ?wd ?wdLabel ?type ?uri ?prefix ?localName WHERE {\n", " {\n", " { ?wd wdt:P1628 ?uri . BIND(\"equivalent property\" AS ?type) } UNION\n", " { ?wd wdt:P1709 ?uri . BIND(\"equivalent class\" AS ?type) } UNION\n", " { ?wd wdt:P2888 ?uri . BIND(\"exact match\" AS ?type) } UNION\n", " { ?wd wdt:P2235 ?uri . BIND(\"superproperty\" AS ?type) } UNION\n", " { ?wd wdt:P2236 ?uri . BIND(\"subproperty\" AS ?type) } \n", " }\n", " BIND( REPLACE(STR(?uri),'[^#/]+$',) AS ?prefix)\n", " BIND( REPLACE(STR(?uri),'^.*[#/]',) AS ?localName)\n", " # filter by ontology (otherwise timeout expected)\n", " FILTER(?prefix = \"http://schema.org/\")\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\" }\n", "} ORDER BY ?prefix ?localName\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Main Topic of Q ID\n", "**[Try it!](https://query.wikidata.org/#PREFIX%20schema%3A%20%3Chttp%3A%2F%2Fschema.org%2F%3E%0A%0ASELECT%20%3Fitem%20%3FitemLabel%20%3FmainTopic%20%3FmainTopicLabel%20WHERE%20%7B%0A%20%20%20VALUES%20%3Fitem%20%7B%20wd%3AQ13307732%0Awd%3AQ8953981%0Awd%3AQ1458376%0Awd%3AQ8953071%0A%7D%0A%20%20%3FmainTopic%20wdt%3AP910%20%3Fitem.%0A%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0A)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "PREFIX schema: \n", " SELECT ?item ?itemLabel ?mainTopic ?mainTopicLabel WHERE {\n", " VALUES ?item { wd:Q13307732\n", " wd:Q8953981\n", " wd:Q1458376\n", " wd:Q8953071\n", " }\n", " ?mainTopic wdt:P910 ?item.\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", " }\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DBpedia Queries\n", "DBpedia is a bit more tricky to deal with.\n", "\n", "Again, we set up our major call, to be followed by a series of SPARQL queries to DBpedia:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from SPARQLWrapper import SPARQLWrapper, RDFXML\n", "from rdflib import Graph\n", "\n", "sparql = SPARQLWrapper(\"http://dbpedia.org/sparql\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Languages in DBpedia with schema.org Language Code\n", "In this query, we are looking for items that have been already mapped or characterized in a second ontology (schema.org).\n", "\n", "**[Try it!](http://dbpedia.org/snorql/?query=++++PREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0A++++PREFIX+schema%3A+%3Chttp%3A%2F%2Fschema.org%2F%3E%0D%0A%0D%0A++++CONSTRUCT+%7B%0D%0A++++++%3Flang+a+schema%3ALanguage+%3B%0D%0A++++++schema%3AalternateName+%3Fiso6391Code+.%0D%0A++++%7D%0D%0A++++WHERE+%7B%0D%0A++++++%3Flang+a+dbo%3ALanguage+%3B%0D%0A++++++dbo%3Aiso6391Code+%3Fiso6391Code+.%0D%0A++++++FILTER+%28STRLEN%28%3Fiso6391Code%29%3D2%29+%23+to+filter+out+non-valid+values%0D%0A++++%7D)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", " PREFIX dbo: \n", " PREFIX schema: \n", "\n", " CONSTRUCT {\n", " ?lang a schema:Language ;\n", " schema:alternateName ?iso6391Code .\n", " }\n", " WHERE {\n", " ?lang a dbo:Language ;\n", " dbo:iso6391Code ?iso6391Code .\n", " FILTER (STRLEN(?iso6391Code)=2) # to filter out non-valid values\n", " }\n", "\"\"\")\n", "\n", "sparql.setReturnFormat(RDFXML)\n", "results = sparql.query().convert()\n", "print(results.serialize(format='xml'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Missing Definitions\n", "**[Try it!](http://dbpedia.org/snorql/?query=SELECT+%3Fitem%2C+%3Fdescription+WHERE+%7B%0D%0AVALUES+%3Fitem+%7B+%3AChild_prostitution%0D%0A%3AIce_Hockey_World_Championships%0D%0A%3AMajor_League_Soccer%0D%0A%3ATamil_language%0D%0A%3AAcne+%7D%0D%0A%3Fitem+rdfs%3Acomment+%3Fdescription+.%0D%0A%0D%0AFILTER+%28+LANG%28%3Fdescription%29+%3D+%22en%22+%29+%0D%0A%0D%0A%7D)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "PREFIX rdfs: \n", "PREFIX : \n", "SELECT ?item, ?description WHERE {\n", " VALUES ?item { :Child_prostitution\n", " :Ice_Hockey_World_Championships\n", " :Major_League_Soccer\n", " :Tamil_language\n", " :Acne }\n", " ?item rdfs:comment ?description .\n", "\n", " FILTER ( LANG(?description) = \"en\" ) \n", "} \n", "\"\"\")\n", "\n", "sparql.setReturnFormat(RDFXML)\n", "results = sparql.query().convert()\n", "print(results.serialize(format='xml'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Get URIs from Aliases\n", "**[Try it!](http://dbpedia.org/snorql/?query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0ASELECT+%3Fx+%3FredirectsTo+WHERE+%7B%0D%0A++VALUES+%3Fwikipedia+%7B+%22Abies%22%40en+%0D%0A++%22Abolitionists%22%40en%7D%0D%0A++%3Fx+rdfs%3Alabel+%3Fwikipedia+.%0D%0A++%3Fx+dbo%3AwikiPageRedirects+%3FredirectsTo%0D%0A%7D)**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(\"\"\"\n", "PREFIX rdfs: \n", "PREFIX dbo: \n", "SELECT ?x ?redirectsTo WHERE {\n", " VALUES ?wikipedia { \"Abies\"@en \n", " \"Abolitionists\"@en}\n", " ?x rdfs:label ?wikipedia .\n", " ?x dbo:wikiPageRedirects ?redirectsTo\n", "}\n", "\"\"\")\n", "\n", "sparql.setReturnFormat(XML)\n", "results = sparql.query().convert()\n", "print(results.serialize(format='xml'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, SPARQL is a language unto itself, and it takes time to become fluent. The examples above are closer to baby-talk than Shakespearean speech. Nonetheless, one begins to gain a feel for the power of the language.\n", "\n", "As we move forward, we will try to leverage SPARQL as the query language to our knowledge graph, since it provides the most powerful and flexible language for doing so. There will obviously be times when direct Python calls are more direct and shorter to implement. But the most flexible filters and intersections will come from our use of SPARQL." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Query Resources\n", "A partial, but useful, list of public SPARQL endpoints is provided by:\n", "- https://www.w3.org/wiki/SparqlEndpoints\n", "\n", "An assessment of their current availability is provided by:\n", "- https://sparqles.ai.wu.ac.at/availability\n", "\n", "Here are the top 100 named graphs available with their triple counts:\n", "- https://docs.google.com/spreadsheets/d/1XUtRtWwEXN805pnlrsAg9qW7gxqneeXqrsPOwhVusTw/edit#gid=0\n", "\n", "Wikidata provides its own listing of 100 SPARQL endpoints:\n", "- https://www.wikidata.org/wiki/Wikidata:Lists/SPARQL_endpoints\n", "\n", "There is an excellent (and growing) compilation of useful SPARQL queries to Wikidata available from:\n", "- https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples\n", "\n", "Two smaller, but similarly useful resource for DBpedia queries, are available from:\n", "- https://aifb-ls3-kos.aifb.kit.edu/projects/spartiqulator/examples.htm\n", "- https://www.cambridgesemantics.com/blog/semantic-university/learn-sparql/sparql-nuts-bolts/\n", "\n", "The latter also provides some SPARQL construction tips.\n", "\n", "Example [OpenStreetMap](https://wiki.openstreetmap.org/wiki/Main_Page) SPARQL examples are available from:\n", "- https://wiki.openstreetmap.org/wiki/SPARQL_examples.\n", "\n", "\n", "
\n", " NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site. The cowpoke Python code listing covering the series is also available from GitHub.\n", "
\n", "\n", "
\n", "\n", "NOTE: This CWPK \n", "installment is available both as an online interactive\n", "file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
\n", "\n", "
\n", "
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment -- which is part of the fun of Python -- and to notify me should you make improvements. \n", "\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }