{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Jupyter SPARQL Fun \n", "*Bob DuCharme*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "First, load the libraries that will be used by code in later cells and define a function to output query results as a nice HTML table. If the contents of this next cell got much longer (while staying as re-usable) I'd move it to a separate library and just import that. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import rdflib\n", "from IPython.core.display import display, HTML\n", "import RDFClosure # install from https://github.com/RDFLib/OWL-RL\n", "\n", "def queryResultToHTMLTable(queryResult):\n", " HTMLResult = ''\n", " # print variable names\n", " for varName in queryResult.vars:\n", " HTMLResult = HTMLResult + ''\n", " HTMLResult = HTMLResult + ''\n", " # print values from each row\n", " for row in queryResult:\n", " HTMLResult = HTMLResult + '' \n", " for column in row:\n", " HTMLResult = HTMLResult + ''\n", " HTMLResult = HTMLResult + ''\n", " HTMLResult = HTMLResult + '
' + varName + '
' + column + '
'\n", " display(HTML(HTMLResult))\n", "\n", "# In a fancier script, you may want to create more than one Graph, but \n", "# I'll create just one and keep it in the setup section for simplicity. \n", "diskFileGraph = rdflib.Graph() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most of the remaining Python you see here is [rdflib](https://github.com/RDFLib/rdflib) code. rdflib and I go [way back](http://www.xml.com/pub/a/2003/02/12/rdflib.html). Here, I load a sample data file from [Learning SPARQL](http://www.learningsparql.com).\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "triples = diskFileGraph.parse(\"lq012.ttl\",format=\"turtle\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running a query\n", "This is the heart of using SPARQL with Jupyter: put the query between the pair of triple quotes in the following. (Triple quotes let you create multi-line strings in Python.) That function call performs the query, and the line after that calls the `queryResultToHTMLTable()` function that I defined up above to output the query result as an HTML table.\n", "\n", "I could have combined everything in this cell into one nested function call instead of storing the query result in `queryResult` and then passing that to my function. I could have even put the formatting and the call to `triples.query()` into one function, but I chose not to." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
spo
http://learningsparql.com/ns/data#i8301http://learningsparql.com/ns/addressbook#emailcraigellis@yahoo.com
http://learningsparql.com/ns/data#i8301http://learningsparql.com/ns/addressbook#lastNameEllis
http://learningsparql.com/ns/data#i0432http://learningsparql.com/ns/addressbook#homeTel(229) 276-5135
http://learningsparql.com/ns/data#i0432http://learningsparql.com/ns/addressbook#emailrichard49@hotmail.com
http://learningsparql.com/ns/data#i9771http://learningsparql.com/ns/addressbook#homeTel(245) 646-5488
http://learningsparql.com/ns/data#i8301http://learningsparql.com/ns/addressbook#emailc.ellis@usairwaysgroup.com
http://learningsparql.com/ns/data#i0432http://learningsparql.com/ns/addressbook#lastNameMutt
http://learningsparql.com/ns/data#i0432http://learningsparql.com/ns/addressbook#firstNameRichard
http://learningsparql.com/ns/data#i9771http://learningsparql.com/ns/addressbook#firstNameCindy
http://learningsparql.com/ns/data#i9771http://learningsparql.com/ns/addressbook#lastNameMarshall
http://learningsparql.com/ns/data#i9771http://learningsparql.com/ns/addressbook#emailcindym@gmail.com
http://learningsparql.com/ns/data#i8301http://learningsparql.com/ns/addressbook#firstNameCraig
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "queryResult = triples.query(\"\"\"\n", "SELECT *\n", " WHERE\n", " {?s ?p ?o}\n", "\"\"\")\n", "\n", "queryResultToHTMLTable(queryResult)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding data to the in-memory graph\n", "The disk file graph is quite mutable. Here, we add more data to it, leaving the original data in there. See my blog entry [Trying Out Blazegraph](http://www.snee.com/bobdc.blog/2016/05/trying-out-blazegraph.html) to see all of the triples in the furniture data and how I used it to demo some inferencing." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [], "source": [ "triples = diskFileGraph.parse(\"furniture.ttl\",format=\"turtle\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next query retrieves a list of the predicates used in the data, and we see the original address book predicates from the lq012.ttl file plus some new ones from the furniture data. Note how the cell is identical to the one above with the SPARQL query, except for the query itself. If you're running this with your own Jupyter server, you can modify the query or copy the cell's contents into new cells and execute those." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
p
http://learningsparql.com/ns/addressbook#lastName
http://learningsparql.com/ns/addressbook#homeTel
http://learningsparql.com/ns/demo#locatedIn
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://learningsparql.com/ns/addressbook#email
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://learningsparql.com/ns/addressbook#firstName
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "queryResult = triples.query(\"\"\"\n", "SELECT DISTINCT ?p\n", " WHERE\n", " {?s ?p ?o}\n", "\"\"\")\n", "\n", "queryResultToHTMLTable(queryResult)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SPARQL UPDATE queries\n", "You can run SPARQL UPDATE queries using the diskFileGraph class's `update` method." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "triples.update(\"DELETE {?s ?p ?o} where { ?s ?p ?o }\")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
p
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Show all predicates again\n", "queryResult = triples.query(\"\"\"\n", "SELECT DISTINCT ?p\n", " WHERE\n", " {?s ?p ?o}\n", "\"\"\")\n", "\n", "queryResultToHTMLTable(queryResult)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This query result shows that there are no predicates, because there are no triples. The previous command deleted them all." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inferencing\n", "The RDFClosure library above lets us do inferencing with the in-memory graph. I described how I used it to do data integration on a Hadoop cluster in [Driving Hadoop data integration with standards-based models instead of code](http://www.snee.com/bobdc.blog/2015/02/driving-hadoop-data-integratio.html). Here, we'll start by reading the furniture data back into memory, and then we'll tell the library to do OWL RL inferencing. " ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "triples = diskFileGraph.parse(\"furniture.ttl\",format=\"turtle\")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# do OWL RL inferencing\n", "RDFClosure.DeductiveClosure(RDFClosure.OWLRL_Semantics).expand(triples)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we'll query for furniture in building100. As described in [Trying Out Blazegraph](http://www.snee.com/bobdc.blog/2016/05/trying-out-blazegraph.html), the furniture data had no triples saying that there was furniture in that building, but it had triples saying that desks and chairs were furniture, that a certain desk and two chairs were ``locatedIn`` rooms that were ``locatedIn`` that building, and that the ``locatedIn`` property is transitive. (Semantics!) This was enough for the inferencing engine to infer what furniture was in building100:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
furniture
http://learningsparql.com/ns/data#desk22
http://learningsparql.com/ns/data#chair15
http://learningsparql.com/ns/data#chair23
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "queryResult = triples.query(\"\"\"\n", "PREFIX dm: \n", "PREFIX d: \n", "SELECT ?furniture\n", "WHERE \n", "{ \n", " ?furniture a dm:Furniture .\n", " ?furniture dm:locatedIn d:building100 . \n", "}\n", "\"\"\")\n", "queryResultToHTMLTable(queryResult)" ] } ], "metadata": { "kernelspec": { "display_name": "Python [Root]", "language": "python", "name": "Python [Root]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 0 }