{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "CWPK \\#59: Adding a SPARQL Endpoint - Part I\n", "=======================================\n", "\n", "What Should be Simple Proves Frustratingly Complex\n", "--------------------------\n", "\n", "
vi
editor to be difficult and hard to navigate, since I only use it on occasion. I now use nano
as my editor replacement, since it presents key commands at the bottom of the screen useful to my occasional use, and is also part of the standard distro.\n",
" conda
virtual environment, that we will name 'sparql':\n",
"\n",
"conda create -n sparql python=3
\n",
"\n",
"We get the echo to screen as the basic conda
environment is created. Remember, this environment is found in the /usr/bin/python-projects/miniconda3/envs/sparql
directory location. We then activate the environment:\n",
"\n",
"conda activate sparql
\n",
"\n",
"We install some basic packages and then create our new sparql
directory and the two standard stub files there:\n",
"\n",
"\n", "conda install flask\n", "conda install pip\n", "\n", "\n", "then the two files, beginning with
test_sparql.py
:\n",
"\n",
"\n", "from flask import Flask\n", "app = Flask(__name__)\n", "@app.route(\"/\")\n", "def hello():\n", " return \"Hello SPARQL!\"\n", "\n", "\n", "and then
wsgi.py
:\n",
"\n",
"\n", "import sys\n", "sys.path.insert(0, \"/var/www/html/sparql/\")\n", "from test_sparql import app as application\n", "\n", "\n", "We then proceed to set up the Apache2 configurations, placed directly below our prior similar specification in the
/etc/apache2/sites-enabled
directory in the 000-default.conf
file:\n",
"\n",
"\n", " WSGIDaemonProcess sparql python-path=/usr/bin/python-projects/miniconda3/envs/sparql/lib/python3.8/site-packages\n", " WSGIScriptAlias /sparql /var/www/html/sparql/wsgi.py\n", " <Directory /var/www/html/sparql>\n", " WSGIProcessGroup sparql\n", " WSGIApplicationGroup %{GLOBAL}\n", " Order deny,allow\n", " Allow from all\n", " </Directory>\n", "\n", "\n", "then you can check whether the configuration is OK and re-start the server. Then, when we enter:\n", "\n", "
http://54.227.249.140/sparql
\n",
"\n",
"We see that the right message appears and our configuration is OK."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2. Install all needed Python packages**\n",
"\n",
"If you recall from the last installment, we used the minimal miniconda3
package installer for our remote Linux (Ubuntu) instance. This minimal footprint largely only installs conda
and Python. That means we must install all of the needed additional packages for our current application.\n",
" \n",
"We noted the pip
installer before, but we are best off using one of the conda-related channels since they better check configuration dependencies. To expand our package availability from what is standard in the conda
channel, we may need to add some additional channels to our base package. One of the most useful of these is conda-forge
. To install it:\n",
" \n",
"conda config --add channels conda-forge
\n",
" \n",
"It is best to install packages in bulk, since dependencies are checked at install time. One does this by listing the packages in the same command line. When doing so, you may encounter messages that one or more of the packages was not found. In these cases, you should go to the search box at https://anaconda.com, search for the package, and then note the channel in which the package is found. If that channel is not already part of your configuration, add it.\n",
"\n",
"Many of the needed packages for our SPARQL implementation are found under the conda-forge
channel. Here is how a bulk install may look:\n",
" \n",
"conda install networkx owlready2 rdflib sparqlwrapper pandas --channel conda-forge
\n",
"\n",
"We also then need to install *cowpoke* using pip
by using this command while in the sparql
virtual environment:\n",
"\n",
"pip install cowpoke
\n",
"\n",
"Everytime we invoke the sparql
virtual environment these packages will be available, which you can inspect using:\n",
"\n",
"conda list
\n",
"\n",
"Also, if you want to share with others the package configuration of your conda
environments, you may create the standard configuration file using this command:\n",
"\n",
" conda env export > environment.yaml
\n",
"\n",
"The file will be written to the directory in which you invoke this command."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**3. Install KBpedia 'sandbox' KGs**\n",
"\n",
"Clearly, besides the Python code, we also need the various knowledge graphs used by KBpedia. These graphs are the same \\*.owl
(rdf/xml) files that we first discussed in [**CWPK #18**](https://www.mkbergman.com/2348/cwpk-18-basic-terminology-and-load-kbpedia/) . We will use the same 'sandbox' files from that installment.\n",
"\n",
"Our first need is to decide where we want to store our KBpedia knowledge graphs. For the same reasons noted above, we choose to create the directory structure of /var/data/kbpedia
. Once we create these directories, we need to set up the ownership and access properties for the files we will place there. So, we navigate to the parent directory data
of our target kbpedia
directory and issue two statements to set the ownership and access rights to this location:\n",
"\n",
"\n", "sudo chown -R user-owner:user-group kbpedia\n", "sudo chmod -R 775 kbpedia\n", "\n", "\n", "The
-R
switch means that our settings get applied recursively to all files and directories in the target directory. The permissions level (775) means that user owners or groups may write to these files (general users may not).\n",
"\n",
"These permission changes now allow us to transfer our local 'sandbox' files to this new directory. The two files that we need to transfer using our SSH or file transfer clients are:\n",
"\n",
"\n", "kbpedia_reference_concepts.owl\n", "kko.owl\n", "\n", "\n", "Recall these are the RDF/XML conversions of the original
*.n3
files. We now have the data available on the remote instance for our SPARQL purposes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**4. Verify access and use of KBpedia and owlready2**\n",
"\n",
"OK, so to see that some of this is working, I pick up on the file viewing code in [**CWPK #18**](https://www.mkbergman.com/2348/cwpk-18-basic-terminology-and-load-kbpedia/) to see if we can load and view this stuff. I enter this code into a temp.py
file and run python (python temp.py
) under the /var/www/html/sparql/
directory:\n",
"\n",
"\n", "main = '/var/data/kbpedia/kko.owl' \n", "\n", "with open(main) as fobj: \n", " for line in fobj:\n", " print (line)\n", "\n", "\n", "Good; we see the
kko.owl
file scroll by.\n",
"\n",
"So, the next test is to see if owlready2 is loaded properly and we can inspect the KBpedia knowledge graph.\n",
"\n",
"Picking up from some of the first tests in [**CWPK #20**](https://www.mkbergman.com/2350/cwpk-20-basic-knowledge-graph-management-i/), I create a script file locally and enter these instructions (note where the kko.owl
file is now located):\n",
"\n",
"\n", "main = '/var/data/kbpedia/kko.owl'\n", "skos_file = 'http://www.w3.org/2004/02/skos/core'\n", "\n", "from owlready2 import *\n", "kko = get_ontology(main).load()\n", "\n", "skos = get_ontology(skos_file).load()\n", "kko.imported_ontologies.append(skos) \n", "\n", "list(kko.classes())\n", "\n", "\n", "When in the
sparql
directory under /var/www/html/sparql
, I call up Python (remember to have the sparql
virtual environment active!), which gives me this command line feedback:\n",
"\n",
"\n", "(sparql) root@ip-xxx-xx-x-xx:/var/www/html/sparql# python\n", "Python 3.8.5 (default, Sep 4 2020, 07:30:14)\n", "[GCC 7.3.0] :: Anaconda, Inc. on linux\n", "Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n", ">>> \n", "\n", "\n", "and I paste the code block above at the cursor (
>>>
). I then hit Enter at the end of the code block, and we then see our kko
classes get listed out.\n",
"\n",
"Good, it appears we have the proper packages and directory locations. We can Ctrl-d
(since we are on Linux) to exit the Python interactive session."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**5. Create a 'remote_access.py' to verify a SPARQL query against the local version of the remote instance**\n",
"\n",
"So far, so good. We are now ready to test support for SPARQL. We again look to one of our prior installments, [**CWPK #25**](https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/), to test whether SPARQL is working for us with all of the constituent KBpedia knowledge graphs. As we did with the prior question, we formulate a code block and invoke it interactively on the remote server with our python
command. Here is the code (note that we have switched the definition of main
to the full KBpedia reference concepts graph):\n",
"\n",
"\n", "main = '/var/data/kbpedia/kbpedia_reference_concepts.owl'\n", "skos_file = 'http://www.w3.org/2004/02/skos/core' \n", "kko_file = '/var/data/kbpedia/kko.owl'\n", "\n", "from owlready2 import *\n", "world = World()\n", "kb = world.get_ontology(main).load()\n", "rc = kb.get_namespace('http://kbpedia.org/kko/rc/')\n", "\n", "skos = world.get_ontology(skos_file).load()\n", "kb.imported_ontologies.append(skos)\n", "\n", "kko = world.get_ontology(kko_file).load()\n", "kb.imported_ontologies.append(kko)\n", "\n", "import rdflib\n", "\n", "graph = world.as_rdflib_graph()\n", "\n", "form_1 = list(graph.query_owlready(\"\"\"\n", " PREFIX rc: <http://kbpedia.org/kko/rc/>\n", " PREFIX skos: <http://www.w3.org/2004/02/skos/core#>\n", " SELECT DISTINCT ?x ?label\n", " WHERE\n", " {\n", " ?x rdfs:subClassOf rc:Mammal.\n", " ?x skos:prefLabel ?label. \n", " }\n", "\"\"\"))\n", "\n", "print(form_1)\n", "\n", "\n", "Fantastic! This works, too, even to the level of giving us the owlready2 circular reference warnings we received when we first invoked [**CWPK #25**](https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/)!\n", "\n", "Now, let's also test if we can query using SPARQL to another remote endpoint from our remote instance using again more code from the [**CWPK #25**](https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/) installment and also after importing the
sparqlwrapper
package:\n",
"\n",
"\n", "main = '/var/data/kbpedia/kbpedia_reference_concepts.owl'\n", "skos_file = 'http://www.w3.org/2004/02/skos/core' \n", "kko_file = '/var/data/kbpedia/kko.owl'\n", "\n", "from owlready2 import *\n", "world = World()\n", "kb = world.get_ontology(main).load()\n", "rc = kb.get_namespace('http://kbpedia.org/kko/rc/')\n", "\n", "skos = world.get_ontology(skos_file).load()\n", "kb.imported_ontologies.append(skos)\n", "\n", "kko = world.get_ontology(kko_file).load()\n", "kb.imported_ontologies.append(kko)\n", "\n", "from SPARQLWrapper import SPARQLWrapper, JSON\n", "from rdflib import Graph\n", "\n", "sparql = SPARQLWrapper(\"https://query.wikidata.org/sparql\")\n", "\n", "sparql.setQuery(\"\"\"\n", " PREFIX schema: <http://schema.org/>\n", " SELECT ?item ?itemLabel ?wikilink ?itemDescription ?subClass ?subClassLabel WHERE {\n", " VALUES ?item { wd:Q25297630\n", " wd:Q537127\n", " wd:Q16831714\n", " wd:Q24398318\n", " wd:Q11755880\n", " wd:Q681337\n", "}\n", " ?item wdt:P910 ?subClass.\n", "\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", "}\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "print(results)\n", "\n", "\n", "Most excellent! We have also confirmed we can use our remote server for remote endpoint queries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**6. Create a Flask-based SPARQL input form for the local version**\n", "This progress is rewarding, but the task now becomes substantially harder. We need to set up interfaces that will allow these queries to be run from external sources to our remote instance. There are two ways we can tackle this requirement.\n", "\n", "The first way, the subject of this particular question, is to set up a Web page form that any outside user may access from the Web to issue a SPARQL query via an editable input form. The second way, the subject of question **#9**, is to enable a remote query issued via
sparqlwrapper
and Python that goes directly to the endpoint and bypasses the need for a form.\n",
"\n",
"Since we already have installed [Flask](https://en.wikipedia.org/wiki/Flask_(web_framework)) and validated it in the last installment, our task under this present question is to set up the Web form (in the form of a template as used by Flask) in which we enter our SPARQL queries. Flask maps Web (HTTP) requests to Python functions, which we showed in the last installment where the /sparql
URI fragment maps to the /var/www/html/sparql
path and its test_sparql.py
function. Flask runs this code and then displays results to the browser using HTTP protocols, with the GET
method being the most common, but all HTTP methods may be supported. The Python code invoked may call up templates (based on Jinja) that can then invoke HTML pages forms and various response functions. \n",
"\n",
"I noted earlier two SPARQL-related efforts, [pyLDAPI](https://github.com/RDFLib/pyLDAPI) and [adhs](https://github.com/nareike/adhs/blob/master/templates/sparql.html). While neither appears to have a working example, both contain aspects that can inform this task and subsequent ones. A (non-working) implementation of pyLDAPI called [GNAF](https://github.com/CSIRO-enviro-informatics/gnaf-dataset/blob/master/view/templates/page_sparql.html), in particular, has a SPARQL Web page that looked to be useful as a starting template.\n",
"\n",
"If you recall, Flask uses HTML-based templates as its 'view'-related approach to the [model-view-controller](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller) (MVC) design. Besides embedding standard HTML, these templates may also contain set Flask statements that relate the Web page to various model or controller commands. These templates should be placed into a set directory under the Flask directory structure. The templates can be nested within one another, useful, for example, when one wants a header and footer repeated across multiple pages, but for our instance I chose a single-page template.\n",
"\n",
"In essence, I took the two main text areas from the starting GNAF template and embedded them in duplicate presentations of the header and footer from the [KBpedia](https://kbpedia.org/) current Web page design. (You should know that the server hosting the subject SPARQL page is different from the physical server hosting the standard KBpedia Web site.) I took this approach because I was considering making a SPARQL query form a standard part of the main KBpedia site, which I implement at the conclusion of the next installment. Here is how the resulting Web page form looks:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
" *.ipynb
file. It may take a bit of time for the interactive option to load.