{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "CWPK \\#59: Adding a SPARQL Endpoint - Part I\n", "=======================================\n", "\n", "What Should be Simple Proves Frustratingly Complex\n", "--------------------------\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "Sometimes the installments in this [*Cooking with Python and KBpedia*](https://www.mkbergman.com/cooking-with-python-and-kbpedia/) series come together fairly quickly, sometimes not. This installment has proven to be particularly difficult. Research has spread over days, and progress has been frustratingly slow. As a result, I spread the content of developing a remote SPARQL service across two parts.\n", "\n", "At the outset I thought it would progress rapidly: After all, is not [SPARQL](https://en.wikipedia.org/wiki/SPARQL) a proven query language with central importance to [knowledge graphs](https://en.wikipedia.org/wiki/Knowledge_graph)? But, possibly because our focus in the series is [Python](https://en.wikipedia.org/wiki/Python_(programming_language)), or perhaps for other reasons, I have found a dearth of examples to follow regarding setting up a Python endpoint.\n", "\n", "You will recall we first introduced SPARQL in [**CWPK #25**](https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/) in conjunction with the [RDFLib](https://en.wikipedia.org/wiki/RDFLib) package. We showed the flexibility and robustness of this query language to retrieve and filter any and all structural aspects of a knowledge graph. Then, in installment [**CWPK #50**](https://www.mkbergman.com/2396/cwpk-50-querying-external-sources/) we expanded on this basis to describe how SPARQL can be an essential component for querying and retrieving data from external sources, principally [Wikidata](https://en.wikipedia.org/wiki/Wikidata) and [DBpedia](https://en.wikipedia.org/wiki/DBpedia).\n", "\n", "Most all public SPARQL endpoints that presently exist (see [this representative list](https://www.wikidata.org/wiki/Wikidata:Lists/SPARQL_endpoints), which is disappointingly small) are based on [triple stores](https://en.wikipedia.org/wiki/Triplestore) that [come bundled with SPARQL endpoints](https://www.w3.org/wiki/LargeTripleStores). A few are also based on endpoint wrappers based on Java such as [RDF4j](https://en.wikipedia.org/wiki/RDF4J) or [Jena](https://en.wikipedia.org/wiki/Jena_(framework)) and a few languages such as C ([Redland](https://en.wikipedia.org/wiki/Redland_RDF_Application_Framework)) or [JavaScript](https://www.w3.org/community/rdfjs/wiki/Comparison_of_RDFJS_libraries). These options obviously do not meet our Python objectives.\n", "\n", "As we saw in [**CWPK #25**](https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/), RDFLib provides SPARQL query support and also has the related [SPARQLwrapper](https://rdflib.dev/sparqlwrapper/) package that enables one to pose queries to external SPARQL endpoints. ([easysparql](https://github.com/oeg-upm/easysparql) provides similar functionality.) However, the objective we have to turn a local or remote instance into a SPARQL-enabled endpoint accessible to outside parties is not so easily supported. A number of years back there were the well-regarded [rdflib-web](https://github.com/RDFLib/rdflib-web) apps that ran within [Flask](https://en.wikipedia.org/wiki/Flask_(web_framework)); unfortunately, this code is out of date and does not run on Python 3. There was also the [adhs](https://github.com/nareike/adhs) package that saw limited development and has not been updated in five years. In my initial diligence for this series I also found the [pyLDAPI](https://pyldapi.readthedocs.io/en/latest/) package that looked promising. However, I have not been able to find a working version of this system, and the I find the approach it takes to [content negotiation](https://en.wikipedia.org/wiki/Content_negotiation) for [linked data](https://en.wikipedia.org/wiki/Linked_data) to be cumbersome and tedious (see next installment).\n", "\n", "So, based on the fragments indicated and found from these researches, I decided to tackle setting up a SPARQL endpoint largely on my own. Having established a toe-hold in our remote Linux server in the last installation, I decided to proceed by baby steps reflecting what I had already learned with our local instance to expose an endpoint on our remote server." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step-wise Approach\n", "We begin our process by setting up our environment, loading needed packages and KBpedia, testing them, and then proceeding to write some code to enable SPARQL queries and then to manage the application. Not knowing if all of these steps will work, I decide to approach these questions in a step-by-step manner.\n", "\n", "**1. Create a 'sparql' conda and Flask address**\n", "\n", "
\n", " Note: I have always found the Linux vi editor to be difficult and hard to navigate, since I only use it on occasion. I now use nano as my editor replacement, since it presents key commands at the bottom of the screen useful to my occasional use, and is also part of the standard distro.\n", "
\n", "\n", "We follow the same steps that we worked out in [**CWPK #58**](https://www.mkbergman.com/2407/cwpk-58-setting-up-a-remote-instance-and-web-page-server/) for setting up a conda virtual environment, that we will name 'sparql':\n", "\n", "conda create -n sparql python=3\n", "\n", "We get the echo to screen as the basic conda environment is created. Remember, this environment is found in the /usr/bin/python-projects/miniconda3/envs/sparql directory location. We then activate the environment:\n", "\n", "conda activate sparql\n", "\n", "We install some basic packages and then create our new sparql directory and the two standard stub files there:\n", "\n", "
\n",
    "conda install flask\n",
    "conda install pip\n",
    "
\n", "\n", "then the two files, beginning with test_sparql.py:\n", "\n", "
\n",
    "from flask import Flask\n",
    "app = Flask(__name__)\n",
    "@app.route(\"/\")\n",
    "def hello():\n",
    "    return \"Hello SPARQL!\"\n",
    "
\n", "\n", "and then wsgi.py:\n", "\n", "
\n",
    "import sys\n",
    "sys.path.insert(0, \"/var/www/html/sparql/\")\n",
    "from test_sparql import app as application\n",
    "
\n", "\n", "We then proceed to set up the Apache2 configurations, placed directly below our prior similar specification in the /etc/apache2/sites-enabled directory in the 000-default.conf file:\n", "\n", "
\n",
    "        WSGIDaemonProcess sparql python-path=/usr/bin/python-projects/miniconda3/envs/sparql/lib/python3.8/site-packages\n",
    "        WSGIScriptAlias /sparql /var/www/html/sparql/wsgi.py\n",
    "        <Directory /var/www/html/sparql>\n",
    "            WSGIProcessGroup sparql\n",
    "            WSGIApplicationGroup %{GLOBAL}\n",
    "            Order deny,allow\n",
    "            Allow from all\n",
    "        </Directory>\n",
    "
\n", "\n", "then you can check whether the configuration is OK and re-start the server. Then, when we enter:\n", "\n", "http://54.227.249.140/sparql\n", "\n", "We see that the right message appears and our configuration is OK." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**2. Install all needed Python packages**\n", "\n", "If you recall from the last installment, we used the minimal miniconda3 package installer for our remote Linux (Ubuntu) instance. This minimal footprint largely only installs conda and Python. That means we must install all of the needed additional packages for our current application.\n", " \n", "We noted the pip installer before, but we are best off using one of the conda-related channels since they better check configuration dependencies. To expand our package availability from what is standard in the conda channel, we may need to add some additional channels to our base package. One of the most useful of these is conda-forge. To install it:\n", " \n", "conda config --add channels conda-forge\n", " \n", "It is best to install packages in bulk, since dependencies are checked at install time. One does this by listing the packages in the same command line. When doing so, you may encounter messages that one or more of the packages was not found. In these cases, you should go to the search box at https://anaconda.com, search for the package, and then note the channel in which the package is found. If that channel is not already part of your configuration, add it.\n", "\n", "Many of the needed packages for our SPARQL implementation are found under the conda-forge channel. Here is how a bulk install may look:\n", " \n", "conda install networkx owlready2 rdflib sparqlwrapper pandas --channel conda-forge\n", "\n", "We also then need to install *cowpoke* using pip by using this command while in the sparql virtual environment:\n", "\n", "pip install cowpoke\n", "\n", "Everytime we invoke the sparql virtual environment these packages will be available, which you can inspect using:\n", "\n", "conda list\n", "\n", "Also, if you want to share with others the package configuration of your conda environments, you may create the standard configuration file using this command:\n", "\n", " conda env export > environment.yaml\n", "\n", "The file will be written to the directory in which you invoke this command." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**3. Install KBpedia 'sandbox' KGs**\n", "\n", "Clearly, besides the Python code, we also need the various knowledge graphs used by KBpedia. These graphs are the same \\*.owl (rdf/xml) files that we first discussed in [**CWPK #18**](https://www.mkbergman.com/2348/cwpk-18-basic-terminology-and-load-kbpedia/) . We will use the same 'sandbox' files from that installment.\n", "\n", "Our first need is to decide where we want to store our KBpedia knowledge graphs. For the same reasons noted above, we choose to create the directory structure of /var/data/kbpedia. Once we create these directories, we need to set up the ownership and access properties for the files we will place there. So, we navigate to the parent directory data of our target kbpedia directory and issue two statements to set the ownership and access rights to this location:\n", "\n", "
\n",
    "sudo chown -R user-owner:user-group kbpedia\n",
    "sudo chmod -R 775 kbpedia\n",
    "
\n", "\n", "The -R switch means that our settings get applied recursively to all files and directories in the target directory. The permissions level (775) means that user owners or groups may write to these files (general users may not).\n", "\n", "These permission changes now allow us to transfer our local 'sandbox' files to this new directory. The two files that we need to transfer using our SSH or file transfer clients are:\n", "\n", "
\n",
    "kbpedia_reference_concepts.owl\n",
    "kko.owl\n",
    "
\n", "\n", "Recall these are the RDF/XML conversions of the original *.n3 files. We now have the data available on the remote instance for our SPARQL purposes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**4. Verify access and use of KBpedia and owlready2**\n", "\n", "OK, so to see that some of this is working, I pick up on the file viewing code in [**CWPK #18**](https://www.mkbergman.com/2348/cwpk-18-basic-terminology-and-load-kbpedia/) to see if we can load and view this stuff. I enter this code into a temp.py file and run python (python temp.py) under the /var/www/html/sparql/ directory:\n", "\n", "
\n",
    "main = '/var/data/kbpedia/kko.owl'  \n",
    "\n",
    "with open(main) as fobj:                           \n",
    "    for line in fobj:\n",
    "        print (line)\n",
    "
\n", "\n", "Good; we see the kko.owl file scroll by.\n", "\n", "So, the next test is to see if owlready2 is loaded properly and we can inspect the KBpedia knowledge graph.\n", "\n", "Picking up from some of the first tests in [**CWPK #20**](https://www.mkbergman.com/2350/cwpk-20-basic-knowledge-graph-management-i/), I create a script file locally and enter these instructions (note where the kko.owl file is now located):\n", "\n", "
\n",
    "main = '/var/data/kbpedia/kko.owl'\n",
    "skos_file = 'http://www.w3.org/2004/02/skos/core'\n",
    "\n",
    "from owlready2 import *\n",
    "kko = get_ontology(main).load()\n",
    "\n",
    "skos = get_ontology(skos_file).load()\n",
    "kko.imported_ontologies.append(skos) \n",
    "\n",
    "list(kko.classes())\n",
    "
\n", "\n", "When in the sparql directory under /var/www/html/sparql, I call up Python (remember to have the sparql virtual environment active!), which gives me this command line feedback:\n", "\n", "
\n",
    "(sparql) root@ip-xxx-xx-x-xx:/var/www/html/sparql# python\n",
    "Python 3.8.5 (default, Sep  4 2020, 07:30:14)\n",
    "[GCC 7.3.0] :: Anaconda, Inc. on linux\n",
    "Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n",
    ">>> \n",
    "
\n", "\n", "and I paste the code block above at the cursor (>>>). I then hit Enter at the end of the code block, and we then see our kko classes get listed out.\n", "\n", "Good, it appears we have the proper packages and directory locations. We can Ctrl-d (since we are on Linux) to exit the Python interactive session." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**5. Create a 'remote_access.py' to verify a SPARQL query against the local version of the remote instance**\n", "\n", "So far, so good. We are now ready to test support for SPARQL. We again look to one of our prior installments, [**CWPK #25**](https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/), to test whether SPARQL is working for us with all of the constituent KBpedia knowledge graphs. As we did with the prior question, we formulate a code block and invoke it interactively on the remote server with our python command. Here is the code (note that we have switched the definition of main to the full KBpedia reference concepts graph):\n", "\n", "
\n",
    "main = '/var/data/kbpedia/kbpedia_reference_concepts.owl'\n",
    "skos_file = 'http://www.w3.org/2004/02/skos/core' \n",
    "kko_file = '/var/data/kbpedia/kko.owl'\n",
    "\n",
    "from owlready2 import *\n",
    "world = World()\n",
    "kb = world.get_ontology(main).load()\n",
    "rc = kb.get_namespace('http://kbpedia.org/kko/rc/')\n",
    "\n",
    "skos = world.get_ontology(skos_file).load()\n",
    "kb.imported_ontologies.append(skos)\n",
    "\n",
    "kko = world.get_ontology(kko_file).load()\n",
    "kb.imported_ontologies.append(kko)\n",
    "\n",
    "import rdflib\n",
    "\n",
    "graph = world.as_rdflib_graph()\n",
    "\n",
    "form_1 = list(graph.query_owlready(\"\"\"\n",
    "  PREFIX rc: <http://kbpedia.org/kko/rc/>\n",
    "  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>\n",
    "  SELECT DISTINCT ?x ?label\n",
    "  WHERE\n",
    "  {\n",
    "    ?x rdfs:subClassOf rc:Mammal.\n",
    "    ?x skos:prefLabel  ?label. \n",
    "  }\n",
    "\"\"\"))\n",
    "\n",
    "print(form_1)\n",
    "
\n", "\n", "Fantastic! This works, too, even to the level of giving us the owlready2 circular reference warnings we received when we first invoked [**CWPK #25**](https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/)!\n", "\n", "Now, let's also test if we can query using SPARQL to another remote endpoint from our remote instance using again more code from the [**CWPK #25**](https://www.mkbergman.com/2358/cwpk-25-querying-kbpedia-with-sparql/) installment and also after importing the sparqlwrapper package:\n", "\n", "
\n",
    "main = '/var/data/kbpedia/kbpedia_reference_concepts.owl'\n",
    "skos_file = 'http://www.w3.org/2004/02/skos/core' \n",
    "kko_file = '/var/data/kbpedia/kko.owl'\n",
    "\n",
    "from owlready2 import *\n",
    "world = World()\n",
    "kb = world.get_ontology(main).load()\n",
    "rc = kb.get_namespace('http://kbpedia.org/kko/rc/')\n",
    "\n",
    "skos = world.get_ontology(skos_file).load()\n",
    "kb.imported_ontologies.append(skos)\n",
    "\n",
    "kko = world.get_ontology(kko_file).load()\n",
    "kb.imported_ontologies.append(kko)\n",
    "\n",
    "from SPARQLWrapper import SPARQLWrapper, JSON\n",
    "from rdflib import Graph\n",
    "\n",
    "sparql = SPARQLWrapper(\"https://query.wikidata.org/sparql\")\n",
    "\n",
    "sparql.setQuery(\"\"\"\n",
    "  PREFIX schema: <http://schema.org/>\n",
    "  SELECT ?item ?itemLabel ?wikilink ?itemDescription ?subClass ?subClassLabel WHERE {\n",
    "  VALUES ?item { wd:Q25297630\n",
    "  wd:Q537127\n",
    "  wd:Q16831714\n",
    "  wd:Q24398318\n",
    "  wd:Q11755880\n",
    "  wd:Q681337\n",
    "}\n",
    "  ?item wdt:P910 ?subClass.\n",
    "\n",
    "  SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n",
    "}\n",
    "\"\"\")\n",
    "sparql.setReturnFormat(JSON)\n",
    "results = sparql.query().convert()\n",
    "print(results)\n",
    "
\n", "\n", "Most excellent! We have also confirmed we can use our remote server for remote endpoint queries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**6. Create a Flask-based SPARQL input form for the local version**\n", "This progress is rewarding, but the task now becomes substantially harder. We need to set up interfaces that will allow these queries to be run from external sources to our remote instance. There are two ways we can tackle this requirement.\n", "\n", "The first way, the subject of this particular question, is to set up a Web page form that any outside user may access from the Web to issue a SPARQL query via an editable input form. The second way, the subject of question **#9**, is to enable a remote query issued via sparqlwrapper and Python that goes directly to the endpoint and bypasses the need for a form.\n", "\n", "Since we already have installed [Flask](https://en.wikipedia.org/wiki/Flask_(web_framework)) and validated it in the last installment, our task under this present question is to set up the Web form (in the form of a template as used by Flask) in which we enter our SPARQL queries. Flask maps Web (HTTP) requests to Python functions, which we showed in the last installment where the /sparql URI fragment maps to the /var/www/html/sparql path and its test_sparql.py function. Flask runs this code and then displays results to the browser using HTTP protocols, with the GET method being the most common, but all HTTP methods may be supported. The Python code invoked may call up templates (based on Jinja) that can then invoke HTML pages forms and various response functions. \n", "\n", "I noted earlier two SPARQL-related efforts, [pyLDAPI](https://github.com/RDFLib/pyLDAPI) and [adhs](https://github.com/nareike/adhs/blob/master/templates/sparql.html). While neither appears to have a working example, both contain aspects that can inform this task and subsequent ones. A (non-working) implementation of pyLDAPI called [GNAF](https://github.com/CSIRO-enviro-informatics/gnaf-dataset/blob/master/view/templates/page_sparql.html), in particular, has a SPARQL Web page that looked to be useful as a starting template.\n", "\n", "If you recall, Flask uses HTML-based templates as its 'view'-related approach to the [model-view-controller](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller) (MVC) design. Besides embedding standard HTML, these templates may also contain set Flask statements that relate the Web page to various model or controller commands. These templates should be placed into a set directory under the Flask directory structure. The templates can be nested within one another, useful, for example, when one wants a header and footer repeated across multiple pages, but for our instance I chose a single-page template.\n", "\n", "In essence, I took the two main text areas from the starting GNAF template and embedded them in duplicate presentations of the header and footer from the [KBpedia](https://kbpedia.org/) current Web page design. (You should know that the server hosting the subject SPARQL page is different from the physical server hosting the standard KBpedia Web site.) I took this approach because I was considering making a SPARQL query form a standard part of the main KBpedia site, which I implement at the conclusion of the next installment. Here is how the resulting Web page form looks:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "
\n", "\n", "\"KBpedia\n", "\n", "
\n", "\n", "
\n", "\n", "Figure 1: KBpedia SPARQL Form\n", "\n", "
\n", "\n", "Though located on a remote server different than the standard [KBpedia Web site](https://kbpedia.org/), we have designed the KBpedia SPARQL form to mimic the look of that standard site **(1)** with the same menu options, and both interact seamlessly. Sample SPARQL queries are provided both for the internal KBpedia knowledge graph and for external sites **(2)**, including links **(2)** to additional query examples. These queries, whether samples or ones of your own crafting, can be pasted into the query entry box **(3)**. Once pasted, you have the option to enter an external SPARQL query URL **(4)**, pick whether your query should be directed internally to KBpedia or externally **(4)** (if the query is external), and to select amongst about 8 output formats **(4)**, including standard RDF/XML, JSON, CSV, HTML, etc. Then, when you submit the query **(4)**, the results appear in the final text box **(5)**. If the results are helpful, you may copy them and paste them into a local file.\n", "\n", "You can inspect this resulting SPARQL Web page at the following address (View Page Source to see the HTML):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "http://sparql.kbpedia.org/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will note that besides logo and menu items similar to the standard KBpedia site, that this form has two text areas, one for entering the SPARQL query and one for viewing subsequent results. There are also some switches regarding input and output forms. It is these switches and the two text areas that relate most directly to the next question.\n", "\n", "Tieing this form to (which, of course was actually developed in conjunction with) its accompanying code was the most difficult coding effort I have undertaken with this **CWPK** series to date. I cover this coding development, along with the remaining questions and related topics, in our next installment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " NOTE: This article is part of the Cooking with Python and KBpedia series. See the CWPK listing for other articles in the series. KBpedia has its own Web site. The cowpoke Python code listing covering the series is also available from GitHub.\n", "
\n", "\n", "
\n", "\n", "NOTE: This CWPK \n", "installment is available both as an online interactive\n", "file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb file. It may take a bit of time for the interactive option to load.
\n", "\n", "
\n", "
I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment -- which is part of the fun of Python -- and to notify me should you make improvements. \n", "\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 5 }