{ "metadata": { "name": "", "signature": "sha256:b036a8f34d3ae154887c94d2301145e771c48227107b88e1f727373c7f5b6ec3" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook you will learn how to create your own in-memory Linked Data RDF triplestore from one or more files of triples stored in a Turtle (*.ttl*) datafile.\n", "\n", "[See also the *Getting Started With GP Data - Linked Data demo* notebook in the *gpdata* directory for an example of how to create an RDF graph of data that originates in a tabular data formatted dataset such as a CSV file.]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Turtle data files for this example come from http://learning-provider.data.ac.uk/\n", " \n", "* Learning Providers (basic) - 180K \n", " * RDF Descriptions of university data supplied from http://www.ukrlp.co.uk/. It should be considered as being from an authoritative source.\n", "\n", "* University Groups and Consortia - 26K \n", " * RDF Descriptions of groups and consortia (eg. Russell Group), including name and website of members.\n", "\n", "* Linkset to DBPedia - 22K \n", " * RDF Turtle containing just a list of owl:sameAs startments connection the URIs for learning providers and consortia to the DBPedia URI for them.\n", "\n", "* Linkset to \"Gateway to Research\" URIs - 22K \n", " * RDF Turtle containing just a list of owl:sameAs startments connection the URIs for learning providers to the Gateway to Research URI for that organsation, where available." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using this data, we should be able to generate our own Linked Data graph describing UK HE organisations." ] }, { "cell_type": "code", "collapsed": false, "input": [ "#List the triple files\n", "!ls linkedDataTriples/" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the `rdflib` package to manage a simple in-memory RDF triplestore." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import rdflib\n", "g = rdflib.Graph()\n", "g.parse('linkedDataTriples/groups.ttl', format='n3')\n", "for stmt in g:\n", " print(stmt)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use it to run SPARQL queries over the triple store." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def rdfQuery(graph,q):\n", " ans=graph.query(q)\n", " for row in ans:\n", " for el in row:\n", " print(el,end=\" \")\n", " print()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "q='''\n", "SELECT DISTINCT ?y ?z {\n", " ?y ?z \n", "}\n", "'''\n", "rdfQuery(g,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "q='''\n", "SELECT DISTINCT ?y ?z {\n", " ?y ?z \n", "}\n", "'''\n", "rdfQuery(g,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "q='''\n", "SELECT DISTINCT ?group {\n", " ?group \n", "}\n", "'''\n", "rdfQuery(g,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "rdfQuery(g,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "r = rdflib.Graph()\n", "r.parse('linkedDataTriples/gtr-linkset.ttl', format='n3')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "q='''\n", "SELECT DISTINCT ?y ?z {\n", " ?y ?z \n", "}\n", "'''\n", "rdfQuery(r,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!head linkedDataTriples/gtr-linkset.ttl" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "q='''\n", "SELECT (COUNT(*) as ?n) WHERE { ?x ?y ?z }\n", "'''\n", "\n", "rdfQuery(r,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "#dbpedia-linkset.ttl groups.ttl gtr-linkset.ttl learning-providers.ttl\n", "l = rdflib.Graph()\n", "l.parse('linkedDataTriples/dbpedia-linkset.ttl', format='n3')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!head linkedDataTriples/dbpedia-linkset.ttl" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "p = rdflib.Graph()\n", "p.parse('linkedDataTriples/learning-providers.ttl', format='n3')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!head -n 20 linkedDataTriples/learning-providers.ttl" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "q='''\n", "SELECT ?y ?z {\n", " ?y ?z \n", "}\n", "'''\n", "rdfQuery(l,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can add the triples contained within multiple graphs together to create a new graph." ] }, { "cell_type": "code", "collapsed": false, "input": [ "q='''\n", "SELECT (COUNT(*) as ?count) {\n", " ?x ?y ?z \n", "}\n", "'''\n", "rdfQuery(l,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "rdfQuery(l+g,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "q='''\n", "SELECT ?y ?z {\n", " ?y ?z \n", "}\n", "'''\n", "rdfQuery(l+g,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "rdfQuery(p,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "q='''\n", "SELECT ?group {\n", " ?y ?z.\n", " ?r ?y ?z.\n", " ?group ?r.\n", "}\n", "'''\n", "rdfQuery(l,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "q='''\n", "SELECT DISTINCT ?group {\n", " ?group .\n", "}\n", "'''\n", "rdfQuery(l+g,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!head -n 22 linkedDataTriples/dbpedia-linkset.ttl" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }