{ "metadata": { "name": "", "signature": "sha256:8471b3e107b94917f99dcdf649ada33387dd1210c469b34c8957353b7bb6744c" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "OU Linked Data Demo - Aggregation Operators" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with SQL, SPARQL supports a range of aggregation operators that can be used to generate summary reports over a dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As usual, let's import the necessary libraries and set up the endpoint URL." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from SPARQLWrapper import SPARQLWrapper, JSON\n", "endpoint_ou=\"http://data.open.ac.uk/query\"" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And drawing on the previous OU Linked Data Demo notebook, bring in some helper functions." ] }, { "cell_type": "code", "collapsed": false, "input": [ "#We should perhaps show how to create a simple package in the first OU notebook that we can then just import?\n", "def runQuery(endpoint,prefix,q):\n", " ''' Run a SPARQL query with a declared prefix over a specified endpoint '''\n", " sparql = SPARQLWrapper(endpoint)\n", " sparql.setQuery(prefix+q)\n", " sparql.setReturnFormat(JSON)\n", " return sparql.query().convert()\n", "\n", "import pandas as pd\n", "#And some more helpers\n", "def dict2df(results):\n", " ''' Hack a function to flatten the SPARQL query results and return the column values '''\n", " data=[]\n", " for result in results[\"results\"][\"bindings\"]:\n", " tmp={}\n", " for el in result:\n", " tmp[el]=result[el]['value']\n", " data.append(tmp)\n", "\n", " df = pd.DataFrame(data)\n", " return df\n", "\n", "def dfResults(endpoint,prefix,q):\n", " ''' Generate a data frame containing the results of running\n", " a SPARQL query with a declared prefix over a specified endpoint '''\n", " return dict2df( runQuery( endpoint, prefix, q ) )\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SPARQL queries support a similar set of aggregation operators to SQL. For example, we can group results using the `GROUP BY` operator." ] }, { "cell_type": "code", "collapsed": false, "input": [ "prefix='''\n", "PREFIX mlo: \n", "PREFIX aiiso: \n", "'''\n", "\n", "#We can rename a query variable or expression using the AS command within a set of brackets:\n", "## ( CONSTRUCTION AS ?newname)\n", "q='''\n", "SELECT ?level (COUNT(?course) AS ?count)\n", "FROM \n", "WHERE {\n", " ?course mlo:location .\n", " ?course a aiiso:Module.\n", " #?course \"1\"^^.\n", " ?course ?level.\n", "} GROUP BY ?level\n", "'''\n", "dfResults(endpoint_ou,prefix,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also limit searches based on the results of grouped aggregation operations by using the `HAVING` limit." ] }, { "cell_type": "code", "collapsed": false, "input": [ "prefix='''\n", "PREFIX mlo: \n", "PREFIX aiiso: \n", "'''\n", "\n", "q='''\n", "SELECT ?level (COUNT(?course) AS ?count)\n", "FROM \n", "WHERE {\n", " ?course mlo:location .\n", " ?course a aiiso:Module.\n", " ?course ?level.\n", "}\n", "GROUP BY ?level\n", "HAVING (COUNT(?level) >= 25)\n", "'''\n", "dfResults(endpoint_ou,prefix,q)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with other query languages such as SQL, the ability to use aggregation operators as the basis of reporting means that, to a certain extent, you can push computational requirements onto the query engine and minimise the amount of data that needs to be transported, and then computed on, within the client." ] } ], "metadata": {} } ] }