{
"metadata": {
"name": "",
"signature": "sha256:8471b3e107b94917f99dcdf649ada33387dd1210c469b34c8957353b7bb6744c"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"OU Linked Data Demo - Aggregation Operators"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As with SQL, SPARQL supports a range of aggregation operators that can be used to generate summary reports over a dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As usual, let's import the necessary libraries and set up the endpoint URL."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from SPARQLWrapper import SPARQLWrapper, JSON\n",
"endpoint_ou=\"http://data.open.ac.uk/query\""
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And drawing on the previous OU Linked Data Demo notebook, bring in some helper functions."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We should perhaps show how to create a simple package in the first OU notebook that we can then just import?\n",
"def runQuery(endpoint,prefix,q):\n",
" ''' Run a SPARQL query with a declared prefix over a specified endpoint '''\n",
" sparql = SPARQLWrapper(endpoint)\n",
" sparql.setQuery(prefix+q)\n",
" sparql.setReturnFormat(JSON)\n",
" return sparql.query().convert()\n",
"\n",
"import pandas as pd\n",
"#And some more helpers\n",
"def dict2df(results):\n",
" ''' Hack a function to flatten the SPARQL query results and return the column values '''\n",
" data=[]\n",
" for result in results[\"results\"][\"bindings\"]:\n",
" tmp={}\n",
" for el in result:\n",
" tmp[el]=result[el]['value']\n",
" data.append(tmp)\n",
"\n",
" df = pd.DataFrame(data)\n",
" return df\n",
"\n",
"def dfResults(endpoint,prefix,q):\n",
" ''' Generate a data frame containing the results of running\n",
" a SPARQL query with a declared prefix over a specified endpoint '''\n",
" return dict2df( runQuery( endpoint, prefix, q ) )\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"SPARQL queries support a similar set of aggregation operators to SQL. For example, we can group results using the `GROUP BY` operator."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"prefix='''\n",
"PREFIX mlo: \n",
"PREFIX aiiso: \n",
"'''\n",
"\n",
"#We can rename a query variable or expression using the AS command within a set of brackets:\n",
"## ( CONSTRUCTION AS ?newname)\n",
"q='''\n",
"SELECT ?level (COUNT(?course) AS ?count)\n",
"FROM \n",
"WHERE {\n",
" ?course mlo:location .\n",
" ?course a aiiso:Module.\n",
" #?course \"1\"^^.\n",
" ?course ?level.\n",
"} GROUP BY ?level\n",
"'''\n",
"dfResults(endpoint_ou,prefix,q)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also limit searches based on the results of grouped aggregation operations by using the `HAVING` limit."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"prefix='''\n",
"PREFIX mlo: \n",
"PREFIX aiiso: \n",
"'''\n",
"\n",
"q='''\n",
"SELECT ?level (COUNT(?course) AS ?count)\n",
"FROM \n",
"WHERE {\n",
" ?course mlo:location .\n",
" ?course a aiiso:Module.\n",
" ?course ?level.\n",
"}\n",
"GROUP BY ?level\n",
"HAVING (COUNT(?level) >= 25)\n",
"'''\n",
"dfResults(endpoint_ou,prefix,q)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As with other query languages such as SQL, the ability to use aggregation operators as the basis of reporting means that, to a certain extent, you can push computational requirements onto the query engine and minimise the amount of data that needs to be transported, and then computed on, within the client."
]
}
],
"metadata": {}
}
]
}