{
 "metadata": {
  "name": "",
  "signature": "sha256:8471b3e107b94917f99dcdf649ada33387dd1210c469b34c8957353b7bb6744c"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "OU Linked Data Demo - Aggregation Operators"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As with SQL, SPARQL supports a range of aggregation operators that can be used to generate summary reports over a dataset."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As usual, let's import the necessary libraries and set up the endpoint URL."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from SPARQLWrapper import SPARQLWrapper, JSON\n",
      "endpoint_ou=\"http://data.open.ac.uk/query\""
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And drawing on the previous OU Linked Data Demo notebook, bring in some helper functions."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#We should perhaps show how to create a simple package in the first OU notebook that we can then just import?\n",
      "def runQuery(endpoint,prefix,q):\n",
      "    ''' Run a SPARQL query with a declared prefix over a specified endpoint '''\n",
      "    sparql = SPARQLWrapper(endpoint)\n",
      "    sparql.setQuery(prefix+q)\n",
      "    sparql.setReturnFormat(JSON)\n",
      "    return sparql.query().convert()\n",
      "\n",
      "import pandas as pd\n",
      "#And some more helpers\n",
      "def dict2df(results):\n",
      "    ''' Hack a function to flatten the SPARQL query results and return the column values '''\n",
      "    data=[]\n",
      "    for result in results[\"results\"][\"bindings\"]:\n",
      "        tmp={}\n",
      "        for el in result:\n",
      "            tmp[el]=result[el]['value']\n",
      "        data.append(tmp)\n",
      "\n",
      "    df = pd.DataFrame(data)\n",
      "    return df\n",
      "\n",
      "def dfResults(endpoint,prefix,q):\n",
      "    ''' Generate a data frame containing the results of running\n",
      "        a SPARQL query with a declared prefix over a specified endpoint '''\n",
      "    return dict2df( runQuery( endpoint, prefix, q ) )\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "SPARQL queries support a similar set of aggregation operators to SQL. For example, we can group results using the `GROUP BY` operator."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "prefix='''\n",
      "PREFIX mlo: <http://purl.org/net/mlo/>\n",
      "PREFIX aiiso: <http://purl.org/vocab/aiiso/schema#>\n",
      "'''\n",
      "\n",
      "#We can rename a query variable or expression using the AS command within a set of brackets:\n",
      "## ( CONSTRUCTION AS ?newname)\n",
      "q='''\n",
      "SELECT ?level (COUNT(?course) AS ?count)\n",
      "FROM <http://data.open.ac.uk/context/course>\n",
      "WHERE {\n",
      "    ?course mlo:location <http://sws.geonames.org/2328926/> .\n",
      "    ?course a aiiso:Module.\n",
      "    #?course <http://data.open.ac.uk/saou/ontology#OUCourseLevel> \"1\"^^<http://www.w3.org/2001/XMLSchema#string>.\n",
      "    ?course <http://data.open.ac.uk/saou/ontology#OUCourseLevel> ?level.\n",
      "} GROUP BY ?level\n",
      "'''\n",
      "dfResults(endpoint_ou,prefix,q)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can also limit searches based on the results of grouped aggregation operations by using the `HAVING` limit."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "prefix='''\n",
      "PREFIX mlo: <http://purl.org/net/mlo/>\n",
      "PREFIX aiiso: <http://purl.org/vocab/aiiso/schema#>\n",
      "'''\n",
      "\n",
      "q='''\n",
      "SELECT ?level (COUNT(?course) AS ?count)\n",
      "FROM <http://data.open.ac.uk/context/course>\n",
      "WHERE {\n",
      "    ?course mlo:location <http://sws.geonames.org/2328926/> .\n",
      "    ?course a aiiso:Module.\n",
      "    ?course <http://data.open.ac.uk/saou/ontology#OUCourseLevel> ?level.\n",
      "}\n",
      "GROUP BY ?level\n",
      "HAVING (COUNT(?level) >= 25)\n",
      "'''\n",
      "dfResults(endpoint_ou,prefix,q)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As with other query languages such as SQL, the ability to use aggregation operators as the basis of reporting means that, to a certain extent, you can push computational requirements onto the query engine and minimise the amount of data that needs to be transported, and then computed on, within the client."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}