{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Querying hierarchical data\n",
    "\n",
    "In this notebook, we explore how to query hierarchical databases."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The database\n",
    "\n",
    "We start with loading a sample hierarchical database.  Our sample database is derived from the dataset of all employees of the city of Chicago ([source](https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-Salaries-and-Position-Title/xzkq-xp2w))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dict{AbstractString,Any} with 1 entry:\n",
       "  \"departments\" => Any[Dict{AbstractString,Any}(\"name\"=>\"WATER MGMNT\",\"employee…"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ENV[\"LINES\"] = 15\n",
    "include(\"../citydb_json.jl\")\n",
    "\n",
    "citydb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In hierarchical data model, data is organized in a tree-like structure.  In this database, data is stored as a JSON document organized in a 2-level hierarchy:\n",
    "\n",
    "* the top level object contains field `\"departments\"` with an array of department objects;\n",
    "* each department object has fields `\"name\"`, the name of the department, and `\"employees\"`, an array of employees;\n",
    "* each employee object has fields `\"name\"`, `\"surname\"`, `\"position\"`, `\"salary\"` describing the employee.\n",
    "\n",
    "$$\n",
    "\\text{departments} \\quad\n",
    "\\begin{cases}\n",
    "    \\text{name} \\\\\n",
    "    \\text{employees} \\quad\n",
    "        \\begin{cases}\n",
    "            \\text{name} \\\\\n",
    "            \\text{surname} \\\\\n",
    "            \\text{position} \\\\\n",
    "            \\text{salary}\n",
    "        \\end{cases}\n",
    "\\end{cases}\n",
    "$$\n",
    "\n",
    "Here is a fragment of the dataset:\n",
    "\n",
    "```json\n",
    "    {\n",
    "        \"departments\": [\n",
    "            {\n",
    "                \"name\": \"WATER MGMNT\",\n",
    "                \"employees\": [\n",
    "                    {\n",
    "                        \"name\": \"ALVA\",\n",
    "                        \"surname\": \"A\",\n",
    "                        \"position\": \"WATER RATE TAKER\",\n",
    "                        \"salary\": 87228\n",
    "                    },\n",
    "                    ...\n",
    "                ]\n",
    "            },\n",
    "            ...\n",
    "        ]\n",
    "    }\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combinators\n",
    "\n",
    "We may want to ask some questions about the data.  For example,\n",
    "\n",
    "* *What are the departments in the city of Chicago?*\n",
    "* *How many employees in each department?*\n",
    "* *What is the top salary among all the employees?*\n",
    "* and so on...\n",
    "\n",
    "Even though the raw dataset does not immediately contain any answers to these questions, it has enough information so that the answers could be inferred from the data if we are willing to write some code (we use [Julia](http://julialang.org/) programming language).\n",
    "\n",
    "Take a relatively complicated problem:\n",
    "\n",
    "> *For each department, find the number of employees with the salary higher than $100k.*\n",
    "\n",
    "It can be solved as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35-element Array{Any,1}:\n",
       " Dict{ASCIIString,Any}(\"name\"=>\"WATER MGMNT\",\"N100k\"=>179)    \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"POLICE\",\"N100k\"=>1493)        \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"GENERAL SERVICES\",\"N100k\"=>79)\n",
       " Dict{ASCIIString,Any}(\"name\"=>\"CITY COUNCIL\",\"N100k\"=>54)    \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"STREETS & SAN\",\"N100k\"=>39)   \n",
       " ⋮                                                            \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"BOARD OF ETHICS\",\"N100k\"=>2)  \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"POLICE BOARD\",\"N100k\"=>0)     \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"BUDGET & MGMT\",\"N100k\"=>12)   \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"ADMIN HEARNG\",\"N100k\"=>3)     \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"LICENSE APPL COMM\",\"N100k\"=>0)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Depts_With_Num_Well_Paid_Empls(data) =\n",
    "    map(d -> Dict(\n",
    "            \"name\" => d[\"name\"],\n",
    "            \"N100k\" => length(filter(e -> e[\"salary\"] > 100000, d[\"employees\"]))),\n",
    "        data[\"departments\"])\n",
    "\n",
    "Depts_With_Num_Well_Paid_Empls(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Is it a good solution?  Possibly.  It is certainly compact, due to our use of `map` and `filter` to traverse the structure of the database.  On the other hand, to write or understand code like this, one needs solid understanding of non-trivial CS concepts such as high-order and anonymous functions.  One needs to be a professional programmer.\n",
    "\n",
    "Is there a way to write this query without use of `map` and `filter` (or, equivalently, nested loops)?  Indeed, there is, but to show how to do it, we need to introduce some new primitives and operations.  We start with the notion of *JSON combinators*.\n",
    "\n",
    "A JSON combinator is simply a function that maps JSON input to JSON output.  Two trivial examples of JSON combinators are:\n",
    "\n",
    "* `Const(val)`, which maps each input value to constant value `val`.\n",
    "* `This()`, which copies the input to the output without changes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(42,42,42)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Const(val) = x -> val\n",
    "\n",
    "C = Const(42)\n",
    "C(true), C(42), C([1,2,3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example, `Const(42)` creates a new constant combinator.  It is then applied to various input JSON values, always producing the same output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(true,42,[1,2,3])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "This() = x -> x\n",
    "\n",
    "I = This()\n",
    "I(true), I(42), I([1,2,3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, `This()` creates a new identity combinator.  We test it with different input values to assure ourselves that it does not change the input.\n",
    "\n",
    "Notice the pattern:\n",
    "\n",
    "* First, we create a combinator *(construct a query)* using combinator constructors.\n",
    "* Then, we apply the combinator *(execute the query)* against the data.\n",
    "\n",
    "In short, by designing a collection of useful combinators, we are creating a query language (embedded in the host language, but this is immaterial)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Traversing the hierarchy\n",
    "\n",
    "Now let us define a more interesting combinator.  `Field(name)` extracts a field value from a JSON object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Field (generic function with 1 method)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Field(name) = x -> x[name]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "216210"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Salary = Field(\"salary\")\n",
    "Salary(Dict(\"name\" => \"RAHM\", \"surname\" => \"E\", \"position\" => \"MAYOR\", \"salary\" => 216210))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, to demonstrate field extractors, we defined `Salary`, a combinator that extracts value of field `\"salary\"` from the input JSON object.\n",
    "\n",
    "To build interesting queries, we need a way to construct complex combinators from primitives.  Let us define composition `(F >> G)` of combinators `F` and `G` that ties `F` and `G` by sending the output of `F` to the input of `G`.\n",
    "\n",
    "Our first, naive attempt to implement composition is as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       ">> (generic function with 86 methods)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import Base: >>\n",
    "\n",
    "(F >> G) = x -> G(F(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can traverse the structure of hierarchical data by chaining field extractors with the composition operator.\n",
    "\n",
    "$$\n",
    "\\textbf{departments}\\gg \\quad\n",
    "\\begin{cases}\n",
    "    \\gg\\textbf{name} \\\\\n",
    "    \\text{employees} \\quad\n",
    "        \\begin{cases}\n",
    "            \\text{name} \\\\\n",
    "            \\text{surname} \\\\\n",
    "            \\text{position} \\\\\n",
    "            \\text{salary}\n",
    "        \\end{cases}\n",
    "\\end{cases}\n",
    "$$\n",
    "\n",
    "For example, let us *find the names of all departments.*  We can do it by composing extractors for fields `\"departments\"` and `\"name\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "ename": "LoadError",
     "evalue": "LoadError: indexing Array{Any,1} with types Tuple{ASCIIString} is not supported\nwhile loading In[8], in expression starting on line 5",
     "output_type": "error",
     "traceback": [
      "LoadError: indexing Array{Any,1} with types Tuple{ASCIIString} is not supported\nwhile loading In[8], in expression starting on line 5",
      "",
      " in error at ./error.jl:21",
      " in getindex at abstractarray.jl:483",
      " in anonymous at In[5]:1",
      " in anonymous at In[7]:3",
      " [inlined code] from essentials.jl:114"
     ]
    }
   ],
   "source": [
    "Departments = Field(\"departments\")\n",
    "Name = Field(\"name\")\n",
    "\n",
    "Dept_Names = Departments >> Name\n",
    "Dept_Names(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is going on?  We expected to get a list of department names, but instead we got an error.\n",
    "\n",
    "Here is a problem.  With the current definition of the ``>>`` operator, expression\n",
    "```julia\n",
    "    (Departments >> Name)(citydb)\n",
    "```\n",
    "is translated to\n",
    "```julia\n",
    "    citydb[\"departments\"][\"name\"]\n",
    "```\n",
    "But this fails because `citydb[\"departments\"]` is a array and thus doesn't have a field called `\"name\"`.\n",
    "\n",
    "Let us demonstrate the behavior of `>>` on the *duplicating* combinator.   Combinator `Dup` duplicates its input, that is, for any input value `x`, it produces an array `[x, x]`.  See what happens when we compose `Dup` with itself, once or several times:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(Any[0,0],Any[Any[0,0],Any[0,0]],Any[Any[Any[0,0],Any[0,0]],Any[Any[0,0],Any[0,0]]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Dup = x -> Any[x, x]\n",
    "\n",
    "Dup(0), (Dup >> Dup)(0), (Dup >> Dup >> Dup)(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need composition `(F >> G)` to be smarter.  When `F` produces an array, the composition should apply `G` to *each* element of the array.  In addition, if `G` also produces array values, `(F >> G)` concatenates all outputs to produce a single array value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "_expand (generic function with 1 method)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(F >> G) = x -> _flat(_map(G, F(x)))\n",
    "\n",
    "_flat(z) =\n",
    "    isa(z, Array) ? foldr(vcat, [], z) : z\n",
    "_map(G, y) =\n",
    "    isa(y, Array) ? map(_expand, map(G, y)) : G(y)\n",
    "_expand(z_i) =\n",
    "    isa(z_i, Array) ? z_i : z_i != nothing ? [z_i] : []"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us test the updated `>>` operator with `Dup` again.  We see that the output arrays are now flattened:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(Any[0,0],Any[0,0,0,0],Any[0,0,0,0,0,0,0,0])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Dup(0), (Dup >> Dup)(0), (Dup >> Dup >> Dup)(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can get back to our original example, *finding the names of all departments*.  Now we get the result we expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35-element Array{Any,1}:\n",
       " \"WATER MGMNT\"      \n",
       " \"POLICE\"           \n",
       " \"GENERAL SERVICES\" \n",
       " \"CITY COUNCIL\"     \n",
       " \"STREETS & SAN\"    \n",
       " ⋮                  \n",
       " \"BOARD OF ETHICS\"  \n",
       " \"POLICE BOARD\"     \n",
       " \"BUDGET & MGMT\"    \n",
       " \"ADMIN HEARNG\"     \n",
       " \"LICENSE APPL COMM\""
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Dept_Names = Departments >> Name\n",
    "Dept_Names(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, we can list values of any attribute in the hierarchy tree.  For example, let us *find the names of all employees*.\n",
    "\n",
    "$$\n",
    "\\textbf{departments}\\gg \\quad\n",
    "\\begin{cases}\n",
    "    \\text{name} \\\\\n",
    "    \\gg\\textbf{employees}\\gg \\quad\n",
    "        \\begin{cases}\n",
    "            \\gg\\textbf{name} \\\\\n",
    "            \\text{surname} \\\\\n",
    "            \\text{position} \\\\\n",
    "            \\text{salary}\n",
    "        \\end{cases}\n",
    "\\end{cases}\n",
    "$$\n",
    "\n",
    "We can do it by composing field extractors on the path from the root of the hierarchy to the `\"name\"` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "32181-element Array{Any,1}:\n",
       " \"ELVIA\"     \n",
       " \"VICENTE\"   \n",
       " \"MUHAMMAD\"  \n",
       " \"GIRLEY\"    \n",
       " \"DILAN\"     \n",
       " ⋮           \n",
       " \"NANCY\"     \n",
       " \"DARCI\"     \n",
       " \"THADDEUS\"  \n",
       " \"RACHENETTE\"\n",
       " \"MICHELLE\"  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Employees = Field(\"employees\")\n",
    "\n",
    "Empl_Names = Departments >> Employees >> Name\n",
    "Empl_Names(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summarizing data\n",
    "\n",
    "Field extractors and composition give us a way to traverse the hierarchy tree.  We still need a way to summarize data.\n",
    "\n",
    "Consider a query: *find the number of departments*.  To write it down, we need a combinator that can count the number of elements in an array.\n",
    "\n",
    "Here is our first attempt to implement it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Count (generic function with 1 method)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Count() = x -> length(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We compose it with a combinator that generates an array of departments to *calculate the number of departments:*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35-element Array{Any,1}:\n",
       " 2\n",
       " 2\n",
       " 2\n",
       " 2\n",
       " 2\n",
       " ⋮\n",
       " 2\n",
       " 2\n",
       " 2\n",
       " 2\n",
       " 2"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Num_Depts = Departments >> Count()\n",
    "Num_Depts(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But that's not what we expected!  Here is the problem: the composition operator does not let `Count()` see the whole array.  Instead, `Departments >> Count()` submits each array element to `Count()` one by one and then concatenates the outputs of `Count()`.  `Count()`, when its input is a JSON object, returns the number of fields in the object (in this case, 2 fields for all department objects).\n",
    "\n",
    "The right way to implement `Count()` is to add an array-producing combinator as a parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Count (generic function with 2 methods)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Count(F) = x -> length(F(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Num_Depts = Count(Departments)\n",
    "Num_Depts(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How to use composition with `Count()` correctly?  Here is an example: *show the number of employees for each department*.  Consider this: *number of employees* is a (derived) property of *each department*, which suggests us to compose two combinators: one generating department objects and the other calculating the number of employees for a given department.  We get:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35-element Array{Any,1}:\n",
       "  1848\n",
       " 13570\n",
       "   924\n",
       "   397\n",
       "  2090\n",
       "     ⋮\n",
       "     9\n",
       "     2\n",
       "    43\n",
       "    39\n",
       "     1"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Num_Empls_Per_Dept = Departments >> Count(Employees)\n",
    "Num_Empls_Per_Dept(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On the other hand, if we'd like to *calculate the total number of employees*, the parameter of `Count()` should be the combinator that generates all the employees:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "32181"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Num_Empls = Count(Departments >> Employees)\n",
    "Num_Empls(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We could add other summarizing or *aggregate* combinators.  For example, let us define a combinator that finds the maximum value in an array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Max (generic function with 1 method)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Max(F) = x -> maximum(F(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Aggregate combinators could be combined to answer complex questions.  For example, let us *find the maximum number of employees per department*.  We already have a combinator that generates the number of employees for each department, all is left is to apply `Max()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "13570"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Max_Empls_Per_Dept = Max(Num_Empls_Per_Dept) # Max(Departments >> Count(Employees))\n",
    "Max_Empls_Per_Dept(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Constructing objects\n",
    "\n",
    "We learned how to traverse and summarize data.  Let us show how to create new structured data.\n",
    "\n",
    "Combinator `Select()` constructs JSON objects.  It is parameterized with a list of field names and constructors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Select (generic function with 1 method)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Select(fields...) =\n",
    "    x -> Dict(map(f -> f.first => f.second(x), fields))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For each input, `Select()` constructs a new JSON object with field values generated by field constructors applied to the input.\n",
    "\n",
    "Here is a simple example of `Select()` summarizing the input array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dict{ASCIIString,Int64} with 2 entries:\n",
       "  \"max\" => 30\n",
       "  \"len\" => 3"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S = Select(\"len\" => Count(This()), \"max\" => Max(This()))\n",
    "S([10, 20, 30])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, we can summarize any hierarchical dataset. Let us modify the query that *finds the number of employees for each department*.  Instead of a raw list of numbers, we will generate a table with the name of the department and its size (the number of employees):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35-element Array{Any,1}:\n",
       " Dict{ASCIIString,Any}(\"name\"=>\"WATER MGMNT\",\"size\"=>1848)    \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"POLICE\",\"size\"=>13570)        \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"GENERAL SERVICES\",\"size\"=>924)\n",
       " Dict{ASCIIString,Any}(\"name\"=>\"CITY COUNCIL\",\"size\"=>397)    \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"STREETS & SAN\",\"size\"=>2090)  \n",
       " ⋮                                                            \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"BOARD OF ETHICS\",\"size\"=>9)   \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"POLICE BOARD\",\"size\"=>2)      \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"BUDGET & MGMT\",\"size\"=>43)    \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"ADMIN HEARNG\",\"size\"=>39)     \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"LICENSE APPL COMM\",\"size\"=>1) "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Depts_With_Size =\n",
    "    Departments >> Select(\n",
    "        \"name\" => Name,\n",
    "        \"size\" => Count(Employees))\n",
    "\n",
    "Depts_With_Size(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This query could easily be expanded to add more information about the department.  For that, we only need to add extra field definitions to the `Select()` clause.  Notably, change in one field constructor cannot in any way affect the values of the other fields.\n",
    "\n",
    "Let us additionally determine *the top salary for each department*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35-element Array{Any,1}:\n",
       " Dict{ASCIIString,Any}(\"name\"=>\"WATER MGMNT\",\"max_salary\"=>169512,\"size\"=>1848)    \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"POLICE\",\"max_salary\"=>260004,\"size\"=>13570)        \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"GENERAL SERVICES\",\"max_salary\"=>157092,\"size\"=>924)\n",
       " Dict{ASCIIString,Any}(\"name\"=>\"CITY COUNCIL\",\"max_salary\"=>160248,\"size\"=>397)    \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"STREETS & SAN\",\"max_salary\"=>157092,\"size\"=>2090)  \n",
       " ⋮                                                                                 \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"BOARD OF ETHICS\",\"max_salary\"=>131688,\"size\"=>9)   \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"POLICE BOARD\",\"max_salary\"=>97728,\"size\"=>2)       \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"BUDGET & MGMT\",\"max_salary\"=>169992,\"size\"=>43)    \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"ADMIN HEARNG\",\"max_salary\"=>156420,\"size\"=>39)     \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"LICENSE APPL COMM\",\"max_salary\"=>69888,\"size\"=>1)  "
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Depts_With_Size_And_Max_Salary =\n",
    "    Departments >> Select(\n",
    "        \"name\" => Name,\n",
    "        \"size\" => Count(Employees),\n",
    "        \"max_salary\" => Max(Employees >> Salary))\n",
    "\n",
    "Depts_With_Size_And_Max_Salary(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Filtering data\n",
    "\n",
    "Remember the problem we stated in the beginning: *find the number of employees with the salary higher than $100k*.  We have almost all pieces we need to construct a solution of this problem.  One piece that appears to be missing is a way to refine data.  We need a combinator that, given a set of values and a predicate, produces the values that satisfy the predicate condition.\n",
    "\n",
    "Here is how we can implement it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Sieve (generic function with 1 method)"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Sieve(P) = x -> P(x) ? x : nothing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Combinator `Sieve(P)` is parameterized with a predicate combinator `P`.  A predicate is a combinator that, for any input, returns `true` or `false`. For example, a predicate combinator `(F < G)` with two parameters `F` and `G` returns, for any input `x`, the result of comparison `F(x) < G(x)`.\n",
    "\n",
    "Let us implement common predicate (and also some arithmetic) combinators:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "rem (generic function with 133 methods)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import Base: >, >=, <, <=, ==, !=, !, &, |, +, -, /, %\n",
    "\n",
    "(>)(F::Function, G::Function) = x -> F(x) > G(x)\n",
    "(>)(F::Function, n::Number) = F > Const(n)\n",
    "\n",
    "(>=)(F::Function, G::Function) = x -> F(x) >= G(x)\n",
    "(>=)(F::Function, n::Number) = F >= Const(n)\n",
    "\n",
    "(<)(F::Function, G::Function) = x -> F(x) < G(x)\n",
    "(<)(F::Function, n::Number) = F < Const(n)\n",
    "\n",
    "(<=)(F::Function, G::Function) = x -> F(x) <= G(x)\n",
    "(<=)(F::Function, n::Number) = F <= Const(n)\n",
    "\n",
    "(==)(F::Function, G::Function) = x -> F(x) == G(x)\n",
    "(==)(F::Function, n::Number) = F == Const(n)\n",
    "\n",
    "(!=)(F::Function, G::Function) = x -> F(x) != G(x)\n",
    "(!=)(F::Function, n::Number) = F != Const(n)\n",
    "\n",
    "(!)(F::Function) = x -> !F(x)\n",
    "(&)(F::Function, G::Function) = x -> F(x) && G(x)\n",
    "(|)(F::Function, G::Function) = x -> F(x) || G(x)\n",
    "\n",
    "(+)(F::Function, G::Function) = x -> F(x) + G(x)\n",
    "(+)(F::Function, n::Number) = F + Const(n)\n",
    "\n",
    "(-)(F::Function, G::Function) = x -> F(x) - G(x)\n",
    "(-)(F::Function, n::Number) = F - Const(n)\n",
    "\n",
    "(/)(F::Function, G::Function) = x -> F(x) / G(x)\n",
    "(/)(F::Function, n::Number) = F / Const(n)\n",
    "\n",
    "(%)(F::Function, G::Function) = x -> F(x) % G(x)\n",
    "(%)(F::Function, n::Number) = F % Const(n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Sieve(P)` tests its input on the predicate condition `P`.  If the input satisfies the condition, it is returned without changes.  Otherwise, `nothing` is returned.\n",
    "\n",
    "Here is a trivial example to demonstrate how `Sieve()` works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(5,nothing)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Take_Odd = Sieve(This() % 2 == 1)\n",
    "Take_Odd(5), Take_Odd(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When the composition operator accumulates values for array output, it drops `nothing` values.  Thus, in a composition `(F >> Sieve(P))` with an array-generating combinator `F`, `Sieve(P)` filters the elements of the array.\n",
    "\n",
    "Let us use this feature to *list the departments with more than 1000 employees*.  We already defined a combinator producing departments with the number of employees, we just need to filter its output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7-element Array{Any,1}:\n",
       " Dict{ASCIIString,Any}(\"name\"=>\"WATER MGMNT\",\"size\"=>1848)  \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"POLICE\",\"size\"=>13570)      \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"STREETS & SAN\",\"size\"=>2090)\n",
       " Dict{ASCIIString,Any}(\"name\"=>\"AVIATION\",\"size\"=>1344)     \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"FIRE\",\"size\"=>4875)         \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"OEMC\",\"size\"=>1135)         \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"TRANSPORTN\",\"size\"=>1200)   "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Size = Field(\"size\")\n",
    "\n",
    "Large_Depts = Depts_With_Size >> Sieve(Size > 1000)\n",
    "Large_Depts(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, we can *list positions of employees with salary higher than 200k*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3-element Array{Any,1}:\n",
       " \"SUPERINTENDENT OF POLICE\"\n",
       " \"FIRE COMMISSIONER\"       \n",
       " \"MAYOR\"                   "
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Position = Field(\"position\")\n",
    "\n",
    "Very_Well_Paid_Posns =\n",
    "    Departments >> Employees >> Sieve(Salary > 200000) >> Position\n",
    "\n",
    "Very_Well_Paid_Posns(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With `Sieve()` defined, we are finally able to answer the original question using combinators:\n",
    "\n",
    "> *For each department, find the number of employees with salary higher than 100k.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35-element Array{Any,1}:\n",
       " Dict{ASCIIString,Any}(\"name\"=>\"WATER MGMNT\",\"N100k\"=>179)    \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"POLICE\",\"N100k\"=>1493)        \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"GENERAL SERVICES\",\"N100k\"=>79)\n",
       " Dict{ASCIIString,Any}(\"name\"=>\"CITY COUNCIL\",\"N100k\"=>54)    \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"STREETS & SAN\",\"N100k\"=>39)   \n",
       " ⋮                                                            \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"BOARD OF ETHICS\",\"N100k\"=>2)  \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"POLICE BOARD\",\"N100k\"=>0)     \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"BUDGET & MGMT\",\"N100k\"=>12)   \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"ADMIN HEARNG\",\"N100k\"=>3)     \n",
       " Dict{ASCIIString,Any}(\"name\"=>\"LICENSE APPL COMM\",\"N100k\"=>0)"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Better_Depts_With_Num_Well_Paid_Empls =\n",
    "    Departments >> Select(\n",
    "        \"name\" => Name,\n",
    "        \"N100k\" => Count(Employees >> Sieve(Salary > 100000)))\n",
    "\n",
    "Better_Depts_With_Num_Well_Paid_Empls(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compare it with the original solution.  The new one reads much better!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Depts_With_Num_Well_Paid_Empls (generic function with 1 method)"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Depts_With_Num_Well_Paid_Empls(data) =\n",
    "    map(d -> Dict(\n",
    "            \"name\" => d[\"name\"],\n",
    "            \"N100k\" => length(filter(e -> e[\"salary\"] > 100000, d[\"employees\"]))),\n",
    "        data[\"departments\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parameters\n",
    "\n",
    "We achieved our goal of sketching (a prototype of) a query language for hierarchical databases.  Let us explore how it could be developed further.  One possible way to improve it is by adding query parameters.\n",
    "\n",
    "Consider a problem: *find the number of employees whose annual salary exceeds 200k*.  We have all the tools to solve it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Num_Very_Well_Paid_Empls =\n",
    "    Count(Departments >> Employees >> Sieve(Salary >= 200000))\n",
    "\n",
    "Num_Very_Well_Paid_Empls(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, imagine that we'd like to *find the number of employees with salary in a certain range*, but we don't know the range at the time we construct the query.  Instead, we want to specify the range when we *execute* the query.\n",
    "\n",
    "Let us introduce a *query context*, a collection of parameters and their values.  We'd like the query context to travel with the input, where each combinator could access it if necessary.  Thus, we have an updated definition of a JSON combinator: a function that maps JSON input and query context to JSON output.\n",
    "\n",
    "We need to update existing combinators to make them context-aware:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "rem (generic function with 133 methods)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Const(val) = (x, ctx...) -> val\n",
    "This() = (x, ctx...) -> x\n",
    "\n",
    "(F >> G) = (x, ctx...) -> _flat(_map(G, F(x, ctx...), ctx...))\n",
    "_map(G, y, ctx...) =\n",
    "    isa(y, Array) ? map(_expand, map(yi -> G(yi, ctx...), y)) : G(y, ctx...)\n",
    "\n",
    "Field(name) = (x, ctx...) -> x[name]\n",
    "Select(fields...) =\n",
    "    (x, ctx...) -> Dict(map(f -> f.first => f.second(x, ctx...), fields))\n",
    "\n",
    "Count(F) = (x, ctx...) -> length(F(x, ctx...))\n",
    "Max(F) = (x, ctx...) -> maximum(F(x, ctx...))\n",
    "\n",
    "Sieve(P) = (x, ctx...) -> P(x, ctx...) ? x : nothing\n",
    "(>)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) > G(x, ctx...)\n",
    "(>=)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) >= G(x, ctx...)\n",
    "(<)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) < G(x, ctx...)\n",
    "(<=)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) <= G(x, ctx...)\n",
    "(==)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) == G(x, ctx...)\n",
    "(!=)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) != G(x, ctx...)\n",
    "(!)(F::Function) = (x, ctx...) -> !F(x, ctx...)\n",
    "(&)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) && G(x, ctx...)\n",
    "(|)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) || G(x, ctx...)\n",
    "(+)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) + G(x, ctx...)\n",
    "(-)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) - G(x, ctx...)\n",
    "(/)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) / G(x, ctx...)\n",
    "(%)(F::Function, G::Function) = (x, ctx...) -> F(x, ctx...) % G(x, ctx...)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let us add combinator `Var(name)` that extracts the value of a parameter from the query context."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Var (generic function with 1 method)"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Var(name) = (x, ctx...) -> Dict(ctx)[name]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can make parameterized queries.  *Find the number of employees with salary in a certain range:*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3916"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Min_Salary = Var(\"min_salary\")\n",
    "Max_Salary = Var(\"max_salary\")\n",
    "\n",
    "Departments = Field(\"departments\")\n",
    "Employees = Field(\"employees\")\n",
    "Salary = Field(\"salary\")\n",
    "\n",
    "Num_Empls_By_Salary =\n",
    "    Count(\n",
    "        Departments >>\n",
    "        Employees >>\n",
    "        Sieve((Salary >= Min_Salary) & (Salary < Max_Salary)))\n",
    "\n",
    "Num_Empls_By_Salary(citydb, \"min_salary\" => 100000, \"max_salary\" => 200000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use of context is not limited to query parameters.  We can also update the context dynamically.\n",
    "\n",
    "Consider a problem: *find the employee with the highest salary*.\n",
    "\n",
    "It can be solved in two queries.  First, *find the highest salary:*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "260004"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Max_Salary = Max(Departments >> Employees >> Salary)\n",
    "Max_Salary(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Second, *find the employee with the given salary:*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1-element Array{Any,1}:\n",
       " Dict{AbstractString,Any}(\"name\"=>\"GARRY\",\"surname\"=>\"M\",\"position\"=>\"SUPERINTENDENT OF POLICE\",\"salary\"=>260004)"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "The_Salary = Var(\"salary\")\n",
    "\n",
    "Empl_With_Salary = Departments >> Employees >> Sieve(Salary == The_Salary)\n",
    "Empl_With_Salary(citydb, \"salary\" => 260004)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to automate this sequence of operations.  Specifically, we take a value calculated by one combinator, assign it to some context parameter, and then evaluate the other combinator in the updated context.  That's what `Given()` combinator does:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Given (generic function with 1 method)"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Given(F, vars...) =\n",
    "    (x, ctx...) ->\n",
    "        let ctx = (ctx..., map(v -> v.first => v.second(x, ctx...), vars)...)\n",
    "            F(x, ctx...)\n",
    "        end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Combining `Max_Salary` and `Empl_With_Salary` using `Given`, we get:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1-element Array{Any,1}:\n",
       " Dict{AbstractString,Any}(\"name\"=>\"GARRY\",\"surname\"=>\"M\",\"position\"=>\"SUPERINTENDENT OF POLICE\",\"salary\"=>260004)"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Empl_With_Max_Salary = # Given(Empl_With_Salary, \"salary\" => Max_Salary)\n",
    "    Given(\n",
    "        Departments >> Employees >> Sieve(Salary == The_Salary),\n",
    "        \"salary\" => Max(Departments >> Employees >> Salary))\n",
    "\n",
    "Empl_With_Max_Salary(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is not just a convenience feature.  Indeed, let us change this query to *find the highest paid employee for each department*.  To implement it, we need to pull `Departments` from the `Given()` clause:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35-element Array{Any,1}:\n",
       " Dict{AbstractString,Any}(\"name\"=>\"THOMAS\",\"surname\"=>\"P\",\"position\"=>\"COMMISSIONER OF WATER MGMT\",\"salary\"=>169512)                \n",
       " Dict{AbstractString,Any}(\"name\"=>\"GARRY\",\"surname\"=>\"M\",\"position\"=>\"SUPERINTENDENT OF POLICE\",\"salary\"=>260004)                   \n",
       " Dict{AbstractString,Any}(\"name\"=>\"DAVID\",\"surname\"=>\"R\",\"position\"=>\"COMMISSIONER OF FLEET & FACILITY MANAGEMENT\",\"salary\"=>157092)\n",
       " Dict{AbstractString,Any}(\"name\"=>\"MARLA\",\"surname\"=>\"K\",\"position\"=>\"CHIEF ADMINISTRATIVE OFFICER\",\"salary\"=>160248)               \n",
       " Dict{AbstractString,Any}(\"name\"=>\"CHARLES\",\"surname\"=>\"W\",\"position\"=>\"COMMISSIONER OF STREETS AND SANITATION\",\"salary\"=>157092)   \n",
       " ⋮                                                                                                                                  \n",
       " Dict{AbstractString,Any}(\"name\"=>\"STEVEN\",\"surname\"=>\"B\",\"position\"=>\"EXECUTIVE DIR - BOARD OF ETHICS\",\"salary\"=>131688)           \n",
       " Dict{AbstractString,Any}(\"name\"=>\"MAX\",\"surname\"=>\"C\",\"position\"=>\"EXECUTIVE DIR - POLICE BOARD\",\"salary\"=>97728)                  \n",
       " Dict{AbstractString,Any}(\"name\"=>\"ALEXANDRA\",\"surname\"=>\"H\",\"position\"=>\"BUDGET DIR\",\"salary\"=>169992)                             \n",
       " Dict{AbstractString,Any}(\"name\"=>\"PATRICIA\",\"surname\"=>\"J\",\"position\"=>\"DIR OF ADMINISTRATIVE HEARINGS\",\"salary\"=>156420)          \n",
       " Dict{AbstractString,Any}(\"name\"=>\"MICHELLE\",\"surname\"=>\"G\",\"position\"=>\"STAFF ASST\",\"salary\"=>69888)                               "
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Empls_With_Max_Salary_By_Dept =\n",
    "    Departments >> Given(\n",
    "        Employees >> Sieve(Salary == The_Salary),\n",
    "        \"salary\" => Max(Employees >> Salary))\n",
    "\n",
    "Empls_With_Max_Salary_By_Dept(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Limitations and conclusion\n",
    "\n",
    "Consider a problem: *find the top salary for each department*.  This is an easy one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "35-element Array{Any,1}:\n",
       " 169512\n",
       " 260004\n",
       " 157092\n",
       " 160248\n",
       " 157092\n",
       "      ⋮\n",
       " 131688\n",
       "  97728\n",
       " 169992\n",
       " 156420\n",
       "  69888"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Max_Salary_By_Dept = Departments >> Max(Employees >> Salary)\n",
    "Max_Salary_By_Dept(citydb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now change it to: *find the top salary for each position.*  We can't solve it with our current set of combinators.  Why?\n",
    "\n",
    "Look at the database hierarchy diagram:\n",
    "\n",
    "$$\n",
    "\\text{departments} \\quad\n",
    "\\begin{cases}\n",
    "    \\text{name} \\\\\n",
    "    \\text{employees} \\quad\n",
    "        \\begin{cases}\n",
    "            \\text{name} \\\\\n",
    "            \\text{surname} \\\\\n",
    "            \\text{position} \\\\\n",
    "            \\text{salary}\n",
    "        \\end{cases}\n",
    "\\end{cases}\n",
    "$$\n",
    "\n",
    "The structure of the first query (*top salary for each department*) fits the structure of the database:\n",
    "\n",
    "$$\n",
    "\\textbf{departments}\\gg \\quad\n",
    "\\begin{cases}\n",
    "    \\text{name} \\\\\n",
    "    \\gg\\textbf{employees}\\gg \\quad\n",
    "        \\begin{cases}\n",
    "            \\text{name} \\\\\n",
    "            \\text{surname} \\\\\n",
    "            \\text{position} \\\\\n",
    "            \\gg\\textbf{salary}\n",
    "        \\end{cases}\n",
    "\\end{cases}\n",
    "$$\n",
    "\n",
    "The structure of the second query (*top salary for each position*) violates this structure:\n",
    "\n",
    "$$\n",
    "\\text{departments} \\quad\n",
    "\\begin{cases}\n",
    "    \\text{name} \\\\\n",
    "    \\textbf{employees}\\lessgtr \\quad\n",
    "        \\begin{cases}\n",
    "            \\text{name} \\\\\n",
    "            \\text{surname} \\\\\n",
    "            \\ll\\textbf{position} \\\\\n",
    "            \\gg\\textbf{salary}\n",
    "        \\end{cases}\n",
    "\\end{cases}\n",
    "$$\n",
    "\n",
    "This is not the only limitation.  Let us not forget that real databases are *decidedly* non-hierarchical.  For example, this is the database schema (designed by Charles Tirrell) of our flagship product [RexStudy](http://www.rexdb.org/).  No hierarchy in sight!\n",
    "\n",
    "![RexStudy Data Model](http://i.imgur.com/HRRYysK.png)\n",
    "\n",
    "As a conclusion, combinators are awesome for querying data as long as:\n",
    "\n",
    "1. The data is hierarchical.\n",
    "2. The structure of the query respects the structure of the data.\n",
    "\n",
    "Otherwise, we are out of luck...\n",
    "\n",
    "*... Or are we?*"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 0.5.0-dev",
   "language": "julia",
   "name": "julia-0.5"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "0.5.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}