{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Formal Language Definition"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import lux\n",
    "from lux.vis.VisList import VisList\n",
    "from lux.vis.Vis import Vis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Lux intent specification can be defined as a context-free grammar (CFG). Here, we introduce a formal definition of the intent language in Lux for interested readers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('https://github.com/lux-org/lux-datasets/blob/master/data/cars.csv?raw=true')\n",
    "df[\"Year\"] = pd.to_datetime(df[\"Year\"], format='%Y') "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Composing a Lux *Intent* with  `Clause` objects"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An *intent* in Lux corresponds to the Kleene star of `Clause` objects, i.e., it can have either zero, one, or multiple `Clause`s."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{equation}\n",
    "\\langle Intent\\rangle \\rightarrow \\langle Clause\\rangle^* \\\\\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "spec1 = lux.Clause(\"MilesPerGal\")\n",
    "spec2 = lux.Clause(\"Horsepower\")\n",
    "spec3 = lux.Clause(\"Origin=USA\")\n",
    "intent = [spec1, spec2, spec3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is an example of how we can formulate an intent as a list of `Clause` and generate a visualization. In this tutorial, we will discuss how the `Clause` breaks down to different production rules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "92d2983aa2884f13b33983f926719264",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "LuxWidget(current_vis={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'axis': {'labelCo…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "<Vis  (x: MilesPerGal, y: Horsepower -- [Origin=USA]) mark: scatter, score: 0.0 >"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Vis(intent,df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `Clause` can either be an `Axis` specification or a `Filter` specification.\n",
    "Note that it is not possible for a `Clause` to be both an `Axis` and a `Filter`, but they can be specified as separate `Clause`s in the intent.\n",
    "\n",
    "\\begin{equation}\n",
    "    \\begin{split}\n",
    "        \\langle Clause\\rangle &\\rightarrow \\langle Axis \\rangle \\\\\n",
    "        &\\rightarrow \\langle Filter \\rangle \n",
    "    \\end{split}\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "axisSpec = lux.Clause(attribute=\"MilesPerGal\") \n",
    "# Equivalent, easier-to-specify Clause syntax : lux.Clause(\"MilesPerGal\") \n",
    "axisSpec"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filterSpec = lux.Clause(attribute=\"Origin\",filter_op=\"=\",value=\"USA\")\n",
    "# Equivalent, easier-to-specify Clause syntax : lux.Clause(\"Origin=USA\") \n",
    "filterSpec"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `Axis` specification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An `Axis` requires an `attribute` specification, and an optional `channel`, `aggregation`, or `bin_size` specification.\n",
    "\\begin{equation}\n",
    "    \\langle Axis \\rangle \\rightarrow \\langle attribute \\rangle \\langle channel \\rangle \\langle aggregation \\rangle \\langle bin\\_size \\rangle\n",
    "\\end{equation}        \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An `attribute` can either be a single column in the dataset, a list of columns, or a `wildcard`.\n",
    "\\begin{equation}\n",
    "    \\begin{split}\n",
    "        \\langle attribute \\rangle &\\rightarrow \\textrm{attribute} \\\\\n",
    "        &\\rightarrow \\textrm{attribute} \\cup \\langle attribute \\rangle\\\\\n",
    "        &\\rightarrow \\langle wildcard \\rangle\n",
    "    \\end{split}\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# user is interested in \"MilesPerGal\"\n",
    "attribute = lux.Clause(\"Origin\") \n",
    "\n",
    "# user is interested in \"MilesPerGal\",\"Horsepower\", or \"Weight\"\n",
    "attribute = lux.Clause([\"MilesPerGal\",\"Horsepower\",\"Weight\"]) \n",
    "\n",
    "# user is interested in any attribute\n",
    "attribute = lux.Clause(\"?\") "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Optional specification of the `Axis` include : \n",
    "\n",
    "\\begin{equation}\n",
    "    \\begin{aligned}\n",
    "        &\\langle channel\\rangle \\rightarrow (\\textrm{x } |\\textrm{ y }|\\textrm{ color }|\\textrm{ auto})\\\\\n",
    "        &\\langle aggregation\\rangle \\rightarrow (\\textrm{mean }| \\textrm{ sum } | \\textrm{ count } | \\textrm{ min } | \\textrm{ max } | \\textrm{ any numpy aggregation function }| \\textrm{ auto})\\\\\n",
    "        &\\langle bin\\_size \\rangle \\rightarrow ( \\textrm{any integer } | \\textrm{ auto})\\\\\n",
    "    \\end{aligned}\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Ensure that \"MilesPerGal\" is placed on the x axis\n",
    "axisSpec = lux.Clause(\"MilesPerGal\",channel=\"x\")\n",
    "\n",
    "# Apply sum on \"MilesPerGal\" \n",
    "axisSpec = lux.Clause(\"MilesPerGal\",aggregation=\"sum\")\n",
    "\n",
    "# Divide \"MilesPerGal\" into 50 bins\n",
    "axisSpec = lux.Clause(\"MilesPerGal\",bin_size=50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example: Effects of optional attribute specification parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, if we specify only an attribute, the system automatically infers the appropriate `channel`, `aggregation`, or `bin_size`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "axisSpec = \"MilesPerGal\"\n",
    "Vis([axisSpec],df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can increase the `bin_size` as an optional parameter: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bin50MPG = lux.Clause(\"MilesPerGal\",bin_size=50)\n",
    "Vis([bin50MPG],df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For bar charts, Lux uses a default aggregation of `mean` and displays a horizontal bar chart with `Origin` on the y axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "axisSpec1 = \"MilesPerGal\"\n",
    "axisSpec2 = \"Origin\"\n",
    "Vis([axisSpec1,axisSpec2],df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can change the `mean` to a `sum` aggregation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "axisSpec1 = lux.Clause(\"MilesPerGal\",aggregation=\"sum\")\n",
    "axisSpec2 = \"Origin\"\n",
    "Vis([axisSpec1,axisSpec2],df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or we can set the `Origin` on the x-axis to get a vertical bar chart instead:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "axisSpec1 = \"MilesPerGal\"\n",
    "axisSpec2 = lux.Clause(\"Origin\",channel=\"x\")\n",
    "Vis([axisSpec1,axisSpec2],df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `Wildcard` attribute specifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `wildcard` consists of an \"any\" specifier (?) with an optional `constraint` clause, that constrains the set of attributes that Lux enumerates over.\n",
    "\n",
    "\\begin{equation}\n",
    "    \\langle wildcard \\rangle \\rightarrow \\textrm{( ? )} \\langle constraint\\rangle\\\\\n",
    "\\end{equation}\n",
    "\n",
    "\n",
    "\\begin{equation}\n",
    "    \\langle constraint \\rangle \\rightarrow \\langle data\\_model\\rangle \\langle data\\_type\\rangle\\\\\n",
    "\\end{equation}\n",
    "\n",
    "\\begin{equation}\n",
    "    \\begin{aligned}\n",
    "        &\\langle data\\_type\\rangle \\rightarrow (\\textrm{quantitative }| \\textrm{ nominal } | \\textrm{ ordinal } | \\textrm{ temporal } | \\textrm{ auto})\\\\\n",
    "        &\\langle data\\_model\\rangle \\rightarrow (\\textrm{dimension }|\\textrm{ measure }|\\textrm{ auto})\\\\\n",
    "    \\end{aligned}\n",
    "\\end{equation}\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# user is interested in any ordinal attribute\n",
    "wildcard = lux.Clause(\"?\",data_type=\"temporal\")\n",
    "\n",
    "# user is interested in any measure attribute\n",
    "wildcard = lux.Clause(\"?\",data_model=\"measure\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example: `Origin` with respect to other measure variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "origin = lux.Clause(\"Origin\") \n",
    "anyMeasure = lux.Clause(\"?\",data_model=\"measure\")\n",
    "VisList([origin, anyMeasure],df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `Filter` specification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{equation}\n",
    "    \\langle Filter \\rangle \\rightarrow \\langle attribute\\rangle (=|>|<|\\leq|\\geq|\\neq) \\langle value\\rangle\\\\\n",
    "\\end{equation}\n",
    "\n",
    "\\begin{equation}\n",
    "    \\begin{split}\n",
    "        \\langle value \\rangle &\\rightarrow \\textrm{value} \\\\\n",
    "        &\\rightarrow \\textrm{value} \\cup \\langle value \\rangle\\\\\n",
    "        &\\rightarrow (\\textrm{?})\n",
    "    \\end{split}\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# user is interested in only Ford cars\n",
    "value = \"ford\"\n",
    "filterSpec = lux.Clause(attribute=\"Brand\", filter_op=\"=\",value=value) \n",
    "\n",
    "# user is interested in cars that are either Ford, Chevrolet, or Toyota\n",
    "value = [\"ford\",\"chevrolet\",\"toyota\"]\n",
    "filterSpec = lux.Clause(attribute=\"Brand\", filter_op=\"=\",value=value) \n",
    "\n",
    "# user is interested in cars that are of any Brand\n",
    "value = \"?\"\n",
    "filterSpec = lux.Clause(attribute=\"Brand\", filter_op=\"=\",value=value) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example: Distribution of `Horsepower`  for different Brands"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "horsepower = lux.Clause(\"Horsepower\")\n",
    "anyBrand = lux.Clause(attribute=\"Brand\", filter_op=\"=\",value=\"?\") \n",
    "VisList([horsepower, anyBrand],df)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}