{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Formal Language Definition" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import lux\n", "from lux.vis.VisList import VisList\n", "from lux.vis.Vis import Vis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Lux intent specification can be defined as a context-free grammar (CFG). Here, we introduce a formal definition of the intent language in Lux for interested readers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('https://github.com/lux-org/lux-datasets/blob/master/data/cars.csv?raw=true')\n", "df[\"Year\"] = pd.to_datetime(df[\"Year\"], format='%Y') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Composing a Lux *Intent* with `Clause` objects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An *intent* in Lux corresponds to the Kleene star of `Clause` objects, i.e., it can have either zero, one, or multiple `Clause`s." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\\begin{equation}\n", "\\langle Intent\\rangle \\rightarrow \\langle Clause\\rangle^* \\\\\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "spec1 = lux.Clause(\"MilesPerGal\")\n", "spec2 = lux.Clause(\"Horsepower\")\n", "spec3 = lux.Clause(\"Origin=USA\")\n", "intent = [spec1, spec2, spec3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is an example of how we can formulate an intent as a list of `Clause` and generate a visualization. In this tutorial, we will discuss how the `Clause` breaks down to different production rules." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "92d2983aa2884f13b33983f926719264", "version_major": 2, "version_minor": 0 }, "text/plain": [ "LuxWidget(current_vis={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'axis': {'labelCo…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Vis(intent,df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `Clause` can either be an `Axis` specification or a `Filter` specification.\n", "Note that it is not possible for a `Clause` to be both an `Axis` and a `Filter`, but they can be specified as separate `Clause`s in the intent.\n", "\n", "\\begin{equation}\n", " \\begin{split}\n", " \\langle Clause\\rangle &\\rightarrow \\langle Axis \\rangle \\\\\n", " &\\rightarrow \\langle Filter \\rangle \n", " \\end{split}\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "axisSpec = lux.Clause(attribute=\"MilesPerGal\") \n", "# Equivalent, easier-to-specify Clause syntax : lux.Clause(\"MilesPerGal\") \n", "axisSpec" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filterSpec = lux.Clause(attribute=\"Origin\",filter_op=\"=\",value=\"USA\")\n", "# Equivalent, easier-to-specify Clause syntax : lux.Clause(\"Origin=USA\") \n", "filterSpec" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `Axis` specification" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An `Axis` requires an `attribute` specification, and an optional `channel`, `aggregation`, or `bin_size` specification.\n", "\\begin{equation}\n", " \\langle Axis \\rangle \\rightarrow \\langle attribute \\rangle \\langle channel \\rangle \\langle aggregation \\rangle \\langle bin\\_size \\rangle\n", "\\end{equation} \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An `attribute` can either be a single column in the dataset, a list of columns, or a `wildcard`.\n", "\\begin{equation}\n", " \\begin{split}\n", " \\langle attribute \\rangle &\\rightarrow \\textrm{attribute} \\\\\n", " &\\rightarrow \\textrm{attribute} \\cup \\langle attribute \\rangle\\\\\n", " &\\rightarrow \\langle wildcard \\rangle\n", " \\end{split}\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# user is interested in \"MilesPerGal\"\n", "attribute = lux.Clause(\"Origin\") \n", "\n", "# user is interested in \"MilesPerGal\",\"Horsepower\", or \"Weight\"\n", "attribute = lux.Clause([\"MilesPerGal\",\"Horsepower\",\"Weight\"]) \n", "\n", "# user is interested in any attribute\n", "attribute = lux.Clause(\"?\") " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Optional specification of the `Axis` include : \n", "\n", "\\begin{equation}\n", " \\begin{aligned}\n", " &\\langle channel\\rangle \\rightarrow (\\textrm{x } |\\textrm{ y }|\\textrm{ color }|\\textrm{ auto})\\\\\n", " &\\langle aggregation\\rangle \\rightarrow (\\textrm{mean }| \\textrm{ sum } | \\textrm{ count } | \\textrm{ min } | \\textrm{ max } | \\textrm{ any numpy aggregation function }| \\textrm{ auto})\\\\\n", " &\\langle bin\\_size \\rangle \\rightarrow ( \\textrm{any integer } | \\textrm{ auto})\\\\\n", " \\end{aligned}\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Ensure that \"MilesPerGal\" is placed on the x axis\n", "axisSpec = lux.Clause(\"MilesPerGal\",channel=\"x\")\n", "\n", "# Apply sum on \"MilesPerGal\" \n", "axisSpec = lux.Clause(\"MilesPerGal\",aggregation=\"sum\")\n", "\n", "# Divide \"MilesPerGal\" into 50 bins\n", "axisSpec = lux.Clause(\"MilesPerGal\",bin_size=50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example: Effects of optional attribute specification parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, if we specify only an attribute, the system automatically infers the appropriate `channel`, `aggregation`, or `bin_size`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "axisSpec = \"MilesPerGal\"\n", "Vis([axisSpec],df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can increase the `bin_size` as an optional parameter: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bin50MPG = lux.Clause(\"MilesPerGal\",bin_size=50)\n", "Vis([bin50MPG],df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For bar charts, Lux uses a default aggregation of `mean` and displays a horizontal bar chart with `Origin` on the y axis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "axisSpec1 = \"MilesPerGal\"\n", "axisSpec2 = \"Origin\"\n", "Vis([axisSpec1,axisSpec2],df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can change the `mean` to a `sum` aggregation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "axisSpec1 = lux.Clause(\"MilesPerGal\",aggregation=\"sum\")\n", "axisSpec2 = \"Origin\"\n", "Vis([axisSpec1,axisSpec2],df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or we can set the `Origin` on the x-axis to get a vertical bar chart instead:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "axisSpec1 = \"MilesPerGal\"\n", "axisSpec2 = lux.Clause(\"Origin\",channel=\"x\")\n", "Vis([axisSpec1,axisSpec2],df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `Wildcard` attribute specifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `wildcard` consists of an \"any\" specifier (?) with an optional `constraint` clause, that constrains the set of attributes that Lux enumerates over.\n", "\n", "\\begin{equation}\n", " \\langle wildcard \\rangle \\rightarrow \\textrm{( ? )} \\langle constraint\\rangle\\\\\n", "\\end{equation}\n", "\n", "\n", "\\begin{equation}\n", " \\langle constraint \\rangle \\rightarrow \\langle data\\_model\\rangle \\langle data\\_type\\rangle\\\\\n", "\\end{equation}\n", "\n", "\\begin{equation}\n", " \\begin{aligned}\n", " &\\langle data\\_type\\rangle \\rightarrow (\\textrm{quantitative }| \\textrm{ nominal } | \\textrm{ ordinal } | \\textrm{ temporal } | \\textrm{ auto})\\\\\n", " &\\langle data\\_model\\rangle \\rightarrow (\\textrm{dimension }|\\textrm{ measure }|\\textrm{ auto})\\\\\n", " \\end{aligned}\n", "\\end{equation}\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# user is interested in any ordinal attribute\n", "wildcard = lux.Clause(\"?\",data_type=\"temporal\")\n", "\n", "# user is interested in any measure attribute\n", "wildcard = lux.Clause(\"?\",data_model=\"measure\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example: `Origin` with respect to other measure variables" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "origin = lux.Clause(\"Origin\") \n", "anyMeasure = lux.Clause(\"?\",data_model=\"measure\")\n", "VisList([origin, anyMeasure],df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `Filter` specification" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\\begin{equation}\n", " \\langle Filter \\rangle \\rightarrow \\langle attribute\\rangle (=|>|<|\\leq|\\geq|\\neq) \\langle value\\rangle\\\\\n", "\\end{equation}\n", "\n", "\\begin{equation}\n", " \\begin{split}\n", " \\langle value \\rangle &\\rightarrow \\textrm{value} \\\\\n", " &\\rightarrow \\textrm{value} \\cup \\langle value \\rangle\\\\\n", " &\\rightarrow (\\textrm{?})\n", " \\end{split}\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# user is interested in only Ford cars\n", "value = \"ford\"\n", "filterSpec = lux.Clause(attribute=\"Brand\", filter_op=\"=\",value=value) \n", "\n", "# user is interested in cars that are either Ford, Chevrolet, or Toyota\n", "value = [\"ford\",\"chevrolet\",\"toyota\"]\n", "filterSpec = lux.Clause(attribute=\"Brand\", filter_op=\"=\",value=value) \n", "\n", "# user is interested in cars that are of any Brand\n", "value = \"?\"\n", "filterSpec = lux.Clause(attribute=\"Brand\", filter_op=\"=\",value=value) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example: Distribution of `Horsepower` for different Brands" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "horsepower = lux.Clause(\"Horsepower\")\n", "anyBrand = lux.Clause(attribute=\"Brand\", filter_op=\"=\",value=\"?\") \n", "VisList([horsepower, anyBrand],df)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }