{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Julia is getting more and more interesting as a way to translate math into beautiful-looking code. After some halted attempts earlier at Julia 1.0 and 1.4, I'm giving it a try again at 1.6 after it seems like Julia is looking more and more stable. So here I'll write down a simple task that would be trivial to do in Python, except I'll do it in Julia instead. \n",
    "\n",
    "I want to:\n",
    "\n",
    "1. Read a CSV file from the web (i.e. in a github repository) as a dataframe\n",
    "2. Do some data cleaning - text replacement, fill na's, etc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Load a CSV File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "using HTTP, CSV, DataFrames, PyCall, DataFramesMeta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use a GET command to get some data, in this case it's a String\n",
    "# r = HTTP.get(\"https://api.github.com/users/banditkings\").body |> String;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "url = \"https://ocw.mit.edu/courses/sloan-school-of-management/15-071-the-analytics-edge-spring-2017/linear-regression/assignment-2/climate_change.csv\";"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Docstrings must be placed before a function call. On the plus side, it seems to render the docstring really nicely:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "Basic function call example. In this case we're using it to fetch a CSV file from\n",
    "github here.\n",
    "\n",
    "You can even use latex in here! \\$f(x) = Ax + b\\$\n",
    "\n",
    "<span style=\"color:red\">and HTML!!</span>\n",
    "\n",
    "Parameters\n",
    "----------\n",
    "url : String\n",
    "* url address for the CSV file that you want to read in as a dataframe\n",
    "\n",
    "Returns\n",
    "-------\n",
    "dataframe\n",
    "\"\"\"    \n",
    "function get_data(url)\n",
    "    body = HTTP.get(url).body\n",
    "    df = CSV.read(body, DataFrame);\n",
    "end;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/latex": [
       "Basic function call example. In this case we're using it to fetch a CSV file from github here.\n",
       "\n",
       "You can even use latex in here! $f(x) = Ax + b$\n",
       "\n",
       "<span style=\"color:red\">and HTML!!</span>\n",
       "\n",
       "\\subsection{Parameters}\n",
       "url : String\n",
       "\n",
       "\\begin{itemize}\n",
       "\\item url address for the CSV file that you want to read in as a dataframe\n",
       "\n",
       "\\end{itemize}\n",
       "\\subsection{Returns}\n",
       "dataframe\n",
       "\n"
      ],
      "text/markdown": [
       "Basic function call example. In this case we're using it to fetch a CSV file from github here.\n",
       "\n",
       "You can even use latex in here! $f(x) = Ax + b$\n",
       "\n",
       "<span style=\"color:red\">and HTML!!</span>\n",
       "\n",
       "## Parameters\n",
       "\n",
       "url : String\n",
       "\n",
       "  * url address for the CSV file that you want to read in as a dataframe\n",
       "\n",
       "## Returns\n",
       "\n",
       "dataframe\n"
      ],
      "text/plain": [
       "  Basic function call example. In this case we're using it to fetch a CSV file\n",
       "  from github here.\n",
       "\n",
       "  You can even use latex in here! \u001b[35mf(x) = Ax + b\u001b[39m\n",
       "\n",
       "  <span style=\"color:red\">and HTML!!</span>\n",
       "\n",
       "\u001b[1m  Parameters\u001b[22m\n",
       "\u001b[1m  ============\u001b[22m\n",
       "\n",
       "  url : String\n",
       "\n",
       "    •  url address for the CSV file that you want to read in as a\n",
       "       dataframe\n",
       "\n",
       "\u001b[1m  Returns\u001b[22m\n",
       "\u001b[1m  =========\u001b[22m\n",
       "\n",
       "  dataframe"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "?get_data() # display the docstring we just generated"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "### Control Flows and `try`:`catch`\n",
    "\n",
    "Our little `get_data` function is fine but I don't want to constantly be downloading this file if I don't have to...how about something like python's `try`:`except` and learn a little about how Julia handles control flows?\n",
    "\n",
    "* **Goal:** I want to try to load from CSV file if it's available, and load from a url if it doesn't exist. \n",
    "\n",
    "* Reference: [Control Flows](https://docs.julialang.org/en/v1/manual/control-flow/) and `try`:`catch` [here](https://docs.julialang.org/en/v1/manual/control-flow/#The-try/catch-statement)!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "try\n",
    "    df = CSV.read(\"climate_change.csv\", DataFrame);\n",
    "catch\n",
    "    println(\"No CSV Found, Downloading from source\")\n",
    "    df = get_data(url);\n",
    "    # We can also save the csv file locally\n",
    "    CSV.write(\"climate_change.csv\", df)\n",
    "end;\n",
    "\n",
    "df = CSV.read(\"climate_change.csv\", DataFrame);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python vs Julia: appending an item to a list (and PyCall in Jupyter)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another key difference from python is appending lists together. In python, you'll create a list like:\n",
    "\n",
    "```python\n",
    ">>> foo = [1]\n",
    ">>> foo = foo + ['a', 'b']\n",
    ">>> foo\n",
    "[1, 'a', 'b']\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Actually why don't we use the `PyCall` module to demonstrate the above? With `PyCall`, you can call Python and get functions and variables out. I tried this a while back and it seemed to be really bad at supporting pandas dataframes but things like numpy arrays are fine. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3-element Vector{Any}:\n",
       " 1\n",
       "  \"a\"\n",
       "  \"b\""
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Example of using Python within a jupyter notebook cell with a Julia kernel\n",
    "\n",
    "py\"\"\"\n",
    "foo = [1]\n",
    "foo = foo + ['a', 'b']\n",
    "\"\"\"\n",
    "\n",
    "# Reference an existing variable or function\n",
    "py\"foo\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now here's the same thing in Julia:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2-element Vector{Any}:\n",
       " 1\n",
       "  \"a\""
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "foo = Any[1]; # need to set to Any or it'll assume it's an Integer and it'll fail on the two strings there\n",
    "append!(foo, [\"a\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What is that `!` symbol?\n",
    "\n",
    "Okay notice the exclamation mark `!` after `append` - this is special Julia notation for functions that modify something inplace. In this case, `append!` adds an item to the list in place. You could use `push!` if adding a single item. And there's no `append` function, only `append!`. How do you do something where you don't modify it in place? You can't, but you can make a copy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3-element Vector{Any}:\n",
       " 1\n",
       "  \"a\"\n",
       "  \"b\""
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bar = foo |> copy\n",
    "append!(bar, [\"b\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "false"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "foo == bar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DataFrame Manipulation\n",
    "\n",
    "With Base Julia:\n",
    "[Documentation](https://dataframes.juliadata.org/stable/man/working_with_dataframes/)\n",
    "\n",
    "`head` and `tail` become `first` and `last` with Julia."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>5 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>5</td><td>2.556</td><td>345.96</td><td>1638.59</td><td>303.677</td><td>191.324</td><td>350.113</td><td>1366.1</td><td>0.0863</td></tr><tr><th>2</th><td>1983</td><td>6</td><td>2.167</td><td>345.52</td><td>1633.71</td><td>303.746</td><td>192.057</td><td>351.848</td><td>1366.12</td><td>0.0794</td></tr><tr><th>3</th><td>1983</td><td>7</td><td>1.741</td><td>344.15</td><td>1633.22</td><td>303.795</td><td>192.818</td><td>353.725</td><td>1366.29</td><td>0.0731</td></tr><tr><th>4</th><td>1983</td><td>8</td><td>1.13</td><td>342.25</td><td>1631.35</td><td>303.839</td><td>193.602</td><td>355.633</td><td>1366.42</td><td>0.0673</td></tr><tr><th>5</th><td>1983</td><td>9</td><td>0.428</td><td>340.17</td><td>1648.4</td><td>303.901</td><td>194.392</td><td>357.465</td><td>1366.23</td><td>0.0619</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 5 & 2.556 & 345.96 & 1638.59 & 303.677 & 191.324 & 350.113 & 1366.1 & 0.0863 & $\\dots$ \\\\\n",
       "\t2 & 1983 & 6 & 2.167 & 345.52 & 1633.71 & 303.746 & 192.057 & 351.848 & 1366.12 & 0.0794 & $\\dots$ \\\\\n",
       "\t3 & 1983 & 7 & 1.741 & 344.15 & 1633.22 & 303.795 & 192.818 & 353.725 & 1366.29 & 0.0731 & $\\dots$ \\\\\n",
       "\t4 & 1983 & 8 & 1.13 & 342.25 & 1631.35 & 303.839 & 193.602 & 355.633 & 1366.42 & 0.0673 & $\\dots$ \\\\\n",
       "\t5 & 1983 & 9 & 0.428 & 340.17 & 1648.4 & 303.901 & 194.392 & 357.465 & 1366.23 & 0.0619 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m5×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1983      5    2.556   345.96  1638.59  303.677  191.324  350.113  136 ⋯\n",
       "   2 │  1983      6    2.167   345.52  1633.71  303.746  192.057  351.848  136\n",
       "   3 │  1983      7    1.741   344.15  1633.22  303.795  192.818  353.725  136\n",
       "   4 │  1983      8    1.13    342.25  1631.35  303.839  193.602  355.633  136\n",
       "   5 │  1983      9    0.428   340.17  1648.4   303.901  194.392  357.465  136 ⋯\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first(df, 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also pipe commands as well with Julia's base pipe `|>`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p>DataFrameRow (11 columns)</p><div class=\"data-frame\"><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>308</th><td>2008</td><td>12</td><td>-0.666</td><td>385.56</td><td>1812.88</td><td>322.182</td><td>244.204</td><td>535.005</td><td>1365.69</td><td>0.0046</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t308 & 2008 & 12 & -0.666 & 385.56 & 1812.88 & 322.182 & 244.204 & 535.005 & 1365.69 & 0.0046 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1mDataFrameRow\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       " 308 │  2008     12   -0.666   385.56  1812.88  322.182  244.204  535.005  136 ⋯\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df |> last"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How about the last 5 by pipe `|>`? Well it's not as good as R's `dplyr` unfortunately...the base pipe function in Julia can't handle multiple parameters. There are other libraries like `Pipe` and `Chain` that make use of macros, but compared to `dplyr` or even method chaining in Python, they both are too clunky. \n",
    "\n",
    "Libraries like the [Query](https://github.com/queryverse/Query.jl) from the [Queryverse](https://www.queryverse.org) attempt to address this but one of the many issues of trying to do work in a new language that's decades behind Python and R. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Indexing and Slicing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Indexing is just like pandas like:\n",
    "```Julia\n",
    "df[rows, columns]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>6 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>9</td><td>0.428</td><td>340.17</td><td>1648.4</td><td>303.901</td><td>194.392</td><td>357.465</td><td>1366.23</td><td>0.0619</td></tr><tr><th>2</th><td>1983</td><td>10</td><td>0.002</td><td>340.3</td><td>1663.79</td><td>303.97</td><td>195.171</td><td>359.174</td><td>1366.06</td><td>0.0569</td></tr><tr><th>3</th><td>1983</td><td>11</td><td>-0.176</td><td>341.53</td><td>1658.23</td><td>304.032</td><td>195.921</td><td>360.758</td><td>1366.11</td><td>0.0524</td></tr><tr><th>4</th><td>1983</td><td>12</td><td>-0.176</td><td>343.07</td><td>1654.31</td><td>304.082</td><td>196.609</td><td>362.174</td><td>1366.06</td><td>0.0486</td></tr><tr><th>5</th><td>1984</td><td>1</td><td>-0.339</td><td>344.05</td><td>1658.98</td><td>304.13</td><td>197.219</td><td>363.359</td><td>1365.43</td><td>0.0451</td></tr><tr><th>6</th><td>1984</td><td>2</td><td>-0.565</td><td>344.77</td><td>1656.48</td><td>304.194</td><td>197.759</td><td>364.296</td><td>1365.66</td><td>0.0416</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 9 & 0.428 & 340.17 & 1648.4 & 303.901 & 194.392 & 357.465 & 1366.23 & 0.0619 & $\\dots$ \\\\\n",
       "\t2 & 1983 & 10 & 0.002 & 340.3 & 1663.79 & 303.97 & 195.171 & 359.174 & 1366.06 & 0.0569 & $\\dots$ \\\\\n",
       "\t3 & 1983 & 11 & -0.176 & 341.53 & 1658.23 & 304.032 & 195.921 & 360.758 & 1366.11 & 0.0524 & $\\dots$ \\\\\n",
       "\t4 & 1983 & 12 & -0.176 & 343.07 & 1654.31 & 304.082 & 196.609 & 362.174 & 1366.06 & 0.0486 & $\\dots$ \\\\\n",
       "\t5 & 1984 & 1 & -0.339 & 344.05 & 1658.98 & 304.13 & 197.219 & 363.359 & 1365.43 & 0.0451 & $\\dots$ \\\\\n",
       "\t6 & 1984 & 2 & -0.565 & 344.77 & 1656.48 & 304.194 & 197.759 & 364.296 & 1365.66 & 0.0416 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m6×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1983      9    0.428   340.17  1648.4   303.901  194.392  357.465  136 ⋯\n",
       "   2 │  1983     10    0.002   340.3   1663.79  303.97   195.171  359.174  136\n",
       "   3 │  1983     11   -0.176   341.53  1658.23  304.032  195.921  360.758  136\n",
       "   4 │  1983     12   -0.176   343.07  1654.31  304.082  196.609  362.174  136\n",
       "   5 │  1984      1   -0.339   344.05  1658.98  304.13   197.219  363.359  136 ⋯\n",
       "   6 │  1984      2   -0.565   344.77  1656.48  304.194  197.759  364.296  136\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# get Rows 5-10 (INCLUSIVE), and all columns\n",
    "df[5:10, :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can exclude columns with `Not()`, and you can use regex, too. What a dream!!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p>DataFrameRow (7 columns)</p><div class=\"data-frame\"><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>N2O</th><th>TSI</th><th>Aerosols</th><th>Temp</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>5</td><td>2.556</td><td>303.677</td><td>1366.1</td><td>0.0863</td><td>0.109</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccc}\n",
       "\t& Year & Month & MEI & N2O & TSI & Aerosols & Temp\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 5 & 2.556 & 303.677 & 1366.1 & 0.0863 & 0.109 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1mDataFrameRow\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m TSI     \u001b[0m\u001b[1m Aerosols \u001b[0m\u001b[1m Temp    \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64  \u001b[0m\u001b[90m Float64 \u001b[0m\n",
       "─────┼────────────────────────────────────────────────────────────\n",
       "   1 │  1983      5    2.556  303.677   1366.1    0.0863    0.109"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Exclude all columns that have a capital C in it\n",
    "df[1, Not(r\"C.*\")]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Checking out the data:\n",
    "\n",
    "* `size(df)` : To get dataframe dimensions you need to use `size` instead of `length` or `len` or `shape`:\n",
    "* `describe(df)` : Gives you summary info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(308, 11)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "size(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>11 rows × 7 columns</p><table class=\"data-frame\"><thead><tr><th></th><th>variable</th><th>mean</th><th>min</th><th>median</th><th>max</th><th>nmissing</th><th>eltype</th></tr><tr><th></th><th title=\"Symbol\">Symbol</th><th title=\"Float64\">Float64</th><th title=\"Real\">Real</th><th title=\"Float64\">Float64</th><th title=\"Real\">Real</th><th title=\"Int64\">Int64</th><th title=\"DataType\">DataType</th></tr></thead><tbody><tr><th>1</th><td>Year</td><td>1995.66</td><td>1983</td><td>1996.0</td><td>2008</td><td>0</td><td>Int64</td></tr><tr><th>2</th><td>Month</td><td>6.55195</td><td>1</td><td>7.0</td><td>12</td><td>0</td><td>Int64</td></tr><tr><th>3</th><td>MEI</td><td>0.275555</td><td>-1.635</td><td>0.2375</td><td>3.001</td><td>0</td><td>Float64</td></tr><tr><th>4</th><td>CO2</td><td>363.227</td><td>340.17</td><td>361.735</td><td>388.5</td><td>0</td><td>Float64</td></tr><tr><th>5</th><td>CH4</td><td>1749.82</td><td>1629.89</td><td>1764.04</td><td>1814.18</td><td>0</td><td>Float64</td></tr><tr><th>6</th><td>N2O</td><td>312.392</td><td>303.677</td><td>311.507</td><td>322.182</td><td>0</td><td>Float64</td></tr><tr><th>7</th><td>CFC-11</td><td>251.973</td><td>191.324</td><td>258.344</td><td>271.494</td><td>0</td><td>Float64</td></tr><tr><th>8</th><td>CFC-12</td><td>497.525</td><td>350.113</td><td>528.356</td><td>543.813</td><td>0</td><td>Float64</td></tr><tr><th>9</th><td>TSI</td><td>1366.07</td><td>1365.43</td><td>1365.98</td><td>1367.32</td><td>0</td><td>Float64</td></tr><tr><th>10</th><td>Aerosols</td><td>0.0166571</td><td>0.0016</td><td>0.00575</td><td>0.1494</td><td>0</td><td>Float64</td></tr><tr><th>11</th><td>Temp</td><td>0.256776</td><td>-0.282</td><td>0.248</td><td>0.739</td><td>0</td><td>Float64</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccc}\n",
       "\t& variable & mean & min & median & max & nmissing & eltype\\\\\n",
       "\t\\hline\n",
       "\t& Symbol & Float64 & Real & Float64 & Real & Int64 & DataType\\\\\n",
       "\t\\hline\n",
       "\t1 & Year & 1995.66 & 1983 & 1996.0 & 2008 & 0 & Int64 \\\\\n",
       "\t2 & Month & 6.55195 & 1 & 7.0 & 12 & 0 & Int64 \\\\\n",
       "\t3 & MEI & 0.275555 & -1.635 & 0.2375 & 3.001 & 0 & Float64 \\\\\n",
       "\t4 & CO2 & 363.227 & 340.17 & 361.735 & 388.5 & 0 & Float64 \\\\\n",
       "\t5 & CH4 & 1749.82 & 1629.89 & 1764.04 & 1814.18 & 0 & Float64 \\\\\n",
       "\t6 & N2O & 312.392 & 303.677 & 311.507 & 322.182 & 0 & Float64 \\\\\n",
       "\t7 & CFC-11 & 251.973 & 191.324 & 258.344 & 271.494 & 0 & Float64 \\\\\n",
       "\t8 & CFC-12 & 497.525 & 350.113 & 528.356 & 543.813 & 0 & Float64 \\\\\n",
       "\t9 & TSI & 1366.07 & 1365.43 & 1365.98 & 1367.32 & 0 & Float64 \\\\\n",
       "\t10 & Aerosols & 0.0166571 & 0.0016 & 0.00575 & 0.1494 & 0 & Float64 \\\\\n",
       "\t11 & Temp & 0.256776 & -0.282 & 0.248 & 0.739 & 0 & Float64 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m11×7 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m variable \u001b[0m\u001b[1m mean         \u001b[0m\u001b[1m min       \u001b[0m\u001b[1m median     \u001b[0m\u001b[1m max       \u001b[0m\u001b[1m nmissing \u001b[0m\u001b[1m elt\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Symbol   \u001b[0m\u001b[90m Float64      \u001b[0m\u001b[90m Real      \u001b[0m\u001b[90m Float64    \u001b[0m\u001b[90m Real      \u001b[0m\u001b[90m Int64    \u001b[0m\u001b[90m Dat\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │ Year      1995.66       1983       1996.0      2008              0  Int ⋯\n",
       "   2 │ Month        6.55195       1          7.0        12              0  Int\n",
       "   3 │ MEI          0.275555     -1.635      0.2375      3.001          0  Flo\n",
       "   4 │ CO2        363.227       340.17     361.735     388.5            0  Flo\n",
       "   5 │ CH4       1749.82       1629.89    1764.04     1814.18           0  Flo ⋯\n",
       "   6 │ N2O        312.392       303.677    311.507     322.182          0  Flo\n",
       "   7 │ CFC-11     251.973       191.324    258.344     271.494          0  Flo\n",
       "   8 │ CFC-12     497.525       350.113    528.356     543.813          0  Flo\n",
       "   9 │ TSI       1366.07       1365.43    1365.98     1367.32           0  Flo ⋯\n",
       "  10 │ Aerosols     0.0166571     0.0016     0.00575     0.1494         0  Flo\n",
       "  11 │ Temp         0.256776     -0.282      0.248       0.739          0  Flo\n",
       "\u001b[36m                                                                1 column omitted\u001b[0m"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "describe(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Filtering DataFrames Based on Conditions\n",
    "\n",
    "How do you conditionally filter a dataframe? One way to do it is through indexing, similar to how you would do it in Pandas. \n",
    "\n",
    "Note that you can reference a dataframe column (like a pandas series) both with and without quotes, so `df.Year` is the same as `df.\"Year\"`. Handy for dealing with spaces!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>4 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1985</td><td>1</td><td>-0.561</td><td>345.25</td><td>1663.42</td><td>305.301</td><td>208.537</td><td>373.623</td><td>1365.61</td><td>0.0164</td></tr><tr><th>2</th><td>1985</td><td>2</td><td>-0.602</td><td>346.06</td><td>1666.21</td><td>305.243</td><td>209.543</td><td>374.681</td><td>1365.71</td><td>0.016</td></tr><tr><th>3</th><td>1985</td><td>3</td><td>-0.737</td><td>347.66</td><td>1678.34</td><td>305.165</td><td>210.368</td><td>376.004</td><td>1365.66</td><td>0.0141</td></tr><tr><th>4</th><td>1985</td><td>4</td><td>-0.484</td><td>348.2</td><td>1675.24</td><td>305.093</td><td>211.111</td><td>377.635</td><td>1365.51</td><td>0.0138</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1985 & 1 & -0.561 & 345.25 & 1663.42 & 305.301 & 208.537 & 373.623 & 1365.61 & 0.0164 & $\\dots$ \\\\\n",
       "\t2 & 1985 & 2 & -0.602 & 346.06 & 1666.21 & 305.243 & 209.543 & 374.681 & 1365.71 & 0.016 & $\\dots$ \\\\\n",
       "\t3 & 1985 & 3 & -0.737 & 347.66 & 1678.34 & 305.165 & 210.368 & 376.004 & 1365.66 & 0.0141 & $\\dots$ \\\\\n",
       "\t4 & 1985 & 4 & -0.484 & 348.2 & 1675.24 & 305.093 & 211.111 & 377.635 & 1365.51 & 0.0138 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m4×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1985      1   -0.561   345.25  1663.42  305.301  208.537  373.623  136 ⋯\n",
       "   2 │  1985      2   -0.602   346.06  1666.21  305.243  209.543  374.681  136\n",
       "   3 │  1985      3   -0.737   347.66  1678.34  305.165  210.368  376.004  136\n",
       "   4 │  1985      4   -0.484   348.2   1675.24  305.093  211.111  377.635  136\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[(df.Year.==1985) .& (df.\"Month\".<=4), :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is that `.` doing in front of the `<=`, `==` and `&` sign?\n",
    "\n",
    "The `.` sign there is doing some heavy lifting. It's an 'elementwise' operator, meaning that it does the comparison across every element in the vector `df.Year` and `df.Month`. If you don't have them this is what it would return for the `df.Year` comparison:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "false"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.Year==1985 # Returns true if ALL values of Year are equal to 1985"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When what we really want is something like a list (vector) of values that compares each element and returns either `true` or `false` for each element in each vector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bool[0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "println((df.Year.==1985 )[1:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bool[1, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "println(([1,1,0].==[1,0,1]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "println((df.Year.==1985) .& (df.\"Month\".<=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`DataFrames` also has a `filter` and `subset` option to index based on row values, but they're a little messy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `filter(f, df)` function lets you filter by passing in a function, like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p>DataFrameRow (11 columns)</p><div class=\"data-frame\"><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1985</td><td>1</td><td>-0.561</td><td>345.25</td><td>1663.42</td><td>305.301</td><td>208.537</td><td>373.623</td><td>1365.61</td><td>0.0164</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1985 & 1 & -0.561 & 345.25 & 1663.42 & 305.301 & 208.537 & 373.623 & 1365.61 & 0.0164 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1mDataFrameRow\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1985      1   -0.561   345.25  1663.42  305.301  208.537  373.623  136 ⋯\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "filter(:Year=> ==(1985), df) |> first"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What is the colon `:` doing in front of `Year`?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another nuance with Julia is this little colon in front of a column name, and that's because `Year` is a `Symbol` type. This is a special thing you can do for performance, but you can still index columns by string too - handy if your column name has spaces in it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p>DataFrameRow (11 columns)</p><div class=\"data-frame\"><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1985</td><td>1</td><td>-0.561</td><td>345.25</td><td>1663.42</td><td>305.301</td><td>208.537</td><td>373.623</td><td>1365.61</td><td>0.0164</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1985 & 1 & -0.561 & 345.25 & 1663.42 & 305.301 & 208.537 & 373.623 & 1365.61 & 0.0164 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1mDataFrameRow\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1985      1   -0.561   345.25  1663.42  305.301  208.537  373.623  136 ⋯\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "filter(:\"Year\"=> ==(1985), df) |> first"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What the heck is the `=>` doing after the column?\n",
    "\n",
    "More nuance, and this is usually the part where I go back to Pandas or Dplyr. What's actually happening here is that it's assigning a `Pair` like `key=>value`. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Filtering Rows "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The row logic makes sense to me more, but it's not as performant"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0.190525 seconds (834.12 k allocations: 43.820 MiB, 99.68% compilation time)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p>DataFrameRow (11 columns)</p><div class=\"data-frame\"><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>5</td><td>2.556</td><td>345.96</td><td>1638.59</td><td>303.677</td><td>191.324</td><td>350.113</td><td>1366.1</td><td>0.0863</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 5 & 2.556 & 345.96 & 1638.59 & 303.677 & 191.324 & 350.113 & 1366.1 & 0.0863 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1mDataFrameRow\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1983      5    2.556   345.96  1638.59  303.677  191.324  350.113   13 ⋯\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@time begin\n",
    "    filter(row -> (row.Year in [1983, 1984]), df) |> first\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remember that `->` is an anonymous function and `filter` is expecting a function as its first argument.\n",
    "\n",
    "\n",
    "* See Julia for Data Science for a good introductiont to `filter` and `subset`: [julia for data science](https://juliadatascience.io/filter_subset)\n",
    "* Another Good writeup here in this article [why do we have both subset and filter functions?](https://bkamins.github.io/julialang/2021/05/07/subset.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This row filtering is equivalent to:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0.098306 seconds (370.27 k allocations: 18.121 MiB, 99.74% compilation time)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p>DataFrameRow (11 columns)</p><div class=\"data-frame\"><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>5</td><td>2.556</td><td>345.96</td><td>1638.59</td><td>303.677</td><td>191.324</td><td>350.113</td><td>1366.1</td><td>0.0863</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 5 & 2.556 & 345.96 & 1638.59 & 303.677 & 191.324 & 350.113 & 1366.1 & 0.0863 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1mDataFrameRow\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1983      5    2.556   345.96  1638.59  303.677  191.324  350.113   13 ⋯\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@time begin\n",
    "    df[(df.Year .==1983) .| (df.Year .==1984), :] |> first\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or also:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0.071469 seconds (159.67 k allocations: 8.747 MiB, 99.50% compilation time)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p>DataFrameRow (11 columns)</p><div class=\"data-frame\"><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>5</td><td>2.556</td><td>345.96</td><td>1638.59</td><td>303.677</td><td>191.324</td><td>350.113</td><td>1366.1</td><td>0.0863</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 5 & 2.556 & 345.96 & 1638.59 & 303.677 & 191.324 & 350.113 & 1366.1 & 0.0863 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1mDataFrameRow\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1983      5    2.556   345.96  1638.59  303.677  191.324  350.113   13 ⋯\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# using a list comprehension works for the `in` operator here\n",
    "@time begin\n",
    "    df[[ year in [1983,1984] for year in df.Year], :] |> first\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And you can combine indexing like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0.084675 seconds (455.85 k allocations: 23.343 MiB, 99.34% compilation time)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>4 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1984</td><td>1</td><td>-0.339</td><td>344.05</td><td>1658.98</td><td>304.13</td><td>197.219</td><td>363.359</td><td>1365.43</td><td>0.0451</td></tr><tr><th>2</th><td>1984</td><td>2</td><td>-0.565</td><td>344.77</td><td>1656.48</td><td>304.194</td><td>197.759</td><td>364.296</td><td>1365.66</td><td>0.0416</td></tr><tr><th>3</th><td>1984</td><td>3</td><td>0.131</td><td>345.46</td><td>1655.77</td><td>304.285</td><td>198.249</td><td>365.044</td><td>1366.17</td><td>0.0383</td></tr><tr><th>4</th><td>1984</td><td>4</td><td>0.331</td><td>346.77</td><td>1657.68</td><td>304.389</td><td>198.723</td><td>365.692</td><td>1365.57</td><td>0.0352</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1984 & 1 & -0.339 & 344.05 & 1658.98 & 304.13 & 197.219 & 363.359 & 1365.43 & 0.0451 & $\\dots$ \\\\\n",
       "\t2 & 1984 & 2 & -0.565 & 344.77 & 1656.48 & 304.194 & 197.759 & 364.296 & 1365.66 & 0.0416 & $\\dots$ \\\\\n",
       "\t3 & 1984 & 3 & 0.131 & 345.46 & 1655.77 & 304.285 & 198.249 & 365.044 & 1366.17 & 0.0383 & $\\dots$ \\\\\n",
       "\t4 & 1984 & 4 & 0.331 & 346.77 & 1657.68 & 304.389 & 198.723 & 365.692 & 1365.57 & 0.0352 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m4×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1984      1   -0.339   344.05  1658.98  304.13   197.219  363.359  136 ⋯\n",
       "   2 │  1984      2   -0.565   344.77  1656.48  304.194  197.759  364.296  136\n",
       "   3 │  1984      3    0.131   345.46  1655.77  304.285  198.249  365.044  136\n",
       "   4 │  1984      4    0.331   346.77  1657.68  304.389  198.723  365.692  136\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@time begin \n",
    "    filter(row -> (row.Year in [1983, 1984]) & (row.Month <= 4), df)\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or equivalently with plain indexing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0.206541 seconds (533.77 k allocations: 27.184 MiB, 13.27% gc time, 99.64% compilation time)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p>DataFrameRow (11 columns)</p><div class=\"data-frame\"><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>5</td><td>2.556</td><td>345.96</td><td>1638.59</td><td>303.677</td><td>191.324</td><td>350.113</td><td>1366.1</td><td>0.0863</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 5 & 2.556 & 345.96 & 1638.59 & 303.677 & 191.324 & 350.113 & 1366.1 & 0.0863 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1mDataFrameRow\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1983      5    2.556   345.96  1638.59  303.677  191.324  350.113   13 ⋯\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# using a list comprehension works because there's no elementwise `.in` or `.∈` operator here\n",
    "@time begin\n",
    "    condition1 = [year in [1983,1984] for year in df.Year]\n",
    "    condition2 = [month <= 5 for month in df.Month]\n",
    "    df[condition1 .& condition2, :] |> first\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So the nice thing about `filter` is that you can apply functions and stuff to it, like you could do calculations and then do a boolean off of the calculation, etc etc. Either way it's not too slow."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Subset(df, args)\n",
    "\n",
    "Returns a copy of the dataframe where args are true"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>8 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>5</td><td>2.556</td><td>345.96</td><td>1638.59</td><td>303.677</td><td>191.324</td><td>350.113</td><td>1366.1</td><td>0.0863</td></tr><tr><th>2</th><td>1983</td><td>6</td><td>2.167</td><td>345.52</td><td>1633.71</td><td>303.746</td><td>192.057</td><td>351.848</td><td>1366.12</td><td>0.0794</td></tr><tr><th>3</th><td>1983</td><td>7</td><td>1.741</td><td>344.15</td><td>1633.22</td><td>303.795</td><td>192.818</td><td>353.725</td><td>1366.29</td><td>0.0731</td></tr><tr><th>4</th><td>1983</td><td>8</td><td>1.13</td><td>342.25</td><td>1631.35</td><td>303.839</td><td>193.602</td><td>355.633</td><td>1366.42</td><td>0.0673</td></tr><tr><th>5</th><td>1983</td><td>9</td><td>0.428</td><td>340.17</td><td>1648.4</td><td>303.901</td><td>194.392</td><td>357.465</td><td>1366.23</td><td>0.0619</td></tr><tr><th>6</th><td>1983</td><td>10</td><td>0.002</td><td>340.3</td><td>1663.79</td><td>303.97</td><td>195.171</td><td>359.174</td><td>1366.06</td><td>0.0569</td></tr><tr><th>7</th><td>1983</td><td>11</td><td>-0.176</td><td>341.53</td><td>1658.23</td><td>304.032</td><td>195.921</td><td>360.758</td><td>1366.11</td><td>0.0524</td></tr><tr><th>8</th><td>1983</td><td>12</td><td>-0.176</td><td>343.07</td><td>1654.31</td><td>304.082</td><td>196.609</td><td>362.174</td><td>1366.06</td><td>0.0486</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 5 & 2.556 & 345.96 & 1638.59 & 303.677 & 191.324 & 350.113 & 1366.1 & 0.0863 & $\\dots$ \\\\\n",
       "\t2 & 1983 & 6 & 2.167 & 345.52 & 1633.71 & 303.746 & 192.057 & 351.848 & 1366.12 & 0.0794 & $\\dots$ \\\\\n",
       "\t3 & 1983 & 7 & 1.741 & 344.15 & 1633.22 & 303.795 & 192.818 & 353.725 & 1366.29 & 0.0731 & $\\dots$ \\\\\n",
       "\t4 & 1983 & 8 & 1.13 & 342.25 & 1631.35 & 303.839 & 193.602 & 355.633 & 1366.42 & 0.0673 & $\\dots$ \\\\\n",
       "\t5 & 1983 & 9 & 0.428 & 340.17 & 1648.4 & 303.901 & 194.392 & 357.465 & 1366.23 & 0.0619 & $\\dots$ \\\\\n",
       "\t6 & 1983 & 10 & 0.002 & 340.3 & 1663.79 & 303.97 & 195.171 & 359.174 & 1366.06 & 0.0569 & $\\dots$ \\\\\n",
       "\t7 & 1983 & 11 & -0.176 & 341.53 & 1658.23 & 304.032 & 195.921 & 360.758 & 1366.11 & 0.0524 & $\\dots$ \\\\\n",
       "\t8 & 1983 & 12 & -0.176 & 343.07 & 1654.31 & 304.082 & 196.609 & 362.174 & 1366.06 & 0.0486 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m8×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1983      5    2.556   345.96  1638.59  303.677  191.324  350.113  136 ⋯\n",
       "   2 │  1983      6    2.167   345.52  1633.71  303.746  192.057  351.848  136\n",
       "   3 │  1983      7    1.741   344.15  1633.22  303.795  192.818  353.725  136\n",
       "   4 │  1983      8    1.13    342.25  1631.35  303.839  193.602  355.633  136\n",
       "   5 │  1983      9    0.428   340.17  1648.4   303.901  194.392  357.465  136 ⋯\n",
       "   6 │  1983     10    0.002   340.3   1663.79  303.97   195.171  359.174  136\n",
       "   7 │  1983     11   -0.176   341.53  1658.23  304.032  195.921  360.758  136\n",
       "   8 │  1983     12   -0.176   343.07  1654.31  304.082  196.609  362.174  136\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "subset(df, :Year=> ByRow(==(1983)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Subset with multiple conditions\n",
    "\n",
    "To do multiple conditions (including an OR operator) we need to use a function. Both `filter` and `subset` are powerful because of the ability to pass functions as arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>4 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1984</td><td>1</td><td>-0.339</td><td>344.05</td><td>1658.98</td><td>304.13</td><td>197.219</td><td>363.359</td><td>1365.43</td><td>0.0451</td></tr><tr><th>2</th><td>1984</td><td>2</td><td>-0.565</td><td>344.77</td><td>1656.48</td><td>304.194</td><td>197.759</td><td>364.296</td><td>1365.66</td><td>0.0416</td></tr><tr><th>3</th><td>1984</td><td>3</td><td>0.131</td><td>345.46</td><td>1655.77</td><td>304.285</td><td>198.249</td><td>365.044</td><td>1366.17</td><td>0.0383</td></tr><tr><th>4</th><td>1984</td><td>4</td><td>0.331</td><td>346.77</td><td>1657.68</td><td>304.389</td><td>198.723</td><td>365.692</td><td>1365.57</td><td>0.0352</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1984 & 1 & -0.339 & 344.05 & 1658.98 & 304.13 & 197.219 & 363.359 & 1365.43 & 0.0451 & $\\dots$ \\\\\n",
       "\t2 & 1984 & 2 & -0.565 & 344.77 & 1656.48 & 304.194 & 197.759 & 364.296 & 1365.66 & 0.0416 & $\\dots$ \\\\\n",
       "\t3 & 1984 & 3 & 0.131 & 345.46 & 1655.77 & 304.285 & 198.249 & 365.044 & 1366.17 & 0.0383 & $\\dots$ \\\\\n",
       "\t4 & 1984 & 4 & 0.331 & 346.77 & 1657.68 & 304.389 & 198.723 & 365.692 & 1365.57 & 0.0352 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m4×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1984      1   -0.339   344.05  1658.98  304.13   197.219  363.359  136 ⋯\n",
       "   2 │  1984      2   -0.565   344.77  1656.48  304.194  197.759  364.296  136\n",
       "   3 │  1984      3    0.131   345.46  1655.77  304.285  198.249  365.044  136\n",
       "   4 │  1984      4    0.331   346.77  1657.68  304.389  198.723  365.692  136\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function my_filter(year, month)\n",
    "    year_match = (year in [1983, 1984])\n",
    "    month_match = (month<=4)\n",
    "    return year_match & month_match\n",
    "end;\n",
    "\n",
    "# apply this filter for each row\n",
    "subset(df, [:Year, :Month]=> ByRow(my_filter))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`subset` is nice because it can handle missing values, whereas `filter` won't. So in short, for most of your use cases you should stick with indexing or using `subset`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0.000168 seconds (507 allocations: 39.328 KiB)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>4 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1984</td><td>1</td><td>-0.339</td><td>344.05</td><td>1658.98</td><td>304.13</td><td>197.219</td><td>363.359</td><td>1365.43</td><td>0.0451</td></tr><tr><th>2</th><td>1984</td><td>2</td><td>-0.565</td><td>344.77</td><td>1656.48</td><td>304.194</td><td>197.759</td><td>364.296</td><td>1365.66</td><td>0.0416</td></tr><tr><th>3</th><td>1984</td><td>3</td><td>0.131</td><td>345.46</td><td>1655.77</td><td>304.285</td><td>198.249</td><td>365.044</td><td>1366.17</td><td>0.0383</td></tr><tr><th>4</th><td>1984</td><td>4</td><td>0.331</td><td>346.77</td><td>1657.68</td><td>304.389</td><td>198.723</td><td>365.692</td><td>1365.57</td><td>0.0352</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1984 & 1 & -0.339 & 344.05 & 1658.98 & 304.13 & 197.219 & 363.359 & 1365.43 & 0.0451 & $\\dots$ \\\\\n",
       "\t2 & 1984 & 2 & -0.565 & 344.77 & 1656.48 & 304.194 & 197.759 & 364.296 & 1365.66 & 0.0416 & $\\dots$ \\\\\n",
       "\t3 & 1984 & 3 & 0.131 & 345.46 & 1655.77 & 304.285 & 198.249 & 365.044 & 1366.17 & 0.0383 & $\\dots$ \\\\\n",
       "\t4 & 1984 & 4 & 0.331 & 346.77 & 1657.68 & 304.389 & 198.723 & 365.692 & 1365.57 & 0.0352 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m4×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1984      1   -0.339   344.05  1658.98  304.13   197.219  363.359  136 ⋯\n",
       "   2 │  1984      2   -0.565   344.77  1656.48  304.194  197.759  364.296  136\n",
       "   3 │  1984      3    0.131   345.46  1655.77  304.285  198.249  365.044  136\n",
       "   4 │  1984      4    0.331   346.77  1657.68  304.389  198.723  365.692  136\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@time begin\n",
    "    subset(df, [:Year, :Month]=> ByRow(my_filter))\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `@subset` \n",
    "\n",
    "Instead of the filter and subset stuff in base dataframes, we can use DataFramesMeta:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>1 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>5</td><td>2.556</td><td>345.96</td><td>1638.59</td><td>303.677</td><td>191.324</td><td>350.113</td><td>1366.1</td><td>0.0863</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 5 & 2.556 & 345.96 & 1638.59 & 303.677 & 191.324 & 350.113 & 1366.1 & 0.0863 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m1×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1983      5    2.556   345.96  1638.59  303.677  191.324  350.113   13 ⋯\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@subset(df, (:Year .<1984), (:Month .==5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can do `@rsubset` to filter across rows so you don't need to do the elementwise `.` every time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>1 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>5</td><td>2.556</td><td>345.96</td><td>1638.59</td><td>303.677</td><td>191.324</td><td>350.113</td><td>1366.1</td><td>0.0863</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 5 & 2.556 & 345.96 & 1638.59 & 303.677 & 191.324 & 350.113 & 1366.1 & 0.0863 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m1×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1983      5    2.556   345.96  1638.59  303.677  191.324  350.113   13 ⋯\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@rsubset(df, :Year<1984, :Month==5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So in the above example we are using an AND statement, it'll keep doing the ANDs across each item there. So if you want to do something fancier like an OR statement you'll need to wrap it in parenthesis:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0.134172 seconds (490.13 k allocations: 26.135 MiB, 99.18% compilation time)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>2 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1983</td><td>5</td><td>2.556</td><td>345.96</td><td>1638.59</td><td>303.677</td><td>191.324</td><td>350.113</td><td>1366.1</td><td>0.0863</td></tr><tr><th>2</th><td>1985</td><td>3</td><td>-0.737</td><td>347.66</td><td>1678.34</td><td>305.165</td><td>210.368</td><td>376.004</td><td>1365.66</td><td>0.0141</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1983 & 5 & 2.556 & 345.96 & 1638.59 & 303.677 & 191.324 & 350.113 & 1366.1 & 0.0863 & $\\dots$ \\\\\n",
       "\t2 & 1985 & 3 & -0.737 & 347.66 & 1678.34 & 305.165 & 210.368 & 376.004 & 1365.66 & 0.0141 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1983      5    2.556   345.96  1638.59  303.677  191.324  350.113  136 ⋯\n",
       "   2 │  1985      3   -0.737   347.66  1678.34  305.165  210.368  376.004  136\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@time begin\n",
    "    @rsubset(df, (:Year<1984 && :Month==5) || (:Year==1985 && :Month==3))\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or you can always pass a custom function too, just like before (this is fast!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0.000125 seconds (508 allocations: 39.375 KiB)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>4 rows × 11 columns (omitted printing of 1 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>Year</th><th>Month</th><th>MEI</th><th>CO2</th><th>CH4</th><th>N2O</th><th>CFC-11</th><th>CFC-12</th><th>TSI</th><th>Aerosols</th></tr><tr><th></th><th title=\"Int64\">Int64</th><th title=\"Int64\">Int64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>1984</td><td>1</td><td>-0.339</td><td>344.05</td><td>1658.98</td><td>304.13</td><td>197.219</td><td>363.359</td><td>1365.43</td><td>0.0451</td></tr><tr><th>2</th><td>1984</td><td>2</td><td>-0.565</td><td>344.77</td><td>1656.48</td><td>304.194</td><td>197.759</td><td>364.296</td><td>1365.66</td><td>0.0416</td></tr><tr><th>3</th><td>1984</td><td>3</td><td>0.131</td><td>345.46</td><td>1655.77</td><td>304.285</td><td>198.249</td><td>365.044</td><td>1366.17</td><td>0.0383</td></tr><tr><th>4</th><td>1984</td><td>4</td><td>0.331</td><td>346.77</td><td>1657.68</td><td>304.389</td><td>198.723</td><td>365.692</td><td>1365.57</td><td>0.0352</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccccccccccc}\n",
       "\t& Year & Month & MEI & CO2 & CH4 & N2O & CFC-11 & CFC-12 & TSI & Aerosols & \\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n",
       "\t\\hline\n",
       "\t1 & 1984 & 1 & -0.339 & 344.05 & 1658.98 & 304.13 & 197.219 & 363.359 & 1365.43 & 0.0451 & $\\dots$ \\\\\n",
       "\t2 & 1984 & 2 & -0.565 & 344.77 & 1656.48 & 304.194 & 197.759 & 364.296 & 1365.66 & 0.0416 & $\\dots$ \\\\\n",
       "\t3 & 1984 & 3 & 0.131 & 345.46 & 1655.77 & 304.285 & 198.249 & 365.044 & 1366.17 & 0.0383 & $\\dots$ \\\\\n",
       "\t4 & 1984 & 4 & 0.331 & 346.77 & 1657.68 & 304.389 & 198.723 & 365.692 & 1365.57 & 0.0352 & $\\dots$ \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m4×11 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m Year  \u001b[0m\u001b[1m Month \u001b[0m\u001b[1m MEI     \u001b[0m\u001b[1m CO2     \u001b[0m\u001b[1m CH4     \u001b[0m\u001b[1m N2O     \u001b[0m\u001b[1m CFC-11  \u001b[0m\u001b[1m CFC-12  \u001b[0m\u001b[1m TSI\u001b[0m ⋯\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n",
       "─────┼──────────────────────────────────────────────────────────────────────────\n",
       "   1 │  1984      1   -0.339   344.05  1658.98  304.13   197.219  363.359  136 ⋯\n",
       "   2 │  1984      2   -0.565   344.77  1656.48  304.194  197.759  364.296  136\n",
       "   3 │  1984      3    0.131   345.46  1655.77  304.285  198.249  365.044  136\n",
       "   4 │  1984      4    0.331   346.77  1657.68  304.389  198.723  365.692  136\n",
       "\u001b[36m                                                               3 columns omitted\u001b[0m"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@time begin\n",
    "    @rsubset(df, my_filter(:Year, :Month))\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nice, it's super fast too! I think this is the way."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## @chain from DataFramesMeta\n",
    "\n",
    "Instead of the clunky filtering and indexing, you can try out some of the querying libraries that Julia has to offer. They are all going to be different than pandas and dplyr, but that might not be a bad thing. The downside is that there isn't one obvious/clear standard yet in Julia, so for the practical business user or accidental coder it's still tough sledding. \n",
    "\n",
    "It's November 2021 and we've had stuff like Query and Queryverse for a while now, with DataFramesMeta in the mix. I'll show off `@chain` macro stuff here from the DataFramesMeta package, and we've already been familiar with `@rsubset` from earlier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2 = rename(df,:CO2 => \"Carbon Dioxide\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But there's more! Here's [a guide for people coming from dplyr](https://juliadata.github.io/DataFramesMeta.jl/stable/dplyr/) from the DataFramesMeta folks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>4 rows × 3 columns</p><table class=\"data-frame\"><thead><tr><th></th><th>MEI</th><th>Aerosols</th><th>Carbon Dioxide</th></tr><tr><th></th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>-0.561</td><td>0.0164</td><td>345.25</td></tr><tr><th>2</th><td>-0.602</td><td>0.016</td><td>346.06</td></tr><tr><th>3</th><td>-0.737</td><td>0.0141</td><td>347.66</td></tr><tr><th>4</th><td>-0.484</td><td>0.0138</td><td>348.2</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& MEI & Aerosols & Carbon Dioxide\\\\\n",
       "\t\\hline\n",
       "\t& Float64 & Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & -0.561 & 0.0164 & 345.25 \\\\\n",
       "\t2 & -0.602 & 0.016 & 346.06 \\\\\n",
       "\t3 & -0.737 & 0.0141 & 347.66 \\\\\n",
       "\t4 & -0.484 & 0.0138 & 348.2 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m4×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m MEI     \u001b[0m\u001b[1m Aerosols \u001b[0m\u001b[1m Carbon Dioxide \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Float64 \u001b[0m\u001b[90m Float64  \u001b[0m\u001b[90m Float64        \u001b[0m\n",
       "─────┼───────────────────────────────────\n",
       "   1 │  -0.561    0.0164          345.25\n",
       "   2 │  -0.602    0.016           346.06\n",
       "   3 │  -0.737    0.0141          347.66\n",
       "   4 │  -0.484    0.0138          348.2"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@chain df2 begin\n",
    "    @rsubset(:Year==1985, :Month<=4)\n",
    "    @select(:MEI, :Aerosols, :\"Carbon Dioxide\")\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using our previous OR filter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"data-frame\"><p>4 rows × 3 columns</p><table class=\"data-frame\"><thead><tr><th></th><th>MEI</th><th>Aerosols</th><th>Carbon Dioxide</th></tr><tr><th></th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th><th title=\"Float64\">Float64</th></tr></thead><tbody><tr><th>1</th><td>-0.339</td><td>0.0451</td><td>344.05</td></tr><tr><th>2</th><td>-0.565</td><td>0.0416</td><td>344.77</td></tr><tr><th>3</th><td>0.131</td><td>0.0383</td><td>345.46</td></tr><tr><th>4</th><td>0.331</td><td>0.0352</td><td>346.77</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& MEI & Aerosols & Carbon Dioxide\\\\\n",
       "\t\\hline\n",
       "\t& Float64 & Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & -0.339 & 0.0451 & 344.05 \\\\\n",
       "\t2 & -0.565 & 0.0416 & 344.77 \\\\\n",
       "\t3 & 0.131 & 0.0383 & 345.46 \\\\\n",
       "\t4 & 0.331 & 0.0352 & 346.77 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m4×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m MEI     \u001b[0m\u001b[1m Aerosols \u001b[0m\u001b[1m Carbon Dioxide \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Float64 \u001b[0m\u001b[90m Float64  \u001b[0m\u001b[90m Float64        \u001b[0m\n",
       "─────┼───────────────────────────────────\n",
       "   1 │  -0.339    0.0451          344.05\n",
       "   2 │  -0.565    0.0416          344.77\n",
       "   3 │   0.131    0.0383          345.46\n",
       "   4 │   0.331    0.0352          346.77"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@chain df2 begin\n",
    "    @rsubset(my_filter(:Year, :Month))\n",
    "    @select(:MEI, :Aerosols, :\"Carbon Dioxide\")\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.7.0",
   "language": "julia",
   "name": "julia-1.7"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}