{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 10 minutes to polars\n",
    "This as short introduction to Polars to get you started with the basic concepts of data wrangling. It is very much influenced by [10 minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html).\n",
    "\n",
    "We start by importing Polars. If you run this for the first time, get a coffee. This will take a while"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "use polars::prelude::*;\n",
    "\n",
    "#[macro_use]\n",
    "extern crate polars;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Object creation\n",
    "Creating a `Series` by passing a list of nullable values. Note that we use `Option<T>` to describe missing values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Series: i32\n",
       "[\n",
       "\t1\n",
       "\t3\n",
       "\t5\n",
       "\tnull\n",
       "\t6\n",
       "\t8\n",
       "]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Series::new(\n",
    "    \"some_values with ones\", \n",
    "    &[Some(1), Some(3), Some(5), None, Some(6), Some(8)]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we dont have any missing values, we can just pass a slice of `T`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Series: i32\n",
       "[\n",
       "\t1\n",
       "\t3\n",
       "\t5\n",
       "\t7\n",
       "\t6\n",
       "\t8\n",
       "]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Series::new(\n",
    "    \"some_non_null_values\", \n",
    "    &[1, 3, 5, 7, 6, 8]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `Series` are actually an `Enum` around different typed values of a `ChunkedArray`. \n",
    "You can think of a `ChunedkArray` as an array with a known type. Every `ChunkedArray` has a type alias that makes them more convenient to use. \n",
    "\n",
    "Some examples are:\n",
    "\n",
    "| Type                        | Alias            |\n",
    "|-----------------------------|------------------|\n",
    "| `ChunkedArray<Float64Type>` | `Float64Chunked` |\n",
    "| `ChunkedArray<UInt32Type>`  | `UInt32Chunked`  |\n",
    "| `ChunkedArray<BooleanType>` | `BooleanChunked` |\n",
    "| `ChunkedArray<Utf8Type>`    | `Utf8Chunked`    |\n",
    "\n",
    "See all available data types [here](https://ritchie46.github.io/polars/polars/datatypes/index.html).\n",
    "\n",
    "Create a `ChunkedArray` with null values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[PrimitiveArray<Int64>\n",
       "[\n",
       "  null,\n",
       "  1,\n",
       "  2,\n",
       "]]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Int64Chunked::new_from_opt_slice(\"nullable\", &[None, Some(1), Some(2)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or create a `ChunkedArray` without null values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[PrimitiveArray<Int64>\n",
       "[\n",
       "  1,\n",
       "  2,\n",
       "  3,\n",
       "]]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Int64Chunked::new_from_slice(\"non-nullable\", &[1, 2, 3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Converting from `Series` to a `ChunkedArray` can be done by defining there type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ok([PrimitiveArray<Int32>\n",
       "[\n",
       "  1,\n",
       "  2,\n",
       "  3,\n",
       "]])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "let s = Series::new(\"values\", &[1, 2, 3]);\n",
    "s.i32()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will return an `Err` if you specify the wrong type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Err(DataTypeMisMatch)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.i64()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But we can cast a `Series` to the proper type and then unpack."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ok([PrimitiveArray<Int64>\n",
       "[\n",
       "  1,\n",
       "  2,\n",
       "  3,\n",
       "]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.cast::<Int64Type>().unwrap().i64()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below we use pattern matching to check if the cast was successful. Note that the clones on a `ChunkedArray` and a `Series` are very cheap, as the underlying data is wrapped by an `Arc`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[PrimitiveArray<Int64>\n",
       "[\n",
       "  1,\n",
       "  2,\n",
       "  3,\n",
       "]]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "let ca = match s.i64() {\n",
    "    Err(_) => {\n",
    "        s.cast::<Int64Type>()\n",
    "            .unwrap()\n",
    "            .i64()\n",
    "            .map(|ca| ca.clone())\n",
    "            .unwrap()\n",
    "    },\n",
    "    Ok(ca) => ca.clone()\n",
    "};\n",
    "ca"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Converting from a `ChunkedArray` to a `Series`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Series: i64\n",
       "[\n",
       "\t1\n",
       "\t2\n",
       "\t3\n",
       "]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ca.into_series()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `DataFrame` is created from a `Vec` of `Series`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+\n",
       "| dates      | n   | foos  |\n",
       "| ---        | --- | ---   |\n",
       "| date32     | i32 | str   |\n",
       "+============+=====+=======+\n",
       "| 2020-08-21 | 1   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-21 | 2   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-22 | 3   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-23 | 4   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-22 | 5   | \"foo\" |\n",
       "+------------+-----+-------+\n"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "let dates = &[\n",
    "    \"2020-08-21\",\n",
    "    \"2020-08-21\",\n",
    "    \"2020-08-22\",\n",
    "    \"2020-08-23\",\n",
    "    \"2020-08-22\",\n",
    "];\n",
    "let fmt = \"%Y-%m-%d\";\n",
    "let s0 = Date32Chunked::parse_from_str_slice(\"dates\", dates, fmt).into();\n",
    "let s1 = Series::new(\"n\", &[1, 2, 3, 4, 5]);\n",
    "let s2 = Utf8Chunked::full(\"foos\", \"foo\", 5).into();\n",
    "\n",
    "let df = DataFrame::new(vec![s0, s1, s2]).expect(\"something went wrong\");\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The columns of the resulting `DataFrame` have different data types."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Column: 'dates',\t dtype: Date32(Day)\n",
      "Column: 'n',\t dtype: Int32\n",
      "Column: 'foos',\t dtype: Utf8\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "()"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes()\n",
    "    .iter()\n",
    "    .zip(df.columns().iter())\n",
    "    .for_each(|(dtype, name)| \n",
    "    println!(\"Column: '{}',\\t dtype: {:?}\", name, dtype))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\"dates\", \"n\", \"foos\"]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Viewing data\n",
    "\n",
    "Here is how to view the top and bottom rows of a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+\n",
       "| dates      | n   | foos  |\n",
       "| ---        | --- | ---   |\n",
       "| date32     | i32 | str   |\n",
       "+============+=====+=======+\n",
       "| 2020-08-21 | 1   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-21 | 2   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-22 | 3   | \"foo\" |\n",
       "+------------+-----+-------+\n"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(Some(3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+\n",
       "| dates      | n   | foos  |\n",
       "| ---        | --- | ---   |\n",
       "| date32     | i32 | str   |\n",
       "+============+=====+=======+\n",
       "| 2020-08-22 | 3   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-23 | 4   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-22 | 5   | \"foo\" |\n",
       "+------------+-----+-------+\n"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail(Some(3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sorting by a column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+\n",
       "| dates      | n   | foos  |\n",
       "| ---        | --- | ---   |\n",
       "| date32     | i32 | str   |\n",
       "+============+=====+=======+\n",
       "| 2020-08-23 | 4   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-22 | 3   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-22 | 5   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-21 | 1   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-21 | 2   | \"foo\" |\n",
       "+------------+-----+-------+\n"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "let reverse = true;\n",
    "df.sort(\"dates\", reverse).expect(\"column not sortable\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Selection\n",
    "Selecting a single column, which yields a `Result<Series>`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Series: date32(day)\n",
       "[\n",
       "\t2020-08-21\n",
       "\t2020-08-21\n",
       "\t2020-08-22\n",
       "\t2020-08-23\n",
       "\t2020-08-22\n",
       "]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.column(\"dates\")\n",
    "    .expect(\"columns don't exist\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Selecting 1 or multiple columns, which yield another `Result<DataFrame>`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+\n",
       "| dates      |\n",
       "| ---        |\n",
       "| date32     |\n",
       "+============+\n",
       "| 2020-08-21 |\n",
       "+------------+\n",
       "| 2020-08-21 |\n",
       "+------------+\n",
       "| 2020-08-22 |\n",
       "+------------+\n",
       "| 2020-08-23 |\n",
       "+------------+\n",
       "| 2020-08-22 |\n",
       "+------------+\n"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.select(\"dates\")\n",
    "    .expect(\"column does not exist\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+\n",
       "| dates      | n   |\n",
       "| ---        | --- |\n",
       "| date32     | i32 |\n",
       "+============+=====+\n",
       "| 2020-08-21 | 1   |\n",
       "+------------+-----+\n",
       "| 2020-08-21 | 2   |\n",
       "+------------+-----+\n",
       "| 2020-08-22 | 3   |\n",
       "+------------+-----+\n",
       "| 2020-08-23 | 4   |\n",
       "+------------+-----+\n",
       "| 2020-08-22 | 5   |\n",
       "+------------+-----+\n"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.select(&[\"dates\", \"n\"])\n",
    "    .expect(\"column does not exist\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `DataFrame` can also be sliced in to a subset of the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+\n",
       "| dates      | n   | foos  |\n",
       "| ---        | --- | ---   |\n",
       "| date32     | i32 | str   |\n",
       "+============+=====+=======+\n",
       "| 2020-08-22 | 3   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-23 | 4   | \"foo\" |\n",
       "+------------+-----+-------+\n"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "let offset = 2;\n",
    "let length = 2;\n",
    "df.slice(offset, length)\n",
    "    .expect(\"slice was not within bounds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Select a column by index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Series: i32\n",
       "[\n",
       "\t1\n",
       "\t2\n",
       "\t3\n",
       "\t4\n",
       "\t5\n",
       "]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.select_at_idx(1)\n",
    "    .expect(\"column was not within bounds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Boolean indexing\n",
    "Boolean indexes can be used to filter data. Note that this also works on `Series` and `ChunkedArray`. We also use the `as_result!` macro. This utility expects a block that returns a `Result<T, PolarsError>`. This makes it to convenient to use the `?` operator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+\n",
       "| dates      | n   | foos  |\n",
       "| ---        | --- | ---   |\n",
       "| date32     | i32 | str   |\n",
       "+============+=====+=======+\n",
       "| 2020-08-22 | 3   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-23 | 4   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-22 | 5   | \"foo\" |\n",
       "+------------+-----+-------+\n"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    // select the n column\n",
    "    let n_s = df.column(\"n\")?;\n",
    "    let mask = n_s.gt(2);\n",
    "\n",
    "    // filter values > 2\n",
    "    df.filter(&mask)\n",
    "}).unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Filter all values in the \"n\" column greater than 2 and smaller than 5:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+\n",
       "| dates      | n   | foos  |\n",
       "| ---        | --- | ---   |\n",
       "| date32     | i32 | str   |\n",
       "+============+=====+=======+\n",
       "| 2020-08-22 | 3   | \"foo\" |\n",
       "+------------+-----+-------+\n",
       "| 2020-08-23 | 4   | \"foo\" |\n",
       "+------------+-----+-------+\n"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    // select the n column\n",
    "    let n_s = df.column(\"n\")?;\n",
    "    \n",
    "    // create the boolean mask\n",
    "    let mask = (n_s.gt(2) & n_s.lt(5))?;\n",
    "\n",
    "    // filter values > 2\n",
    "    df.filter(&mask)\n",
    "}).unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For all the comparison methods available on `Series` and `ChunkArrays` check the [ChunkCompare trait](https://ritchie46.github.io/polars/polars/chunked_array/ops/trait.ChunkCompare.html).\n",
    "\n",
    "# Setting\n",
    "Setting a new column can be done with the `hstack` operation. This is operation adds new columns to the existing `DataFrame`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+-------+\n",
       "| dates      | n   | foos  | days  |\n",
       "| ---        | --- | ---   | ---   |\n",
       "| date32     | i32 | str   | str   |\n",
       "+============+=====+=======+=======+\n",
       "| 2020-08-21 | 1   | \"foo\" | \"mo\"  |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-21 | 2   | \"foo\" | \"tue\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-22 | 3   | \"foo\" | \"wed\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-23 | 4   | \"foo\" | \"thu\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-22 | 5   | \"foo\" | \"fri\" |\n",
       "+------------+-----+-------+-------+\n"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "let mut df = df;\n",
    "let s = Series::new(\"days\", &[\"mo\", \"tue\", \"wed\", \"thu\", \"fri\"]);\n",
    "df.hstack(&[s]).unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It isn't possible to get mutable access to the columns of a `DataFrame`, because this would give you the possibility to invalidate the `DataFrame` (for instance by replacing the column with a `Series` with a different length).\n",
    "\n",
    "Luckely there are other ways to mutate a DataFrame. We could for instance replace a column in the `DataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+-------+\n",
       "| dates      | n   | foos  | days  |\n",
       "| ---        | --- | ---   | ---   |\n",
       "| date32     | i32 | str   | str   |\n",
       "+============+=====+=======+=======+\n",
       "| 2020-08-21 | 1   | \"bar\" | \"mo\"  |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-21 | 2   | \"bar\" | \"tue\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-22 | 3   | \"bar\" | \"wed\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-23 | 4   | \"bar\" | \"thu\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-22 | 5   | \"bar\" | \"fri\" |\n",
       "+------------+-----+-------+-------+\n"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "let s = Utf8Chunked::full(\"bars\", \"bar\", 5);\n",
    "df.replace(\"foos\", s).unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or if we want to use the column we're replacing to determine the new column's values we can use the `apply` method and use a closure to create the new column.\n",
    "\n",
    "Below we use this determine `n + 1`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+-------+\n",
       "| dates      | n   | foos  | days  |\n",
       "| ---        | --- | ---   | ---   |\n",
       "| date32     | i32 | str   | str   |\n",
       "+============+=====+=======+=======+\n",
       "| 2020-08-21 | 2   | \"bar\" | \"mo\"  |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-21 | 3   | \"bar\" | \"tue\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-22 | 4   | \"bar\" | \"wed\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-23 | 5   | \"bar\" | \"thu\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-22 | 6   | \"bar\" | \"fri\" |\n",
       "+------------+-----+-------+-------+\n"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.apply(\"n\", |s| s + 1).unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Both the `replace` and the `apply` methods exist for selection by index; \n",
    "* `replace_at_idx`\n",
    "* `apply_at_idx`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+-------+-------+\n",
       "| dates      | n   | foos  | days  |\n",
       "| ---        | --- | ---   | ---   |\n",
       "| date32     | i32 | str   | str   |\n",
       "+============+=====+=======+=======+\n",
       "| 2020-08-21 | 4   | \"bar\" | \"mo\"  |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-21 | 6   | \"bar\" | \"tue\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-22 | 8   | \"bar\" | \"wed\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-23 | 10  | \"bar\" | \"thu\" |\n",
       "+------------+-----+-------+-------+\n",
       "| 2020-08-22 | 12  | \"bar\" | \"fri\" |\n",
       "+------------+-----+-------+-------+\n"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.apply_at_idx(1, |s| s * 2)\n",
    "    .unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or we can apply a closure to the values that are valid under a condition constraint:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+---------------------+-------+\n",
       "| dates      | n   | foos                | days  |\n",
       "| ---        | --- | ---                 | ---   |\n",
       "| date32     | i32 | str                 | str   |\n",
       "+============+=====+=====================+=======+\n",
       "| 2020-08-21 | 4   | \"not_within_bounds\" | \"mo\"  |\n",
       "+------------+-----+---------------------+-------+\n",
       "| 2020-08-21 | 6   | \"bar\"               | \"tue\" |\n",
       "+------------+-----+---------------------+-------+\n",
       "| 2020-08-22 | 8   | \"bar\"               | \"wed\" |\n",
       "+------------+-----+---------------------+-------+\n",
       "| 2020-08-23 | 10  | \"not_within_bounds\" | \"thu\" |\n",
       "+------------+-----+---------------------+-------+\n",
       "| 2020-08-22 | 12  | \"not_within_bounds\" | \"fri\" |\n",
       "+------------+-----+---------------------+-------+\n"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    let mask = (df.column(\"n\")?.gt(4) & df.column(\"n\")?.lt(10))?;\n",
    "    \n",
    "    df.may_apply(\"foos\", |s| {\n",
    "        s.utf8()?\n",
    "        .set(&!mask, Some(\"not_within_bounds\"))\n",
    "        }\n",
    "    )\n",
    "    }\n",
    ").unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Iterators\n",
    "Every `ChunkedArray` implements the [IntoIterator trait](https://doc.rust-lang.org/std/iter/trait.IntoIterator.html) which gives us all the powerful trait methods available for iterators."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Some(15)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Ok(())"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    let s = Series::new(\"a\", [1, 2, 3, 4, 5]);\n",
    "    \n",
    "    let v = s.i32()?\n",
    "        .into_iter()\n",
    "        .sum::<Option<i32>>();\n",
    "    \n",
    "    println!(\"{:?}\", v);\n",
    "    \n",
    "    Ok(())\n",
    "})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+------------+-----+---------------------+----------+\n",
       "| dates      | n   | foos                | days     |\n",
       "| ---        | --- | ---                 | ---      |\n",
       "| date32     | i32 | str                 | str      |\n",
       "+============+=====+=====================+==========+\n",
       "| 2020-08-21 | 4   | \"not_within_bounds\" | \"mo_ay\"  |\n",
       "+------------+-----+---------------------+----------+\n",
       "| 2020-08-21 | 6   | \"bar\"               | \"tue_ay\" |\n",
       "+------------+-----+---------------------+----------+\n",
       "| 2020-08-22 | 8   | \"bar\"               | \"wed_ay\" |\n",
       "+------------+-----+---------------------+----------+\n",
       "| 2020-08-23 | 10  | \"not_within_bounds\" | \"thu_ay\" |\n",
       "+------------+-----+---------------------+----------+\n",
       "| 2020-08-22 | 12  | \"not_within_bounds\" | \"fri_ay\" |\n",
       "+------------+-----+---------------------+----------+\n"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    // adds \"ay\" to every word.\n",
    "    fn to_pig_latin(opt_val: Option<&str>) -> Option<String> {\n",
    "        opt_val.map(|val| format!(\"{}_ay\", val))\n",
    "    }\n",
    "    \n",
    "    // may apply takes a closure that may fail.\n",
    "    df.may_apply(\"days\", |s| {\n",
    "        let ca: Utf8Chunked = s.utf8()?\n",
    "            .into_iter()\n",
    "            .map(to_pig_latin)\n",
    "            .collect();\n",
    "        Ok(ca)\n",
    "    });\n",
    "    \n",
    "    Ok(df.clone())\n",
    "}).unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Concat\n",
    "\n",
    "Polars provides various facilities for easily combining `DataFrames` and `Series`.\n",
    "\n",
    "We can concatenate a `DataFrame` with `hstack`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+------------+-----+---------------------+----------+------------+-----+---------------------+----------+\n",
      "| dates      | n   | foos                | days     | dates      | n   | foos                | days     |\n",
      "| ---        | --- | ---                 | ---      | ---        | --- | ---                 | ---      |\n",
      "| date32     | i32 | str                 | str      | date32     | i32 | str                 | str      |\n",
      "+============+=====+=====================+==========+============+=====+=====================+==========+\n",
      "| 2020-08-21 | 4   | \"not_within_bounds\" | \"mo_ay\"  | 2020-08-21 | 4   | \"not_within_bounds\" | \"mo_ay\"  |\n",
      "+------------+-----+---------------------+----------+------------+-----+---------------------+----------+\n",
      "| 2020-08-21 | 6   | \"bar\"               | \"tue_ay\" | 2020-08-21 | 6   | \"bar\"               | \"tue_ay\" |\n",
      "+------------+-----+---------------------+----------+------------+-----+---------------------+----------+\n",
      "| 2020-08-22 | 8   | \"bar\"               | \"wed_ay\" | 2020-08-22 | 8   | \"bar\"               | \"wed_ay\" |\n",
      "+------------+-----+---------------------+----------+------------+-----+---------------------+----------+\n",
      "| 2020-08-23 | 10  | \"not_within_bounds\" | \"thu_ay\" | 2020-08-23 | 10  | \"not_within_bounds\" | \"thu_ay\" |\n",
      "+------------+-----+---------------------+----------+------------+-----+---------------------+----------+\n",
      "| 2020-08-22 | 12  | \"not_within_bounds\" | \"fri_ay\" | 2020-08-22 | 12  | \"not_within_bounds\" | \"fri_ay\" |\n",
      "+------------+-----+---------------------+----------+------------+-----+---------------------+----------+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "{\n",
    "    let mut df1 = df.clone();   \n",
    "    \n",
    "    df1.hstack(df.get_columns());\n",
    "    \n",
    "    println!(\"{:?}\", df1);\n",
    "};"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or append the rows of a second DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+------------+-----+---------------------+----------+\n",
      "| dates      | n   | foos                | days     |\n",
      "| ---        | --- | ---                 | ---      |\n",
      "| date32     | i32 | str                 | str      |\n",
      "+============+=====+=====================+==========+\n",
      "| 2020-08-21 | 4   | \"not_within_bounds\" | \"mo_ay\"  |\n",
      "+------------+-----+---------------------+----------+\n",
      "| 2020-08-21 | 6   | \"bar\"               | \"tue_ay\" |\n",
      "+------------+-----+---------------------+----------+\n",
      "| 2020-08-22 | 8   | \"bar\"               | \"wed_ay\" |\n",
      "+------------+-----+---------------------+----------+\n",
      "| 2020-08-23 | 10  | \"not_within_bounds\" | \"thu_ay\" |\n",
      "+------------+-----+---------------------+----------+\n"
     ]
    }
   ],
   "source": [
    "{\n",
    "    let mut df1 = df.clone();   \n",
    "    \n",
    "    df1.vstack(&df);\n",
    "    \n",
    "    println!(\"{:?}\", df1);\n",
    "};"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Join\n",
    "SQL-style joins. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "| 2020-08-22 | 12  | \"not_within_bounds\" | \"fri_ay\" |\n",
      "+------------+-----+---------------------+----------+\n",
      "| 2020-08-21 | 4   | \"not_within_bounds\" | \"mo_ay\"  |\n",
      "+------------+-----+---------------------+----------+\n",
      "| 2020-08-21 | 6   | \"bar\"               | \"tue_ay\" |\n",
      "+------------+-----+---------------------+----------+\n",
      "| 2020-08-22 | 8   | \"bar\"               | \"wed_ay\" |\n",
      "+------------+-----+---------------------+----------+\n",
      "| 2020-08-23 | 10  | \"not_within_bounds\" | \"thu_ay\" |\n",
      "+------------+-----+---------------------+----------+\n",
      "| 2020-08-22 | 12  | \"not_within_bounds\" | \"fri_ay\" |\n",
      "+------------+-----+---------------------+----------+\n",
      "\n",
      "+-------+------+\n",
      "| key   | lval |\n",
      "| ---   | ---  |\n",
      "| str   | i32  |\n",
      "+=======+======+\n",
      "| \"foo\" | 1    |\n",
      "+-------+------+\n",
      "| \"foo\" | 2    |\n",
      "+-------+------+\n",
      "\n",
      "+-------+------+\n",
      "| key   | rval |\n",
      "| ---   | ---  |\n",
      "| str   | i32  |\n",
      "+=======+======+\n",
      "| \"foo\" | 4    |\n",
      "+-------+------+\n",
      "| \"foo\" | 5    |\n",
      "+-------+------+\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "+-------+------+------+\n",
       "| key   | lval | rval |\n",
       "| ---   | ---  | ---  |\n",
       "| str   | i32  | i32  |\n",
       "+=======+======+======+\n",
       "| \"foo\" | 1    | 4    |\n",
       "+-------+------+------+\n",
       "| \"foo\" | 2    | 4    |\n",
       "+-------+------+------+\n",
       "| \"foo\" | 1    | 5    |\n",
       "+-------+------+------+\n",
       "| \"foo\" | 2    | 5    |\n",
       "+-------+------+------+\n"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    let left = DataFrame::new(vec![\n",
    "        Series::new(\"key\", &[\"foo\", \"foo\"]),\n",
    "        Series::new(\"lval\", &[1, 2]),\n",
    "    ])?;\n",
    "    \n",
    "    let right = DataFrame::new(vec![\n",
    "        Series::new(\"key\", &[\"foo\", \"foo\"]),\n",
    "        Series::new(\"rval\", &[4, 5]),\n",
    "    ])?;\n",
    "    \n",
    "    println!(\"{:?}\", left);\n",
    "    println!(\"{:?}\", right);\n",
    "    \n",
    "    left.inner_join(&right, \"key\", \"key\")\n",
    "}).unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another example that can be given is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-------+------+\n",
      "| key   | lval |\n",
      "| ---   | ---  |\n",
      "| str   | i32  |\n",
      "+=======+======+\n",
      "| \"foo\" | 1    |\n",
      "+-------+------+\n",
      "| \"bar\" | 2    |\n",
      "+-------+------+\n",
      "\n",
      "+-------+------+\n",
      "| key   | rval |\n",
      "| ---   | ---  |\n",
      "| str   | i32  |\n",
      "+=======+======+\n",
      "| \"foo\" | 4    |\n",
      "+-------+------+\n",
      "| \"bar\" | 5    |\n",
      "+-------+------+\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "+-------+------+------+\n",
       "| key   | lval | rval |\n",
       "| ---   | ---  | ---  |\n",
       "| str   | i32  | i32  |\n",
       "+=======+======+======+\n",
       "| \"foo\" | 1    | 4    |\n",
       "+-------+------+------+\n",
       "| \"bar\" | 2    | 5    |\n",
       "+-------+------+------+\n"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    \n",
    "    let left = DataFrame::new(vec![\n",
    "        Series::new(\"key\", &[\"foo\", \"bar\"]),\n",
    "        Series::new(\"lval\", &[1, 2]),\n",
    "    ])?;\n",
    "    \n",
    "    let right = DataFrame::new(vec![\n",
    "        Series::new(\"key\", &[\"foo\", \"bar\"]),\n",
    "        Series::new(\"rval\", &[4, 5]),\n",
    "    ])?;\n",
    "    \n",
    "    println!(\"{:?}\", left);\n",
    "    println!(\"{:?}\", right);\n",
    "    \n",
    "    left.inner_join(&right, \"key\", \"key\")\n",
    "}).unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Grouping\n",
    "\n",
    "By \"group by\" we are referring to a process involving one or more of the following steps:\n",
    "* **Splitting** the data into groups based on some criteria\n",
    "* **Applying** a function to each group independently\n",
    "* **Combining** the results into a data structure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+-------+---------+-----+-----+\n",
       "| A     | B       | C   | D   |\n",
       "| ---   | ---     | --- | --- |\n",
       "| str   | str     | i32 | i32 |\n",
       "+=======+=========+=====+=====+\n",
       "| \"foo\" | \"one\"   | 1   | 1   |\n",
       "+-------+---------+-----+-----+\n",
       "| \"bar\" | \"one\"   | 1   | 2   |\n",
       "+-------+---------+-----+-----+\n",
       "| \"foo\" | \"two\"   | 1   | 3   |\n",
       "+-------+---------+-----+-----+\n",
       "| \"bar\" | \"three\" | 1   | 4   |\n",
       "+-------+---------+-----+-----+\n",
       "| \"foo\" | \"two\"   | 1   | 5   |\n",
       "+-------+---------+-----+-----+\n",
       "| \"bar\" | \"two\"   | 1   | 6   |\n",
       "+-------+---------+-----+-----+\n",
       "| \"foo\" | \"one\"   | 1   | 7   |\n",
       "+-------+---------+-----+-----+\n",
       "| \"foo\" | \"three\" | 1   | 8   |\n",
       "+-------+---------+-----+-----+\n"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "let df = DataFrame::new(vec![\n",
    "    Series::new(\"A\", &[\"foo\", \"bar\", \"foo\", \"bar\",\n",
    "                        \"foo\", \"bar\", \"foo\", \"foo\"]),\n",
    "    Series::new(\"B\", &[\"one\", \"one\", \"two\", \"three\",\n",
    "                       \"two\", \"two\", \"one\", \"three\"]),\n",
    "    Int32Chunked::full(\"C\", 1, 8).into(),\n",
    "    Series::new(\"D\", &[1, 2, 3, 4,\n",
    "                      5, 6, 7, 8])\n",
    "]).unwrap();\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Grouping and then applying the `sum()` method to the resulting groups:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+-------+-------+\n",
       "| A     | C_sum |\n",
       "| ---   | ---   |\n",
       "| str   | i32   |\n",
       "+=======+=======+\n",
       "| \"foo\" | 5     |\n",
       "+-------+-------+\n",
       "| \"bar\" | 3     |\n",
       "+-------+-------+\n"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    (&df).groupby(\"A\")?.select(\"C\").sum()\n",
    "}).unwrap()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+-------+---------+-------+\n",
       "| A     | B       | C_sum |\n",
       "| ---   | ---     | ---   |\n",
       "| str   | str     | i32   |\n",
       "+=======+=========+=======+\n",
       "| \"bar\" | \"three\" | 1     |\n",
       "+-------+---------+-------+\n",
       "| \"foo\" | \"one\"   | 2     |\n",
       "+-------+---------+-------+\n",
       "| \"bar\" | \"two\"   | 1     |\n",
       "+-------+---------+-------+\n",
       "| \"foo\" | \"three\" | 1     |\n",
       "+-------+---------+-------+\n",
       "| \"bar\" | \"one\"   | 1     |\n",
       "+-------+---------+-------+\n",
       "| \"foo\" | \"two\"   | 2     |\n",
       "+-------+---------+-------+\n"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    (&df).groupby(&[\"A\", \"B\"])?.select(\"C\").sum()\n",
    "}).unwrap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pivot tables\n",
    "Pivots create a summary table by a applying a groupby and defining a pivot column and values to aggregate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+---------+-----+-------+-----+\n",
       "| A       | B   | C     | E   |\n",
       "| ---     | --- | ---   | --- |\n",
       "| str     | str | str   | i32 |\n",
       "+=========+=====+=======+=====+\n",
       "| \"one\"   | \"A\" | \"foo\" | 0   |\n",
       "+---------+-----+-------+-----+\n",
       "| \"one\"   | \"B\" | \"foo\" | 1   |\n",
       "+---------+-----+-------+-----+\n",
       "| \"two\"   | \"C\" | \"foo\" | 2   |\n",
       "+---------+-----+-------+-----+\n",
       "| \"three\" | \"A\" | \"bar\" | 3   |\n",
       "+---------+-----+-------+-----+\n",
       "| \"one\"   | \"B\" | \"bar\" | 4   |\n",
       "+---------+-----+-------+-----+\n",
       "| \"one\"   | \"C\" | \"bar\" | 5   |\n",
       "+---------+-----+-------+-----+\n",
       "| \"two\"   | \"A\" | \"foo\" | 6   |\n",
       "+---------+-----+-------+-----+\n",
       "| \"three\" | \"B\" | \"foo\" | 7   |\n",
       "+---------+-----+-------+-----+\n",
       "| \"one\"   | \"C\" | \"foo\" | 8   |\n",
       "+---------+-----+-------+-----+\n",
       "| \"one\"   | \"A\" | \"bar\" | 9   |\n",
       "+---------+-----+-------+-----+\n"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "let s0 = Series::new(\"A\", &[\"one\", \"one\", \"two\", \"three\",\n",
    "                        \"one\", \"one\", \"two\", \"three\",\n",
    "                        \"one\", \"one\", \"two\", \"three\"\n",
    "]);\n",
    "let s1 = Series::new(\"B\", &[\"A\", \"B\", \"C\",\n",
    "                            \"A\", \"B\", \"C\",\n",
    "                            \"A\", \"B\", \"C\",\n",
    "                            \"A\", \"B\", \"C\",\n",
    "]);\n",
    "let s2 = Series::new(\"C\", &[\"foo\", \"foo\", \"foo\", \"bar\", \"bar\", \"bar\",\n",
    "                            \"foo\", \"foo\", \"foo\", \"bar\", \"bar\", \"bar\"\n",
    "]);\n",
    "let s3 = Series::new(\"E\", &((0..12).collect::<Vec<_>>()));\n",
    "\n",
    "let df = DataFrame::new(vec![s0, s1, s2, s3]).unwrap();\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+---------+-----+-----+\n",
       "| A       | foo | bar |\n",
       "| ---     | --- | --- |\n",
       "| str     | i32 | i32 |\n",
       "+=========+=====+=====+\n",
       "| \"one\"   | 9   | 18  |\n",
       "+---------+-----+-----+\n",
       "| \"three\" | 7   | 14  |\n",
       "+---------+-----+-----+\n",
       "| \"two\"   | 8   | 10  |\n",
       "+---------+-----+-----+\n"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    (&df).groupby(&[\"A\"])?.pivot(\"C\", \"E\").sum()\n",
    "}).unwrap()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "+---------+-----+------+------+\n",
       "| A       | B   | foo  | bar  |\n",
       "| ---     | --- | ---  | ---  |\n",
       "| str     | str | i32  | i32  |\n",
       "+=========+=====+======+======+\n",
       "| \"three\" | \"A\" | null | 3    |\n",
       "+---------+-----+------+------+\n",
       "| \"one\"   | \"C\" | 8    | 5    |\n",
       "+---------+-----+------+------+\n",
       "| \"two\"   | \"A\" | 6    | null |\n",
       "+---------+-----+------+------+\n",
       "| \"one\"   | \"A\" | 0    | 9    |\n",
       "+---------+-----+------+------+\n",
       "| \"two\"   | \"C\" | 2    | null |\n",
       "+---------+-----+------+------+\n",
       "| \"three\" | \"B\" | 7    | null |\n",
       "+---------+-----+------+------+\n",
       "| \"one\"   | \"B\" | 1    | 4    |\n",
       "+---------+-----+------+------+\n",
       "| \"two\"   | \"B\" | null | 10   |\n",
       "+---------+-----+------+------+\n",
       "| \"three\" | \"C\" | null | 11   |\n",
       "+---------+-----+------+------+\n"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "as_result!({\n",
    "    (&df).groupby(&[\"A\", \"B\"])?.pivot(\"C\", \"E\").sum()\n",
    "}).unwrap()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}