{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Table Enforcer Demo\n",
    "\n",
    "\n",
    "## Description\n",
    "\n",
    "A python package to facilitate the iterative process of developing and\n",
    "using schema-like representations of table data to recode and validate\n",
    "instances of these data stored in pandas DataFrames. This is a fairly\n",
    "young attempt to solve a recurrent problem many people have. So far I\n",
    "have looked at multiple solutions, but none really did it for me.\n",
    "\n",
    "They either deal primarily with JSON encoded data or they only really solve\n",
    "the validation side of the problem and consider recoding to be a\n",
    "separate issue. They seem to assume that recoding and cleaning has\n",
    "already been done and all we care about is making sure the final product\n",
    "is sane.\n",
    "\n",
    "To me, this seems backwards.\n",
    "\n",
    "I need to load, recode, and validate tables all day, everyday. Sometimes\n",
    "its simple; I can `pandas.read_table()` and all is good. But sometimes I\n",
    "have a 700 column long RedCap data dump that is complicated af, and it\n",
    "really helps me to develop my recoding logic through an iterative\n",
    "process. For me it makes sense to couple the recoding process directly\n",
    "with the validation process: to write the \"tests\" for each column first,\n",
    "then add recoding logic in steps until the tests pass.\n",
    "\n",
    "So Table Enforcer is my attempt to apply a sort of \"test driven\n",
    "development\" workflow to data cleaning and validation.\n",
    "\n",
    "## Basic Workflow\n",
    "\n",
    "1.  For each column that you care about in your source table:\n",
    "    1.  Define a `Column` object that represents the ideal state of your\n",
    "        data by passing a list of small, independent, reusable validator\n",
    "        functions and some descriptive information.\n",
    "    2.  Use this object to validate the column data from your source\n",
    "        table.\n",
    "          - It will fail.\n",
    "    3.  Add small, composable, reusable recoding functions to the column\n",
    "        object and iterate until your validations pass.\n",
    "2.  Define an `Enforcer` object by passing it a list of your column\n",
    "    representation objects.\n",
    "3.  This enforcer can be used to recode or validate recoded tables of\n",
    "    the same kind as your source table wherever your applications use\n",
    "    that type of data.\n",
    "\n",
    "Please take a look and offer thoughts/advice.\n",
    "\n",
    "  - Free software: MIT license\n",
    "  - Web site: <https://github.com/xguse/table_enforcer>\n",
    "  - Documentation: <https://table-enforcer.readthedocs.io>.\n",
    "\n",
    "## Features\n",
    "\n",
    "  - `Enforcer` and `Column` classes to define what columns should look\n",
    "    like in a table.\n",
    "  - `CompundColumn` class that supports complex operations including\n",
    "    \"one-to-many\" and \"many-to-one\" recoding logic as sometimes a column\n",
    "    tries to do too much and should really be multiple columns as well\n",
    "    as the reverse.\n",
    "  - Growing cadre of built-in validator functions and decorators.\n",
    "  - Decorators for use in defining parameterized validators like\n",
    "    `between_4_and_60()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:50.170525Z",
     "start_time": "2018-02-14T20:36:50.152356Z"
    }
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:50.538563Z",
     "start_time": "2018-02-14T20:36:50.172364Z"
    }
   },
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "import re\n",
    "\n",
    "from box import Box\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table Enforcer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:50.561705Z",
     "start_time": "2018-02-14T20:36:50.540462Z"
    }
   },
   "outputs": [],
   "source": [
    "from table_enforcer import Enforcer, Column, CompoundColumn\n",
    "import table_enforcer.errors as e\n",
    "\n",
    "from table_enforcer import validate as v\n",
    "from table_enforcer import recode as r"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Constants"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Load or create your Table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:50.742115Z",
     "start_time": "2018-02-14T20:36:50.563415Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "      <th>col7</th>\n",
       "      <th>col8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "      <td>male</td>\n",
       "      <td>01:one*</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>m</td>\n",
       "      <td>3:Three</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10:ten</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>Female</td>\n",
       "      <td>6:Six</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3    col4     col5  col6  col7  col8\n",
       "0     7     3    10    male  01:one*     0     1     1\n",
       "1     2     4     6       m  3:Three     1     0     0\n",
       "2     6     6     2       F   10:ten     0     1     0\n",
       "3     5     5     5  Female    6:Six     1     1     1"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table_path = \"../tests/files/demo_table.csv\"\n",
    "df = pd.read_csv(table_path)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Validator Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Built-in Validators\n",
    "\n",
    "Some validator functions come built-in (See the `table_enforcer/validate/funcs.py` module code for current list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:50.764140Z",
     "start_time": "2018-02-14T20:36:50.746789Z"
    }
   },
   "outputs": [],
   "source": [
    "# ! cat $v.funcs.__file__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validator Call Signatures\n",
    "\n",
    "In general, validators take a single `pandas.Series` object as input and return a `pandas.Series` of the same shape and indexes containing `True` or `False` relative to which items passed the validation logic. \n",
    "\n",
    "This changes a little if you are using a decorator like `@maxmin` (_See section on Decorating Validators_)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Writing Custom Validators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:50.952317Z",
     "start_time": "2018-02-14T20:36:50.929601Z"
    }
   },
   "outputs": [],
   "source": [
    "def gte2(series):\n",
    "    return series >= 2\n",
    "\n",
    "def lte10(series):\n",
    "    return series <= 10\n",
    "\n",
    "def length_is_one(series):\n",
    "    return series.str.len() == 1\n",
    "\n",
    "def valid_sex(series):\n",
    "    sex = set(['M', 'F'])\n",
    "    return series.isin(sex)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Decorating Validators\n",
    "You can define decorators that extend the call signature of validators to accomadate more complex things like ranges.\n",
    "\n",
    "Here we can define a single validator that accomplishes the same thing as the combination of the first two validators that we defined above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:51.064903Z",
     "start_time": "2018-02-14T20:36:51.036035Z"
    }
   },
   "outputs": [],
   "source": [
    "@v.decorators.minmax(low=2, high=10)\n",
    "def bt_2_and_10(series):\n",
    "    \"\"\"Test that the data items fall within range: 2 <= x <= 10.\"\"\"\n",
    "    return series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Note:__ pay attention to the fact that here we return the __original__ series object. _We didn't do ANYTHING to it._ The testing gets done in the `@v.decorators.minmax` decorator function. This allows our validators to always expect a single argument at run-time and always return a single argument.  This is _very_ important.\n",
    "\n",
    "### Writing Custom Validator Decorators\n",
    "\n",
    "This is a bit beyond this demo's scope but here is the `table_enforcer/validate/decorators.py` code so that you can get an idea oh how to do it if you are brave."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:51.162212Z",
     "start_time": "2018-02-14T20:36:51.145349Z"
    }
   },
   "outputs": [],
   "source": [
    "# !cat $v.decorators.__file__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Defining Columns\n",
    "\n",
    "Now that we have a few validator functions, its time to put together a representation of our table and use that object to help us get us a final data table that is in a form we like. We will start with `col1`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:51.371062Z",
     "start_time": "2018-02-14T20:36:51.320455Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "      <th>col7</th>\n",
       "      <th>col8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "      <td>male</td>\n",
       "      <td>01:one*</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>m</td>\n",
       "      <td>3:Three</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10:ten</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>Female</td>\n",
       "      <td>6:Six</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3    col4     col5  col6  col7  col8\n",
       "0     7     3    10    male  01:one*     0     1     1\n",
       "1     2     4     6       m  3:Three     1     0     0\n",
       "2     6     6     2       F   10:ten     0     1     0\n",
       "3     5     5     5  Female    6:Six     1     1     1"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets say for this example we **only** care about `col1`,`col3`, and `col4`.\n",
    "\n",
    "We create a column object for each column we want that contains the name of the column we want to represent. We set the options to describe our ideal format. We set the values for each option for each column including a a list of validator functions that should all pass for all items in each column if we got our way.\n",
    "\n",
    "For now, we will ignore the `recoders` argument. What we end up with is the three column objects below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:51.518575Z",
     "start_time": "2018-02-14T20:36:51.455214Z"
    }
   },
   "outputs": [],
   "source": [
    "col1 = Column(name='col1',\n",
    "             dtype=np.int,\n",
    "             unique=False,\n",
    "             validators=[v.funcs.not_null, v.funcs.positive, bt_2_and_10],\n",
    "             recoders=None)\n",
    "\n",
    "col3 = Column(name='col3',\n",
    "             dtype=np.int,\n",
    "             unique=True,\n",
    "             validators=[v.funcs.not_null, v.funcs.positive, bt_2_and_10],\n",
    "             recoders=None)\n",
    "\n",
    "col4 = Column(name='col4',\n",
    "             dtype=str,\n",
    "             unique=False,\n",
    "             validators=[v.funcs.upper, length_is_one, valid_sex],\n",
    "             recoders=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validating a Column object\n",
    "For each column object we store the validator functions in a `dict`.\n",
    "\n",
    "Here is the value for `col4`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:51.643722Z",
     "start_time": "2018-02-14T20:36:51.562661Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'length_is_one': <function __main__.length_is_one>,\n",
       " 'upper': <function table_enforcer.validate.funcs.upper>,\n",
       " 'valid_sex': <function __main__.valid_sex>}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col4.validators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We validate a column by passing our entire original dataframe to the column object. The tests get run independently and after the data has passed through each test, we get a dataframe of results.  We can see right away which data items fail which tests.\n",
    "\n",
    "This helps us plan our recoding efforts later. We can immediately see what sorts of operations need to be run to convert those `False` tests to `True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:51.754279Z",
     "start_time": "2018-02-14T20:36:51.670249Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>length_is_one</th>\n",
       "      <th>upper</th>\n",
       "      <th>valid_sex</th>\n",
       "      <th>dtype</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   length_is_one  upper  valid_sex  dtype\n",
       "0          False  False      False   True\n",
       "1           True  False      False   True\n",
       "2           True   True       True   True\n",
       "3          False  False      False   True"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col4.validate(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In reality we can have hundreds or thousands of rows and it would be nice to focus on only those that fail validation. Here is how you do that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:51.880391Z",
     "start_time": "2018-02-14T20:36:51.782530Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>length_is_one</th>\n",
       "      <th>upper</th>\n",
       "      <th>valid_sex</th>\n",
       "      <th>dtype</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   length_is_one  upper  valid_sex  dtype\n",
       "0          False  False      False   True\n",
       "1           True  False      False   True\n",
       "3          False  False      False   True"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col4.validate(df, failed_only=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Wait a minute... Where did that validation test called `dtype` come from?\n",
    "\n",
    "Turns out we get that one for free since we defined that this column needs to be of `dtype` `np.int`. We get a similar `unique` validation check inserted if we set that option to `True`. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Recoder Functions\n",
    "\n",
    "Recoder functions have a very similar structure to validator functions. But their purpose is to coerce the raw data into a more useful form. For example `Col4` is obviously meant to represent 'male'/'female' logic. But we don't want to have to interpret all possible ways to represent that logic every time we want to use that column. So we write a recoder function to do convert all items to either \"M\" or \"F\" so we only need to understand two choices from now on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:52.029978Z",
     "start_time": "2018-02-14T20:36:51.999938Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "      <th>col7</th>\n",
       "      <th>col8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "      <td>male</td>\n",
       "      <td>01:one*</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>m</td>\n",
       "      <td>3:Three</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10:ten</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>Female</td>\n",
       "      <td>6:Six</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3    col4     col5  col6  col7  col8\n",
       "0     7     3    10    male  01:one*     0     1     1\n",
       "1     2     4     6       m  3:Three     1     0     0\n",
       "2     6     6     2       F   10:ten     0     1     0\n",
       "3     5     5     5  Female    6:Six     1     1     1"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Built-in Recoders\n",
    "\n",
    "Some recoder functions come built-in (See the `table_enforcer/recode/funcs.py` module code for current list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:52.125086Z",
     "start_time": "2018-02-14T20:36:52.107938Z"
    }
   },
   "outputs": [],
   "source": [
    "# !cat $r.funcs.__file__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recoder Call Signatures\n",
    "\n",
    "Like validators, recoders take a single `pandas.Series` object as input and return a `pandas.Series` of the same shape and indexes as the original series object. However, instead of returning a series of `True`/`False` values, it performs some operation on the data that gets the column data closer to being how you want it to look during analysis operations.\n",
    "\n",
    "## Writing Custom Recoders\n",
    "\n",
    "### Recoders are Meant to be Composable\n",
    "\n",
    "You are able to build a single, monolithic recoder that transforms a column all the way into what you want in a single step. But it may be better to write recoders that perform only a single step on the way to getting a particular column into shape. \n",
    "\n",
    "This results in reusable functions that are \"composable\". Meaning that for the NEXT column, you may not even need to write a new function. All you may need to do is list a sequence of recoders that already exist. \n",
    "\n",
    "This is because, recoders are applied in a pipeline model.  The output from the first is supplied to the second etc.\n",
    "\n",
    "Now, lets write our recoders to transform `col4`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:52.274906Z",
     "start_time": "2018-02-14T20:36:52.224742Z"
    }
   },
   "outputs": [],
   "source": [
    "def standardize_sex(series):\n",
    "    \"\"\"Return a series where common representations of 'male'/'female' are standardized.\n",
    "    \n",
    "    Things like ['M', 'MALE', 'M', 'BOY', ...] are converted to `M`.\n",
    "    Representations of female are treated similarly. \n",
    "    \"\"\"\n",
    "    mapper = {\"M\": \"M\",\n",
    "              \"MALE\": \"M\",\n",
    "              \"BOY\": \"M\",\n",
    "              \"F\": \"F\",\n",
    "              \"FEMALE\": \"F\",\n",
    "              \"GIRL\": \"F\",\n",
    "             }\n",
    "    if series.str.islower().any():\n",
    "        raise ValueError(\"standardize_sex expects input series to contain only UPPERCASE letters.\")\n",
    "    else:\n",
    "        return series.apply(lambda x: mapper[x])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since we plan to call the built-in recoder `upper` first, we only need to support uppercase text here. This is kind of a silly requirement in practice but it demonstrates how you can write composable recoders to fail if expectations are not met."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next Iteration: Validate Recoded Table"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets revisit our original definition of `col4` and add the recoders we have now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:52.627560Z",
     "start_time": "2018-02-14T20:36:52.609558Z"
    }
   },
   "outputs": [],
   "source": [
    "col4 = Column(name='col4',\n",
    "             dtype=str,\n",
    "             unique=False,\n",
    "             validators=[v.funcs.upper, length_is_one, valid_sex],\n",
    "             recoders=[r.funcs.upper, standardize_sex])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now what do we have after we recode this column?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:52.746143Z",
     "start_time": "2018-02-14T20:36:52.721506Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col4\n",
       "0    M\n",
       "1    M\n",
       "2    F\n",
       "3    F"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col4.recode(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That looks pretty good. But let's not rely on our eyes and instead rely on the validation tests we defined. \n",
    "\n",
    "Now we try `col4.recode` again but this time tell it to perform validation after recoding and see what happens.\n",
    "\n",
    "If all is well, we will simply get the recoded column back. If validation fails, we will get a `ValidationError` raised."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:52.872898Z",
     "start_time": "2018-02-14T20:36:52.840705Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col4\n",
       "0    M\n",
       "1    M\n",
       "2    F\n",
       "3    F"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col4.recode(df, validate=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are good! We can now move on to our other columns and repeat the process until all the validation check pass on all of our columns. Then we can populate our `Enforcer` object and we will be nearing the end of our initial sanity check / recoding phase for this table. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-11-14T22:36:25.371934Z",
     "start_time": "2017-11-14T22:36:25.347337Z"
    }
   },
   "source": [
    "# Defining Table Enforcers\n",
    "\n",
    "Just pass a list of the column objects we created to the `Enforcer`. The validation/recoding api for the `Enforcer` mirrors the one for `Columns` so you can reuse what we learned above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:53.073198Z",
     "start_time": "2018-02-14T20:36:53.055550Z"
    }
   },
   "outputs": [],
   "source": [
    "demo = Enforcer(columns=[col1,col3,col4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:53.170458Z",
     "start_time": "2018-02-14T20:36:53.074747Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<table_enforcer.main_classes.Column at 0x7f6bfc99b9e8>,\n",
       " <table_enforcer.main_classes.Column at 0x7f6bfc99bf28>,\n",
       " <table_enforcer.main_classes.Column at 0x7f6bfc9371d0>]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "demo.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:53.319126Z",
     "start_time": "2018-02-14T20:36:53.172985Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col3 col4\n",
       "0     7    10    M\n",
       "1     2     6    M\n",
       "2     6     2    F\n",
       "3     5     5    F"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "demo.recode(df, validate=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__NOTE:__\n",
    "\n",
    "- we will see what happens when the `demo.recode(df, validate=True)` encounters problems in recoding or validating when we pass some bad data to our `load_csv()` function later."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Writing loading functions that use enforcers\n",
    "\n",
    "Use these enforcers to recode or validate recoded tables of the same kind as your source table wherever your applications use that type of data like in your loading functions.\n",
    "\n",
    "You can pass them into a function as an argument or reference them from the script's local scope as you would an imported function.\n",
    "\n",
    "I pass it in here so that I can easily switch between different enforcers for the examples. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:53.400218Z",
     "start_time": "2018-02-14T20:36:53.320895Z"
    }
   },
   "outputs": [],
   "source": [
    "def load_csv(path, enforcer):\n",
    "    df = pd.read_csv(path)\n",
    "    return enforcer.recode(df, validate=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading Good Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:53.512678Z",
     "start_time": "2018-02-14T20:36:53.402456Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "      <th>col7</th>\n",
       "      <th>col8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "      <td>male</td>\n",
       "      <td>01:one*</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>m</td>\n",
       "      <td>3:Three</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10:ten</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>Female</td>\n",
       "      <td>6:Six</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3    col4     col5  col6  col7  col8\n",
       "0     7     3    10    male  01:one*     0     1     1\n",
       "1     2     4     6       m  3:Three     1     0     0\n",
       "2     6     6     2       F   10:ten     0     1     0\n",
       "3     5     5     5  Female    6:Six     1     1     1"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:53.637337Z",
     "start_time": "2018-02-14T20:36:53.514450Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col3 col4\n",
       "0     7    10    M\n",
       "1     2     6    M\n",
       "2     6     2    F\n",
       "3     5     5    F"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = load_csv(path=table_path, enforcer=demo)\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading Bad Data \n",
    "In this case we load a CSV that is slightly different that the one we built our `enforcer` on: perhaps this is represents a new data dump or data from another source.\n",
    "\n",
    "## The problems\n",
    "1. `col1` includes a value less than 2 and we have a validator that requires that all values be between 2 and 10. \n",
    "    - Perhaps this is because this value represents some lab test that has a limit of detection of 2 so we can't confidently distinguish between 0-2.\n",
    "    - So we want to set all values below the detection limit to the detection limit itself. However, we neglected to include a recoder for this since we didnt see these types of values in our first data-set.\n",
    "    - We will see how this causes a `ValidationError` with helpful information to be raised.\n",
    "\n",
    "2. `Feemale` is not found in the `standardize_sex()` recoder's `mapper` dictionary.\n",
    "    - Here we encounter a value that actually has nothing to do with a validation check failing, but breaks our recoder itself.\n",
    "    - **This time**, it causes a helpful `RecodingError` to be raised.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:53.731006Z",
     "start_time": "2018-02-14T20:36:53.641883Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "      <th>col7</th>\n",
       "      <th>col8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "      <td>male</td>\n",
       "      <td>01:one*</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>m</td>\n",
       "      <td>3:Three</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10:ten</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>Feemale</td>\n",
       "      <td>6:Six</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3     col4     col5  col6  col7  col8\n",
       "0     7     3    10     male  01:one*     0     1     1\n",
       "1     2     4     6        m  3:Three     1     0     0\n",
       "2     1     6     2        F   10:ten     0     1     0\n",
       "3     6     5     5  Feemale    6:Six     1     1     1"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table_path_bad = \"../tests/files/demo_table_bad.csv\"\n",
    "df_bad = pd.read_csv(table_path_bad)\n",
    "df_bad"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ValidationErrors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `ValidationError` is raised when we try to get a recoded table or column while setting the `recode` method's option `validate=True`.\n",
    "\n",
    "Lets see what happens when we try to load this table with our original `demo` enforcer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:54.263306Z",
     "start_time": "2018-02-14T20:36:54.072997Z"
    }
   },
   "outputs": [
    {
     "ename": "ValidationError",
     "evalue": "Rows that failed to validate for column 'col1':\n   bt_2_and_10  not_null  positive  dtype\n2        False      True      True   True",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValidationError\u001b[0m                           Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-27-f46767c92254>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mload_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtable_path_bad\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0menforcer\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdemo\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m<ipython-input-23-c3b512fc9549>\u001b[0m in \u001b[0;36mload_csv\u001b[0;34m(path, enforcer)\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mload_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0menforcer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      2\u001b[0m     \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0menforcer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalidate\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/src/repos/git/project-repos/table_enforcer/table_enforcer/main_classes.py\u001b[0m in \u001b[0;36mrecode\u001b[0;34m(self, table, validate)\u001b[0m\n\u001b[1;32m     67\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     68\u001b[0m         \u001b[0;32mfor\u001b[0m \u001b[0mcolumn\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 69\u001b[0;31m             \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcolumn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate_dataframe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalidate\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvalidate\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     70\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     71\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/src/repos/git/project-repos/table_enforcer/table_enforcer/main_classes.py\u001b[0m in \u001b[0;36mupdate_dataframe\u001b[0;34m(self, df, table, validate)\u001b[0m\n\u001b[1;32m     81\u001b[0m         \u001b[0;34m\"\"\"Perform ``self.recode`` and add resulting column(s) to ``df`` and return ``df``.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     82\u001b[0m         \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 83\u001b[0;31m         \u001b[0mrecoded_columns\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalidate\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvalidate\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     84\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconcat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrecoded_columns\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     85\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/src/repos/git/project-repos/table_enforcer/table_enforcer/main_classes.py\u001b[0m in \u001b[0;36mrecode\u001b[0;34m(self, table, validate)\u001b[0m\n\u001b[1;32m    202\u001b[0m             \u001b[0mfailed_rows\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfind_failed_rows\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalidate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mto_frame\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    203\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mfailed_rows\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 204\u001b[0;31m                 \u001b[0;32mraise\u001b[0m \u001b[0mValidationError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Rows that failed to validate for column '{self.name}':\\n{failed_rows}\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    205\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    206\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mto_frame\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mValidationError\u001b[0m: Rows that failed to validate for column 'col1':\n   bt_2_and_10  not_null  positive  dtype\n2        False      True      True   True"
     ]
    }
   ],
   "source": [
    "load_csv(path=table_path_bad, enforcer=demo)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A column object raised a `ValidationError` because it saw some data that did not pass validation while inside the `recode` method because we told it to validate. Also, it gives us some helpful information:\n",
    "\n",
    "1. The column name\n",
    "2. The rows that did not pass validation\n",
    "3. And which validations happened to fail on each row\n",
    "\n",
    "Now we add a recoder that deals with that value and move on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:50:57.031675Z",
     "start_time": "2018-02-14T20:50:56.978604Z"
    }
   },
   "outputs": [],
   "source": [
    "# write another recoder\n",
    "def handle_values_below_detection_limit(series):\n",
    "    series[series < 2] = 2\n",
    "    return series\n",
    "\n",
    "# add the recoder to a Column representing `col1`\n",
    "col1_new = Column(name='col1',\n",
    "                  dtype=np.int,\n",
    "                  unique=False,\n",
    "                  validators=[v.funcs.not_null, v.funcs.positive, bt_2_and_10],\n",
    "                  recoders=[handle_values_below_detection_limit])\n",
    "\n",
    "# build enforcer with the corrected Column object\n",
    "demo2 = Enforcer(columns=[col1_new,\n",
    "                          col3,\n",
    "                          col4])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets try this again with `demo2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:37:38.919074Z",
     "start_time": "2018-02-14T20:37:38.805609Z"
    }
   },
   "outputs": [
    {
     "ename": "RecodingError",
     "evalue": "Recoder 'standardize_sex' raised the following error on column 'col4': KeyError('FEEMALE',).",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[0;32m~/src/repos/git/project-repos/table_enforcer/table_enforcer/main_classes.py\u001b[0m in \u001b[0;36mrecode\u001b[0;34m(self, table, validate)\u001b[0m\n\u001b[1;32m    196\u001b[0m             \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 197\u001b[0;31m                 \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrecoder\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    198\u001b[0m             \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mBaseException\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<ipython-input-16-a3488642c87b>\u001b[0m in \u001b[0;36mstandardize_sex\u001b[0;34m(series)\u001b[0m\n\u001b[1;32m     16\u001b[0m     \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0mseries\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mmapper\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/.anaconda/envs/table_enforcer/lib/python3.6/site-packages/pandas/core/series.py\u001b[0m in \u001b[0;36mapply\u001b[0;34m(self, func, convert_dtype, args, **kwds)\u001b[0m\n\u001b[1;32m   2550\u001b[0m                 \u001b[0mvalues\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0masobject\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2551\u001b[0;31m                 \u001b[0mmapped\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmap_infer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconvert\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mconvert_dtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2552\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/src/inference.pyx\u001b[0m in \u001b[0;36mpandas._libs.lib.map_infer\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;32m<ipython-input-16-a3488642c87b>\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(x)\u001b[0m\n\u001b[1;32m     16\u001b[0m     \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0mseries\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mmapper\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mKeyError\u001b[0m: 'FEEMALE'",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[0;31mRecodingError\u001b[0m                             Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-53-80a005ab7cfc>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf3\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mload_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtable_path_bad\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0menforcer\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdemo2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0mdf3\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<ipython-input-23-c3b512fc9549>\u001b[0m in \u001b[0;36mload_csv\u001b[0;34m(path, enforcer)\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mload_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0menforcer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      2\u001b[0m     \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0menforcer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalidate\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/src/repos/git/project-repos/table_enforcer/table_enforcer/main_classes.py\u001b[0m in \u001b[0;36mrecode\u001b[0;34m(self, table, validate)\u001b[0m\n\u001b[1;32m     67\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     68\u001b[0m         \u001b[0;32mfor\u001b[0m \u001b[0mcolumn\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 69\u001b[0;31m             \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcolumn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate_dataframe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalidate\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvalidate\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     70\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     71\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/src/repos/git/project-repos/table_enforcer/table_enforcer/main_classes.py\u001b[0m in \u001b[0;36mupdate_dataframe\u001b[0;34m(self, df, table, validate)\u001b[0m\n\u001b[1;32m     81\u001b[0m         \u001b[0;34m\"\"\"Perform ``self.recode`` and add resulting column(s) to ``df`` and return ``df``.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     82\u001b[0m         \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 83\u001b[0;31m         \u001b[0mrecoded_columns\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalidate\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvalidate\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     84\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconcat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrecoded_columns\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     85\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/src/repos/git/project-repos/table_enforcer/table_enforcer/main_classes.py\u001b[0m in \u001b[0;36mrecode\u001b[0;34m(self, table, validate)\u001b[0m\n\u001b[1;32m    197\u001b[0m                 \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrecoder\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    198\u001b[0m             \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mBaseException\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 199\u001b[0;31m                 \u001b[0;32mraise\u001b[0m \u001b[0mRecodingError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcol\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrecoder\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    200\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    201\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mvalidate\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mRecodingError\u001b[0m: Recoder 'standardize_sex' raised the following error on column 'col4': KeyError('FEEMALE',)."
     ]
    }
   ],
   "source": [
    "df3 = load_csv(path=table_path_bad, enforcer=demo2)\n",
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Problem 1__ seems to be fixed, but now we get this new `RecodingError`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## RecodingErrors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `RecodingError` is raised when a recoder function fails and raises an exception: any exception.\n",
    "\n",
    "Like the `ValidationError` it tries to give you as much helpful information about where and how it failed.\n",
    "\n",
    "1. The recoder that raised an exception\n",
    "2. The column name\n",
    "3. And the exception and its value.\n",
    "\n",
    "In this case we need to update our recoder to deal with this new typo of the word \"female\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:51:07.170254Z",
     "start_time": "2018-02-14T20:51:07.098889Z"
    }
   },
   "outputs": [],
   "source": [
    "# update the recoder\n",
    "def standardize_sex2(series):\n",
    "    \"\"\"Return a series where common representations of 'male'/'female' are standardized.\n",
    "    \n",
    "    Things like ['M', 'MALE', 'M', 'BOY', ...] are converted to `M`.\n",
    "    Representations of female are treated similarly. \n",
    "    \"\"\"\n",
    "    mapper = {\"M\": \"M\",\n",
    "              \"MALE\": \"M\",\n",
    "              \"BOY\": \"M\",\n",
    "              \"F\": \"F\",\n",
    "              \"FEMALE\": \"F\",\n",
    "              \"GIRL\": \"F\",\n",
    "              \"FEEMALE\": \"F\",\n",
    "             }\n",
    "    if series.str.islower().any():\n",
    "        raise ValueError(\"standardize_sex expects input series to contain only UPPERCASE letters.\")\n",
    "    else:\n",
    "        return series.apply(lambda x: mapper[x])\n",
    "\n",
    "# add the recoder to a Column representing `col4`\n",
    "col4_new = Column(name='col4',\n",
    "                  dtype=str,\n",
    "                  unique=False,\n",
    "                  validators=[v.funcs.upper, length_is_one, valid_sex],\n",
    "                  recoders=[r.funcs.upper, standardize_sex2])\n",
    "\n",
    "# build enforcer with the corrected Column object\n",
    "demo3 = Enforcer(columns=[col1_new,\n",
    "                          col3,\n",
    "                          col4_new])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-02T16:19:56.618954Z",
     "start_time": "2018-02-02T16:19:56.583069Z"
    }
   },
   "source": [
    "OK one more time with `demo3` this time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:51:10.259969Z",
     "start_time": "2018-02-14T20:51:10.202687Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6</td>\n",
       "      <td>5</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col3 col4\n",
       "0     7    10    M\n",
       "1     2     6    M\n",
       "2     2     2    F\n",
       "3     6     5    F"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4 = load_csv(path=table_path_bad, enforcer=demo3)\n",
    "df4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Success! __Problem 2__ is no also fixed and we are ready to get back to work.\n",
    "\n",
    "\n",
    "...until the next head ache."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The CompoundColumn Class "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Overview of `CompoundColumn` class:__\n",
    "\n",
    "- The `CompoundColumn` class is based on the same class that the `Column` inherits from (`BaseColumn`).\n",
    "- This means it has many methods with the same names and general behavior as the `Column` class that you are used to.\n",
    "    - namely: `validate`, `recode`, and `update_dataframe`\n",
    "        - `update_dataframe` is used by the `Enforcer` class and determines how the Column object adds columns to the final table.\n",
    "- However they behave slightly differently because this class has a slightly different job to do than `Column`.\n",
    "- This class is actually contains and coordinates **multiple** `Column` objects.\n",
    "- These objects allow more complex transformation and validation logic that spans multiple columns in either or both the original table and final table.\n",
    "    - Transformations like **Many-To-One** and **One-To-Many** column conversions for example.\n",
    "- It is structured sort of like a container of `Column`s and will actually contain three plain `Column` objects.\n",
    "- Here we introduce the concept of *input_columns* and *output_columns*.\n",
    "- Its call signature is fairly different than its cousin the `Column`.\n",
    "- It accepts only three arguments:\n",
    "    - `input_columns`: A list of fully defined `Column` objects, each representing a single column in the **original** `Dataframe`. \n",
    "    - `output_columns`: A list of fully defined `Column` objects, each representing a single column in the **final** `DataFrame`. \n",
    "    - `column_transform`: A function accepting the input `DataFrame`, performing transformations to it and returning a new `DataFrame` containing the TRANSFORMED columns only.\n",
    "    \n",
    "__Interaction with `Enforcer` objects:__\n",
    "\n",
    "- These are subclasses of `BaseColumn` and implement the three essential methods compatible with the interface that `Enforcer` objects expect each column to use. \n",
    "    - `validate`\n",
    "    - `recode`\n",
    "    - `update_dataframe`\n",
    "- So adding them to `Enforcer` objects works exactly like the `Column` objects we already know."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:55.141538Z",
     "start_time": "2018-02-14T20:36:55.124259Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Construct a new ``CompoundColumn`` object.\n",
      "\n",
      "        Args:\n",
      "            input_columns (list, Column): A list of ``Column`` objects representing column(s) from the SOURCE table.\n",
      "            output_columns (list, Column): A list of ``Column`` objects representing column(s) from the FINAL table.\n",
      "            column_transform (Callable): Function accepting the table object, performing transformations to it and returning a DataFrame containing the NEW columns only.\n",
      "        \n"
     ]
    }
   ],
   "source": [
    "print(CompoundColumn.__init__.__doc__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## One-to-many Column Transformation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Now we will be focusing on column 5.\n",
    "- Here we want to split it into two columns because it represents complex information encoded into a single column.\n",
    "- It would be better for us to treat it as separate columns.\n",
    "- In these examples we will split `col5` into two new columns so we must keep tract of how to deal with **three** columns:\n",
    "    - the original column plus two derivative columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:55.384938Z",
     "start_time": "2018-02-14T20:36:55.351781Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "      <th>col7</th>\n",
       "      <th>col8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "      <td>male</td>\n",
       "      <td>01:one*</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>m</td>\n",
       "      <td>3:Three</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10:ten</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>Female</td>\n",
       "      <td>6:Six</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3    col4     col5  col6  col7  col8\n",
       "0     7     3    10    male  01:one*     0     1     1\n",
       "1     2     4     6       m  3:Three     1     0     0\n",
       "2     6     6     2       F   10:ten     0     1     0\n",
       "3     5     5     5  Female    6:Six     1     1     1"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Definition and Usage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:55.612668Z",
     "start_time": "2018-02-14T20:36:55.485509Z"
    }
   },
   "outputs": [],
   "source": [
    "\n",
    "## Validators and Recoders\n",
    "bad_chars = re.compile(pattern=\"\"\"[*(]\"\"\")\n",
    "\n",
    "def no_bad_characters(series):\n",
    "    \"\"\"Validator\"\"\"\n",
    "    def test(x):\n",
    "        if bad_chars.search(x) is None:\n",
    "            return True\n",
    "        else:\n",
    "            return False\n",
    "    return series.astype(str).apply(test)\n",
    "\n",
    "\n",
    "def fix_bad_characters(series):\n",
    "    \"\"\"Recoder\"\"\"\n",
    "    def fix(x):\n",
    "        return bad_chars.sub(repl='', string=x)\n",
    "    return series.astype(str).apply(fix)\n",
    "\n",
    "def recode_upper(series):\n",
    "    return series.astype(str).str.upper()\n",
    "\n",
    "def to_int(series):\n",
    "    return series.astype(np.int)\n",
    "\n",
    "def to_str(series):\n",
    "    return series.astype(str)\n",
    "\n",
    "\n",
    "# Transformation function\n",
    "def split_on_colon(df):\n",
    "    cols = Box()\n",
    "    cols.col5_number = df.col5.apply(lambda x: x.split(\":\")[0])\n",
    "    cols.col5_word = df.col5.apply(lambda x: x.split(\":\")[1])\n",
    "    \n",
    "    new_columns = pd.DataFrame(cols)[[\"col5_number\", \"col5_word\"]]\n",
    "    return new_columns\n",
    "\n",
    "\n",
    "\n",
    "## Defining the Input Column\n",
    "col5 = Column(\n",
    "    name='col5',\n",
    "    dtype=str,\n",
    "    unique=False,\n",
    "    validators=[v.funcs.not_null, no_bad_characters],\n",
    "    recoders=[fix_bad_characters])\n",
    "\n",
    "## Defining the Output Columns (col5_a/col5_b)\n",
    "col5_a = Column(\n",
    "    name='col5_number',\n",
    "    dtype=np.int,\n",
    "    unique=False,\n",
    "    validators=[v.funcs.not_null,],\n",
    "    recoders=[to_int],)\n",
    "\n",
    "col5_b = Column(\n",
    "    name='col5_word',\n",
    "    dtype=str,\n",
    "    unique=False,\n",
    "    validators=[v.funcs.not_null, v.funcs.upper, no_bad_characters],\n",
    "    recoders=[to_str, recode_upper],)\n",
    "\n",
    "## Defining the Compound Column\n",
    "col5_split = CompoundColumn(input_columns=[col5],\n",
    "                            output_columns=[col5_a, col5_b],\n",
    "                            column_transform=split_on_colon)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- When we run `col5_split.validate` we get a more complex dataframe than before.\n",
    "- It is the total validations for each column (in this case 3)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:55.784520Z",
     "start_time": "2018-02-14T20:36:55.614270Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>dtype</th>\n",
       "      <th>no_bad_characters</th>\n",
       "      <th>not_null</th>\n",
       "      <th>upper</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>validation_type</th>\n",
       "      <th>column_name</th>\n",
       "      <th>row</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">input</th>\n",
       "      <th rowspan=\"4\" valign=\"top\">col5</th>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"8\" valign=\"top\">output</th>\n",
       "      <th rowspan=\"4\" valign=\"top\">col5_number</th>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">col5_word</th>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 dtype  no_bad_characters  not_null  upper\n",
       "validation_type column_name row                                           \n",
       "input           col5        0     True              False      True   True\n",
       "                            1     True               True      True   True\n",
       "                            2     True               True      True   True\n",
       "                            3     True               True      True   True\n",
       "output          col5_number 0    False               True      True   True\n",
       "                            1    False               True      True   True\n",
       "                            2    False               True      True   True\n",
       "                            3    False               True      True   True\n",
       "                col5_word   0     True              False      True  False\n",
       "                            1     True               True      True  False\n",
       "                            2     True               True      True  False\n",
       "                            3     True               True      True  False"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col5_split.validate(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `col5_split.recode` returns a `pd.DataFrame` representing the `output_columns` in the provided order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:55.866106Z",
     "start_time": "2018-02-14T20:36:55.786364Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col5_number</th>\n",
       "      <th>col5_word</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>ONE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>THREE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10</td>\n",
       "      <td>TEN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6</td>\n",
       "      <td>SIX</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col5_number col5_word\n",
       "0            1       ONE\n",
       "1            3     THREE\n",
       "2           10       TEN\n",
       "3            6       SIX"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col5_split.recode(df, validate=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Many-To-One Column Transformation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Lets look at columns col6, col7, and col8.\n",
    "    - Imagine that these columns result from a situation were a subject's tissue was submitted for zero or more different tests.\n",
    "    - These columns track which tests were done for each subject.\n",
    "- In a pure SQL-like relational context we would likely want to break these data out into a supporting table with a one-to-many relation (1 subject to *N* tests).\n",
    "- But what if the data is destined for a more flexible format like MongoDB, where you are able to store these data in a list-like format.\n",
    "    - In that case we may want to collect these columns into a single column where each item is a `list` representing the names of which if any of the tests were done for each subject.\n",
    "- We can do this using the `CompoundColumn` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:56.015486Z",
     "start_time": "2018-02-14T20:36:55.986607Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "      <th>col7</th>\n",
       "      <th>col8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "      <td>male</td>\n",
       "      <td>01:one*</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>m</td>\n",
       "      <td>3:Three</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10:ten</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>Female</td>\n",
       "      <td>6:Six</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3    col4     col5  col6  col7  col8\n",
       "0     7     3    10    male  01:one*     0     1     1\n",
       "1     2     4     6       m  3:Three     1     0     0\n",
       "2     6     6     2       F   10:ten     0     1     0\n",
       "3     5     5     5  Female    6:Six     1     1     1"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Definition and Usage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:56.445739Z",
     "start_time": "2018-02-14T20:36:56.128775Z"
    }
   },
   "outputs": [],
   "source": [
    "# helper functions\n",
    "def is_subset(x, ref_set):\n",
    "    if not isinstance(ref_set, set):\n",
    "        valid = set(ref_set)\n",
    "        \n",
    "    if isinstance(x, (list, tuple, set)):\n",
    "        set_x = set(x)\n",
    "    else:\n",
    "        set_x = set([x])\n",
    "        \n",
    "    return set_x.issubset(ref_set)\n",
    "\n",
    "\n",
    "# Transformation function\n",
    "def join_as_tuple(df):\n",
    "    cols = Box()\n",
    "    cols.col6_7_8 = df[[\"col6\", \"col7\", \"col8\"]].apply(lambda row: (row.col6, row.col7, row.col8,), axis=1)\n",
    "    \n",
    "    new_columns = pd.DataFrame(cols)\n",
    "    return new_columns\n",
    "    \n",
    "\n",
    "# Validators\n",
    "def col6_valid_values(series):\n",
    "    \"\"\"Validator\"\"\"\n",
    "    valid = [None, \"DNASeq\"]\n",
    "    return series.apply(is_subset, ref_set=valid)\n",
    "\n",
    "def col7_valid_values(series):\n",
    "    \"\"\"Validator\"\"\"\n",
    "    valid = [None, \"Protein Function\"]\n",
    "    return series.apply(is_subset, ref_set=valid)\n",
    "\n",
    "def col8_valid_values(series):\n",
    "    \"\"\"Validator\"\"\"\n",
    "    valid = [None, \"RNASeq\"]\n",
    "    return series.apply(is_subset, ref_set=valid)\n",
    "\n",
    "\n",
    "def col6_7_8_valid_values(series):\n",
    "    \"\"\"Validator\"\"\"\n",
    "    valid = set([\"DNASeq\", \"Protein Function\", \"RNASeq\"])\n",
    "    return series.apply(is_subset, ref_set=valid)\n",
    "\n",
    "# Recoders\n",
    "def translate_col6(series):\n",
    "    \"\"\"Recode 0-> None; 1-> 'DNASeq' \"\"\"\n",
    "    def rcode(x):\n",
    "        mapping = {0: None, 1: \"DNASeq\"}\n",
    "        return mapping[x]\n",
    "    return series.apply(rcode)\n",
    "    \n",
    "def translate_col7(series):\n",
    "    \"\"\"Recode 0-> None; 1-> 'Protein Function' \"\"\"\n",
    "    def rcode(x):\n",
    "        mapping = {0: None, 1: \"Protein Function\"}\n",
    "        return mapping[x]\n",
    "    return series.apply(rcode)\n",
    "    \n",
    "def translate_col8(series):\n",
    "    \"\"\"Recode 0-> None; 1-> 'RNASeq' \"\"\"\n",
    "    def rcode(x):\n",
    "        mapping = {0: None, 1: \"RNASeq\"}\n",
    "        return mapping[x]\n",
    "    return series.apply(rcode)\n",
    "\n",
    "def setify_drop_nones(series):\n",
    "    \"\"\"Convert to sets and drop ``None`` values.\"\"\"\n",
    "    def drop_nones(x):\n",
    "        x.discard(None)\n",
    "        return x\n",
    "    return series.apply(lambda x: set(x)).apply(drop_nones).apply(list)\n",
    "\n",
    "# Defining the Input Columns\n",
    "col6 = Column(\n",
    "    name='col6',\n",
    "    dtype=(str, type(None)),\n",
    "    unique=False,\n",
    "    validators=[col6_valid_values],\n",
    "    recoders=[translate_col6],)\n",
    "\n",
    "col7 = Column(\n",
    "    name='col7',\n",
    "    dtype=(str, type(None)),\n",
    "    unique=False,\n",
    "    validators=[col7_valid_values],\n",
    "    recoders=[translate_col7],)\n",
    "\n",
    "col8 = Column(\n",
    "    name='col8',\n",
    "    dtype=(str, type(None)),\n",
    "    unique=False,\n",
    "    validators=[col8_valid_values],\n",
    "    recoders=[translate_col8],)\n",
    "\n",
    "\n",
    "# Defining the Output Column\n",
    "col6_7_8 = Column(\n",
    "    name='col6_7_8',\n",
    "    dtype=list,\n",
    "    unique=False,\n",
    "    validators=[v.funcs.not_null, col6_7_8_valid_values],\n",
    "    recoders=[setify_drop_nones],)\n",
    "\n",
    "# Defining the Compound Column\n",
    "col6_7_8_join = CompoundColumn(input_columns=[col6, col7, col8],\n",
    "                       output_columns=[col6_7_8],\n",
    "                       column_transform=join_as_tuple)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:56.595678Z",
     "start_time": "2018-02-14T20:36:56.447377Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "      <th>col7</th>\n",
       "      <th>col8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "      <td>male</td>\n",
       "      <td>01:one*</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>m</td>\n",
       "      <td>3:Three</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10:ten</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>Female</td>\n",
       "      <td>6:Six</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3    col4     col5  col6  col7  col8\n",
       "0     7     3    10    male  01:one*     0     1     1\n",
       "1     2     4     6       m  3:Three     1     0     0\n",
       "2     6     6     2       F   10:ten     0     1     0\n",
       "3     5     5     5  Female    6:Six     1     1     1"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- When we run `col6_7_8_join.validate` we, again,  get a more complex dataframe than with a `Column` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:56.748596Z",
     "start_time": "2018-02-14T20:36:56.597387Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>col6_7_8_valid_values</th>\n",
       "      <th>col6_valid_values</th>\n",
       "      <th>col7_valid_values</th>\n",
       "      <th>col8_valid_values</th>\n",
       "      <th>dtype</th>\n",
       "      <th>not_null</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>validation_type</th>\n",
       "      <th>column_name</th>\n",
       "      <th>row</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"12\" valign=\"top\">input</th>\n",
       "      <th rowspan=\"4\" valign=\"top\">col6</th>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">col7</th>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">col8</th>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">output</th>\n",
       "      <th rowspan=\"4\" valign=\"top\">col6_7_8</th>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 col6_7_8_valid_values  col6_valid_values  \\\n",
       "validation_type column_name row                                             \n",
       "input           col6        0                     True              False   \n",
       "                            1                     True              False   \n",
       "                            2                     True              False   \n",
       "                            3                     True              False   \n",
       "                col7        0                     True               True   \n",
       "                            1                     True               True   \n",
       "                            2                     True               True   \n",
       "                            3                     True               True   \n",
       "                col8        0                     True               True   \n",
       "                            1                     True               True   \n",
       "                            2                     True               True   \n",
       "                            3                     True               True   \n",
       "output          col6_7_8    0                    False               True   \n",
       "                            1                    False               True   \n",
       "                            2                    False               True   \n",
       "                            3                    False               True   \n",
       "\n",
       "                                 col7_valid_values  col8_valid_values  dtype  \\\n",
       "validation_type column_name row                                                \n",
       "input           col6        0                 True               True  False   \n",
       "                            1                 True               True  False   \n",
       "                            2                 True               True  False   \n",
       "                            3                 True               True  False   \n",
       "                col7        0                False               True  False   \n",
       "                            1                False               True  False   \n",
       "                            2                False               True  False   \n",
       "                            3                False               True  False   \n",
       "                col8        0                 True              False  False   \n",
       "                            1                 True              False  False   \n",
       "                            2                 True              False  False   \n",
       "                            3                 True              False  False   \n",
       "output          col6_7_8    0                 True               True  False   \n",
       "                            1                 True               True  False   \n",
       "                            2                 True               True  False   \n",
       "                            3                 True               True  False   \n",
       "\n",
       "                                 not_null  \n",
       "validation_type column_name row            \n",
       "input           col6        0        True  \n",
       "                            1        True  \n",
       "                            2        True  \n",
       "                            3        True  \n",
       "                col7        0        True  \n",
       "                            1        True  \n",
       "                            2        True  \n",
       "                            3        True  \n",
       "                col8        0        True  \n",
       "                            1        True  \n",
       "                            2        True  \n",
       "                            3        True  \n",
       "output          col6_7_8    0        True  \n",
       "                            1        True  \n",
       "                            2        True  \n",
       "                            3        True  "
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col6_7_8_join.validate(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:56.848036Z",
     "start_time": "2018-02-14T20:36:56.751350Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col6_7_8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[RNASeq, Protein Function]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[DNASeq]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[Protein Function]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[DNASeq, RNASeq, Protein Function]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             col6_7_8\n",
       "0          [RNASeq, Protein Function]\n",
       "1                            [DNASeq]\n",
       "2                  [Protein Function]\n",
       "3  [DNASeq, RNASeq, Protein Function]"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col6_7_8_join.recode(df, validate=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building an Enforcer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Now lets see how these fit together into an `Enforcer` object.\n",
    "- We are using the same table as before but now we add our new `CompoundColumns`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:56.983776Z",
     "start_time": "2018-02-14T20:36:56.849652Z"
    }
   },
   "outputs": [],
   "source": [
    "now_w_moar_complexity = Enforcer(columns=[col1,\n",
    "                                          col3,\n",
    "                                          col4,\n",
    "                                          col5_split,\n",
    "                                          col6_7_8_join])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- We can do all the same stuff with this `Enforcer`.\n",
    "- This is because each subclass of `BaseColumn` includes a special set of methods that allow it to interact with `Enforcer` objects in their specific ways."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:57.165249Z",
     "start_time": "2018-02-14T20:36:56.985662Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# this should fail\n",
    "now_w_moar_complexity.validate(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:57.281155Z",
     "start_time": "2018-02-14T20:36:57.167100Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5_number</th>\n",
       "      <th>col5_word</th>\n",
       "      <th>col6_7_8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "      <td>M</td>\n",
       "      <td>1</td>\n",
       "      <td>ONE</td>\n",
       "      <td>[RNASeq, Protein Function]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>M</td>\n",
       "      <td>3</td>\n",
       "      <td>THREE</td>\n",
       "      <td>[DNASeq]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10</td>\n",
       "      <td>TEN</td>\n",
       "      <td>[Protein Function]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>F</td>\n",
       "      <td>6</td>\n",
       "      <td>SIX</td>\n",
       "      <td>[DNASeq, RNASeq, Protein Function]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col3 col4  col5_number col5_word                            col6_7_8\n",
       "0     7    10    M            1       ONE          [RNASeq, Protein Function]\n",
       "1     2     6    M            3     THREE                            [DNASeq]\n",
       "2     6     2    F           10       TEN                  [Protein Function]\n",
       "3     5     5    F            6       SIX  [DNASeq, RNASeq, Protein Function]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# this should not\n",
    "now_w_moar_complexity.recode(df, validate=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- The original table again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:57.329325Z",
     "start_time": "2018-02-14T20:36:57.282806Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "      <th>col6</th>\n",
       "      <th>col7</th>\n",
       "      <th>col8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "      <td>male</td>\n",
       "      <td>01:one*</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>m</td>\n",
       "      <td>3:Three</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10:ten</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>Female</td>\n",
       "      <td>6:Six</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3    col4     col5  col6  col7  col8\n",
       "0     7     3    10    male  01:one*     0     1     1\n",
       "1     2     4     6       m  3:Three     1     0     0\n",
       "2     6     6     2       F   10:ten     0     1     0\n",
       "3     5     5     5  Female    6:Six     1     1     1"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Testing Facilitation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:57.435377Z",
     "start_time": "2018-02-14T20:36:57.331086Z"
    }
   },
   "outputs": [],
   "source": [
    "def sort_columns(df):\n",
    "    return df.T.sort_index().T.sort_index()\n",
    "\n",
    "def check_and_print(df, column):\n",
    "    if sort_columns(column.validate(df).reset_index()).equals(sort_columns(pd.read_json(validate_all_json))):\n",
    "        print(f\"validate_all_json = '''{validate_all_json}'''\")\n",
    "    else:\n",
    "        raise ValueError()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## OTM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:57.561967Z",
     "start_time": "2018-02-14T20:36:57.437018Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "validate_input_json = '''{\"validation_type\":{\"0\":\"input\",\"1\":\"input\",\"2\":\"input\",\"3\":\"input\"},\"column_name\":{\"0\":\"col5\",\"1\":\"col5\",\"2\":\"col5\",\"3\":\"col5\"},\"row\":{\"0\":0,\"1\":1,\"2\":2,\"3\":3},\"no_bad_characters\":{\"0\":false,\"1\":true,\"2\":true,\"3\":true},\"not_null\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true},\"dtype\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true}}'''\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column_name</th>\n",
       "      <th>dtype</th>\n",
       "      <th>no_bad_characters</th>\n",
       "      <th>not_null</th>\n",
       "      <th>row</th>\n",
       "      <th>validation_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>col5</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>col5</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>col5</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>col5</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  column_name  dtype  no_bad_characters  not_null  row validation_type\n",
       "0        col5   True              False      True    0           input\n",
       "1        col5   True               True      True    1           input\n",
       "2        col5   True               True      True    2           input\n",
       "3        col5   True               True      True    3           input"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "validate_input = col5_split._validate_input(df).reset_index()\n",
    "validate_input\n",
    "validate_input_json = validate_input.to_json()\n",
    "\n",
    "print(f\"validate_input_json = '''{validate_input_json}'''\")\n",
    "pd.read_json(validate_input_json)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:57.674692Z",
     "start_time": "2018-02-14T20:36:57.564495Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "validate_output_json = '''{\"validation_type\":{\"0\":\"output\",\"1\":\"output\",\"2\":\"output\",\"3\":\"output\",\"4\":\"output\",\"5\":\"output\",\"6\":\"output\",\"7\":\"output\"},\"column_name\":{\"0\":\"col5_number\",\"1\":\"col5_number\",\"2\":\"col5_number\",\"3\":\"col5_number\",\"4\":\"col5_word\",\"5\":\"col5_word\",\"6\":\"col5_word\",\"7\":\"col5_word\"},\"row\":{\"0\":0,\"1\":1,\"2\":2,\"3\":3,\"4\":0,\"5\":1,\"6\":2,\"7\":3},\"dtype\":{\"0\":false,\"1\":false,\"2\":false,\"3\":false,\"4\":true,\"5\":true,\"6\":true,\"7\":true},\"no_bad_characters\":{\"0\":null,\"1\":null,\"2\":null,\"3\":null,\"4\":false,\"5\":true,\"6\":true,\"7\":true},\"not_null\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true,\"4\":true,\"5\":true,\"6\":true,\"7\":true},\"upper\":{\"0\":null,\"1\":null,\"2\":null,\"3\":null,\"4\":false,\"5\":false,\"6\":false,\"7\":false}}'''\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column_name</th>\n",
       "      <th>dtype</th>\n",
       "      <th>no_bad_characters</th>\n",
       "      <th>not_null</th>\n",
       "      <th>row</th>\n",
       "      <th>upper</th>\n",
       "      <th>validation_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>col5_number</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>col5_number</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>col5_number</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>col5_number</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>col5_word</td>\n",
       "      <td>True</td>\n",
       "      <td>0.0</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>col5_word</td>\n",
       "      <td>True</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>col5_word</td>\n",
       "      <td>True</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>0.0</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>col5_word</td>\n",
       "      <td>True</td>\n",
       "      <td>1.0</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>0.0</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column_name  dtype  no_bad_characters  not_null  row  upper validation_type\n",
       "0  col5_number  False                NaN      True    0    NaN          output\n",
       "1  col5_number  False                NaN      True    1    NaN          output\n",
       "2  col5_number  False                NaN      True    2    NaN          output\n",
       "3  col5_number  False                NaN      True    3    NaN          output\n",
       "4    col5_word   True                0.0      True    0    0.0          output\n",
       "5    col5_word   True                1.0      True    1    0.0          output\n",
       "6    col5_word   True                1.0      True    2    0.0          output\n",
       "7    col5_word   True                1.0      True    3    0.0          output"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "validate_output = col5_split._validate_output(df).reset_index()\n",
    "validate_output\n",
    "validate_output_json = validate_output.to_json()\n",
    "\n",
    "print(f\"validate_output_json = '''{validate_output_json}'''\")    \n",
    "pd.read_json(validate_output_json)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:57.788188Z",
     "start_time": "2018-02-14T20:36:57.679358Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "validate_all_json = '''{\"validation_type\":{\"0\":\"input\",\"1\":\"input\",\"2\":\"input\",\"3\":\"input\",\"4\":\"output\",\"5\":\"output\",\"6\":\"output\",\"7\":\"output\",\"8\":\"output\",\"9\":\"output\",\"10\":\"output\",\"11\":\"output\"},\"column_name\":{\"0\":\"col5\",\"1\":\"col5\",\"2\":\"col5\",\"3\":\"col5\",\"4\":\"col5_number\",\"5\":\"col5_number\",\"6\":\"col5_number\",\"7\":\"col5_number\",\"8\":\"col5_word\",\"9\":\"col5_word\",\"10\":\"col5_word\",\"11\":\"col5_word\"},\"row\":{\"0\":0,\"1\":1,\"2\":2,\"3\":3,\"4\":0,\"5\":1,\"6\":2,\"7\":3,\"8\":0,\"9\":1,\"10\":2,\"11\":3},\"dtype\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true,\"4\":false,\"5\":false,\"6\":false,\"7\":false,\"8\":true,\"9\":true,\"10\":true,\"11\":true},\"no_bad_characters\":{\"0\":false,\"1\":true,\"2\":true,\"3\":true,\"4\":true,\"5\":true,\"6\":true,\"7\":true,\"8\":false,\"9\":true,\"10\":true,\"11\":true},\"not_null\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true,\"4\":true,\"5\":true,\"6\":true,\"7\":true,\"8\":true,\"9\":true,\"10\":true,\"11\":true},\"upper\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true,\"4\":true,\"5\":true,\"6\":true,\"7\":true,\"8\":false,\"9\":false,\"10\":false,\"11\":false}}'''\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column_name</th>\n",
       "      <th>dtype</th>\n",
       "      <th>no_bad_characters</th>\n",
       "      <th>not_null</th>\n",
       "      <th>row</th>\n",
       "      <th>upper</th>\n",
       "      <th>validation_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>col5</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>True</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>col5</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>col5_word</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>False</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>col5_word</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>False</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>col5</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>True</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>col5</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>True</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>col5_number</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>True</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>col5_number</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>col5_number</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>True</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>col5_number</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>True</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>col5_word</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>col5_word</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    column_name  dtype  no_bad_characters  not_null  row  upper  \\\n",
       "0          col5   True              False      True    0   True   \n",
       "1          col5   True               True      True    1   True   \n",
       "10    col5_word   True               True      True    2  False   \n",
       "11    col5_word   True               True      True    3  False   \n",
       "2          col5   True               True      True    2   True   \n",
       "3          col5   True               True      True    3   True   \n",
       "4   col5_number  False               True      True    0   True   \n",
       "5   col5_number  False               True      True    1   True   \n",
       "6   col5_number  False               True      True    2   True   \n",
       "7   col5_number  False               True      True    3   True   \n",
       "8     col5_word   True              False      True    0  False   \n",
       "9     col5_word   True               True      True    1  False   \n",
       "\n",
       "   validation_type  \n",
       "0            input  \n",
       "1            input  \n",
       "10          output  \n",
       "11          output  \n",
       "2            input  \n",
       "3            input  \n",
       "4           output  \n",
       "5           output  \n",
       "6           output  \n",
       "7           output  \n",
       "8           output  \n",
       "9           output  "
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "validate_all = col5_split.validate(df).reset_index()\n",
    "validate_all\n",
    "validate_all_json = validate_all.to_json()\n",
    "\n",
    "print(f\"validate_all_json = '''{validate_all_json}'''\")    \n",
    "pd.read_json(validate_all_json)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MTO"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:57.921233Z",
     "start_time": "2018-02-14T20:36:57.789925Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "validate_input_json = '''{\"validation_type\":{\"0\":\"input\",\"1\":\"input\",\"2\":\"input\",\"3\":\"input\",\"4\":\"input\",\"5\":\"input\",\"6\":\"input\",\"7\":\"input\",\"8\":\"input\",\"9\":\"input\",\"10\":\"input\",\"11\":\"input\"},\"column_name\":{\"0\":\"col6\",\"1\":\"col6\",\"2\":\"col6\",\"3\":\"col6\",\"4\":\"col7\",\"5\":\"col7\",\"6\":\"col7\",\"7\":\"col7\",\"8\":\"col8\",\"9\":\"col8\",\"10\":\"col8\",\"11\":\"col8\"},\"row\":{\"0\":0,\"1\":1,\"2\":2,\"3\":3,\"4\":0,\"5\":1,\"6\":2,\"7\":3,\"8\":0,\"9\":1,\"10\":2,\"11\":3},\"col6_valid_values\":{\"0\":false,\"1\":false,\"2\":false,\"3\":false,\"4\":null,\"5\":null,\"6\":null,\"7\":null,\"8\":null,\"9\":null,\"10\":null,\"11\":null},\"col7_valid_values\":{\"0\":null,\"1\":null,\"2\":null,\"3\":null,\"4\":false,\"5\":false,\"6\":false,\"7\":false,\"8\":null,\"9\":null,\"10\":null,\"11\":null},\"col8_valid_values\":{\"0\":null,\"1\":null,\"2\":null,\"3\":null,\"4\":null,\"5\":null,\"6\":null,\"7\":null,\"8\":false,\"9\":false,\"10\":false,\"11\":false},\"dtype\":{\"0\":false,\"1\":false,\"2\":false,\"3\":false,\"4\":false,\"5\":false,\"6\":false,\"7\":false,\"8\":false,\"9\":false,\"10\":false,\"11\":false}}'''\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col6_valid_values</th>\n",
       "      <th>col7_valid_values</th>\n",
       "      <th>col8_valid_values</th>\n",
       "      <th>column_name</th>\n",
       "      <th>dtype</th>\n",
       "      <th>row</th>\n",
       "      <th>validation_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>col6</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>col6</td>\n",
       "      <td>False</td>\n",
       "      <td>1</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>col8</td>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>col8</td>\n",
       "      <td>False</td>\n",
       "      <td>3</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>col6</td>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>col6</td>\n",
       "      <td>False</td>\n",
       "      <td>3</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>col7</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>col7</td>\n",
       "      <td>False</td>\n",
       "      <td>1</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>col7</td>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>col7</td>\n",
       "      <td>False</td>\n",
       "      <td>3</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>col8</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>col8</td>\n",
       "      <td>False</td>\n",
       "      <td>1</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    col6_valid_values  col7_valid_values  col8_valid_values column_name  \\\n",
       "0                 0.0                NaN                NaN        col6   \n",
       "1                 0.0                NaN                NaN        col6   \n",
       "10                NaN                NaN                0.0        col8   \n",
       "11                NaN                NaN                0.0        col8   \n",
       "2                 0.0                NaN                NaN        col6   \n",
       "3                 0.0                NaN                NaN        col6   \n",
       "4                 NaN                0.0                NaN        col7   \n",
       "5                 NaN                0.0                NaN        col7   \n",
       "6                 NaN                0.0                NaN        col7   \n",
       "7                 NaN                0.0                NaN        col7   \n",
       "8                 NaN                NaN                0.0        col8   \n",
       "9                 NaN                NaN                0.0        col8   \n",
       "\n",
       "    dtype  row validation_type  \n",
       "0   False    0           input  \n",
       "1   False    1           input  \n",
       "10  False    2           input  \n",
       "11  False    3           input  \n",
       "2   False    2           input  \n",
       "3   False    3           input  \n",
       "4   False    0           input  \n",
       "5   False    1           input  \n",
       "6   False    2           input  \n",
       "7   False    3           input  \n",
       "8   False    0           input  \n",
       "9   False    1           input  "
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "validate_input = col6_7_8_join._validate_input(df).reset_index()\n",
    "validate_input\n",
    "validate_input_json = validate_input.to_json()\n",
    "\n",
    "print(f\"validate_input_json = '''{validate_input_json}'''\")\n",
    "pd.read_json(validate_input_json)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:58.013834Z",
     "start_time": "2018-02-14T20:36:57.925570Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "validate_output_json = '''{\"validation_type\":{\"0\":\"output\",\"1\":\"output\",\"2\":\"output\",\"3\":\"output\"},\"column_name\":{\"0\":\"col6_7_8\",\"1\":\"col6_7_8\",\"2\":\"col6_7_8\",\"3\":\"col6_7_8\"},\"row\":{\"0\":0,\"1\":1,\"2\":2,\"3\":3},\"col6_7_8_valid_values\":{\"0\":false,\"1\":false,\"2\":false,\"3\":false},\"not_null\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true},\"dtype\":{\"0\":false,\"1\":false,\"2\":false,\"3\":false}}'''\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col6_7_8_valid_values</th>\n",
       "      <th>column_name</th>\n",
       "      <th>dtype</th>\n",
       "      <th>not_null</th>\n",
       "      <th>row</th>\n",
       "      <th>validation_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>col6_7_8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>col6_7_8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>col6_7_8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>col6_7_8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col6_7_8_valid_values column_name  dtype  not_null  row validation_type\n",
       "0                  False    col6_7_8  False      True    0          output\n",
       "1                  False    col6_7_8  False      True    1          output\n",
       "2                  False    col6_7_8  False      True    2          output\n",
       "3                  False    col6_7_8  False      True    3          output"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "validate_output = col6_7_8_join._validate_output(df).reset_index()\n",
    "validate_output\n",
    "validate_output_json = validate_output.to_json()\n",
    "\n",
    "print(f\"validate_output_json = '''{validate_output_json}'''\")\n",
    "pd.read_json(validate_output_json)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:58.154965Z",
     "start_time": "2018-02-14T20:36:58.015578Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "validate_all_json = '''{\"validation_type\":{\"0\":\"input\",\"1\":\"input\",\"2\":\"input\",\"3\":\"input\",\"4\":\"input\",\"5\":\"input\",\"6\":\"input\",\"7\":\"input\",\"8\":\"input\",\"9\":\"input\",\"10\":\"input\",\"11\":\"input\",\"12\":\"output\",\"13\":\"output\",\"14\":\"output\",\"15\":\"output\"},\"column_name\":{\"0\":\"col6\",\"1\":\"col6\",\"2\":\"col6\",\"3\":\"col6\",\"4\":\"col7\",\"5\":\"col7\",\"6\":\"col7\",\"7\":\"col7\",\"8\":\"col8\",\"9\":\"col8\",\"10\":\"col8\",\"11\":\"col8\",\"12\":\"col6_7_8\",\"13\":\"col6_7_8\",\"14\":\"col6_7_8\",\"15\":\"col6_7_8\"},\"row\":{\"0\":0,\"1\":1,\"2\":2,\"3\":3,\"4\":0,\"5\":1,\"6\":2,\"7\":3,\"8\":0,\"9\":1,\"10\":2,\"11\":3,\"12\":0,\"13\":1,\"14\":2,\"15\":3},\"col6_7_8_valid_values\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true,\"4\":true,\"5\":true,\"6\":true,\"7\":true,\"8\":true,\"9\":true,\"10\":true,\"11\":true,\"12\":false,\"13\":false,\"14\":false,\"15\":false},\"col6_valid_values\":{\"0\":false,\"1\":false,\"2\":false,\"3\":false,\"4\":true,\"5\":true,\"6\":true,\"7\":true,\"8\":true,\"9\":true,\"10\":true,\"11\":true,\"12\":true,\"13\":true,\"14\":true,\"15\":true},\"col7_valid_values\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true,\"4\":false,\"5\":false,\"6\":false,\"7\":false,\"8\":true,\"9\":true,\"10\":true,\"11\":true,\"12\":true,\"13\":true,\"14\":true,\"15\":true},\"col8_valid_values\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true,\"4\":true,\"5\":true,\"6\":true,\"7\":true,\"8\":false,\"9\":false,\"10\":false,\"11\":false,\"12\":true,\"13\":true,\"14\":true,\"15\":true},\"dtype\":{\"0\":false,\"1\":false,\"2\":false,\"3\":false,\"4\":false,\"5\":false,\"6\":false,\"7\":false,\"8\":false,\"9\":false,\"10\":false,\"11\":false,\"12\":false,\"13\":false,\"14\":false,\"15\":false},\"not_null\":{\"0\":true,\"1\":true,\"2\":true,\"3\":true,\"4\":true,\"5\":true,\"6\":true,\"7\":true,\"8\":true,\"9\":true,\"10\":true,\"11\":true,\"12\":true,\"13\":true,\"14\":true,\"15\":true}}'''\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col6_7_8_valid_values</th>\n",
       "      <th>col6_valid_values</th>\n",
       "      <th>col7_valid_values</th>\n",
       "      <th>col8_valid_values</th>\n",
       "      <th>column_name</th>\n",
       "      <th>dtype</th>\n",
       "      <th>not_null</th>\n",
       "      <th>row</th>\n",
       "      <th>validation_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>col6</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>col6</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>col8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>col8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>col6_7_8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>col6_7_8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>col6_7_8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>col6_7_8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>output</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>col6</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>col6</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>col7</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>col7</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>col7</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>col7</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>col8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>col8</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>1</td>\n",
       "      <td>input</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    col6_7_8_valid_values  col6_valid_values  col7_valid_values  \\\n",
       "0                    True              False               True   \n",
       "1                    True              False               True   \n",
       "10                   True               True               True   \n",
       "11                   True               True               True   \n",
       "12                  False               True               True   \n",
       "13                  False               True               True   \n",
       "14                  False               True               True   \n",
       "15                  False               True               True   \n",
       "2                    True              False               True   \n",
       "3                    True              False               True   \n",
       "4                    True               True              False   \n",
       "5                    True               True              False   \n",
       "6                    True               True              False   \n",
       "7                    True               True              False   \n",
       "8                    True               True               True   \n",
       "9                    True               True               True   \n",
       "\n",
       "    col8_valid_values column_name  dtype  not_null  row validation_type  \n",
       "0                True        col6  False      True    0           input  \n",
       "1                True        col6  False      True    1           input  \n",
       "10              False        col8  False      True    2           input  \n",
       "11              False        col8  False      True    3           input  \n",
       "12               True    col6_7_8  False      True    0          output  \n",
       "13               True    col6_7_8  False      True    1          output  \n",
       "14               True    col6_7_8  False      True    2          output  \n",
       "15               True    col6_7_8  False      True    3          output  \n",
       "2                True        col6  False      True    2           input  \n",
       "3                True        col6  False      True    3           input  \n",
       "4                True        col7  False      True    0           input  \n",
       "5                True        col7  False      True    1           input  \n",
       "6                True        col7  False      True    2           input  \n",
       "7                True        col7  False      True    3           input  \n",
       "8               False        col8  False      True    0           input  \n",
       "9               False        col8  False      True    1           input  "
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "validate_all = col6_7_8_join.validate(df).reset_index()\n",
    "validate_all\n",
    "validate_all_json = validate_all.to_json()\n",
    "\n",
    "print(f\"validate_all_json = '''{validate_all_json}'''\")\n",
    "pd.read_json(validate_all_json)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Full Enforcer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-02-14T20:36:58.283355Z",
     "start_time": "2018-02-14T20:36:58.157359Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "enforcer_recode_json = '''{\"col1\":{\"0\":7,\"1\":2,\"2\":6,\"3\":5},\"col3\":{\"0\":10,\"1\":6,\"2\":2,\"3\":5},\"col4\":{\"0\":\"M\",\"1\":\"M\",\"2\":\"F\",\"3\":\"F\"},\"col5_number\":{\"0\":1,\"1\":3,\"2\":10,\"3\":6},\"col5_word\":{\"0\":\"ONE\",\"1\":\"THREE\",\"2\":\"TEN\",\"3\":\"SIX\"},\"col6_7_8\":{\"0\":[\"RNASeq\",\"Protein Function\"],\"1\":[\"DNASeq\"],\"2\":[\"Protein Function\"],\"3\":[\"DNASeq\",\"RNASeq\",\"Protein Function\"]}}'''\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5_number</th>\n",
       "      <th>col5_word</th>\n",
       "      <th>col6_7_8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "      <td>M</td>\n",
       "      <td>1</td>\n",
       "      <td>ONE</td>\n",
       "      <td>[RNASeq, Protein Function]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>M</td>\n",
       "      <td>3</td>\n",
       "      <td>THREE</td>\n",
       "      <td>[DNASeq]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>F</td>\n",
       "      <td>10</td>\n",
       "      <td>TEN</td>\n",
       "      <td>[Protein Function]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>F</td>\n",
       "      <td>6</td>\n",
       "      <td>SIX</td>\n",
       "      <td>[DNASeq, RNASeq, Protein Function]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col3 col4  col5_number col5_word                            col6_7_8\n",
       "0     7    10    M            1       ONE          [RNASeq, Protein Function]\n",
       "1     2     6    M            3     THREE                            [DNASeq]\n",
       "2     6     2    F           10       TEN                  [Protein Function]\n",
       "3     5     5    F            6       SIX  [DNASeq, RNASeq, Protein Function]"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "enforcer_recode = now_w_moar_complexity.recode(df, validate=True)\n",
    "\n",
    "enforcer_recode_json = enforcer_recode.to_json()\n",
    "print(f\"enforcer_recode_json = '''{enforcer_recode_json}'''\")\n",
    "pd.read_json(enforcer_recode_json)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "table_enforcer",
   "language": "python",
   "name": "table_enforcer"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "991px",
    "left": "0px",
    "right": "2145px",
    "top": "207.011px",
    "width": "361px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}