{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "703b5e61",
   "metadata": {},
   "source": [
    "![brainome logo](./images/brainome_logo.png)\n",
    "# 101 Quick Start\n",
    "Running brainome in five easy steps:\n",
    "1. Install brainome via pip\n",
    "2. Download data sets\n",
    "3. Create your first predictor\n",
    "4. Validate the model\n",
    "5. Making predictions on new data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27f5d236",
   "metadata": {},
   "source": [
    "## 1. Install brainome via pip\n",
    "Pip will automatically include dependencies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01365cb1",
   "metadata": {
    "tags": [
     "output_scroll"
    ]
   },
   "outputs": [],
   "source": [
    "!python3 -m pip install brainome\n",
    "print(\"\\n\\nChecking brainome version number:\")\n",
    "!brainome --version"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88df264a",
   "metadata": {},
   "source": [
    "### Troubleshooting installation\n",
    "Sometimes pip requires `--user` parameter in order to install successfully:\n",
    "\n",
    "> `python3 -m pip install brainome --user`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ae71992",
   "metadata": {},
   "source": [
    "## 2. Download this tutorial's data sets.\n",
    "The titanic data set is a commonly used for introduction to data science. It is a passenger manifest of the Titanic including whether they survived the disaster or not. For more information, refer to [kaggle.com/c/titanic](https://www.kaggle.com/c/titanic)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "446e86f8",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import urllib.request as request\n",
    "response1 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_train.csv', 'titanic_train.csv')\n",
    "response2 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_validate.csv', 'titanic_validate.csv')\n",
    "response3 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_predict.csv', 'titanic_predict.csv')\n",
    "%ls -lh titanic_train.csv titanic_validate.csv titanic_predict.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e8a3b84",
   "metadata": {},
   "source": [
    "### Preview training data\n",
    "The goal of the training is to predict which passenger survived the diaster.\n",
    "\n",
    "The passenger roster contains 11 features (PassengerId, Cabin_Class, Name, etc) for 800 passengers that can be used to create a model. Hence, the target column is 'Survived'.\n",
    "\n",
    "You can download the training data at [titanic_train.csv](https://download.brainome.ai/data/public/titanic_train.csv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ea0d21b6",
   "metadata": {
    "tags": [
     "output_scroll"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001B[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621\u001B[0m\r\n",
      "Requirement already satisfied: pandas in /usr/local/lib/python3.9/site-packages (1.3.1)\r\n",
      "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.9/site-packages (from pandas) (2021.1)\r\n",
      "Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.9/site-packages (from pandas) (2.8.2)\r\n",
      "Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.9/site-packages (from pandas) (1.20.0)\r\n",
      "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)\r\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    },
    {
     "data": {
      "text/plain": "     PassengerId  Cabin_Class  \\\n0              1            3   \n1              2            1   \n2              3            3   \n3              4            1   \n4              5            3   \n..           ...          ...   \n795          796            2   \n796          797            1   \n797          798            3   \n798          799            3   \n799          800            3   \n\n                                                  Name     Sex   Age  \\\n0                              Braund, Mr. Owen Harris    male  22.0   \n1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0   \n2                               Heikkinen, Miss. Laina  female  26.0   \n3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n4                             Allen, Mr. William Henry    male  35.0   \n..                                                 ...     ...   ...   \n795                                 Otter, Mr. Richard    male  39.0   \n796                        Leader, Dr. Alice (Farnham)  female  49.0   \n797                                   Osman, Mrs. Mara  female  31.0   \n798                       Ibrahim Shawah, Mr. Yousseff    male  30.0   \n799  Van Impe, Mrs. Jean Baptiste (Rosalie Paula Go...  female  30.0   \n\n     Sibling_Spouse  Parent_Children     Ticket_Number     Fare Cabin_Number  \\\n0                 1                0         A/5 21171   7.2500          NaN   \n1                 1                0          PC 17599  71.2833          C85   \n2                 0                0  STON/O2. 3101282   7.9250          NaN   \n3                 1                0            113803  53.1000         C123   \n4                 0                0            373450   8.0500          NaN   \n..              ...              ...               ...      ...          ...   \n795               0                0             28213  13.0000          NaN   \n796               0                0             17465  25.9292          D17   \n797               0                0            349244   8.6833          NaN   \n798               0                0              2685   7.2292          NaN   \n799               1                1            345773  24.1500          NaN   \n\n    Port_of_Embarkation  Survived  \n0                     S      died  \n1                     C  survived  \n2                     S  survived  \n3                     S  survived  \n4                     S      died  \n..                  ...       ...  \n795                   S      died  \n796                   S  survived  \n797                   S  survived  \n798                   C      died  \n799                   S      died  \n\n[800 rows x 12 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>PassengerId</th>\n      <th>Cabin_Class</th>\n      <th>Name</th>\n      <th>Sex</th>\n      <th>Age</th>\n      <th>Sibling_Spouse</th>\n      <th>Parent_Children</th>\n      <th>Ticket_Number</th>\n      <th>Fare</th>\n      <th>Cabin_Number</th>\n      <th>Port_of_Embarkation</th>\n      <th>Survived</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>3</td>\n      <td>Braund, Mr. Owen Harris</td>\n      <td>male</td>\n      <td>22.0</td>\n      <td>1</td>\n      <td>0</td>\n      <td>A/5 21171</td>\n      <td>7.2500</td>\n      <td>NaN</td>\n      <td>S</td>\n      <td>died</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>1</td>\n      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n      <td>female</td>\n      <td>38.0</td>\n      <td>1</td>\n      <td>0</td>\n      <td>PC 17599</td>\n      <td>71.2833</td>\n      <td>C85</td>\n      <td>C</td>\n      <td>survived</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3</td>\n      <td>3</td>\n      <td>Heikkinen, Miss. Laina</td>\n      <td>female</td>\n      <td>26.0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>STON/O2. 3101282</td>\n      <td>7.9250</td>\n      <td>NaN</td>\n      <td>S</td>\n      <td>survived</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4</td>\n      <td>1</td>\n      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n      <td>female</td>\n      <td>35.0</td>\n      <td>1</td>\n      <td>0</td>\n      <td>113803</td>\n      <td>53.1000</td>\n      <td>C123</td>\n      <td>S</td>\n      <td>survived</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5</td>\n      <td>3</td>\n      <td>Allen, Mr. William Henry</td>\n      <td>male</td>\n      <td>35.0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>373450</td>\n      <td>8.0500</td>\n      <td>NaN</td>\n      <td>S</td>\n      <td>died</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>795</th>\n      <td>796</td>\n      <td>2</td>\n      <td>Otter, Mr. Richard</td>\n      <td>male</td>\n      <td>39.0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>28213</td>\n      <td>13.0000</td>\n      <td>NaN</td>\n      <td>S</td>\n      <td>died</td>\n    </tr>\n    <tr>\n      <th>796</th>\n      <td>797</td>\n      <td>1</td>\n      <td>Leader, Dr. Alice (Farnham)</td>\n      <td>female</td>\n      <td>49.0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>17465</td>\n      <td>25.9292</td>\n      <td>D17</td>\n      <td>S</td>\n      <td>survived</td>\n    </tr>\n    <tr>\n      <th>797</th>\n      <td>798</td>\n      <td>3</td>\n      <td>Osman, Mrs. Mara</td>\n      <td>female</td>\n      <td>31.0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>349244</td>\n      <td>8.6833</td>\n      <td>NaN</td>\n      <td>S</td>\n      <td>survived</td>\n    </tr>\n    <tr>\n      <th>798</th>\n      <td>799</td>\n      <td>3</td>\n      <td>Ibrahim Shawah, Mr. Yousseff</td>\n      <td>male</td>\n      <td>30.0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>2685</td>\n      <td>7.2292</td>\n      <td>NaN</td>\n      <td>C</td>\n      <td>died</td>\n    </tr>\n    <tr>\n      <th>799</th>\n      <td>800</td>\n      <td>3</td>\n      <td>Van Impe, Mrs. Jean Baptiste (Rosalie Paula Go...</td>\n      <td>female</td>\n      <td>30.0</td>\n      <td>1</td>\n      <td>1</td>\n      <td>345773</td>\n      <td>24.1500</td>\n      <td>NaN</td>\n      <td>S</td>\n      <td>died</td>\n    </tr>\n  </tbody>\n</table>\n<p>800 rows × 12 columns</p>\n</div>"
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# preview uses pandas to read and display csv data\n",
    "%pip install pandas --quiet\n",
    "import pandas as pd\n",
    "pd.read_csv('titanic_train.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cecc2102",
   "metadata": {},
   "source": [
    "## 3. Create your first predictor\n",
    "In its simplest invocation, brainome will automatically measure your data, identify the best model, build it, train it, and validate it.  \n",
    "\n",
    "It will automatically split your data into training and validation.\n",
    "\n",
    "The output is python source code in `predictor_101.py`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d0dde1a",
   "metadata": {},
   "outputs": [],
   "source": [
    "!brainome titanic_train.csv --yes -o predictor_101.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c259641",
   "metadata": {},
   "source": [
    "Open `predictor_101.py` to browse the predictor's source code. Notice it is on the order of 38k bytes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b43a6ff5",
   "metadata": {},
   "outputs": [],
   "source": [
    "%ls -lh predictor_101.py\n",
    "%pycat predictor_101.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a37fc6e",
   "metadata": {},
   "source": [
    "## 4. Validate the model\n",
    "Running your predictor on an unseen data set demonstrates its effectiveness.\n",
    "\n",
    "You can download the validation data at [titanic_validate.csv](https://download.brainome.ai/data/public/titanic_validate.csv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7adb3709",
   "metadata": {},
   "outputs": [],
   "source": [
    "!python3 predictor_101.py -validate titanic_validate.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53ba80e7",
   "metadata": {},
   "source": [
    "## 5. Making predictions on new data\n",
    "Run your predictor on an unlabeled data set to generate predictions for other passengers.\n",
    "\n",
    "You can download the prediction data at [titanic_predict.csv](https://download.brainome.ai/data/public/titanic_predict.csv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d3b591f",
   "metadata": {},
   "outputs": [],
   "source": [
    "!python3 predictor_101.py titanic_predict.csv > predictions_101.csv\n",
    "pd.read_csv('predictions_101.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc4bc793",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "- Check out [102 Using CLI](./brainome_102_Using_CLI.ipynb)\n",
    "- Check out [Using Measurement to Create Better Models](./brainome_200_Using_Measurement.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ac74412",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}