{ "cells": [ { "cell_type": "markdown", "id": "703b5e61", "metadata": {}, "source": [ "![brainome logo](./images/brainome_logo.png)\n", "# 101 Quick Start\n", "Running brainome in five easy steps:\n", "1. Install brainome via pip\n", "2. Download data sets\n", "3. Create your first predictor\n", "4. Validate the model\n", "5. Making predictions on new data" ] }, { "cell_type": "markdown", "id": "27f5d236", "metadata": {}, "source": [ "## 1. Install brainome via pip\n", "Pip will automatically include dependencies." ] }, { "cell_type": "code", "execution_count": null, "id": "01365cb1", "metadata": { "tags": [ "output_scroll" ] }, "outputs": [], "source": [ "!python3 -m pip install brainome\n", "print(\"\\n\\nChecking brainome version number:\")\n", "!brainome --version" ] }, { "cell_type": "markdown", "id": "88df264a", "metadata": {}, "source": [ "### Troubleshooting installation\n", "Sometimes pip requires `--user` parameter in order to install successfully:\n", "\n", "> `python3 -m pip install brainome --user`" ] }, { "cell_type": "markdown", "id": "7ae71992", "metadata": {}, "source": [ "## 2. Download this tutorial's data sets.\n", "The titanic data set is a commonly used for introduction to data science. It is a passenger manifest of the Titanic including whether they survived the disaster or not. For more information, refer to [kaggle.com/c/titanic](https://www.kaggle.com/c/titanic)" ] }, { "cell_type": "code", "execution_count": null, "id": "446e86f8", "metadata": { "scrolled": true }, "outputs": [], "source": [ "import urllib.request as request\n", "response1 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_train.csv', 'titanic_train.csv')\n", "response2 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_validate.csv', 'titanic_validate.csv')\n", "response3 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_predict.csv', 'titanic_predict.csv')\n", "%ls -lh titanic_train.csv titanic_validate.csv titanic_predict.csv" ] }, { "cell_type": "markdown", "id": "7e8a3b84", "metadata": {}, "source": [ "### Preview training data\n", "The goal of the training is to predict which passenger survived the diaster.\n", "\n", "The passenger roster contains 11 features (PassengerId, Cabin_Class, Name, etc) for 800 passengers that can be used to create a model. Hence, the target column is 'Survived'.\n", "\n", "You can download the training data at [titanic_train.csv](https://download.brainome.ai/data/public/titanic_train.csv)" ] }, { "cell_type": "code", "execution_count": 1, "id": "ea0d21b6", "metadata": { "tags": [ "output_scroll" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001B[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621\u001B[0m\r\n", "Requirement already satisfied: pandas in /usr/local/lib/python3.9/site-packages (1.3.1)\r\n", "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.9/site-packages (from pandas) (2021.1)\r\n", "Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.9/site-packages (from pandas) (2.8.2)\r\n", "Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.9/site-packages (from pandas) (1.20.0)\r\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)\r\n", "Note: you may need to restart the kernel to use updated packages.\n" ] }, { "data": { "text/plain": " PassengerId Cabin_Class \\\n0 1 3 \n1 2 1 \n2 3 3 \n3 4 1 \n4 5 3 \n.. ... ... \n795 796 2 \n796 797 1 \n797 798 3 \n798 799 3 \n799 800 3 \n\n Name Sex Age \\\n0 Braund, Mr. Owen Harris male 22.0 \n1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 \n2 Heikkinen, Miss. Laina female 26.0 \n3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 \n4 Allen, Mr. William Henry male 35.0 \n.. ... ... ... \n795 Otter, Mr. Richard male 39.0 \n796 Leader, Dr. Alice (Farnham) female 49.0 \n797 Osman, Mrs. Mara female 31.0 \n798 Ibrahim Shawah, Mr. Yousseff male 30.0 \n799 Van Impe, Mrs. Jean Baptiste (Rosalie Paula Go... female 30.0 \n\n Sibling_Spouse Parent_Children Ticket_Number Fare Cabin_Number \\\n0 1 0 A/5 21171 7.2500 NaN \n1 1 0 PC 17599 71.2833 C85 \n2 0 0 STON/O2. 3101282 7.9250 NaN \n3 1 0 113803 53.1000 C123 \n4 0 0 373450 8.0500 NaN \n.. ... ... ... ... ... \n795 0 0 28213 13.0000 NaN \n796 0 0 17465 25.9292 D17 \n797 0 0 349244 8.6833 NaN \n798 0 0 2685 7.2292 NaN \n799 1 1 345773 24.1500 NaN \n\n Port_of_Embarkation Survived \n0 S died \n1 C survived \n2 S survived \n3 S survived \n4 S died \n.. ... ... \n795 S died \n796 S survived \n797 S survived \n798 C died \n799 S died \n\n[800 rows x 12 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
PassengerIdCabin_ClassNameSexAgeSibling_SpouseParent_ChildrenTicket_NumberFareCabin_NumberPort_of_EmbarkationSurvived
013Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNSdied
121Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85Csurvived
233Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNSsurvived
341Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123Ssurvived
453Allen, Mr. William Henrymale35.0003734508.0500NaNSdied
.......................................
7957962Otter, Mr. Richardmale39.0002821313.0000NaNSdied
7967971Leader, Dr. Alice (Farnham)female49.0001746525.9292D17Ssurvived
7977983Osman, Mrs. Marafemale31.0003492448.6833NaNSsurvived
7987993Ibrahim Shawah, Mr. Yousseffmale30.00026857.2292NaNCdied
7998003Van Impe, Mrs. Jean Baptiste (Rosalie Paula Go...female30.01134577324.1500NaNSdied
\n

800 rows × 12 columns

\n
" }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# preview uses pandas to read and display csv data\n", "%pip install pandas --quiet\n", "import pandas as pd\n", "pd.read_csv('titanic_train.csv')" ] }, { "cell_type": "markdown", "id": "cecc2102", "metadata": {}, "source": [ "## 3. Create your first predictor\n", "In its simplest invocation, brainome will automatically measure your data, identify the best model, build it, train it, and validate it. \n", "\n", "It will automatically split your data into training and validation.\n", "\n", "The output is python source code in `predictor_101.py`." ] }, { "cell_type": "code", "execution_count": null, "id": "8d0dde1a", "metadata": {}, "outputs": [], "source": [ "!brainome titanic_train.csv --yes -o predictor_101.py" ] }, { "cell_type": "markdown", "id": "0c259641", "metadata": {}, "source": [ "Open `predictor_101.py` to browse the predictor's source code. Notice it is on the order of 38k bytes." ] }, { "cell_type": "code", "execution_count": null, "id": "b43a6ff5", "metadata": {}, "outputs": [], "source": [ "%ls -lh predictor_101.py\n", "%pycat predictor_101.py" ] }, { "cell_type": "markdown", "id": "7a37fc6e", "metadata": {}, "source": [ "## 4. Validate the model\n", "Running your predictor on an unseen data set demonstrates its effectiveness.\n", "\n", "You can download the validation data at [titanic_validate.csv](https://download.brainome.ai/data/public/titanic_validate.csv)" ] }, { "cell_type": "code", "execution_count": null, "id": "7adb3709", "metadata": {}, "outputs": [], "source": [ "!python3 predictor_101.py -validate titanic_validate.csv" ] }, { "cell_type": "markdown", "id": "53ba80e7", "metadata": {}, "source": [ "## 5. Making predictions on new data\n", "Run your predictor on an unlabeled data set to generate predictions for other passengers.\n", "\n", "You can download the prediction data at [titanic_predict.csv](https://download.brainome.ai/data/public/titanic_predict.csv)" ] }, { "cell_type": "code", "execution_count": null, "id": "8d3b591f", "metadata": {}, "outputs": [], "source": [ "!python3 predictor_101.py titanic_predict.csv > predictions_101.csv\n", "pd.read_csv('predictions_101.csv')" ] }, { "cell_type": "markdown", "id": "fc4bc793", "metadata": {}, "source": [ "## Next steps\n", "- Check out [102 Using CLI](./brainome_102_Using_CLI.ipynb)\n", "- Check out [Using Measurement to Create Better Models](./brainome_200_Using_Measurement.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "id": "5ac74412", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }