{ "cells": [ { "cell_type": "markdown", "id": "895a5bb2", "metadata": {}, "source": [ "![brainome logo](./images/brainome_logo.png)\n", "# 104 Using Brainome's Predictor CLI\n", "The predictor generated by Brainome is capable of being used by the command line interface (CLI).\n", "\n", "\n", "1. Predictor --help\n", "2. Validate test csv dataset\n", "3. Classify unlabeled csv dataset\n", "4. Feature engineering predictions" ] }, { "cell_type": "markdown", "id": "431b880e", "metadata": {}, "source": [ "## Prerequisites\n", "This notebook assumes brainome is installed as per notebook [brainome_101_Quick_Start](brainome_101_Quick_Start.ipynb)\n", "\n", "The data sets are:\n", "* [titanic_train.csv](https://download.brainome.ai/data/public/titanic_train.csv) for training data\n", "* [titanic_validate.csv](https://download.brainome.ai/data/public/titanic_validate.csv) for validation\n", "* [titanic_predict.csv](https://download.brainome.ai/data/public/titanic_predict.csv) for predictions" ] }, { "cell_type": "code", "execution_count": 2, "id": "1bf0381a", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001B[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621\u001B[0m\n", "brainome v1.006-18-beta\n", "\n", "\n", "\n", "\n", "-rw-r--r-- 1 andys staff 858B Sep 20 16:58 titanic_predict.csv\n", "-rw-r--r-- 1 andys staff 56K Sep 20 16:58 titanic_train.csv\n", "-rw-r--r-- 1 andys staff 5.7K Sep 20 16:58 titanic_validate.csv\n" ] } ], "source": [ "!python3 -m pip install brainome --quiet\n", "!brainome -version\n", "\n", "import urllib.request as request\n", "response1 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_train.csv', 'titanic_train.csv')\n", "response2 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_validate.csv', 'titanic_validate.csv')\n", "response3 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_predict.csv', 'titanic_predict.csv')\n", "%ls -lh titanic_train.csv titanic_validate.csv titanic_predict.csv\n" ] }, { "cell_type": "markdown", "id": "b8d121a1", "metadata": {}, "source": [ "## Generate a predictor" ] }, { "cell_type": "code", "execution_count": 8, "id": "57882e55", "metadata": { "tags": [ "output_scroll" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING: Could not detect a GPU. Neural Network generation will be slow.\n", "\n", "The predictor filename is predictor_104.py\n", "-rw-r--r-- 1 andys staff 38K Sep 20 17:00 predictor_104.py\n" ] } ], "source": [ "!brainome titanic_train.csv -rank -y -o predictor_104.py -modelonly -q\n", "print(\"The predictor filename is predictor_104.py\")\n", "%ls -lh predictor_104.py\n", "# Preview predictor\n", "%pycat predictor_104.py" ] }, { "cell_type": "markdown", "id": "8a76f187", "metadata": {}, "source": [ "## 1. Predictor help\n", "Brainome predictors are really short and sweet. They just validate and classify data.\n", "\n", "While the predictor source code is portable, it does require numpy to run and optionally scipy to generate the confusion matrices." ] }, { "cell_type": "code", "execution_count": 9, "id": "81c35599", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: predictor_104.py [-h] [-validate] [-headerless] [-json] [-trim] csvfile\r\n", "\r\n", "Predictor trained on ['titanic_train.csv']\r\n", "\r\n", "positional arguments:\r\n", " csvfile CSV file containing test set (unlabeled).\r\n", "\r\n", "optional arguments:\r\n", " -h, --help show this help message and exit\r\n", " -validate Validation mode. csvfile must be labeled. Output is\r\n", " classification statistics rather than predictions.\r\n", " -headerless Do not treat the first line of csvfile as a header.\r\n", " -json report measurements as json\r\n", " -trim If true, the prediction will not output ignored columns.\r\n" ] } ], "source": [ "!python3 predictor_104.py --help" ] }, { "cell_type": "markdown", "id": "c65d5739", "metadata": {}, "source": [ "## 2. Validate test dataset\n", "The validate function takes a csv data set identical to the training data set and, with the `-validate` parameter, compares outcomes." ] }, { "cell_type": "code", "execution_count": 10, "id": "781a9407", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Classifier Type: Random Forest\r\n", "System Type: 2-way classifier\r\n", "\r\n", "Accuracy:\r\n", " Best-guess accuracy: 61.25%\r\n", " Model accuracy: 80.00% (64/80 correct)\r\n", " Improvement over best guess: 18.75% (of possible 38.75%)\r\n", "\r\n", "Model capacity (MEC): 17 bits\r\n", "Generalization ratio: 3.62 bits/bit\r\n", "\r\n", "Confusion Matrix:\r\n", "\r\n", " Actual | Predicted \r\n", " ----------------------------\r\n", " died | 45 4\r\n", " survived | 12 19\r\n", "\r\n", "Accuracy by Class:\r\n", "\r\n", " target | TP FP TN FN TPR TNR PPV NPV F1 TS\r\n", " -------- | -- -- -- -- ------- ------- ------- ------- ------- -------\r\n", " died | 45 12 19 4 91.84% 61.29% 78.95% 82.61% 84.91% 73.77%\r\n", " survived | 19 4 45 12 61.29% 91.84% 82.61% 78.95% 70.37% 54.29%\r\n" ] } ], "source": [ "!python3 predictor_104.py -validate titanic_validate.csv" ] }, { "cell_type": "markdown", "id": "05028412", "metadata": {}, "source": [ "## 3. Classify unlabeled dataset\n", "The predictor can classify a similar to training/validation data set sans target column.\n", "\n", "It will generate a complete data set with the \"Prediction\" column appended." ] }, { "cell_type": "code", "execution_count": 6, "id": "478ea137", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Viewing classification predictions.\n", "\u001B[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621\u001B[0m\n", "Note: you may need to restart the kernel to use updated packages.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdCabin_ClassNameSexAgeSibling_SpouseParent_ChildrenTicket_NumberFareCabin_NumberPort_of_EmbarkationPrediction
08812Shelley, Mrs. William (Imanita Parrish Hall)female25.00123043326.0000NaNSsurvived
18823Markun, Mr. Johannmale33.0003492577.8958NaNSdied
28833Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNSdied
38842Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NaNSdied
48853Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNSdied
58863Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQdied
68872Montvila, Rev. Juozasmale27.00021153613.0000NaNSdied
78881Graham, Miss. Margaret Edithfemale19.00011205330.0000B42Ssurvived
88893Johnston, Miss. Catherine Helen Carrie\"femaleNaN12W./C. 660723.4500NaNSsurvived
98901Behr, Mr. Karl Howellmale26.00011136930.0000C148Cdied
108913Dooley, Mr. Patrickmale32.0003703767.7500NaNQdied
\n", "
" ], "text/plain": [ " PassengerId Cabin_Class Name \\\n", "0 881 2 Shelley, Mrs. William (Imanita Parrish Hall) \n", "1 882 3 Markun, Mr. Johann \n", "2 883 3 Dahlberg, Miss. Gerda Ulrika \n", "3 884 2 Banfield, Mr. Frederick James \n", "4 885 3 Sutehall, Mr. Henry Jr \n", "5 886 3 Rice, Mrs. William (Margaret Norton) \n", "6 887 2 Montvila, Rev. Juozas \n", "7 888 1 Graham, Miss. Margaret Edith \n", "8 889 3 Johnston, Miss. Catherine Helen Carrie\" \n", "9 890 1 Behr, Mr. Karl Howell \n", "10 891 3 Dooley, Mr. Patrick \n", "\n", " Sex Age Sibling_Spouse Parent_Children Ticket_Number Fare \\\n", "0 female 25.0 0 1 230433 26.0000 \n", "1 male 33.0 0 0 349257 7.8958 \n", "2 female 22.0 0 0 7552 10.5167 \n", "3 male 28.0 0 0 C.A./SOTON 34068 10.5000 \n", "4 male 25.0 0 0 SOTON/OQ 392076 7.0500 \n", "5 female 39.0 0 5 382652 29.1250 \n", "6 male 27.0 0 0 211536 13.0000 \n", "7 female 19.0 0 0 112053 30.0000 \n", "8 female NaN 1 2 W./C. 6607 23.4500 \n", "9 male 26.0 0 0 111369 30.0000 \n", "10 male 32.0 0 0 370376 7.7500 \n", "\n", " Cabin_Number Port_of_Embarkation Prediction \n", "0 NaN S survived \n", "1 NaN S died \n", "2 NaN S died \n", "3 NaN S died \n", "4 NaN S died \n", "5 NaN Q died \n", "6 NaN S died \n", "7 B42 S survived \n", "8 NaN S survived \n", "9 C148 C died \n", "10 NaN Q died " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "!python3 predictor_104.py titanic_predict.csv > classifications_104.csv\n", "print('Viewing classification predictions.')\n", "%pip install pandas --quiet\n", "import pandas as pd\n", "pd.read_csv('classifications_104.csv')\n" ] }, { "cell_type": "markdown", "id": "bc4c0b48", "metadata": {}, "source": [ "## 4. Feature engineering predictions\n", "While feature engineering, it is desired to only view the features that contributed to the prediction. \n", "\n", "With the `-trim` parameter, the output will only show the features deemed important by the model." ] }, { "cell_type": "code", "execution_count": 7, "id": "c9280328", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Viewing important features classification predictions.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdCabin_ClassNameSexAgeSibling_SpouseParent_ChildrenTicket_NumberFareCabin_NumberPort_of_EmbarkationPrediction
08812Shelley, Mrs. William (Imanita Parrish Hall)female25.00123043326.0000NaNSsurvived
18823Markun, Mr. Johannmale33.0003492577.8958NaNSdied
28833Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNSdied
38842Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NaNSdied
48853Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNSdied
58863Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQdied
68872Montvila, Rev. Juozasmale27.00021153613.0000NaNSdied
78881Graham, Miss. Margaret Edithfemale19.00011205330.0000B42Ssurvived
88893Johnston, Miss. Catherine Helen Carrie\"femaleNaN12W./C. 660723.4500NaNSsurvived
98901Behr, Mr. Karl Howellmale26.00011136930.0000C148Cdied
108913Dooley, Mr. Patrickmale32.0003703767.7500NaNQdied
\n", "
" ], "text/plain": [ " PassengerId Cabin_Class Name \\\n", "0 881 2 Shelley, Mrs. William (Imanita Parrish Hall) \n", "1 882 3 Markun, Mr. Johann \n", "2 883 3 Dahlberg, Miss. Gerda Ulrika \n", "3 884 2 Banfield, Mr. Frederick James \n", "4 885 3 Sutehall, Mr. Henry Jr \n", "5 886 3 Rice, Mrs. William (Margaret Norton) \n", "6 887 2 Montvila, Rev. Juozas \n", "7 888 1 Graham, Miss. Margaret Edith \n", "8 889 3 Johnston, Miss. Catherine Helen Carrie\" \n", "9 890 1 Behr, Mr. Karl Howell \n", "10 891 3 Dooley, Mr. Patrick \n", "\n", " Sex Age Sibling_Spouse Parent_Children Ticket_Number Fare \\\n", "0 female 25.0 0 1 230433 26.0000 \n", "1 male 33.0 0 0 349257 7.8958 \n", "2 female 22.0 0 0 7552 10.5167 \n", "3 male 28.0 0 0 C.A./SOTON 34068 10.5000 \n", "4 male 25.0 0 0 SOTON/OQ 392076 7.0500 \n", "5 female 39.0 0 5 382652 29.1250 \n", "6 male 27.0 0 0 211536 13.0000 \n", "7 female 19.0 0 0 112053 30.0000 \n", "8 female NaN 1 2 W./C. 6607 23.4500 \n", "9 male 26.0 0 0 111369 30.0000 \n", "10 male 32.0 0 0 370376 7.7500 \n", "\n", " Cabin_Number Port_of_Embarkation Prediction \n", "0 NaN S survived \n", "1 NaN S died \n", "2 NaN S died \n", "3 NaN S died \n", "4 NaN S died \n", "5 NaN Q died \n", "6 NaN S died \n", "7 B42 S survived \n", "8 NaN S survived \n", "9 C148 C died \n", "10 NaN Q died " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "!python3 predictor_104.py titanic_predict.csv -trim > trimmed_classifications_104.csv\n", "print('Viewing important features classification predictions.')\n", "# preview uses pandas to read and display csv data\n", "import pandas as pd\n", "pd.read_csv('trimmed_classifications_104.csv')" ] }, { "cell_type": "markdown", "id": "86883ae1", "metadata": {}, "source": [ "## Advanced Predictor Usage\n", "See notebook [300 Put your model to work](./brainome_300_Integrating_Predictors.ipynb) for integrating the predictor within your python program." ] }, { "cell_type": "markdown", "id": "83c337d6", "metadata": {}, "source": [ "## Next Steps\n", "- Check out [106 Describe Your CSV](./brainome_106_Describe_Your_CSV.ipynb)\n", "- Check out [Using Measurement to Create Better Models](./brainome_200_Using_Measurement.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }