{ "cells": [ { "cell_type": "markdown", "id": "895a5bb2", "metadata": {}, "source": [ "![brainome logo](./images/brainome_logo.png)\n", "# 106 Describing Your Data Set\n", "Brainome assumes your CSV file has certain characteristics: \n", "* the first row is the column headers\n", "* the target is the last column\n", "* we train using all columns\n", "\n", "\n", "Use these parameters to change our assumptions.\n", "\n", "1. -headerless CSV file\n", "2. Selecting the -target column\n", "3. -ignorecolumns to omit unique identifiers" ] }, { "cell_type": "markdown", "id": "431b880e", "metadata": {}, "source": [ "## Prerequisites\n", "This notebook assumes brainome is installed as per notebook [brainome_101_Quick_Start](./brainome_101_Quick_Start.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python3 -m pip install brainome --quiet\n", "!brainome -version" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "id": "8a76f187", "metadata": {}, "source": [ "## 1. -headerless CSV file\n", "Brainome assumes your CSV file has a header row. \n", "\n", "Use `-headerless` when your CSV file omits the header row.\n", "\n", "In this example, we use [bank.csv](https://download.brainome.ai/data/public/bank.csv)" ] }, { "cell_type": "code", "execution_count": null, "id": "81c35599", "metadata": {}, "outputs": [], "source": [ "import urllib.request as request\n", "response1 = request.urlretrieve('https://download.brainome.ai/data/public/bank.csv', 'bank.csv')\n", "print(\" Headerless data set bank.csv \".center(80,\"-\"))\n", "!head -4 bank.csv\n", "print(\"\\n\",\" Ranking an headerless data file \".center(80,\"-\"))\n", "!brainome bank.csv -headerless -y -o predictor_106_headerless.py | grep -A 6 \"Attribute Ranking:\"" ] }, { "cell_type": "markdown", "id": "c65d5739", "metadata": {}, "source": [ "## 2. Selecting the -target column\n", "Brainome assumes the last column is the target. \n", "\n", "Use `-target` to specify a different column.\n", "\n", "In this example, we use [titanic_train.csv](https://download.brainome.ai/data/public/titanic_train.csv) but rather than predicting *Survived*, we predict *Cabin_Class*\n" ] }, { "cell_type": "code", "execution_count": null, "id": "781a9407", "metadata": {}, "outputs": [], "source": [ "!brainome https://download.brainome.ai/data/public/titanic_train.csv -target Cabin_Class -y -o predictor_106_target.py | grep \"Target Column:\"" ] }, { "cell_type": "markdown", "id": "05028412", "metadata": {}, "source": [ "## 3. -ignorecolumns to omit unique identifiers\n", "Brainome will use all the columns in your data set. Most data sets include unique identifiers to tie the predictions to an external source.\n", "\n", "Use `-ignorecolumns` to omit features from your model.\n", "\n", "In this example, we ignore *PassengerId* and *Ticket_Number* from [titanic_train.csv](https://download.brainome.ai/data/public/titanic_train.csv)" ] }, { "cell_type": "code", "execution_count": null, "id": "478ea137", "metadata": {}, "outputs": [], "source": [ "!brainome https://download.brainome.ai/data/public/titanic_train.csv -ignorecolumns \"PassengerId,Ticket_Number\" -y -o predictor_106_ignorecolumns.py | grep -A 10 \"Attribute Ranking:\"" ] }, { "cell_type": "markdown", "id": "83c337d6", "metadata": {}, "source": [ "## Next Steps\n", "- Check out [200 Using measurements to improve your model](./brainome_200_Using_Measurement.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }