{ "cells": [ { "cell_type": "markdown", "id": "5d6df1f3", "metadata": {}, "source": [ "![brainome logo](./images/brainome_logo.png)\n", "# 105 Sourcing Your Data Set\n", "Brainome accepts CSV files from many sources\n", "\n", "1. Local file system\n", "2. HTTP/HTTPS URL\n", "3. Compressed data sets\n", "4. Multiple data sets" ] }, { "cell_type": "markdown", "id": "431b880e", "metadata": {}, "source": [ "## Prerequisites\n", "This notebook assumes brainome is installed as per notebook [brainome_101_Quick_Start](brainome_101_Quick_Start.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python3 -m pip install brainome -quiet\n", "!brainome -version" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "id": "8a76f187", "metadata": {}, "source": [ "## 1. Local file system\n", "Brainome defaults to reading data files from the current directory.\n", "\n", "In this example, we download [cancer.csv](https://download.brainome.ai/data/public/cancer.csv) to the local file system before using it." ] }, { "cell_type": "code", "execution_count": null, "id": "81c35599", "metadata": {}, "outputs": [], "source": [ "import urllib.request as request\n", "response1 = request.urlretrieve('https://download.brainome.ai/data/public/cancer.csv', 'cancer.csv')\n", "print(\"Downloaded cancer.csv to local file system\")\n", "%ls -lh cancer.csv\n", "print(\"\\nRunning brainome\")\n", "!brainome cancer.csv -y -o predictor_105_local.py | grep -A 6 \"Data:\"" ] }, { "cell_type": "markdown", "id": "c65d5739", "metadata": {}, "source": [ "## 2. HTTP/HTTPS URL\n", "Brainome can download a CSV data set from an HTTP URL.\n", "\n", "\n", "In this example, we use [titanic_train.csv](https://download.brainome.ai/data/public/titanic_train.csv)" ] }, { "cell_type": "code", "execution_count": null, "id": "781a9407", "metadata": {}, "outputs": [], "source": [ "!brainome https://download.brainome.ai/data/public/titanic_train.csv -y -o predictor_105_http.py | grep -A 6 \"Data:\"" ] }, { "cell_type": "markdown", "id": "05028412", "metadata": {}, "source": [ "## 3. Compressed data sets\n", "Brainome can stream a compressed data set.\n", "\n", "In this example, we use [titanic_compressed.csv.gz](https://download.brainome.ai/data/public/titanic_compressed.csv.gz)" ] }, { "cell_type": "code", "execution_count": null, "id": "478ea137", "metadata": {}, "outputs": [], "source": [ "!brainome https://download.brainome.ai/data/public/titanic_compressed.csv.gz -y -o predictor_105_gz.py | grep -A 6 \"Data:\"" ] }, { "cell_type": "markdown", "id": "d6904d21", "metadata": {}, "source": [ "## 4. Multiple data sets\n", "Brainome can accept multiple data sets. They need to all have the same columns.\n", "\n", "In this example, we use [vehicle.csv](https://download.brainome.ai/data/public/vehicle.csv), [vehicle_A.csv.gz](https://download.brainome.ai/data/public/vehicle_A.csv.gz), and [vehicle_B.csv.gz](https://download.brainome.ai/data/public/vehicle_B.csv.gz)" ] }, { "cell_type": "code", "execution_count": null, "id": "870431d1", "metadata": {}, "outputs": [], "source": [ "!brainome https://download.brainome.ai/data/public/vehicle.csv https://download.brainome.ai/data/public/vehicle_A.csv.gz https://download.brainome.ai/data/public/vehicle_B.csv.gz -y -o predictor_105_multi.py | grep -A 6 \"Data:\"" ] }, { "cell_type": "markdown", "id": "83c337d6", "metadata": {}, "source": [ "## Next Steps\n", "- Check out [106 Describe Your CSV](brainome_106_Describe_Your_CSV.ipynb)\n", "- Check out [Using Measurement to Create Better Models](./brainome_200_Using_Measurement.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }