{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5d6df1f3",
   "metadata": {},
   "source": [
    "![brainome logo](./images/brainome_logo.png)\n",
    "# 105 Sourcing Your Data Set\n",
    "Brainome accepts CSV files from many sources\n",
    "\n",
    "1. Local file system\n",
    "2. HTTP/HTTPS URL\n",
    "3. Compressed data sets\n",
    "4. Multiple data sets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "431b880e",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "This notebook assumes brainome is installed as per notebook [brainome_101_Quick_Start](brainome_101_Quick_Start.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "!python3 -m pip install brainome -quiet\n",
    "!brainome -version"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "id": "8a76f187",
   "metadata": {},
   "source": [
    "## 1. Local file system\n",
    "Brainome defaults to reading data files from the current directory.\n",
    "\n",
    "In this example, we download [cancer.csv](https://download.brainome.ai/data/public/cancer.csv) to the local file system before using it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81c35599",
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.request as request\n",
    "response1 = request.urlretrieve('https://download.brainome.ai/data/public/cancer.csv', 'cancer.csv')\n",
    "print(\"Downloaded cancer.csv to local file system\")\n",
    "%ls -lh cancer.csv\n",
    "print(\"\\nRunning brainome\")\n",
    "!brainome cancer.csv -y  -o predictor_105_local.py | grep -A 6 \"Data:\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c65d5739",
   "metadata": {},
   "source": [
    "## 2. HTTP/HTTPS URL\n",
    "Brainome can download a CSV data set from an HTTP URL.\n",
    "\n",
    "\n",
    "In this example, we use [titanic_train.csv](https://download.brainome.ai/data/public/titanic_train.csv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "781a9407",
   "metadata": {},
   "outputs": [],
   "source": [
    "!brainome https://download.brainome.ai/data/public/titanic_train.csv -y -o predictor_105_http.py | grep -A 6 \"Data:\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05028412",
   "metadata": {},
   "source": [
    "## 3. Compressed data sets\n",
    "Brainome can stream a compressed data set.\n",
    "\n",
    "In this example, we use [titanic_compressed.csv.gz](https://download.brainome.ai/data/public/titanic_compressed.csv.gz)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "478ea137",
   "metadata": {},
   "outputs": [],
   "source": [
    "!brainome https://download.brainome.ai/data/public/titanic_compressed.csv.gz -y  -o predictor_105_gz.py | grep -A 6 \"Data:\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6904d21",
   "metadata": {},
   "source": [
    "## 4. Multiple data sets\n",
    "Brainome can accept multiple data sets. They need to all have the same columns.\n",
    "\n",
    "In this example, we use [vehicle.csv](https://download.brainome.ai/data/public/vehicle.csv), [vehicle_A.csv.gz](https://download.brainome.ai/data/public/vehicle_A.csv.gz), and [vehicle_B.csv.gz](https://download.brainome.ai/data/public/vehicle_B.csv.gz)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "870431d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "!brainome https://download.brainome.ai/data/public/vehicle.csv https://download.brainome.ai/data/public/vehicle_A.csv.gz https://download.brainome.ai/data/public/vehicle_B.csv.gz -y  -o predictor_105_multi.py | grep -A 6 \"Data:\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83c337d6",
   "metadata": {},
   "source": [
    "## Next Steps\n",
    "- Check out [106 Describe Your CSV](brainome_106_Describe_Your_CSV.ipynb)\n",
    "- Check out [Using Measurement to Create Better Models](./brainome_200_Using_Measurement.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}