{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Civis Python API Client\n",
    "[Stephen Hoover](https://github.com/stephen-hoover), Lead Data Scientist<br> \n",
    "August 2017"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Civis Platform provides you with a Data Science API which gives you direct access to Civis Platform's cloud-based infrastructure, data science tools, and data. You can query large datasets, train a dozen models at once (and set them to re-train on a schedule), and create or update dashboards to show off your work. Using the Data Science API, you can write code in scripts or notebooks as if you're working on your laptop, but with all the resources of the Civis Platform.\n",
    "\n",
    "Civis Analytics provides [API clients](#what-is-an-api-client) for both Python and R. This notebook introduces you to the abstractions used in the [Civis Python API Client](https://civis-python.readthedocs.io/) and provides a few use examples. If you aren't running this notebook in the Civis Platform, follow the instructions in [Section A.3](#local-machine) for setup instructions. If you aren't a Civis Platform subscriber, sign up for a [free trial](https://www.civisanalytics.com/civis-platform-signup/) today!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using Civis Python API Client version 1.6.0.\n"
     ]
    }
   ],
   "source": [
    "print(f\"Using Civis Python API Client version {civis.__version__}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table of Contents\n",
    "\n",
    "1. [What's Available?](#whats-available)\n",
    "2. [Data Access](#data-access)<br>\n",
    "2.1 [Reading a table from Civis](#reading-table)<br>\n",
    "2.2 [Writing tables to Civis](#writing-tables)<br>\n",
    "2.3 [What is the CivisFuture?](#civisfuture)<br>\n",
    "2.4 [Executing a SQL query](#executing-sql)<br>\n",
    "2.5 [Writing and reading files](#file-io)<br>\n",
    "2.6 [Other useful I/O functions](#other-io)<br>\n",
    "3. [Machine Learning](#ml)<br>\n",
    "3.1 [Training your model](#ml-train)<br>\n",
    "3.2 [Making predictions](#ml-predict)<br>\n",
    "4. [Direct API Access](#raw-api)<br>\n",
    "4.1 [Tables](#tables)<br>\n",
    "4.2 [Paginated responses](#pagination)<br>\n",
    "4.3 [The API Response](#api-response)<br>\n",
    "5. [Build something new](#something-new)<br>\n",
    "5.1 [Creating and running Container Scripts](#container-scripts)<br>\n",
    "5.2 [Custom Scripts](#custom-scripts)<br>\n",
    "6. [A Data Science API](#ds-api)\n",
    "\n",
    "### Appendix\n",
    "A.1 [What is an API client?](#what-is-an-api-client)<br>\n",
    "A.2 [Rate limits and retries](#retries)<br>\n",
    "A.3 [Using the Python API client outside of Civis](#local-machine)<br>\n",
    "A.4 [Where can I go from here?](#next-steps)<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='whats-available'></a>\n",
    "# 1. What's Available?\n",
    "\n",
    "The Python API client has two kinds of functionality.\n",
    "\n",
    "First, you can interact directly with the Civis Data Science API by using a `civis.APIClient` object. This translates the native REST API into Python code, so that you can pass parameters to functions rather than writing out http requests by hand. These functions all immediately return the response from Civis Platform.\n",
    "\n",
    "The second kind of functionality is higher-level functions which make common tasks easier, such as copying a table from Redshift into a `pandas.DataFrame`, or training a machine learning model. You can access these functions through the `civis` namespace.\n",
    "- [`civis.io`](https://civis-python.readthedocs.io/en/latest/io.html) : Data input, output, and transfer, as well as SQL queries on Redshift tables\n",
    "- [`civis.ml`](https://civis-python.readthedocs.io/en/latest/ml.html) : Machine learning\n",
    "- [`civis.parallel`](https://civis-python.readthedocs.io/en/latest/parallel.html) : Tools for doing batch computing in Civis Platform\n",
    "\n",
    "When you start a new Civis Jupyter notebook, you already have the `civis` namespace imported and a `civis.APIClient` object named `client` created and ready to go!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Uncomment the following two lines if you run this notebook outside of Civis Platform\n",
    "#import civis\n",
    "#client = civis.APIClient()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='data-access'></a>\n",
    "# 2. Data Access\n",
    "\n",
    "You can use the functions provided in the [`civis.io`](https://civis-python.readthedocs.io/en/latest/io.html) namespace to move data in and out of Civis Platform. Here's a few examples of how that works. This notebook assumes that all of the data we'll use are in the same database, defined below. If your data aren't in the \"Civis Database\" database, change the following cell to use the correct name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "DATABASE = \"Civis Database\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='reading-table'></a>\n",
    "## 2.1 Reading a table from Civis\n",
    "\n",
    "Sometimes you need to move a table from your Civis Redshift cluster into RAM so that you can manipulate it. The `civis.io.read_civis` function will do that for you.\n",
    "\n",
    "This is the first example of a *wrapper function*, which is a special piece of code designed to do a common task (in this case, read a table from your Civis Redshift cluster and return it as a list or a `pandas.DataFrame`).  There are a number of wrapper functions in `civis.io` designed to assist with getting data in and out of Civis Platform.  They will make your life easier than e.g. working with the raw API endpoints or clicking through the GUI.  *The recommended best practice is to use wrapper functions whenever possible, rather than the client directly.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# First, use \"?\" to investigate the parameters of civis.io.read_civis\n",
    "civis.io.read_civis?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's read out a table of data on public transit ridership in Chicago. The docstring tells us that unless `use_pandas` is True (default=False), the function will return a list. We want a `DataFrame` here, so set `use_pandas` to True."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The table's shape is (739054, 5).\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station_id</th>\n",
       "      <th>stationname</th>\n",
       "      <th>date</th>\n",
       "      <th>daytype</th>\n",
       "      <th>rides</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>40200</td>\n",
       "      <td>Randolph/Wabash</td>\n",
       "      <td>2001-01-01</td>\n",
       "      <td>U</td>\n",
       "      <td>834</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>40870</td>\n",
       "      <td>Francisco</td>\n",
       "      <td>2001-01-01</td>\n",
       "      <td>U</td>\n",
       "      <td>196</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>40060</td>\n",
       "      <td>Belmont-O'Hare</td>\n",
       "      <td>2001-01-02</td>\n",
       "      <td>W</td>\n",
       "      <td>4046</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>40730</td>\n",
       "      <td>Washington/Wells</td>\n",
       "      <td>2001-01-02</td>\n",
       "      <td>W</td>\n",
       "      <td>6788</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>41430</td>\n",
       "      <td>87th</td>\n",
       "      <td>2001-01-02</td>\n",
       "      <td>W</td>\n",
       "      <td>4577</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   station_id       stationname        date daytype  rides\n",
       "0       40200   Randolph/Wabash  2001-01-01       U    834\n",
       "1       40870         Francisco  2001-01-01       U    196\n",
       "2       40060    Belmont-O'Hare  2001-01-02       W   4046\n",
       "3       40730  Washington/Wells  2001-01-02       W   6788\n",
       "4       41430              87th  2001-01-02       W   4577"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = civis.io.read_civis(table='public.cta_ridership_daily',\n",
    "                         database=DATABASE, \n",
    "                         use_pandas=True)\n",
    "print(f\"The table's shape is {df.shape}.\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the table in our notebook, we can inspect it and use Python functions to modify it. Let's turn it into a table of ridership by month for each station, starting in 2010."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The grouped table's shape is (8948, 3).\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>stationname</th>\n",
       "      <th>month</th>\n",
       "      <th>rides</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>18th</td>\n",
       "      <td>2010-01</td>\n",
       "      <td>37136</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>18th</td>\n",
       "      <td>2010-02</td>\n",
       "      <td>37605</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18th</td>\n",
       "      <td>2010-03</td>\n",
       "      <td>42990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>18th</td>\n",
       "      <td>2010-04</td>\n",
       "      <td>41964</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18th</td>\n",
       "      <td>2010-05</td>\n",
       "      <td>40943</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  stationname   month  rides\n",
       "0        18th 2010-01  37136\n",
       "1        18th 2010-02  37605\n",
       "2        18th 2010-03  42990\n",
       "3        18th 2010-04  41964\n",
       "4        18th 2010-05  40943"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df['month'] = pd.DatetimeIndex(df['date']).to_period('M')\n",
    "rides_post_2009 = df[df['month'] >= pd.Period('2010-01', 'M')]\n",
    "rides_by_month = (rides_post_2009.groupby(['stationname', 'month'])[['rides']]\n",
    "                  .sum()\n",
    "                  .reset_index())\n",
    "print(f\"The grouped table's shape is {rides_by_month.shape}.\")\n",
    "rides_by_month.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='writing-tables'></a>\n",
    "## 2.2 Writing tables to Civis\n",
    "\n",
    "Now that we have our modified data, let's put it back into your Redshift cluster. Use the function `civis.io.dataframe_to_civis` to do the upload. We'll put it into a table in the \"scratch\" schema. That's a customary location for tables we don't intend to keep around for long."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<CivisFuture at 0x7f105af525f8 state=running>\n"
     ]
    }
   ],
   "source": [
    "rbm_tablename = 'scratch.rides_by_month'\n",
    "fut = civis.io.dataframe_to_civis(\n",
    "    df=rides_by_month,\n",
    "    database=DATABASE,\n",
    "    table=rbm_tablename,\n",
    "    distkey='month',\n",
    "    sortkey1='month',\n",
    "    existing_table_rows='drop',\n",
    ")  # This is non-blocking\n",
    "print(fut)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'error': None,\n",
       " 'finished_at': '2017-08-29T14:39:59.000Z',\n",
       " 'id': 58250695,\n",
       " 'import_id': 7072332,\n",
       " 'is_cancel_requested': False,\n",
       " 'started_at': '2017-08-29T14:39:51.000Z',\n",
       " 'state': 'succeeded'}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fut.result()  # This blocks (warning: can take a few minutes to run)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='civisfuture'></a>\n",
    "## 2.3 What is the CivisFuture?\n",
    "\n",
    "Notice that, although the `civis.io.read_civis` function waited until your download was done to finish executing, the `civis.io.dataframe_to_civis` function returned immediately, even though Civis Platform hadn't finished creating your table. When working with the client, you will often need to start jobs that will take some time to complete. To deal with this, the Civis client includes `CivisFuture` objects, which allow you to process multiple long running jobs simultaneously. \n",
    "\n",
    "The `CivisFuture` object is a subclass of the standard library [`concurrent.futures.Future`](https://docs.python.org/3/library/concurrent.futures.html#future-objects) object and tracks a Civis Platform run. This abstraction allows you to start multiple jobs at once, rather than wait for one to finish before starting the other. You can keep working while your table creation happens, and only stop to wait (by calling `CivisFuture.result()` or `concurrent.futures.wait`) once you reach a step which relies on your run having finished.\n",
    "\n",
    "Find more information on `CivisFuture` in the User Guide: http://civis-python.readthedocs.io/en/latest/user_guide.html#civis-futures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='executing-sql'></a>\n",
    "## 2.4 Executing a SQL query\n",
    "\n",
    "You can also use functions in the `civis.io` namespace to run SQL in Civis Platform as if you were working with Query. You can use this same method in your scripts to create or drop tables, assign permissions, or do anything else you would want to do in Query. \n",
    "\n",
    "Let's use a Query to pull out the July 2016 traffic at one of the stations in downtown Chicago. Here we're immediately asking for the result of the query by calling `.result()` on the returned `CivisFuture`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Washington/Wells station had 181116 riders in 2015-03.\n"
     ]
    }
   ],
   "source": [
    "station_name = \"Washington/Wells\"\n",
    "month = \"2015-03\"\n",
    "result = civis.io.query_civis(database=DATABASE,\n",
    "                              sql=(f\"SELECT rides FROM {rbm_tablename} \"\n",
    "                                   f\"WHERE stationname = '{station_name}' \" \n",
    "                                   f\"and month = '{month}'\"),\n",
    "                             ).result()\n",
    "print(f\"The {station_name} station had {result['result_rows'][0][0]} riders in {month}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's clean up that scratch table. We don't need to wait for Civis Platform to finish, so this time we won't block on the output of `civis.io.query_civis`. Civis Platform will keep running the table action as we move to the next cells of this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "fut_drop = civis.io.query_civis(database=DATABASE, \n",
    "                                sql=f\"DROP TABLE IF EXISTS {rbm_tablename}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='file-io'></a>\n",
    "## 2.5 Writing and reading files\n",
    "\n",
    "You can store arbitrary files in Civis Platform by using `civis.io.file_to_civis` to store and `civis.io.civis_to_file` to retrieve data. Let's grab the current status of [Chicago's bike share network](https://www.divvybikes.com/) and store the data in a Civis File."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloaded data on 584 stations.\n"
     ]
    }
   ],
   "source": [
    "import requests\n",
    "\n",
    "divvy_api = 'https://feeds.divvybikes.com/stations/stations.json'\n",
    "bikes = requests.get(divvy_api).json()\n",
    "print(f\"Downloaded data on {len(bikes['stationBeanList'])} stations.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Upload your data by sending it as an open file object to `civis.io.file_to_civis`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "File uploaded to file number 6008066.\n"
     ]
    }
   ],
   "source": [
    "import io\n",
    "import json\n",
    "\n",
    "buf = io.TextIOWrapper(io.BytesIO())  # `json` writes text\n",
    "json.dump(bikes, buf)\n",
    "buf.seek(0)\n",
    "bike_file_id = civis.io.file_to_civis(buf.buffer, 'Divvy status')\n",
    "print(f\"File uploaded to file number {bike_file_id}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then you can use that file ID to download the file into a new buffer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "buf_down = io.TextIOWrapper(io.BytesIO())\n",
    "civis.io.civis_to_file(bike_file_id, buf_down.buffer)\n",
    "buf_down.seek(0)\n",
    "bikes_down = json.load(buf_down)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bikes == bikes_down"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because retrieving JSON from a Civis File is such a common occurence, there's a simpler function for files you know are formatted in JSON: `civis.io.file_to_json`. Similarly, if you know that a file is a CSV, you could use `civis.io.file_to_dataframe` to access it as a `pandas.DataFrame`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The file I stored in Civis has data on 584 stations.\n"
     ]
    }
   ],
   "source": [
    "bikes_again = civis.io.file_to_json(bike_file_id)\n",
    "print(\"The file I stored in Civis has data on \"\n",
    "      f\"{len(bikes_again['stationBeanList'])} stations.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='other-io'></a>\n",
    "## 2.6 Other useful I/O functions\n",
    "\n",
    "The following functions handle moving structured data to and from Civis:\n",
    "* civis_to_csv(filename, sql, database[, ...])\tExport data from Civis to a local CSV file.\n",
    "* csv_to_civis(filename, database, table[, ...])\tUpload the contents of a local CSV file to Civis.\n",
    "* dataframe_to_civis(df, database, table[, ...])\tUpload a `pandas.DataFrame` into a Civis table.\n",
    "* read_civis(table, database[, columns, ...])\tRead data from a Civis table.\n",
    "* read_civis_sql(sql, database[, use_pandas, ...])\tRead data from Civis using a custom SQL string."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='ml'></a>\n",
    "# 3. Machine Learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we will walk through how to build a model using [CivisML](https://www.civisanalytics.com/blog/civisml-scikit-learn-at-scale/), a Civis Platform feature with a high-level interface in the Civis API client.\n",
    "\n",
    "You can use CivisML to leverage Civis Platform's infrastructure to do predictive modeling. CivisML is built on `scikit-learn`, so you have lots of flexibility to define your own modeling algorithms. Check out the [official documentation](http://civis-python.readthedocs.io/en/latest/ml.html) for more information, or read the [example](https://www.civisanalytics.com/blog/models-intro-solving-sticky-data-science-problems-quickly-civisml/) on our blog."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='ml-train'></a>\n",
    "## 3.1 Training your model\n",
    "\n",
    "To use CivisML, start by constructing a `civis.ml.ModelPipeline` object. The `ModelPipeline` defines the algorithm you want to use, as well as the name of the dependent variable. You can then call the `train` and `predict` methods to learn from your data or to make new predictions.\n",
    "\n",
    "Let's use the API client to help us predict which customers are most likely to upgrade to a premium service, using the demo \"Brandable\" dataset. We can quickly start three different models training by looping over the parameters we want for each.\n",
    "\n",
    "For this example, we're using Civis's [pre-defined](https://civis-python.readthedocs.io/en/v1.6.0/ml.html#pre-defined-models) algorithms, but if those don't fit your problem, you can create your own algorithms to use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Define the algorithms and model parameters to use\n",
    "MODELS = ['sparse_logistic', 'random_forest_classifier', 'extra_trees_classifier']\n",
    "DV = 'upgrade'  # Column name in the training table\n",
    "PKEY = 'brandable_user_id'  # Column name in the training table\n",
    "EXCLUDE = ['residential_zip']  # Don't train on these columns, if present\n",
    "training_table = 'brandable_upgrades.brandable_training_data'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create the training set by joining the Brandable upgrade labels to the customer data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'succeeded'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sql = f\"\"\"DROP TABLE IF EXISTS {training_table};\n",
    "CREATE TABLE {training_table} AS \n",
    "(SELECT u.*, p.upgrade FROM brandable_customers.brandable_all_users u \n",
    "JOIN brandable_customers.brandable_pilot p \n",
    "ON p.brandable_user_id = u.brandable_user_id)\"\"\"\n",
    "civis.io.query_civis(database=DATABASE, sql=sql).result().state"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Created custom script 7072357.\n",
      "Started training the \"\"sparse_logistic\" model for upgrade\" model.\n",
      "Created custom script 7072358.\n",
      "Started training the \"\"random_forest_classifier\" model for upgrade\" model.\n",
      "Created custom script 7072360.\n",
      "Started training the \"\"extra_trees_classifier\" model for upgrade\" model.\n"
     ]
    }
   ],
   "source": [
    "from civis.ml import ModelPipeline  \n",
    "\n",
    "models = {}\n",
    "for m in MODELS:\n",
    "    name = f'\"{m}\" model for {DV}'\n",
    "    model = ModelPipeline(model=m,\n",
    "                          dependent_variable=DV,\n",
    "                          primary_key=PKEY,\n",
    "                          excluded_columns=EXCLUDE,\n",
    "                          model_name=name)\n",
    "\n",
    "    train = model.train(table_name=training_table, database_name=DATABASE)\n",
    "    models[train] = model\n",
    "    print(f'Started training the \"{name}\" model.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CivisML automatically evaluates the predictive performance of each model using several standard metrics. Now that we've started some models training, we'll check the area under the ROC curve of each model as it finishes training. Once all of the models finish training, we'll pull out the best of them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model# 7072358 on DV \"upgrade\" (\"\"random_forest_classifier\" model for upgrade\") has a ROC AUC of 0.809.\n",
      "Model# 7072357 on DV \"upgrade\" (\"\"sparse_logistic\" model for upgrade\") has a ROC AUC of 0.846.\n",
      "Model# 7072360 on DV \"upgrade\" (\"\"extra_trees_classifier\" model for upgrade\") has a ROC AUC of 0.784.\n",
      "The \"\"sparse_logistic\" model for upgrade\" model has the best ROC AUC.\n"
     ]
    }
   ],
   "source": [
    "from concurrent.futures import as_completed\n",
    "aucs = {}\n",
    "for train in as_completed(models):\n",
    "    if train.succeeded():\n",
    "        print(f\"Model# {train.train_job_id} on DV \"\n",
    "              f\"\\\"{train.metadata['data']['target_columns'][0]}\\\" \"\n",
    "              f'(\"{models[train].model_name}\") '\n",
    "              f\"has a ROC AUC of {round(train.metrics['roc_auc'], 3)}.\")\n",
    "        aucs[train.metrics['roc_auc']] = train\n",
    "best_model = models[aucs[max(aucs)]]\n",
    "print(f\"The \\\"{best_model.model_name}\\\" model has the best ROC AUC.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='ml-predict'></a>\n",
    "## 3.2 Making predictions\n",
    "\n",
    "Once you've trained a model, you can use it to make predictions. CivisML will automatically parallelize predictions when you have a large dataset, so no matter how big the dataset, you won't need to wait too long. Let's use the best model we found from the previous step to make predictions about which users are most likely to upgrade in the future."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Created custom script 7072405.\n"
     ]
    }
   ],
   "source": [
    "score_table = 'scratch.my_scores_table'\n",
    "predict = best_model.predict(table_name='brandable_customers.brandable_all_users', \n",
    "                             database_name=DATABASE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you wanted to store the predictions in a Redshift table, you could have provided an `output_table` parameter. Since this is a relatively small dataset, it's faster to skip the table write and pull down the predictions directly. Let's find the 5% of users who are most likely to upgrade."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>upgrade_1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>brandable_user_id</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000093b8981b93a</th>\n",
       "      <td>0.277570</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00056ee5e2b4e58</th>\n",
       "      <td>0.130607</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0006ec438f8bc4f</th>\n",
       "      <td>0.279233</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000951546d5fa58</th>\n",
       "      <td>0.155526</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000cb76061daad7</th>\n",
       "      <td>0.283429</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   upgrade_1\n",
       "brandable_user_id           \n",
       "000093b8981b93a     0.277570\n",
       "00056ee5e2b4e58     0.130607\n",
       "0006ec438f8bc4f     0.279233\n",
       "000951546d5fa58     0.155526\n",
       "000cb76061daad7     0.283429"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predict.table.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The most likely 4746 of 94920 users to upgrade have scores ranging from 0.8303479210953384 to 1.0.\n"
     ]
    }
   ],
   "source": [
    "n_users = len(predict.table)\n",
    "most_likely = (predict.table\n",
    "               .sort_values(by=\"upgrade_1\", ascending=False))[:int(0.05 * n_users)]\n",
    "print(f'The most likely {len(most_likely)} of {len(predict.table)} users to upgrade '\n",
    "      f'have scores ranging from {most_likely.iloc[-1, 0]} to {most_likely.iloc[0, 0]}.')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='raw-api'></a>\n",
    "# 4. Direct API Access"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can inspect the `client` object and read documentation about individual functions just as you would with any other Python code. For example, you can tab-complete after typing \"`client.`\" to get a list of API \"endpoints\", and further tab-complete from \"`client.users.`\" to find a list of API calls related to users. Here's the way you can ask Civis Platform who it thinks you are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "client.users.list_me?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'created_at': '2015-05-07T13:27:36.000Z',\n",
       " 'custom_branding': None,\n",
       " 'email': 'civistestuser3@gmail.com',\n",
       " 'feature_flags': {'civis_explore_insights': True,\n",
       "  'cmo_multitarget': True,\n",
       "  'container_scripts': True,\n",
       "  'notebook_api': True,\n",
       "  'notebook_r_kernel': True,\n",
       "  'notebook_ui': True,\n",
       "  'paro_frontend': True,\n",
       "  'paro_modeling_wizard': True,\n",
       "  'python_3_scripts': True,\n",
       "  'r_scripts': True,\n",
       "  'report_templates': True,\n",
       "  'script_params': True,\n",
       "  'table_create_statement': True,\n",
       "  'table_person_matching': True},\n",
       " 'groups': [{'id': 10, 'name': 'Demo', 'organization_id': 13},\n",
       "  {'id': 365, 'name': 'Credentials Test', 'organization_id': 2}],\n",
       " 'id': 923,\n",
       " 'initials': 'JS',\n",
       " 'last_checked_announcements': '2017-08-23T21:48:32.000Z',\n",
       " 'name': 'Jane Smith',\n",
       " 'organization_name': 'demo',\n",
       " 'preferences': {'civis_explore_skip_intro': False,\n",
       "  'data_pane_collapsed': 'false',\n",
       "  'data_pane_width': '235',\n",
       "  'enhancement_index_author_filter': '1491',\n",
       "  'enhancement_index_order_dir': 'desc',\n",
       "  'enhancement_index_order_field': 'created_at',\n",
       "  'export_index_author_filter': '1441',\n",
       "  'export_index_order_dir': 'asc',\n",
       "  'export_index_order_field': 'created_at',\n",
       "  'export_index_status_filter': 'succeeded',\n",
       "  'import_index_author_filter': '923',\n",
       "  'import_index_order_dir': 'desc',\n",
       "  'import_index_order_field': 'created_at',\n",
       "  'import_index_type_filter': 'GdocImport',\n",
       "  'model_index_order_dir': 'desc',\n",
       "  'model_index_order_field': 'updated_at',\n",
       "  'model_index_thumbnail_view': 'false',\n",
       "  'notebook_order_dir': 'desc',\n",
       "  'notebook_order_field': 'created_at',\n",
       "  'preferred_server_id': 107,\n",
       "  'project_detail_order_dir': 'asc',\n",
       "  'project_detail_order_field': 'name',\n",
       "  'project_index_order_dir': 'asc',\n",
       "  'project_index_order_field': 'name',\n",
       "  'report_index_thumbnail_view': 'true',\n",
       "  'result_index_order_dir': 'desc',\n",
       "  'result_index_order_field': 'created_at',\n",
       "  'script_index_order_dir': 'desc',\n",
       "  'script_index_order_field': 'last_run.updated_at',\n",
       "  'upgrade_requested': '2017-02-22T21:32:57.649Z',\n",
       "  'welcome_order_dir': 'desc',\n",
       "  'welcome_order_field': 'created_at',\n",
       "  'welcome_status_filter': 'failed,running,scheduled,succeeded'},\n",
       " 'roles': ['cua', 'sdm'],\n",
       " 'sign_in_count': 37,\n",
       " 'username': 'jsmith'}"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.users.list_me()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='tables'></a>\n",
    "## 4.1 Tables\n",
    "\n",
    "Next, let's list the tables available in a single schema. The Civis Data Science API often uses unique IDs instead of names, and the `APIClient` gives you convenience functions to look up those IDs if you know the name. In this case, we need to know the database ID of our database, rather than the name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cta_count\n",
      "cta_count_test\n",
      "cta_ridership_daily\n",
      "cta_ridership_daily_pasttwoyears\n"
     ]
    }
   ],
   "source": [
    "db_id = client.get_database_id(DATABASE)\n",
    "my_tables = client.tables.list(database_id=db_id, schema='public')\n",
    "\n",
    "# Print all tables in the schema\n",
    "for tt in my_tables:\n",
    "    if tt['name'].startswith('cta'):\n",
    "        print(tt['name'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's use the API to look up some information about the CTA daily ridership table. `my_tables` is a list of API responses. Because searching through lists like this is common, the Civis Python API client provides helper functions (`civis.find` and `civis.find_one`) which will locate the entry or entries you're interested in. Let's find the ID of the \"cta_ridership_daily\" table and use that to look up the names and types of each of the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'station_id': 'integer', 'stationname': 'character varying(1024)', 'date': 'date', 'daytype': 'character varying(1024)', 'rides': 'integer'}\n"
     ]
    }
   ],
   "source": [
    "cta_table = civis.find_one(my_tables, name='cta_ridership_daily')\n",
    "tb_info = client.tables.get(cta_table.id)\n",
    "col_types = {c.name: c.sql_type for c in tb_info.columns}\n",
    "print(col_types)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='pagination'></a>\n",
    "## 4.2 Paginated responses\n",
    "\n",
    "Some endpoints may contain a lot of data which Civis Platform will only serve over multiple requests.  For example, `client.tables.list()` will only return information on a maximum of up to 1000 tables in a single call (the default is 50).  Therefore, if we need to collect data on 4000 different tables, we'll need to make at least 4 seperate requests to get all of the data. (Use the `page_num` argument to select additional \"pages\" of data.) To make this easier, the client includes a special `iterator` parameter on endpoints which may require making multiple requests to get all of the data.  These requests could require making a large number of API calls, so use `iterator=True` sparingly!\n",
    "\n",
    "Let's pretend that the \"public\" schema has more tables than we want to list at once and iterate through it to find all of the tables with 5 columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tables with five columns: ['cta_ridership_daily', 'cta_ridership_daily_pasttwoyears', 'iris', 'testimport', 'upgrade_likelihood'].\n"
     ]
    }
   ],
   "source": [
    "# Traditional method for listing tables \n",
    "# (set to list a max of 3 different tables)\n",
    "# This returns multiple tables at the same time.\n",
    "# Increase the \"page_num\" to see more tables.\n",
    "my_three_tables = client.tables.list(database_id=db_id, schema='public',\n",
    "                                     limit=3, page_num=1)\n",
    "\n",
    "# Iterating request (will return all available tables, may take some time to run)\n",
    "# When iterator is set to True, the function yields a single table at a time.\n",
    "tb_iter = client.tables.list(database_id=db_id, schema='public', iterator=True)\n",
    "five_col_tbs = [t for t in tb_iter if t['column_count'] == 5]\n",
    "print(f\"Tables with five columns: {[t['name'] for t in five_col_tbs]}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='api-response'></a>\n",
    "## 4.3 The API Response\n",
    "\n",
    "Every time you communicate with the Civis Data Science API, you get a response. In fact, it's a `civis.response.Response` object. Its contents be accessed either like a dictionary or as normal attributes. The `Response` always comes back immediately, even if it's to acknowledge that you've started something that will take a long time to finish. It will contain either the information you've asked for or an acknowledgement of the action you took. Here's an example of the `Response` when we ask for the status of the best model we built in section 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'container_id': 7072357,\n",
       " 'error': None,\n",
       " 'finished_at': '2017-08-29T14:44:11.000Z',\n",
       " 'id': 58250804,\n",
       " 'is_cancel_requested': False,\n",
       " 'started_at': '2017-08-29T14:42:45.000Z',\n",
       " 'state': 'succeeded'}"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.scripts.get_containers_runs(best_model.train_result_.job_id, \n",
    "                                   best_model.train_result_.run_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='something-new'></a>\n",
    "## 5. Build something new\n",
    "\n",
    "The most flexible way to interact with Civis Platform is by writing your own code and using Civis Platform to run it. For example, you could imagine wanting to write a program that counts from 1 to 100 and replaces every number that's evenly divisible by 3 with \"fizz\", any number divisible by 5 with \"buzz\", and numbers divisible by both 3 and 5 with \"fizzbuzz\". There's no Data Science API function that implements FizzBuzz, so you would need to write that yourself, but you can use Civis Platform to schedule it, share it, and run it in the cloud while you free up your laptop for other purposes. Container Scripts are our general-purpose solution for taking any code and running it in Civis Platform.\n",
    "\n",
    "Container Scripts become really powerful when you pair the flexibility of bring-your-own-code with the power of the Data Science API. One of our favorite design patterns is writing code that calls the Data Science API as part of a more customized workflow. For example, we might use the Data Science API to pull a table into a pandas dataframe, write special-purpose pandas code for manipulating the dataframe, use the Data Science API again to build a model, write more code to analyze the results of the model, and finally publish those analysis results as a report in Civis Platform. The most sophisticated data science code we write is delivered and shared via Container Scripts because of how easy it is to write software in Python or R (or, really, any language) calling API functions for accessing Civis Platform."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='container-scripts'></a>\n",
    "## 5.1 Creating and running Container Scripts\n",
    "\n",
    "Let's take our earlier of example of checking the status of the Chicago bike share system and package it into a script which we can schedule to run regularly. Here we're writing our task as a function and using `cloudpickle`, an open-source Python library which can pickle dynamically-defined functions, to send it to Civis Platform. You could also write this code as a text file and run it as a script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Uploaded Divvy function to file 6008167.\n"
     ]
    }
   ],
   "source": [
    "import cloudpickle\n",
    "import io\n",
    "import json\n",
    "import os\n",
    "import requests\n",
    "\n",
    "def get_bike_status(api_url=divvy_api):\n",
    "    bikes = requests.get(api_url).json()\n",
    "    \n",
    "    buf = io.TextIOWrapper(io.BytesIO())  # `json` writes text\n",
    "    json.dump(bikes, buf)\n",
    "    buf.seek(0)\n",
    "    bike_file_id = civis.io.file_to_civis(buf.buffer, 'Divvy status')\n",
    "    print(f\"Stored Divvy station data at {bike_file_id}.\")\n",
    "\n",
    "    client = civis.APIClient()\n",
    "    job_id = os.environ[\"CIVIS_JOB_ID\"]\n",
    "    run_id = os.environ[\"CIVIS_RUN_ID\"]\n",
    "    client.scripts.post_containers_runs_outputs(job_id, run_id, \"File\", bike_file_id)\n",
    "\n",
    "code_file_id = civis.io.file_to_civis(\n",
    "    io.BytesIO(cloudpickle.dumps(get_bike_status)), 'Divvy script')\n",
    "print(f\"Uploaded Divvy function to file {code_file_id}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we've uploaded the function, we tell Civis Platform to run it. A Container Script consists of an environment (the \"Container\", which is a Docker container) and a bash command to run inside that container. Civis provides some general-purpose Docker images, or you can use any public Docker image. Here we're using \"[datascience-python](https://github.com/civisanalytics/datascience-python)\". Note that we're using a specific image tag, rather than the default \"latest\". It's a good practice to set an image tag. The \"latest\" tag will change with new releases, and that could unexpectedly cause a job which used to work to start failing.\n",
    "\n",
    "In this example, I'm storing my code in a Civis Platform file, but Container Scripts can also access code which you've stored in GitHub. A file is great for small, quick examples like this, but GitHub is a better way to handle larger or production code. Version control is your friend!\n",
    "\n",
    "Like many operations with the Civis Data Science API, running a Container Script is two steps -- first, you create the job (with `client.scripts.post_containers`). Second, you tell Civis Platform to start running the job. You can use the `client.scripts.post_containers_runs` to start a run (this will return a `Response`), or you can use the convenience function `civis.utils.run_job` to start a run. If you use `civis.utils.run_job`, you'll get back a `CivisFuture`, which is a convenient way to track when your run has finished."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DoneAndNotDoneFutures(done={<CivisFuture at 0x7f104051aba8 state=succeeded returned Response>}, not_done=set())"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from concurrent.futures import wait\n",
    "\n",
    "cmd = f\"\"\"civis files download {code_file_id} myscript.pkl; \n",
    "python -c \"import cloudpickle; cloudpickle.load(open(\\\\\\\"myscript.pkl\\\\\\\", \\\\\\\"rb\\\\\\\"))()\" \"\"\"\n",
    "container_job = client.scripts.post_containers(\n",
    "    required_resources = {\"cpu\": 256, \"memory\": 512, \"diskSpace\": 2},\n",
    "    name=\"Divvy download script\",\n",
    "    docker_command = cmd,\n",
    "    docker_image_name = \"civisanalytics/datascience-python\", \n",
    "    docker_image_tag = \"3.1.0\")\n",
    "run = civis.utils.run_job(container_job.id)\n",
    "wait([run])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've stored the bike station data as a JSON in Civis Platform, and set a \"run output\" on the script which read the data. Run outputs are a way for you to transfer data from one job to another. You can inspect this job to find its run outputs, and use the file ID you find there to retrieve the data about the Chicago bike sharing network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bike data are stored at file# 6008168.\n"
     ]
    }
   ],
   "source": [
    "remote_output_file_id = client.scripts.list_containers_runs_outputs(\n",
    "    container_job.id, run.poller_args[1])[0].object_id\n",
    "print(f\"Bike data are stored at file# {remote_output_file_id}.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'altitude': '',\n",
      " 'availableBikes': 19,\n",
      " 'availableDocks': 8,\n",
      " 'city': 'Chicago',\n",
      " 'id': 287,\n",
      " 'is_renting': True,\n",
      " 'landMark': '057',\n",
      " 'lastCommunicationTime': '2017-08-29 09:44:05',\n",
      " 'latitude': 41.880317,\n",
      " 'location': '',\n",
      " 'longitude': -87.635185,\n",
      " 'postalCode': '60606',\n",
      " 'stAddress1': 'Franklin St & Monroe St',\n",
      " 'stAddress2': '',\n",
      " 'stationName': 'Franklin St & Monroe St',\n",
      " 'status': 'IN_SERVICE',\n",
      " 'statusKey': 1,\n",
      " 'statusValue': 'In Service',\n",
      " 'testStation': False,\n",
      " 'totalDocks': 27}\n"
     ]
    }
   ],
   "source": [
    "import pprint\n",
    "\n",
    "station_data = civis.io.file_to_json(remote_output_file_id)\n",
    "for station in station_data['stationBeanList']:\n",
    "    if station['stationName'] == 'Franklin St & Monroe St':\n",
    "        pprint.pprint(station)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='custom-scripts'></a>\n",
    "## 5.2 Custom Scripts\n",
    "\n",
    "Remember that prediction we made about which customers are likely to upgrade? We didn't store it in a table at the time. What if we change our minds? We could download it in this notebook and then use `civis.io.dataframe_to_civis` to make a new table. (Most of the time this will be the right thing to do.) However, we could also use the \"Import from URL\" Template to create a Custom Script which will do that for us.\n",
    "\n",
    "If you (or one of your colleagues) has created an especially useful Container Script which you'll want to run over and over, you can turn it into a Template. Once you have access to a templated script (Civis provides a few that we've found useful), you can run it for yourself by creating a \"Custom Script\". The Custom Script lets you modify a few parameters and then run the code that your colleague wrote.\n",
    "\n",
    "If you know the template ID of a template script, you can use `client.scripts.post_custom` to create a new job. As with the Container Script, we'll use `civis.utils.run_job` to start the run so that we get back a `CivisFuture`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "prediction_tablename = 'scratch.brandable_predictions'\n",
    "template_id = civis.find_one(client.templates.list_scripts(limit=1000), \n",
    "                             name='Import from URL').id\n",
    "url = client.files.get(predict.metadata['output_file_ids'][0])['file_url']\n",
    "upgrade_prediction_import = client.scripts.post_custom(\n",
    "    from_template_id=template_id,\n",
    "    name=\"Import Brandable Predictions\",\n",
    "    arguments={'URL': url,\n",
    "               'TABLE_NAME': prediction_tablename,\n",
    "               'IF_EXISTS': 'drop',\n",
    "               'DATABASE_NAME': DATABASE})\n",
    "import_fut = civis.utils.run_job(upgrade_prediction_import.id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'created_at': '2017-08-29T14:46:30.000Z',\n",
       " 'error': None,\n",
       " 'finished_at': '2017-08-29T14:47:02.000Z',\n",
       " 'id': 58251036,\n",
       " 'started_at': '2017-08-29T14:46:31.000Z',\n",
       " 'state': 'succeeded'}"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import_fut.result()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's keep the database tidy and delete this table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<CivisFuture at 0x7f10404a4c88 state=running>"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "civis.io.query_civis(database=DATABASE, sql=f\"DROP TABLE IF EXISTS {prediction_tablename}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='ds-api'></a>\n",
    "# 6. A Data Science API\n",
    "\n",
    "This has been a whirlwind tour of the Civis Data Science API. Civis Platform has a lot more features than what we've covered here, such as sharing, enhancements, reports, and more. This tour gives you what you need to get started. Use the [API client documentation](https://civis-python.readthedocs.io) or the [API documentation](https://api.civisanalytics.com) to get a complete picture of everything the API can do, and contact support@civisanalytics.com if you run into trouble. The Civis Data Science API is a powerful toolbox that you can use to build, scale, and deploy your data science workflows!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Appendix\n",
    "\n",
    "These sections will give you extra context on what's going on behind the scenes with the Civis API client."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='what-is-an-api-client'></a>\n",
    "## A.1 What is an API client?\n",
    "\n",
    "API: Application Programming Interface\n",
    "* A set of tools for accessing Civis Platform functionality. An API is an official way for two pieces of code to talk to each other\n",
    "* Civis Platform itself works by issuing API calls, which are based on HTTP\n",
    "* But HTTP calls are unwieldy, so the API clients provide create a more streamlined way of making these requests\n",
    "* The API clients can be run interactively or in a script\n",
    "\n",
    "There are Civis API clients in Python and R.\n",
    "\n",
    "Everything you can do with an API client is supported by a Civis Data Science API Endpoint. You can find complete documentation on these endpoints here: https://api.civisanalytics.com\n",
    "\n",
    "####  RESTful API conventions\n",
    "\n",
    "The Civis Data Science API is \"RESTful\". That means it adheres to a set of conventions about the components of the API and their relationships. The world wide web uses REST conventions.\n",
    "\n",
    "The API understands some basic HTTP \"verbs\":\n",
    "* GET →  Retrieve information on objects or members [get, list]\n",
    "* POST → Create a new item or entry in an item [create]\n",
    "* PUT → Replace something [update]\n",
    "* DELETE → Delete [delete]\n",
    "\n",
    "\n",
    "####  HTTP Status Codes\n",
    "When you send a request to an API, it will give you a status code. Common codes include:\n",
    "\n",
    "\n",
    "* 100-level codes: Informational\n",
    "* 200-level codes: Success\n",
    "    * 200 OK\n",
    "* 300-level codes: Redirection\n",
    "* 400-level codes: Client error\n",
    "    * 400 Bad request\n",
    "    * 401 Unauthorized (authentication failed) \n",
    "    * 403 Forbidden (similar to 401)\n",
    "    * 404 File not found\n",
    "    * 408 Request timeout\n",
    "    * 409 Conflict in the request, such as an edit conflict\n",
    "    * 429 Too many requests: You need to wait before you can use the API again\n",
    "* 500-level codes: Server error\n",
    "    * 500 Internal server error\n",
    "\n",
    "For example, you might see this error if you try to call a list endpoint with `page_num=0`:\n",
    "\n",
    "```\n",
    "CivisAPIError: (400) invalid 'page_num' -1 - must be an integer greater than zero\n",
    "```\n",
    "\n",
    "The API client has translated the API's reply into a Python exception. The `Response` object for that error is:\n",
    "\n",
    "```\n",
    "{'code': 400, \n",
    " 'error': 'invalid', \n",
    " 'errorDescription': \"invalid 'page_num' 0 - must be an integer greater than zero\"}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='retries'></a>\n",
    "## A.2 Rate limits and retries\n",
    "\n",
    "If you query the Civis Data Science API too frequently, Civis Platform may return a 429 error response, indicating that you need to wait a while before you can make another request. The Python API client will automatically wait and resend your request when your rate limit refreshes, so there's nothing for you to do. Be aware that too many requests too fast will make your code wait for a while. \n",
    "\n",
    "Currently, the rate limit is 1000 requests per 5 minutes. You can check your rate limit by looking at the `Response.headers['X-RateLimit-Limit']` on any `Response` object that you get back. You can check out the `Response.calls_remaining` if you're curious how many API calls you have left before you get a time out.\n",
    "\n",
    "The Python API client will automatically retry on certain 500-level errors as well. That will give your code extra reliability when using the API client over the raw API. The full list of HTTP status codes which the client will retry are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[429, 502, 503, 504]"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "civis.civis.RETRY_CODES"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='local-machine'></a>\n",
    "## A.3 Using the Python API client outside of Civis\n",
    "\n",
    "You can also install the Civis Python API client on your own computer. The API client is available on PyPI; install it with\n",
    "\n",
    "```\n",
    "pip install civis\n",
    "```\n",
    "\n",
    "Once you have the API client installed, you can create the `APIClient` object:\n",
    "\n",
    "```\n",
    "import civis\n",
    "client = civis.APIClient()\n",
    "```\n",
    "\n",
    "### Setting up an API Key\n",
    "To make requests to the Civis Data Science API, you will need an API key that is unique to you. To make an API key, navigate to your [platform profile page](https://platform.civisanalytics.com/users/profile) and create a new key using the panel on the lower right hand side. Set your key to expire after 30 days, which is the maximum currently available.\n",
    "\n",
    "By default, the Python API client will look for your key in the environment variable CIVIS_API_KEY. To add the API key to your environment, copy the key you generated to your clipboard and follow the instructions below. \n",
    "\n",
    "Keep your API keys safe! Don’t check the key into GitHub or share in plaintext. If you do, immediately cancel the API key from the user profile page and contact support@civisanalytics.com .\n",
    "\n",
    "You can add your API keys to your shell configuration file (`~/.zshrc` MacOS or `~/.bashrc` Linux by default) using a text editor of your choice. An example is included below in emacs:\n",
    "\n",
    "```\n",
    "emacs <path-to-your-shell-config-file>\n",
    "```\n",
    "\n",
    "Make a new line and enter this:\n",
    "\n",
    "```\n",
    "export CIVIS_API_KEY='yourkeyhere'\n",
    "```\n",
    "\n",
    "To save, type: Control-x Control-s. \n",
    "\n",
    "Then run:\n",
    "\n",
    "```\n",
    "source <path-to-your-shell-config-file>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='next-steps'></a>\n",
    "## A.4 Where can I go from here?\n",
    "\n",
    "If you want to learn even more about the Python API client, you can find all of the code on our [GitHub page](https://github.com/civisanalytics/civis-python/). File feature requests or bug reports either as GitHub [issues](https://github.com/civisanalytics/civis-python/issues) or with your Client Success representative at support@civisanalytics.com .\n",
    "\n",
    "For a deeper dive into using CivisML through the Python API client, check out our [examples](https://github.com/civisanalytics/civis-python/tree/master/examples)!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  },
  "widgets": {
   "state": {},
   "version": "1.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}