{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Modeling and Simulation in Python\n",
    "\n",
    "Chapter 5\n",
    "\n",
    "Copyright 2017 Allen Downey\n",
    "\n",
    "License: [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configure Jupyter so figures appear in the notebook\n",
    "%matplotlib inline\n",
    "\n",
    "# Configure Jupyter to display the assigned value after an assignment\n",
    "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n",
    "\n",
    "# import functions from the modsim.py module\n",
    "from modsim import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading data\n",
    "\n",
    "Pandas is a library that provides tools for reading and processing data.  `read_html` reads a web page from a file or the Internet and creates one `DataFrame` for each table on the page."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pandas import read_html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data directory contains a downloaded copy of https://en.wikipedia.org/wiki/World_population_estimates\n",
    "\n",
    "The arguments of `read_html` specify the file to read and how to interpret the tables in the file.  The result, `tables`, is a sequence of `DataFrame` objects; `len(tables)` reports the length of the sequence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = 'data/World_population_estimates.html'\n",
    "tables = read_html(filename, header=0, index_col=0, decimal='M')\n",
    "len(tables)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can select the `DataFrame` we want using the bracket operator.  The tables are numbered from 0, so `tables[2]` is actually the third table on the page.\n",
    "\n",
    "`head` selects the header and the first five rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "table2 = tables[2]\n",
    "table2.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`tail` selects the last five rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "table2.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Long column names are awkard to work with, but we can replace them with abbreviated names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "table2.columns = ['census', 'prb', 'un', 'maddison', \n",
    "                  'hyde', 'tanton', 'biraben', 'mj', \n",
    "                  'thomlinson', 'durand', 'clark']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what the DataFrame looks like now.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "table2.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first column, which is labeled `Year`, is special.  It is the **index** for this `DataFrame`, which means it contains the labels for the rows.\n",
    "\n",
    "Some of the values use scientific notation; for example, `2.544000e+09` is shorthand for $2.544 \\cdot 10^9$ or 2.544 billion.\n",
    "\n",
    "`NaN` is a special value that indicates missing data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Series\n",
    "\n",
    "We can use dot notation to select a column from a `DataFrame`.  The result is a `Series`, which is like a `DataFrame` with a single column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "census = table2.census\n",
    "census.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "census.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like a `DataFrame`, a `Series` contains an index, which labels the rows.\n",
    "\n",
    "`1e9` is scientific notation for $1 \\cdot 10^9$ or 1 billion."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From here on, we will work in units of billions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "un = table2.un / 1e9\n",
    "un.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "census = table2.census / 1e9\n",
    "census.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what these estimates look like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "plot(census, ':', label='US Census')\n",
    "plot(un, '--', label='UN DESA')\n",
    "    \n",
    "decorate(xlabel='Year',\n",
    "         ylabel='World population (billion)')\n",
    "\n",
    "savefig('figs/chap05-fig01.pdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following expression computes the elementwise differences between the two series, then divides through by the UN value to produce [relative errors](https://en.wikipedia.org/wiki/Approximation_error), then finds the largest element.\n",
    "\n",
    "So the largest relative error between the estimates is about 1.3%."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "max(abs(census - un) / un) * 100"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Break down that expression into smaller steps and display the intermediate results, to make sure you understand how it works.\n",
    "\n",
    "1.  Compute the elementwise differences, `census - un`\n",
    "2.  Compute the absolute differences, `abs(census - un)`\n",
    "3.  Compute the relative differences, `abs(census - un) / un`\n",
    "4.  Compute the percent differences, `abs(census - un) / un * 100`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`max` and `abs` are built-in functions provided by Python, but NumPy also provides version that are a little more general.  When you import `modsim`, you get the NumPy versions of these functions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Constant growth"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can select a value from a `Series` using bracket notation.  Here's the first element:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "census[1950]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the last value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "census[2016]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But rather than \"hard code\" those dates, we can get the first and last labels from the `Series`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "t_0 = get_first_label(census)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "t_end = get_last_label(census)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "elapsed_time = t_end - t_0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can get the first and last values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "p_0 = get_first_value(census)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "p_end = get_last_value(census)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we can compute the average annual growth in billions of people per year."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "total_growth = p_end - p_0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "annual_growth = total_growth / elapsed_time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TimeSeries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's create a `TimeSeries` to contain values generated by a linear growth model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = TimeSeries()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Initially the `TimeSeries` is empty, but we can initialize it so the starting value, in 1950, is the 1950 population estimated by the US Census."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "results[t_0] = census[t_0]\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After that, the population in the model grows by a constant amount each year."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "for t in linrange(t_0, t_end):\n",
    "    results[t+1] = results[t] + annual_growth"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what the results looks like, compared to the actual data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot(census, ':', label='US Census')\n",
    "plot(un, '--', label='UN DESA')\n",
    "plot(results, color='gray', label='model')\n",
    "\n",
    "decorate(xlabel='Year', \n",
    "         ylabel='World population (billion)',\n",
    "         title='Constant growth')\n",
    "\n",
    "savefig('figs/chap05-fig02.pdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model fits the data pretty well after 1990, but not so well before."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercises\n",
    "\n",
    "**Optional Exercise:**  Try fitting the model using data from 1970 to the present, and see if that does a better job.\n",
    "\n",
    "Hint: \n",
    "\n",
    "1. Copy the code from above and make a few changes.  Test your code after each small change.\n",
    "\n",
    "2. Make sure your `TimeSeries` starts in 1950, even though the estimated annual growth is based on later data.\n",
    "\n",
    "3. You might want to add a constant to the starting value to match the data better."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "census.loc[1960:1970]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}