{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lab 1 6,000xp\n",
    "\n",
    "\n",
    "## Part 1 - Fun with Crickets\n",
    "\n",
    "<img src=\"http://bugwoodcloud.org/images/384x256/5387688.jpg\" width=\"400\" />\n",
    "\n",
    "A long, long time ago, in the summer of 1898, Ernst Athearn Bessey and his brother Carl decided to observe tree crickets in Lincoln, Nebraska. (Ernst was 21 at the time and just finished his MA degree) They were interested in the relationship between the speed of cricket chirps and outdoor temperature. The thing that made this a bit easier was that they found \"that each cricket remained in the same tree for days at a time.\"\n",
    "\n",
    "\n",
    "\n",
    "<img src=\"http://www.cybertruffle.org.uk/pics/0001727a.jpg\" width= \"300\" />\n",
    "\n",
    "\n",
    "We are going to examine the data they collected.\n",
    "\n",
    "First let's import the numpy library\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we will read the data file from the web and place it in an np.array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "## read in the cricket file\n",
    "cricket_data = np.genfromtxt('http://zacharski.org/files/crickets.csv', delimiter=',')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 20.          88.59999847]\n",
      " [ 16.          71.59999847]\n",
      " [ 19.79999924  93.30000305]\n",
      " [ 18.39999962  84.30000305]\n",
      " [ 17.10000038  80.59999847]\n",
      " [ 15.5         75.19999695]\n",
      " [ 14.69999981  69.69999695]\n",
      " [ 17.10000038  82.        ]\n",
      " [ 15.39999962  69.40000153]\n",
      " [ 16.20000076  83.30000305]\n",
      " [ 15.          79.59999847]\n",
      " [ 17.20000076  82.59999847]\n",
      " [ 16.          80.59999847]\n",
      " [ 17.          83.50000125]\n",
      " [ 14.39999962  76.30000305]\n",
      " [ 17.1         81.50055115]\n",
      " [ 13.1         68.36003334]\n",
      " [ 13.65        70.        ]\n",
      " [ 14.2         72.00011123]\n",
      " [ 18.67        86.67711113]\n",
      " [ 19.1226667   88.21000242]\n",
      " [ 19.01        84.05999912]\n",
      " [ 17.61        83.18999999]\n",
      " [ 16.125       77.30100011]\n",
      " [ 13.          67.26666667]\n",
      " [ 12.          62.72740003]\n",
      " [ 12.49        66.33333333]\n",
      " [ 14.111       77.81221677]\n",
      " [ 17.6         85.        ]]\n"
     ]
    }
   ],
   "source": [
    "print(cricket_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That first column is the number of chirps and the second column is the temperature.\n",
    "\n",
    "\n",
    "<h2 style=\"color:red\">Task 1: Descriptive Statistics: 800xp</h2>\n",
    "What are the mean and median for both the cricket chirps and the temperature? I would like your report to look like\n",
    "\n",
    "    Cricket chirps\n",
    "        mean: 16.1237471341\n",
    "      median: 16.125\n",
    "\n",
    "    Temperature\n",
    "        mean: 78.3116698272\n",
    "      median: 80.59999847"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2 style=\"color:red\">Task 2: Correlation: 800xp</h2>\n",
    "\n",
    "Using `corrcoef` I want to know whether there is a correlation between the number of chirps and the temperature. Can you compute the correlation?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2 style=\"color:red\">Cricket Hacker Challenge Part 1: 1000xp</h2>\n",
    "\n",
    "A year before the Bessey brothers counted cricket chirps, Amos Dolbear (shown below)did the same. Dolbear was a physics professor at Tufts University and invented the telephone some 11 years before Bell.\n",
    "\n",
    "<img src=\"https://upload.wikimedia.org/wikipedia/commons/9/94/Charles_Edwin_Bessey.jpg\" width=\"300\"/>\n",
    "\n",
    "\n",
    "When Dolbear was 60 he came up with what is now known as Dolbear's Law can predict temeperature in Fahrenheight from cricket chirps per minute ($N_{60}$):\n",
    "        \n",
    "\n",
    "### $$T_F = 50 + \\frac{N_{60} - 40}{4}$$\n",
    "\n",
    "Can you create a one-column numpy array with the predicted values of Fahrenheit based on the Cricket Chirp column of cricket_data?  As a hint, if you have a numpy array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "old = np.array([10, 20, 30, 40, 50])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and if my formula were \n",
    "\n",
    "$$new = old + 100$$\n",
    "\n",
    "I can create the new array using:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "new = old + 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([110, 120, 130, 140, 150])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Your predicted code here:\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2 style=\"color:red\">Cricket Hacker Challenge Part 2: 1000xp</h2>\n",
    "    \n",
    "This is going to require some thinking.  That's why it is called a hacker challenge.\n",
    "\n",
    "We now have the predicted values based on Dolbear's Law and we have the actual values. I would like to know the average error. Here is what I mean by that. \n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "Suppose I think I am pretty good at guessing people's weight. Here are my estimates for 2 people and their actual weights (which I got after I guessed):\n",
    "\n",
    "My Guess |  Actual Weight |\n",
    ":--: | :--:\n",
    "200 |  150 \n",
    "125 |  175 \n",
    "\n",
    "\n",
    "Now I would like to know on average how much I was off. (so on average I was off x pounds per guess). If you can do this example with pencil and paper you are well on your way to cricket success.\n",
    "\n",
    "numpy's absolute value function may be helpful here:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([10, 10, 15, 20,  5])"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = np.array([10, -10, -15, 20, 5])\n",
    "np.abs(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Do your work here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 2: Smoking and Birth Weight\n",
    "\n",
    "First I want to expand on what was covered in the Datacamp course. Suppose I have a numpy array that represents this table:\n",
    "\n",
    "\n",
    "Gender | Weight\n",
    " :--: | :--: |\n",
    " Female | 93\n",
    " Female | 113.8\n",
    " Female | 137\n",
    " Male | 183.8\n",
    " Female | 110\n",
    " Male | 152\n",
    " \n",
    " So I create a numpy array:\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[['Female' 'Female' 'Female' 'Male' 'Female' 'Male']\n",
      " ['93' '138' '137' '183.8' '110' '152']]\n"
     ]
    }
   ],
   "source": [
    "students = np.array([['Female', 'Female', 'Female', 'Male', 'Female', 'Male',], [93, 138, 137, 183.8, 110, 152]])\n",
    "print(students)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can extract just the weights of the women by:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['93' '1138' '137' '110']\n"
     ]
    }
   ],
   "source": [
    "females = students[1][students[0] == 'Female']\n",
    "print(females)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now on to our data...\n",
    "\n",
    "## Birth Rate Data\n",
    "\n",
    "The birth rate data looks like this:\n",
    "\n",
    "ID | LOW | AGE | LWT | RACE | SMOKE | PTL | HT  | UI | FTV | BWT\n",
    ":--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: \n",
    "85 | 0 | 19 | 182 | 2 | 0 | 0 | 0 | 1 | 0 | 2523\n",
    "86 | 0 | 33 | 155 | 3 | 0 | 0 | 0 | 0 | 3 | 2551\n",
    "87 | 0 | 20 | 105 | 1 | 1 | 0 | 0 | 0 | 1 | 2557\n",
    "88 | 0 | 21 | 108 | 1 | 1 | 0 | 0 | 1 | 2 | 2594\n",
    "89 | 0 | 18 | 107 | 1 | 1 | 0 | 0 | 1 | 0 | 2600\n",
    "91 | 0 | 21 | 124 | 3 | 0 | 0 | 0 | 0 | 0 | 2622\n",
    "\n",
    "\n",
    "First, let's load the data into numpy:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[  85   86   87 ...,   82   83   84]\n",
      " [   0    0    0 ...,    1    1    1]\n",
      " [  19   33   20 ...,   23   17   21]\n",
      " ..., \n",
      " [   1    0    0 ...,    0    0    0]\n",
      " [   0    3    1 ...,    0    0    3]\n",
      " [2523 2551 2557 ..., 2495 2495 2495]]\n"
     ]
    }
   ],
   "source": [
    "\n",
    "birth_data = np.transpose(np.genfromtxt('http://zacharski.org/files/courses/data101/birthweight.csv', delimiter=',', dtype=int))\n",
    "print(birth_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In printing the numpy array things look a little flipped. So the first column (the ID column) of the original data is represented by the first row of the numpy array. Oddly enough this is exactly how we want it. \n",
    "\n",
    "\n",
    "Here is what the columns mean. \n",
    "\n",
    "* **ID** the first column is simply the id number of the mother\n",
    "* **BWT** The last column is the birth weight in grams.\n",
    "* **LOW** A zero in this column indicates the birth weight is not considered low a one indicates it is\n",
    "* **AGE** The age of the mother\n",
    "* **LWT** \n",
    "* **RACE** 1: white, 2: black, 3: other\n",
    "* **SMOKE** whether or not the person smokes\n",
    "* **PTL** number of pre-mature labors\n",
    "* **HT** Hypertension\n",
    "* **UI** uterine irratability\n",
    "* **FTV**  GP visits\n",
    "\n",
    "\n",
    "<h2 style=\"color:red\">Task 3: Descriptive Statistics on the Entire Dataset: 1000xp</h2>\n",
    "I would like to know the minimum, average (mean), and maximum birth weights of the dataset in a format that looks like:\n",
    "\n",
    "    BIRTH WEIGHTS\n",
    "    minimum:   00 grams\n",
    "    mean:      00 grams\n",
    "    maximum:   00 grams\n",
    "    std dev    00 grams\n",
    "    \n",
    "There is a <span style=\"color:red\">200xp bonus</span>, if instead your print out looks like:\n",
    "\n",
    "\n",
    "    BIRTH WEIGHTS\n",
    "    minimum:   0 pounds 0 ounces\n",
    "    mean:      0 pounds 0 ounces\n",
    "    maximum:   0 pounds 0 ounces\n",
    "    std dev.   0 pounds 0 ounces\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2 style=\"color:red\">Task 4: Smokers vs Non-Smokers 1000xp</h2>\n",
    "\n",
    "Now I would like to see those statistics for smokers vs. non-smokers\n",
    "\n",
    "    SMOKERS BIRTH WEIGHTS\n",
    "    minimum:   00 grams\n",
    "    mean:      00 grams\n",
    "    maximum:   00 grams    \n",
    "    std dev    00 grams\n",
    "    \n",
    "    NONSMOKERS BIRTH WEIGHTS\n",
    "    minimum:   00 grams\n",
    "    mean:      00 grams\n",
    "    maximum:   00 grams   \n",
    "    std dev.   00 grams\n",
    "    \n",
    "What do you think? Are the differences significant enough for you to think there is a difference between smokers and nonsmokers? (we will learn more formal ways of defining significance later in the course)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Congratulations! \n",
    "You just finished a fairly hard unit on Numpy and looked at some real datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}