{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyzing the effect of weather on policing\n", "> A Summary of lecture \"Analyzing Police Activity with pandas, via datacamp\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Data_Science]\n", "- image: images/tmin-boxplot.png" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring the weather dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting the temperature\n", "In this exercise, you'll examine the temperature columns from the weather dataset to assess whether the data seems trustworthy. First you'll print the summary statistics, and then you'll visualize the data using a box plot.\n", "\n", "When deciding whether the values seem reasonable, keep in mind that the temperature is measured in degrees Fahrenheit, not Celsius!" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " TMIN TAVG TMAX\n", "count 4017.000000 1217.000000 4017.000000\n", "mean 43.484441 52.493016 61.268608\n", "std 17.020298 17.830714 18.199517\n", "min -5.000000 6.000000 15.000000\n", "25% 30.000000 39.000000 47.000000\n", "50% 44.000000 54.000000 62.000000\n", "75% 58.000000 68.000000 77.000000\n", "max 77.000000 86.000000 102.000000\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAOo0lEQVR4nO3df4xlZX3H8fdHViqoCMJAEShD241KsVQ6oUaMNWIb0aZgisratKul2SZqS6utrLUWbf8oJETFSEw2QLsau4BoCwVqa4imIdZNB6QqbixbQFz5NSigqIkSv/3jHuplmZ3Zuefe+fHM+5VM7j3Pee6539mz93PPPPee56SqkCS15WkrXYAkafwMd0lqkOEuSQ0y3CWpQYa7JDVow0oXAHDEEUfU9PT0SpchSWvKLbfc8lBVTc23blWE+/T0NLOzsytdhiStKUm+sa91DstIUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGrQqTmKSpFEk6b2NVq9pseiRe5IrkjyY5KtDbc9N8tkkd3S3h3XtSfLhJLuTfDnJKZMsXtL6VlUL/hx//vWL9mnV/gzL/APw6r3atgI3VdVG4KZuGeAMYGP3swX46HjKlCQtxaLhXlX/AXxnr+Yzge3d/e3AWUPtH6uBLwKHJjl6XMVKkvbPqB+oHlVV9wF0t0d27ccA3xzqt6dre4okW5LMJpmdm5sbsQxJ0nzG/W2Z+T7dmHdQq6q2VdVMVc1MTc07Y6UkaUSjhvsDTwy3dLcPdu17gOOG+h0L3Dt6eZKkUYwa7tcBm7v7m4Frh9p/v/vWzEuAR58YvpEkLZ9Fv+eeZAfwCuCIJHuAC4ALgauTnAvcA7y+634j8BpgN/AD4C0TqFmStIhFw72qNu1j1enz9C3gbX2LkiT14/QDktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lq0KIXyJZalmQs2xlcG15aPTxy17pWVQv+HH/+9Yv2Mdi1GhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUG9wj3JnyW5PclXk+xI8owkJyTZmeSOJFclOXBcxUqS9s/I4Z7kGOBPgJmqOgk4ADgHuAj4YFVtBB4Gzh1HoZKk/dd3WGYDcFCSDcDBwH3AK4FruvXbgbN6PockaYlGDveq+hZwMXAPg1B/FLgFeKSqHu+67QGOme/xSbYkmU0yOzc3N2oZkqR59BmWOQw4EzgBeB7wTOCMebrOe252VW2rqpmqmpmamhq1DEnSPPoMy7wKuKuq5qrqx8CngZcCh3bDNADHAvf2rFGStER9wv0e4CVJDs5gar3Tga8BnwPO7vpsBq7tV6Ikaan6jLnvZPDB6a3AV7ptbQPOB96RZDdwOHD5GOqUJC1Br/ncq+oC4IK9mu8ETu2z3bVmHHOCO22s9GQnv//fefSHP+69nemtN/R6/HMOejr/fcFv9q5juXmxjjFYLJint97A3Re+dpmqkdrw6A9/vCpeN33fHFaK0w9IUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIOdzV9PGccGH9XqxB61thruathou+LBWL/agtc1hGUlqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapBnqEpalZ79wq28aPvWlS6DZ78QYGXPch5Fr3BPcihwGXASUMAfAF8HrgKmgbuBN1TVw72qlLTufG/XhSs+dQSs3ekj+g7LXAJ8pqpeAJwM7AK2AjdV1Ubgpm5ZkrSMRg73JIcALwcuB6iqH1XVI8CZwPau23bgrL5FSpKWps+R+88Dc8DfJ/lSksuSPBM4qqruA+huj5zvwUm2JJlNMjs3N9ejDEnS3vqE+wbgFOCjVfVi4PssYQimqrZV1UxVzUxNTfUoQ5K0tz7hvgfYU1U7u+VrGIT9A0mOBuhuH+xXoiRpqUYO96q6H/hmkud3TacDXwOuAzZ3bZuBa3tVKElasr7fc/9j4BNJDgTuBN7C4A3j6iTnAvcAr+/5HCtqHJdpAy/VJml59Qr3qroNmJln1el9truarIbLtMHa/a6tpJXh9AOS1CDDXZIaZLhLUoMMd0lqkOEuSQ1yyl81bTVMG7tWp4zV2ma4q2mrYdpYv8aqleCwjCQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDnlpG0aq2GeXmec9DTV7qEkRjuklalcUz4Nr31hhWfOG6lOCwjSQ0y3CWpQYa7JDXIMfdFrIYr+QzqAK/mI2l/Ge6LWA1X8oHV8a0BSWuHwzKS1CDDXZIaZLhLUoN6h3uSA5J8Kcn13fIJSXYmuSPJVUkO7F+mJGkpxnHkfh6wa2j5IuCDVbUReBg4dwzPIUlagl7hnuRYBt/Pu6xbDvBK4Jquy3bgrD7PIUlaur5H7h8C3gX8pFs+HHikqh7vlvcAx8z3wCRbkswmmZ2bm+tZhiRp2MjhnuS3gAer6pbh5nm61nyPr6ptVTVTVTNTU1OjliFJmkefk5hOA347yWuAZwCHMDiSPzTJhu7o/Vjg3v5lSqNb6RPA1uqUsVrbRg73qno38G6AJK8A/ryqfjfJJ4GzgSuBzcC1Y6hTGknfs4vX85SxWtsm8T3384F3JNnNYAz+8gk8hyRpAWOZW6aqPg98vrt/J3DqOLYrSRqNZ6hKUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUFjmTisdSs9Hzg4J7ikpTHcFzGOubydE1zScnNYRpIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1aORwT3Jcks8l2ZXk9iTnde3PTfLZJHd0t4eNr1xJ0v7oc+T+OPDOqnoh8BLgbUlOBLYCN1XVRuCmblmStIxGDvequq+qbu3ufw/YBRwDnAls77ptB87qW6QkaWnGMuaeZBp4MbATOKqq7oPBGwBw5D4esyXJbJLZubm5cZQhSer0DvckzwI+BfxpVX13fx9XVduqaqaqZqampvqWIUka0ivckzydQbB/oqo+3TU/kOTobv3RwIP9SpQkLVWfb8sEuBzYVVUfGFp1HbC5u78ZuHb08iRJo9jQ47GnAb8HfCXJbV3bXwIXAlcnORe4B3h9vxIlSUs1crhX1c1A9rH69FG3K0nqzzNUJalBhrskNchwl6QG9flAVZJW1OBLe4v0uWjh9VU1pmpWF8Nd0prVajCPg8MyktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNcj53LWujeNiD+C84lp9DHeta4ayWuWwjCQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWrQRMI9yauTfD3J7iRbJ/EckqR9G3u4JzkAuBQ4AzgR2JTkxHE/jyRp3yZx5H4qsLuq7qyqHwFXAmdO4HkkSfswiXA/Bvjm0PKeru1JkmxJMptkdm5ubgJlSNL6NYlwn2+yjqec411V26pqpqpmpqamJlCGJK1fkwj3PcBxQ8vHAvdO4HkkSfswiXD/L2BjkhOSHAicA1w3geeRJO3D2GeFrKrHk7wd+DfgAOCKqrp93M8jSdq3iUz5W1U3AjdOYtuSpMV5hqokNchwl6QGGe6S1CAvszcG47gOp5d7kzROhvsYGMySVhuHZSSpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNymo4ASfJHPCNla5jgo4AHlrpIjQS993a1vr+O76q5r2U3aoI99Ylma2qmZWuQ0vnvlvb1vP+c1hGkhpkuEtSgwz35bFtpQvQyNx3a9u63X+OuUtSgzxyl6QGGe6S1CDDfYmSHJ7ktu7n/iTfGlquJB8f6rshyVyS67vlNyf5SHf/fUl+kOTIof6PLf9vtH4ssu8OTPK6bh++YOgxdyV5/l7b+VCSd3X3T03y+SR3JLk1yQ1JXrTcv1vr+rzuhtqvTfKfe7V9OMl7h5bfk+TSyf9Gk+eVmJaoqr4N/AoMAhp4rKou7pYfA05KclBV/RD4DeBbC2zuIeCdwPkTLVrAwvuua9sE3AycA7yva76yW35/1+dpwNnAaUmOAq4G3lRVX+jWvwz4BeArk/+N1o++r7skhwKnAI8lOaGq7upW/RVwW5JPAAX8IfDiZfiVJs4j9/H7V+C13f1NwI4F+l4BvDHJcydelRaU5FnAacC5DML8CTv2Wn45cHdVfQN4O7D9iWAHqKqbq+qfl6FkPdlir7vfAf6Fn75ZA1BV3wXeA3wEuBT466p6ZOLVLgPDffyuBM5J8gzgl4GdC/R9jEHAn7cchWlBZwGfqar/Ab6T5BSAqvoy8JMkJ3f9zuGnwfFLwK3LXqnms9jr7onA39Hd/39VtQM4DDikqj5OIwz3MevCYJrBf6Ab9+MhHwY2JzlkknVpUZsYBATd7XAA7GAQHBuAM4FPzreBJDuT7EpyyUQr1VMs9Lrrhs9+Ebi5e/N+PMlJQ+uPBX4WeF73F1wTDPfJuA64mIWHZADo/gT8R+Ctky5K80tyOPBK4LIkdwN/wWC4LF2XHcAbgFcBX66qB7v22xmM4wJQVb8GvBd4zjKVrifb1+vujQyOzO/q9u80Tx5qu4TBZyxXAxdMusjlYrhPxhXA31TV/n6o9gHgj/AD7pVyNvCxqjq+qqar6jjgLuBlAFX1v8C3gQt5cnBcCrw5yUuH2g5eppr1VPt63W0CXt3t22ngV+nCPckZwJHAx4C/BV6X5MTlK3lyDPcJqKo9VbXff5pX1UPAPwE/M7mqtIBNDP79h30KeNPQ8g7gBcP9qup+BkeFf5dkd5IvMHij+Mhky9V85nvdJZkGfg744lC/u4DvJvl14EPAW2vg+8C7aGT/Of2AJDXII3dJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhr0fwFK/n64/rZHAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Read 'weather.csv' into a DataFrame named 'weather'\n", "weather = pd.read_csv('./dataset/weather.csv')\n", "\n", "# Describe the temperature columns\n", "print(weather[['TMIN', 'TAVG', 'TMAX']].describe())\n", "\n", "# Create a box plot of the temperature columns\n", "weather[['TMIN', 'TAVG', 'TMAX']].plot(kind='box')\n", "plt.savefig('../images/tmin-boxplot.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting the temperature difference\n", "In this exercise, you'll continue to assess whether the dataset seems trustworthy by plotting the difference between the maximum and minimum temperatures.\n", "\n", "What do you notice about the resulting histogram? Does it match your expectations, or do you see anything unusual?" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "count 4017.000000\n", "mean 17.784167\n", "std 6.350720\n", "min 2.000000\n", "25% 14.000000\n", "50% 18.000000\n", "75% 22.000000\n", "max 43.000000\n", "Name: TDIFF, dtype: float64\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAD4CAYAAAAD6PrjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAARWElEQVR4nO3de6xlZX3G8e/jgOIdkQHJDPRgnbSYRpCOSIJNFWyDoIKNWI2tU0OdpmKq0UZH01RtagJJK2raWKdiHKw3vDIVW4t4a/8QHQQVxYbRUpjOhBmVi1cI+usf+z2vh5kzM/twZp19Lt9PsnPW+6619/6xwpznvOvyrlQVkiQBPGjSBUiSFg9DQZLUGQqSpM5QkCR1hoIkqTts0gXMx9FHH11TU1OTLkOSlpTrrrvu+1W1erZ1SzoUpqam2LZt26TLkKQlJcn/7m+dh48kSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqRu0FBIckuSbya5Icm21ndUkquT3Nx+Pqb1J8k7kmxP8o0kpw5ZmyRpXwsxUnhGVZ1SVetbexNwTVWtA65pbYBnAevaayPwzgWoTZI0wyQOH50HbGnLW4DzZ/RfXiNfBo5MctwE6pOkFWvoO5oL+I8kBbyrqjYDx1bVLoCq2pXkmLbtGuC2Ge/d0fp2zfzAJBsZjSQ44YQTBi5fi8HUpqsm8r23XHzuRL5XmqShQ+GMqtrZfvFfneQ7B9g2s/Tt81i4FiybAdavX+9j4yTpEBr08FFV7Ww/dwOfAE4Dbp8+LNR+7m6b7wCOn/H2tcDOIeuTJN3fYKGQ5OFJHjm9DPw+cCOwFdjQNtsAXNmWtwIvaVchnQ7cNX2YSZK0MIY8fHQs8Ikk09/zgar69yRfBa5IciFwK3BB2/7TwDnAduCnwEsHrE2SNIvBQqGqvgecPEv/D4CzZukv4KKh6pEkHZx3NEuSuiX9kB0tHZO6rFTS3DhSkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqnBBP2o/5TOLn8521VDlSkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYOHQpJVSa5P8qnWPjHJtUluTvLhJA9u/Q9p7e1t/dTQtUmS7m8hRgqvBG6a0b4EuLSq1gF3ABe2/guBO6rqCcClbTtJ0gIaNBSSrAXOBd7d2gHOBD7aNtkCnN+Wz2tt2vqz2vaSpAUy9EjhbcBrgV+29mOBO6vqvtbeAaxpy2uA2wDa+rva9veTZGOSbUm27dmzZ8jaJWnFGSwUkjwb2F1V183snmXTGmPdrzqqNlfV+qpav3r16kNQqSRp2mEDfvYZwHOTnAMcATyK0cjhyCSHtdHAWmBn234HcDywI8lhwKOBHw5YnyRpL4ONFKrq9VW1tqqmgBcCn6uqFwOfB57fNtsAXNmWt7Y2bf3nqmqfkYIkaTiTuE/hdcCrk2xndM7gstZ/GfDY1v9qYNMEapOkFW3Iw0ddVX0B+EJb/h5w2izb/By4YCHqkSTNzjuaJUmdoSBJ6hbk8JGWvqlNV026BEkLwJGCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYeNs1GS36qqG4cuRloupjZdNa/333LxuYeoEmluxh0p/FOSryR5eZIjB61IkjQxY4VCVT0NeDFwPLAtyQeS/N6glUmSFtzY5xSq6mbgr4DXAb8LvCPJd5L8wVDFSZIW1lihkORJSS4FbgLOBJ5TVSe15UsHrE+StIDGHSn8A/A14OSquqiqvgZQVTsZjR72keSIdh7i60m+leTNrf/EJNcmuTnJh5M8uPU/pLW3t/VT8/2PkyTNzbihcA7wgar6GUCSByV5GEBVvW8/77kHOLOqTgZOAc5OcjpwCXBpVa0D7gAubNtfCNxRVU9gNPq45IH8B0mSHrhxQ+GzwENntB/W+varRn7cmoe3VzE65PTR1r8FOL8tn9fatPVnJcmY9UmSDoFxQ+GIGb/gacsPO9ibkqxKcgOwG7ga+C5wZ1Xd1zbZAaxpy2uA29rn3wfcBTx2zPokSYfAuKHwkySnTjeS/Dbws4O9qap+UVWnAGuB04CTZtts+mMPsK5LsjHJtiTb9uzZM1bxkqTxjHVHM/Aq4CNJdrb2ccAfjvslVXVnki8ApwNHJjmsjQbWAtOfuYPRfRA7khwGPBr44SyftRnYDLB+/fp9QkOS9MCNe/PaV4HfBP4ceDlwUlVdd6D3JFk9ffdzkocCz2R0Sevngee3zTYAV7blra1NW/+5qvKXviQtoHFHCgBPAabae56chKq6/ADbHwdsSbKKUfhcUVWfSvJt4ENJ/ha4HrisbX8Z8L4k2xmNEF44t/8USdJ8jTsh3vuAXwduAH7RugvYbyhU1TeAJ8/S/z1G5xf27v85cME49UiShjHuSGE98EQP50jS8jbu1Uc3Ao8bshBJ0uSNO1I4Gvh2kq8wulMZgKp67iBVaRDzneNf0vI3bii8acgiJEmLw1ihUFVfTPJrwLqq+myb92jVsKVJkhbauFNnv4zRfETval1rgE8OVZQkaTLGPdF8EXAGcDf0B+4cM1RRkqTJGDcU7qmqe6cbbRoKL0+VpGVm3FD4YpI3AA9tz2b+CPCvw5UlSZqEcUNhE7AH+CbwZ8Cn2c8T1yRJS9e4Vx/9Evjn9pIkLVPjzn30P8xyDqGqHn/IK5IkTcxc5j6adgSjieuOOvTlSJImadznKfxgxuv/quptjJ61LElaRsY9fHTqjOaDGI0cHjlIRZKkiRn38NHfz1i+D7gFeMEhr0aSNFHjXn30jKELkSRN3riHj159oPVV9dZDU44kaZLmcvXRU4Ctrf0c4EvAbUMUJUmajLk8ZOfUqvoRQJI3AR+pqj8dqjBJ0sIbd5qLE4B7Z7TvBaYOeTWSpIkad6TwPuArST7B6M7m5wGXD1aVJGkixr366C1J/g34ndb10qq6friyJEmTMO7hI4CHAXdX1duBHUlOHKgmSdKEjPs4zjcCrwNe37oOB/5lqKIkSZMx7kjhecBzgZ8AVNVOnOZCkpadcUPh3qoq2vTZSR4+XEmSpEkZNxSuSPIu4MgkLwM+iw/ckaRlZ9yrj/6uPZv5buA3gL+uqqsHrUxawaY2XfWA33vLxecewkq00hw0FJKsAj5TVc8EDAJJWsYOGgpV9YskP03y6Kq6ayGK0v7N5y9ISTqYce9o/jnwzSRX065AAqiqvxikKknSRIwbCle1lyRpGTtgKCQ5oapuraotc/3gJMczmh/pccAvgc1V9fYkRwEfZjSh3i3AC6rqjiQB3g6cA/wU+JOq+tpcv1eS9MAd7JLUT04vJPnYHD/7PuA1VXUScDpwUZInApuAa6pqHXBNawM8C1jXXhuBd87x+yRJ83SwUMiM5cfP5YOratf0X/rtOQw3AWuA84DpkccW4Py2fB5weY18mdE9EcfN5TslSfNzsFCo/SzPSZIp4MnAtcCxVbULRsEBHNM2W8P9n+S2o/Xt/Vkbk2xLsm3Pnj0PtCRJ0iwOFgonJ7k7yY+AJ7Xlu5P8KMnd43xBkkcAHwNeVVUHek9m6dsniKpqc1Wtr6r1q1evHqcESdKYDniiuapWzefDkxzOKBDeX1Ufb923Jzmuqna1w0O7W/8O4PgZb18L7JzP90uS5mYuz1OYk3Y10WXATVX11hmrtgIb2vIG4MoZ/S/JyOnAXdOHmSRJC2Pc+xQeiDOAP2Z009sNre8NwMWMJti7ELgVuKCt+zSjy1G3M7ok9aUD1iZJmsVgoVBV/8Xs5wkAzppl+wIuGqoeSdLBDXb4SJK09BgKkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6oZ8noL2Y2rTVZMuQZJm5UhBktQ5UpCWmfmMRG+5+NxDWImWIkcKkqTOUJAkdYaCJKkzFCRJnaEgSeq8+ugB8l4DScuRIwVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSN1goJHlPkt1JbpzRd1SSq5Pc3H4+pvUnyTuSbE/yjSSnDlWXJGn/hhwpvBc4e6++TcA1VbUOuKa1AZ4FrGuvjcA7B6xLkrQfg4VCVX0J+OFe3ecBW9ryFuD8Gf2X18iXgSOTHDdUbZKk2S30OYVjq2oXQPt5TOtfA9w2Y7sdrW8fSTYm2ZZk2549ewYtVpJWmsVyojmz9NVsG1bV5qpaX1XrV69ePXBZkrSyLHQo3D59WKj93N36dwDHz9huLbBzgWuTpBVvoZ+nsBXYAFzcfl45o/8VST4EPBW4a/owk6SFM5/nhNxy8bmHsBJNymChkOSDwNOBo5PsAN7IKAyuSHIhcCtwQdv808A5wHbgp8BLh6pLkrR/g4VCVb1oP6vOmmXbAi4aqhZJ0ngWy4lmSdIiYChIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQt9Cypkpap+cywCs6yulg4UpAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpG7F3rw23xttJGk5cqQgSeoMBUlSZyhIkjpDQZLUGQqSpG7FXn0kaXGZzxWBTrt96DhSkCR1hoIkqTMUJEmdoSBJ6gwFSVLn1UeSljyvXDp0FlUoJDkbeDuwCnh3VV084ZIkLXMGyv0tmsNHSVYB/wg8C3gi8KIkT5xsVZK0siymkcJpwPaq+h5Akg8B5wHfnmhVkrQfk5yCf6hRymIKhTXAbTPaO4Cn7r1Rko3Axta8J8mNC1DbcnA08P1JF7EEuJ/G434az2D7KZfM6+2/tr8ViykUMktf7dNRtRnYDJBkW1WtH7qw5cB9NR7303jcT+NZivtp0ZxTYDQyOH5Gey2wc0K1SNKKtJhC4avAuiQnJnkw8EJg64RrkqQVZdEcPqqq+5K8AvgMo0tS31NV3zrI2zYPX9my4b4aj/tpPO6n8Sy5/ZSqfQ7bS5JWqMV0+EiSNGGGgiSpW7KhkOTsJP+dZHuSTZOuZ7FI8p4ku2fev5HkqCRXJ7m5/XzMJGtcDJIcn+TzSW5K8q0kr2z97qu9JDkiyVeSfL3tqze3/hOTXNv21YfbBSIrXpJVSa5P8qnWXlL7aUmGglNiHNB7gbP36tsEXFNV64BrWnuluw94TVWdBJwOXNT+H3Jf7ese4MyqOhk4BTg7yenAJcClbV/dAVw4wRoXk1cCN81oL6n9tCRDgRlTYlTVvcD0lBgrXlV9CfjhXt3nAVva8hbg/AUtahGqql1V9bW2/CNG/4jX4L7aR438uDUPb68CzgQ+2vrdV0CStcC5wLtbOyyx/bRUQ2G2KTHWTKiWpeDYqtoFo1+GwDETrmdRSTIFPBm4FvfVrNohkRuA3cDVwHeBO6vqvraJ/wZH3ga8Fvhlaz+WJbaflmoojDUlhnQwSR4BfAx4VVXdPel6Fquq+kVVncJopoHTgJNm22xhq1pckjwb2F1V183snmXTRb2fFs3Na3PklBhzc3uS46pqV5LjGP21t+IlOZxRILy/qj7eut1XB1BVdyb5AqPzMEcmOaz9Fey/QTgDeG6Sc4AjgEcxGjksqf20VEcKTokxN1uBDW15A3DlBGtZFNqx3suAm6rqrTNWua/2kmR1kiPb8kOBZzI6B/N54PltsxW/r6rq9VW1tqqmGP1O+lxVvZgltp+W7B3NLY3fxq+mxHjLhEtaFJJ8EHg6oyl7bwfeCHwSuAI4AbgVuKCq9j4ZvaIkeRrwn8A3+dXx3zcwOq/gvpohyZMYnSBdxegPySuq6m+SPJ7RRR5HAdcDf1RV90yu0sUjydOBv6yqZy+1/bRkQ0GSdOgt1cNHkqQBGAqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVL3/15hwzC7kmryAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Create a 'TDIFF' column that represents temperature difference\n", "weather['TDIFF'] = weather.TMAX - weather.TMIN\n", "\n", "# Describe the 'TDIFF' column\n", "print(weather.TDIFF.describe())\n", "\n", "# Create a histogram with 20 bins to visualize 'TDIFF'\n", "weather.TDIFF.plot(kind='hist', bins=20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Categorizing the weather\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Counting bad weather conditions\n", "The weather DataFrame contains 20 columns that start with 'WT', each of which represents a bad weather condition. For example:\n", "\n", "- WT05 indicates \"Hail\"\n", "- WT11 indicates \"High or damaging winds\"\n", "- WT17 indicates \"Freezing rain\"\n", "\n", "For every row in the dataset, each WT column contains either a 1 (meaning the condition was present that day) or NaN (meaning the condition was not present).\n", "\n", "In this exercise, you'll quantify \"how bad\" the weather was each day by counting the number of 1 values in each row." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAD4CAYAAAAdIcpQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAT80lEQVR4nO3df/BddZ3f8efL4C/cpeDy1caEbMCJruhowK9oy2pV3JUfuyI7s7vJtEqp3agLre7uTBfsTmHs0KEtyi6zW2yQVHAVFkVWVvFHsFuZzoiQQMpvSoAsfEkKWWmFFQoF3/3jnq+5Sb7f77lJvvd7bnKfj5k795z3PeeeN3dIXvmcz7nnpqqQJGkuL+q6AUnS6DMsJEmtDAtJUivDQpLUyrCQJLU6qOsGhuXwww+v5cuXd92GJO03Nm7c+LdVNTHTawdsWCxfvpwNGzZ03YYk7TeS/M1sr3kaSpLUyrCQJLUyLCRJrQwLSVIrw0KS1GpoYZFkXZLHk9zZV/uLJJuax5Ykm5r68iTP9L32ub593prkjiSbk1ycJMPqWZI0s2FeOvsF4E+BK6YLVfXb08tJPgP8uG/7B6pq5QzvcwmwBrgJuB44EfjWEPqVJM1iaCOLqroReGKm15rRwW8BV871HkkWA4dU1Q+qdy/1K4APznevkqS5dTVn8U7gsaq6v692ZJLbknw/yTub2hJgqm+bqaYmSVpAXX2DezU7jyq2Acuq6kdJ3gr8ZZI3AjPNT8z6a01J1tA7ZcWyZcv2urnlZ39zr/fdF1suOKWT40pSmwUfWSQ5CPgN4C+ma1X1bFX9qFneCDwAvI7eSGJp3+5Lga2zvXdVra2qyaqanJiY8fYmkqS90MVpqPcB91bVz04vJZlIsqhZPgpYATxYVduAp5K8o5nn+DDw9Q56lqSxNsxLZ68EfgC8PslUko80L61i94ntdwG3J/kfwFeBj1XV9OT4x4HPA5vpjTi8EkqSFtjQ5iyqavUs9X86Q+0a4JpZtt8AvGlem5Mk7RG/wS1JamVYSJJaGRaSpFaGhSSplWEhSWplWEiSWhkWkqRWhoUkqZVhIUlqZVhIkloZFpKkVoaFJKmVYSFJamVYSJJaGRaSpFaGhSSplWEhSWplWEiSWhkWkqRWhoUkqZVhIUlqNbSwSLIuyeNJ7uyrnZfk0SSbmsfJfa+dk2RzkvuSvL+vfmJT25zk7GH1K0ma3TBHFl8ATpyhflFVrWwe1wMkORpYBbyx2ec/JVmUZBHwZ8BJwNHA6mZbSdICOmhYb1xVNyZZPuDmpwJXVdWzwENJNgPHNa9trqoHAZJc1Wx79zy3K0maQxdzFmclub05TXVYU1sCPNK3zVRTm60+oyRrkmxIsmH79u3z3bckja2FDotLgNcCK4FtwGeaembYtuaoz6iq1lbVZFVNTkxM7GuvkqTG0E5DzaSqHpteTnIp8I1mdQo4om/TpcDWZnm2uiRpgSzoyCLJ4r7V04DpK6WuA1YleWmSI4EVwM3ALcCKJEcmeQm9SfDrFrJnSdIQRxZJrgTeDRyeZAo4F3h3kpX0TiVtAT4KUFV3Jbma3sT188CZVfVC8z5nAd8BFgHrququYfUsSZrZMK+GWj1D+bI5tj8fOH+G+vXA9fPYmiRpD/kNbklSK8NCktTKsJAktTIsJEmtDAtJUivDQpLUyrCQJLUyLCRJrQwLSVIrw0KS1MqwkCS1MiwkSa0MC0lSK8NCktTKsJAktTIsJEmtDAtJUivDQpLUyrCQJLUyLCRJrQwLSVKroYVFknVJHk9yZ1/tPya5N8ntSa5NcmhTX57kmSSbmsfn+vZ5a5I7kmxOcnGSDKtnSdLMhjmy+AJw4i619cCbqurNwP8Ezul77YGqWtk8PtZXvwRYA6xoHru+pyRpyIYWFlV1I/DELrXvVtXzzepNwNK53iPJYuCQqvpBVRVwBfDBYfQrSZpdl3MW/wz4Vt/6kUluS/L9JO9sakuAqb5tpprajJKsSbIhyYbt27fPf8eSNKY6CYsk/xp4HvhSU9oGLKuqY4DfB76c5BBgpvmJmu19q2ptVU1W1eTExMR8ty1JY+ughT5gktOBXwNOaE4tUVXPAs82yxuTPAC8jt5Iov9U1VJg68J2LEla0JFFkhOBPwQ+UFVP99Unkixqlo+iN5H9YFVtA55K8o7mKqgPA19fyJ4lSUMcWSS5Eng3cHiSKeBcelc/vRRY31wBe1Nz5dO7gE8neR54AfhYVU1Pjn+c3pVVL6c3x9E/zyFJWgBDC4uqWj1D+bJZtr0GuGaW1zYAb5rH1iRJe8hvcEuSWhkWkqRWhoUkqZVhIUlqZVhIkloZFpKkVoaFJKmVYSFJamVYSJJaDRQWSfwGtSSNsUFHFp9LcnOS353+KVRJ0vgYKCyq6peBfwwcAWxI8uUkvzLUziRJI2PgOYuquh/4I3q3GP9HwMVJ7k3yG8NqTpI0Ggads3hzkouAe4D3Ar9eVW9oli8aYn+SpBEw6C3K/xS4FPhUVT0zXayqrUn+aCidSZJGxqBhcTLwTFW9AJDkRcDLqurpqvri0LqTJI2EQecsbqD3S3XTDm5qkqQxMGhYvKyq/m56pVk+eDgtSZJGzaBh8ZMkx06vJHkr8Mwc20uSDiCDzll8EvhKkq3N+mLgt4fTkiRp1Az6pbxbgF8CPg78LvCGqtrYtl+SdUkeT3JnX+2VSdYnub95PqypJ8nFSTYnuX2Xkczpzfb3Jzl9T/8jJUn7Zk9uJPg24M3AMcDqJB8eYJ8vACfuUjsb+F5VrQC+16wDnASsaB5rgEugFy7AucDbgeOAc6cDRpK0MAb9Ut4XgQuBX6YXGm8DJtv2q6obgSd2KZ8KXN4sXw58sK9+RfXcBByaZDHwfmB9VT1RVf8bWM/uASRJGqJB5ywmgaOrqubhmK+uqm0AVbUtyaua+hLgkb7tpprabPXdJFlDb1TCsmXL5qFVSRIMfhrqTuDvD7MRIDPUao767sWqtVU1WVWTExMT89qcJI2zQUcWhwN3J7kZeHa6WFUf2ItjPpZkcTOqWAw83tSn6N3VdtpSYGtTf/cu9f+2F8eVJO2lQcPivHk85nXA6cAFzfPX++pnJbmK3mT2j5tA+Q7w7/omtX8VOGce+5EktRgoLKrq+0l+EVhRVTckORhY1LZfkivpjQoOTzJF76qmC4Crk3wEeBj4zWbz6+ndg2oz8DRwRnPsJ5L8W+CWZrtPV9Wuk+aSpCEaKCyS/A69ieNXAq+lN8H8OeCEufarqtWzvLTbfs3k+ZmzvM86YN0gvUqS5t+gE9xnAscDT8LPfgjpVXPuIUk6YAwaFs9W1XPTK0kOYpYrkiRJB55Bw+L7ST4FvLz57e2vAH81vLYkSaNk0LA4G9gO3AF8lN5ktL+QJ0ljYtCroX5K72dVLx1uO5KkUTTo1VAPMcMcRVUdNe8dSZJGzp7cG2ray+h9N+KV89+OJGkUDfp7Fj/qezxaVX8MvHfIvUmSRsSgp6GO7Vt9Eb2Rxs8PpSNJ0sgZ9DTUZ/qWnwe2AL81791IkkbSoFdDvWfYjUiSRtegp6F+f67Xq+qz89OOJGkU7cnVUG+jdxtxgF8HbmTnX7CTJB2g9uTHj46tqqcAkpwHfKWq/vmwGpMkjY5Bb/exDHiub/05YPm8dyNJGkmDjiy+CNyc5Fp63+Q+DbhiaF1JkkbKoFdDnZ/kW8A7m9IZVXXb8NqSJI2SQU9DARwMPFlVfwJMJTlySD1JkkbMQGGR5FzgD4FzmtKLgT8fVlOSpNEy6MjiNOADwE8Aqmor3u5DksbGoGHxXFUVzW3Kk7xieC1JkkbNoGFxdZL/DBya5HeAG9jLH0JK8vokm/oeTyb5ZJLzkjzaVz+5b59zkmxOcl+S9+/NcSVJe2/Qq6EubH57+0ng9cC/qar1e3PAqroPWAmQZBHwKHAtcAZwUVVd2L99kqOBVcAbgdcANyR5XVW9sDfHlyTtudawaP5C/05VvQ/Yq4CYwwnAA1X1N0lm2+ZU4KqqehZ4KMlm4DjgB/PciyRpFq2noZp/wT+d5O8N4firgCv71s9KcnuSdUkOa2pL2PkeVFNNbTdJ1iTZkGTD9u3bh9CuJI2nQecs/i9wR5LLklw8/diXAyd5Cb0rrL7SlC4BXkvvFNU2dvyGxkxDjt1+DxygqtZW1WRVTU5MTOxLe5KkPoPe7uObzWM+nQTcWlWPAUw/AyS5FPhGszoFHNG331Jg6zz3Ikmaw5xhkWRZVT1cVZcP4dir6TsFlWRxVW1rVk8D7myWrwO+nOSz9Ca4VwA3D6EfSdIs2k5D/eX0QpJr5uugSQ4GfgX4Wl/5PyS5I8ntwHuA3wOoqruAq4G7gW8DZ3ollCQtrLbTUP3zBUfN10Gr6mngF3apfWiO7c8Hzp+v40uS9kzbyKJmWZYkjZG2kcVbkjxJb4Tx8maZZr2q6pChdidJGglzhkVVLVqoRiRJo2tPfs9CkjSmDAtJUivDQpLUyrCQJLUyLCRJrQwLSVIrw0KS1GrQu85qASw/e75v7Du4LRec0tmxJY0+RxaSpFaGhSSplWEhSWplWEiSWhkWkqRWhoUkqZVhIUlqZVhIkloZFpKkVp2FRZItSe5IsinJhqb2yiTrk9zfPB/W1JPk4iSbk9ye5Niu+pakcdT1yOI9VbWyqiab9bOB71XVCuB7zTrAScCK5rEGuGTBO5WkMdZ1WOzqVODyZvly4IN99Suq5ybg0CSLu2hQksZRl2FRwHeTbEyypqm9uqq2ATTPr2rqS4BH+vadamqSpAXQ5V1nj6+qrUleBaxPcu8c22aGWu22US901gAsW7ZsfrqUJHU3sqiqrc3z48C1wHHAY9Onl5rnx5vNp4Aj+nZfCmyd4T3XVtVkVU1OTEwMs31JGiudjCySvAJ4UVU91Sz/KvBp4DrgdOCC5vnrzS7XAWcluQp4O/Dj6dNV0t7q6vdD/O0Q7Y+6Og31auDaJNM9fLmqvp3kFuDqJB8BHgZ+s9n+euBkYDPwNHDGwrcsSeOrk7CoqgeBt8xQ/xFwwgz1As5cgNYkSTMYtUtnJUkjyLCQJLUyLCRJrQwLSVIrw0KS1MqwkCS1MiwkSa0MC0lSK8NCktSqy7vOSp3dn0nSnjEsBPiXtqS5eRpKktTKsJAktTIsJEmtDAtJUivDQpLUyrCQJLUyLCRJrQwLSVIrw0KS1MpvcEsLrKtvy2+54JROjqsDw4KPLJIckeSvk9yT5K4kn2jq5yV5NMmm5nFy3z7nJNmc5L4k71/oniVp3HUxsnge+IOqujXJzwMbk6xvXruoqi7s3zjJ0cAq4I3Aa4Abkryuql5Y0K4laYwt+MiiqrZV1a3N8lPAPcCSOXY5Fbiqqp6tqoeAzcBxw+9UkjSt0wnuJMuBY4AfNqWzktyeZF2Sw5raEuCRvt2mmCVckqxJsiHJhu3btw+pa0kaP52FRZKfA64BPllVTwKXAK8FVgLbgM9MbzrD7jXTe1bV2qqarKrJiYmJIXQtSeOpk7BI8mJ6QfGlqvoaQFU9VlUvVNVPgUvZcappCjiib/elwNaF7FeSxl0XV0MFuAy4p6o+21df3LfZacCdzfJ1wKokL01yJLACuHmh+pUkdXM11PHAh4A7kmxqap8CVidZSe8U0xbgowBVdVeSq4G76V1JdaZXQknSwlrwsKiq/87M8xDXz7HP+cD5Q2tKkjQnb/chSWplWEiSWhkWkqRWhoUkqZVhIUlqZVhIkloZFpKkVoaFJKmVYSFJamVYSJJaGRaSpFaGhSSpVRd3nZXUgeVnf7OzY2+54JTOjq354chCktTKsJAktTIsJEmtDAtJUivDQpLUyrCQJLXy0llJQ9fVZbtesjt/HFlIklrtN2GR5MQk9yXZnOTsrvuRpHGyX4RFkkXAnwEnAUcDq5Mc3W1XkjQ+9pc5i+OAzVX1IECSq4BTgbs77UrSSOvyFiddGdY8zf4SFkuAR/rWp4C377pRkjXAmmb175Lct5fHOxz4273c90DjZ7EzP4+d+XnsMBKfRf79Pu3+i7O9sL+ERWao1W6FqrXA2n0+WLKhqib39X0OBH4WO/Pz2Jmfxw4H+mexX8xZ0BtJHNG3vhTY2lEvkjR29pewuAVYkeTIJC8BVgHXddyTJI2N/eI0VFU9n+Qs4DvAImBdVd01xEPu86msA4ifxc78PHbm57HDAf1ZpGq3U/+SJO1kfzkNJUnqkGEhSWplWPTxliI7JDkiyV8nuSfJXUk+0XVPXUuyKMltSb7RdS9dS3Jokq8mubf5f+QfdN1Tl5L8XvPn5M4kVyZ5Wdc9zTfDouEtRXbzPPAHVfUG4B3AmWP+eQB8Arin6yZGxJ8A366qXwLewhh/LkmWAP8SmKyqN9G7CGdVt13NP8Nih5/dUqSqngOmbykylqpqW1Xd2iw/Re8vgyXddtWdJEuBU4DPd91L15IcArwLuAygqp6rqv/TbVedOwh4eZKDgIM5AL8HZljsMNMtRcb2L8d+SZYDxwA/7LaTTv0x8K+An3bdyAg4CtgO/JfmtNznk7yi66a6UlWPAhcCDwPbgB9X1Xe77Wr+GRY7DHRLkXGT5OeAa4BPVtWTXffThSS/BjxeVRu77mVEHAQcC1xSVccAPwHGdo4vyWH0zkIcCbwGeEWSf9JtV/PPsNjBW4rsIsmL6QXFl6rqa13306HjgQ8k2ULv9OR7k/x5ty11agqYqqrpkeZX6YXHuHof8FBVba+q/wd8DfiHHfc07wyLHbylSJ8koXdO+p6q+mzX/XSpqs6pqqVVtZze/xf/taoOuH85Dqqq/hfwSJLXN6UTGO+fC3gYeEeSg5s/NydwAE747xe3+1gIHdxSZNQdD3wIuCPJpqb2qaq6vsOeNDr+BfCl5h9WDwJndNxPZ6rqh0m+CtxK7yrC2zgAb/3h7T4kSa08DSVJamVYSJJaGRaSpFaGhSSplWEhSWplWEiSWhkWkqRW/x/TlHH1jIl+hgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Copy 'WT01' through 'WT22' to a new DataFrame\n", "WT = weather.loc[:, 'WT01':'WT22']\n", "\n", "# Calculate the sum of each row in 'WT'\n", "weather['bad_conditions'] = WT.sum(axis='columns')\n", "\n", "# Replace missing values in 'bad_conditions' with 0\n", "weather['bad_conditions'] = weather.bad_conditions.fillna(0).astype('int')\n", "\n", "# Create a histogram to visualist 'bad_conditions'\n", "weather['bad_conditions'].plot(kind='hist')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rating the weather conditions\n", "In the previous exercise, you counted the number of bad weather conditions each day. In this exercise, you'll use the counts to create a rating system for the weather.\n", "\n", "The counts range from 0 to 9, and should be converted to ratings as follows:\n", "\n", "- Convert 0 to 'good'\n", "- Convert 1 through 4 to 'bad'\n", "- Convert 5 through 9 to 'worse'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1749\n", "1 613\n", "2 367\n", "3 380\n", "4 476\n", "5 282\n", "6 101\n", "7 41\n", "8 4\n", "9 4\n", "Name: bad_conditions, dtype: int64\n", "bad 1836\n", "good 1749\n", "worse 432\n", "Name: rating, dtype: int64\n" ] } ], "source": [ "# Count the unique values in 'bad_conditions' and sort the index\n", "print(weather.bad_conditions.value_counts().sort_index())\n", "\n", "# Create a dictionary that maps integers to strings\n", "mapping = {0:'good', 1:'bad', 2:'bad', 3:'bad', 4:'bad', 5:'worse', \n", " 6:'worse', 7:'worse', 8:'worse', 9:'worse'}\n", "\n", "# Convert the 'bad_conditions' integers to strings using the 'mapping'\n", "weather['rating'] = weather.bad_conditions.map(mapping)\n", "\n", "# Count the unique values in 'rating'\n", "print(weather.rating.value_counts())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Changing the data type to category\n", "Since the rating column only has a few possible values, you'll change its data type to category in order to store the data more efficiently. You'll also specify a logical order for the categories, which will be useful for future exercises.\n", "\n", "**NOTE** : in pandas 1.0.3, ordered categories must be defined with ```pd.api.types.CategoricalDtype```" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 bad\n", "1 bad\n", "2 bad\n", "3 bad\n", "4 bad\n", "Name: rating, dtype: category\n", "Categories (3, object): [good < bad < worse]\n" ] } ], "source": [ "# Create a list of weather ratings in logical order\n", "cats = pd.api.types.CategoricalDtype(categories=['good', 'bad', 'worse'], ordered=True)\n", "# cats = pd.categories=['good', 'bad', 'worse'] # for newer versions of pandas", "\n", "# Change the data type of 'rating' to category\n", "weather['rating'] = weather.rating.astype(cats)\n", "\n", "# Examine the head of 'rating'\n", "print(weather.rating.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Merging datasets\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preparing the DataFrames\n", "In this exercise, you'll prepare the traffic stop and weather rating DataFrames so that they're ready to be merged:\n", "\n", "1. With the ri DataFrame, you'll move the stop_datetime index to a column since the index will be lost during the merge.\n", "2. With the weather DataFrame, you'll select the DATE and rating columns and put them in a new DataFrame." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "ri = pd.read_csv('./dataset/police.csv')\n", "\n", "combined = ri.stop_date.str.cat(ri.stop_time, sep=' ')\n", "\n", "ri['stop_datetime'] = pd.to_datetime(combined)\n", "ri['is_arrested'] = ri['is_arrested'].astype(bool)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " index state stop_date stop_time county_name driver_gender driver_race \\\n", "0 0 RI 2005-01-04 12:55 NaN M White \n", "1 1 RI 2005-01-23 23:15 NaN M White \n", "2 2 RI 2005-02-17 04:15 NaN M White \n", "3 3 RI 2005-02-20 17:15 NaN M White \n", "4 4 RI 2005-02-24 01:20 NaN F White \n", "\n", " violation_raw violation search_conducted search_type \\\n", "0 Equipment/Inspection Violation Equipment False NaN \n", "1 Speeding Speeding False NaN \n", "2 Speeding Speeding False NaN \n", "3 Call for Service Other False NaN \n", "4 Speeding Speeding False NaN \n", "\n", " stop_outcome is_arrested stop_duration drugs_related_stop district \\\n", "0 Citation False 0-15 Min False Zone X4 \n", "1 Citation False 0-15 Min False Zone K3 \n", "2 Citation False 0-15 Min False Zone X4 \n", "3 Arrest Driver True 16-30 Min False Zone X1 \n", "4 Citation False 0-15 Min False Zone X3 \n", "\n", " stop_datetime \n", "0 2005-01-04 12:55:00 \n", "1 2005-01-23 23:15:00 \n", "2 2005-02-17 04:15:00 \n", "3 2005-02-20 17:15:00 \n", "4 2005-02-24 01:20:00 \n", " DATE rating\n", "0 2005-01-01 bad\n", "1 2005-01-02 bad\n", "2 2005-01-03 bad\n", "3 2005-01-04 bad\n", "4 2005-01-05 bad\n" ] } ], "source": [ "# Reset the index of 'ri'\n", "ri.reset_index(inplace=True)\n", "\n", "# Examine the head of 'ri'\n", "print(ri.head())\n", "\n", "# Create a DataFrame from the 'DATE' and 'rating' columns\n", "weather_rating = weather[['DATE', 'rating']]\n", "\n", "# Examine the head of 'weather_rating'\n", "print(weather_rating.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Merging the DataFrames\n", "In this exercise, you'll merge the ri and weather_rating DataFrames into a new DataFrame, ri_weather.\n", "\n", "The DataFrames will be joined using the stop_date column from ri and the DATE column from weather_rating. Thankfully the date formatting matches exactly, which is not always the case!\n", "\n", "Once the merge is complete, you'll set stop_datetime as the index, which is the column you saved in the previous exercise." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(91741, 17)\n", "(91741, 19)\n" ] } ], "source": [ "# Examine the shape of 'ri'\n", "print(ri.shape)\n", "\n", "# Merge 'ri' and 'weather_rating' using left join\n", "ri_weather = pd.merge(left=ri, right=weather_rating, \n", " left_on='stop_date', right_on='DATE', how='left')\n", "\n", "# Examine the shape of 'ri_weather'\n", "print(ri_weather.shape)\n", "\n", "# Set 'stop_datetime' as the index of 'ri_weather'\n", "ri_weather.set_index('stop_datetime', inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Does weather affect the arrest rate?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing arrest rates by weather rating\n", "Do police officers arrest drivers more often when the weather is bad? Find out below!\n", "\n", "- First, you'll calculate the overall arrest rate.\n", "- Then, you'll calculate the arrest rate for each of the weather ratings you previously assigned.\n", "- Finally, you'll add violation type as a second factor in the analysis, to see if that accounts for any differences in the arrest rate.\n", "\n", "Since you previously defined a logical order for the weather categories, good < bad < worse, they will be sorted that way in the results." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.09025408486936048\n", "rating\n", "good 0.086842\n", "bad 0.090479\n", "worse 0.106527\n", "Name: is_arrested, dtype: float64\n", "violation rating\n", "Equipment good 0.058995\n", " bad 0.066311\n", " worse 0.097357\n", "Moving violation good 0.056227\n", " bad 0.058050\n", " worse 0.065860\n", "Other good 0.076923\n", " bad 0.087443\n", " worse 0.062893\n", "Registration/plates good 0.081574\n", " bad 0.098160\n", " worse 0.115625\n", "Seat belt good 0.028587\n", " bad 0.022493\n", " worse 0.000000\n", "Speeding good 0.013404\n", " bad 0.013314\n", " worse 0.016886\n", "Name: is_arrested, dtype: float64\n" ] } ], "source": [ "print(ri_weather.is_arrested.mean())\n", "\n", "print(ri_weather.groupby('rating').is_arrested.mean())\n", "\n", "# Calculate the arrest rate for each 'violation' and 'rating'\n", "print(ri_weather.groupby(['violation', 'rating']).is_arrested.mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selecting from a multi-indexed Series\n", "The output of a single .groupby() operation on multiple columns is a Series with a MultiIndex. Working with this type of object is similar to working with a DataFrame:\n", "\n", "- The outer index level is like the DataFrame rows.\n", "- The inner index level is like the DataFrame columns.\n", "\n", "In this exercise, you'll practice accessing data from a multi-indexed Series using the .loc[] accessor." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "violation rating\n", "Equipment good 0.058995\n", " bad 0.066311\n", " worse 0.097357\n", "Moving violation good 0.056227\n", " bad 0.058050\n", " worse 0.065860\n", "Other good 0.076923\n", " bad 0.087443\n", " worse 0.062893\n", "Registration/plates good 0.081574\n", " bad 0.098160\n", " worse 0.115625\n", "Seat belt good 0.028587\n", " bad 0.022493\n", " worse 0.000000\n", "Speeding good 0.013404\n", " bad 0.013314\n", " worse 0.016886\n", "Name: is_arrested, dtype: float64\n", "0.05804964058049641\n", "rating\n", "good 0.013404\n", "bad 0.013314\n", "worse 0.016886\n", "Name: is_arrested, dtype: float64\n" ] } ], "source": [ "# Save the output of the groupby operation from the last exercise\n", "arrest_rate = ri_weather.groupby(['violation', 'rating']).is_arrested.mean()\n", "\n", "# Print the 'arrest_rate' Series\n", "print(arrest_rate)\n", "\n", "# Print the arrest rate for moving violations in bad weather\n", "print(arrest_rate.loc['Moving violation', 'bad'])\n", "\n", "# Print the arrest rates for speeding violations in all three weather condtions\n", "print(arrest_rate.loc['Speeding'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reshaping the arrest rate data\n", "In this exercise, you'll start by reshaping the arrest_rate Series into a DataFrame. This is a useful step when working with any multi-indexed Series, since it enables you to access the full range of DataFrame methods.\n", "\n", "Then, you'll create the exact same DataFrame using a pivot table. This is a great example of how pandas often gives you more than one way to reach the same result!" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rating good bad worse\n", "violation \n", "Equipment 0.058995 0.066311 0.097357\n", "Moving violation 0.056227 0.058050 0.065860\n", "Other 0.076923 0.087443 0.062893\n", "Registration/plates 0.081574 0.098160 0.115625\n", "Seat belt 0.028587 0.022493 0.000000\n", "Speeding 0.013404 0.013314 0.016886\n", "rating good bad worse\n", "violation \n", "Equipment 0.058995 0.066311 0.097357\n", "Moving violation 0.056227 0.058050 0.065860\n", "Other 0.076923 0.087443 0.062893\n", "Registration/plates 0.081574 0.098160 0.115625\n", "Seat belt 0.028587 0.022493 0.000000\n", "Speeding 0.013404 0.013314 0.016886\n" ] } ], "source": [ "# Unstack the 'arrest_rate' Series into a DataFrame\n", "print(arrest_rate.unstack())\n", "\n", "# Create the same DataFrame using a pivot table\n", "print(ri_weather.pivot_table(index='violation', columns='rating', values='is_arrested'))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }