{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "title: \"Assignment 9: Data Wrangling, in Python\"\n",
    "author: \"Student Name\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## OVERVIEW\n",
    "\n",
    "This exercise accompanies the lessons in Environmental Data Analytics (ENV872L) on data wrangling in Python.<br>(_This assignment mimics Assignment 4: Data Wrangling)_\n",
    "\n",
    "Helpful materials: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf and https://duke-nsoe.github.io/ENV872/Python-R.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Directions\n",
    "1. Change \"Student Name\" on line 2 (above) with your name.\n",
    "\n",
    "\n",
    "2. Use the lesson as a guide. It contains code that can be modified to complete the assignment.\n",
    "\n",
    "\n",
    "3. Work through the steps, **creating code and output** that fulfill each instruction.\n",
    "\n",
    "\n",
    "4. Be sure to **answer the questions** in this assignment document.<br>Create a new cell in the Jupyter notebook and set its type to `Markdown`. Type your answer there. \n",
    "\n",
    "\n",
    "5. When you have completed the assignment, save your file as an HTML document via the `File`>`Download as` menu.\n",
    "\n",
    "\n",
    "6. After convertng to HTML, please submit the completed exercise to the dropbox in Sakai. Please add your last name into the file name (e.g., \"Fay_A07_DataWranglingInPython.html\") prior to submission.\n",
    "\n",
    "The completed exercise is due on **Tuesday March 5** before class begins.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up your session \n",
    "1a. Import pandas "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1b. Read in the following CSV files into dataframes named as follows:\n",
    " * `EPAair_PM25_NC2017_raw.csv` →  `pm_2017`\n",
    " * `EPAair_PM25_NC2018_raw.csv` → `pm_2018`\n",
    " * `EPAair_O3_NC2017_raw.csv`   → `o3_2017`\n",
    " * `EPAair_O3_NC2018_raw.csv`   → `o3_2017`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pm_2017 = \n",
    "pm_2018 = \n",
    "o3_2017 = \n",
    "o3_2018 = "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. Generate a few lines of code to get to know your `pm_2017` dataset (basic data summaries, etc.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Show the structure of the dataframe\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Show a summary of the data in the dataframe\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Show the \"head\" or the first 5 rows of data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Show what data type the \"Date\" field is\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Wrangle individual datasets to create processed files.\n",
    "3. Change `Date` field's data type to date"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Convert the date field to a datetime object\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4. Select the following columns: `Date`, `DAILY_AQI_VALUE`, `Site.Name`, `AQS_PARAMETER_DESC`, `COUNTY`, `SITE_LATITUDE`, `SITE_LONGITUDE`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Select the subset of fields\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "5. For the PM2.5 datasets, create a new column called `AQS_PARAMETER_DESC` filled with the value \"PM2.5\" (all cells in this column should be identical)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "6. Save all four processed datasets in the Processed folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combine datasets\n",
    "7. Merge (`concat`) the four datasets into one. Make sure your column names are identical prior to running this code. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Concatenate the 4 data frames into one\n",
    "allData = \n",
    "allData.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "7a. What are the dimensions of the concatenated data frames?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Report the shape of the dataframe\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Subsetting and splitting\n",
    "\n",
    "#### 8. Wrangle your new dataset...\n",
    " 1. Subset only records in Blackstone, Bryson City, Triple Oak \n",
    "    1. Create a Boolean mask for each site name (`query` won't work because \"Site Name\" has a space in it!)\n",
    "    2. Apply these Boolean masks using the appropriate logical operators (\"`&`\", \"`|`\", \"`~`\") to select records in any of these sites into a new dataframe. \n",
    "    3. Make a \"deep\" copy of this dataframe (to avoid warnings about working on a subset of a dataframe...)\n",
    "    4. Display the dimensions of the resulting dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create 3 Boolean masks, one each for Blackstone, Bryson City, Triple Oak\n",
    "blackstoneMask = \n",
    "brysonMask = \n",
    "tripleoakMask = "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Apply the mask to the dataframe\n",
    "allDataSelect = "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Create a deep copy of the above dataframe\n",
    "allDataSelect = "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Display the dimensions of the dataframe\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. Add columns for \"Month\" and \"Year\" by parsing your \"Date\" column (hint: see [`pd.DatetimeIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html)...)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#C. Add columns for Month and Year\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**9. Spread (`pivot_table`) your datasets** such that AQI values for ozone and PM2.5 are in separate columns. Each location on a specific date should now occupy only one row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "allPivot = "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "10. Call up the dimensions of your new tidy dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "11. Save your processed dataset with the following file name: \"EPAair_O3_PM25_NC1718_Processed.csv\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate summary tables\n",
    "12. Use the split-apply-combine strategy to generate two new data frames: \n",
    "    1. A summary table of mean AQI values for O3 and PM2.5 by month\n",
    "    2. A summary table of the mean, minimum, and maximum AQI values of O3 and PM2.5 for each site"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#12a. A summary table of mean AQI values for O3 and PM2.5 by month\n",
    "group = allDataSelect.groupby(['Month','AQS_PARAMETER_DESC'])\n",
    "aggData = group['DAILY_AQI_VALUE'].mean().reset_index()\n",
    "aggData.pivot_table(index='Month',columns='AQS_PARAMETER_DESC',values='DAILY_AQI_VALUE')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#12b. A summary table of the mean, minimum, and maximum AQI values of O3 and PM2.5 for each site\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}