## Note: Sorry this notebook isn't working right now and is under development.

# Berkeley Air Quality Notebook

**Welcome to our notebook on Berkeley Air Quality!** 

In this notebook we will be looking at Air Quality Index (AQI) scores in the surrounding Berkeley, CA area. With so many pollutants in the air, especially as we head into the annual fire season, AQI becomes something we check on daily. For many of us, this AQI map is all too familiar. Throughout this module we will discuss how data can be used to visualize and uncover underlying trends in the world.

**Let's get started!**

<p align="center">
  <img src="images/AQImap.webp" width="" height="" align="center">
</p>

<br>

## Introduction to Jupyter Notebook

Before we get started with the data, let's talk about what Jupyter Notebook is. This lab is set up in a Jupyter Notebook. Notebooks can contain anything from live code, to written text, equations or visualizations. The content of notebooks are written into rectangular sections called **cells**. 

#### Types of Cells
There are two types of cells in Jupyter, **code** cells and **markdown** cells. **Code cells**, as you can imagine, contain code in Python, the programming language that we will be using throughout this notebook.  **Markdown cells**, such as this one, contain written text. You can select any cell by clicking on it one. 

#### Running Cells
'Running' a cell is similar to pressing 'Enter' on a calculator once you've typed in an expression; it computes all of the expressions contained within the cell.

To run a cell, you can do one of the following:

- press **Shift + Enter**
- click the **Run** button on the top tool bar

Running a markdown cell will embed the text into the notebook and running a code cell will evaluate the code and display its output under the cell. 

Let's try it! **Run the code cell below.**

In [1]:
print("Hello World!")

Hello World!


#### Editing and Saving

- To **edit** a cell, simply double click on the desired cell and begin typing. The cell that you are currently working in will be highlighted by a green box.
- To **save** the notebook, either click *Ctl + S* or navigate to the "File" dropdown and select "Save and Checkpoint"

#### Adding Cells
You can add a cell by clicking <b><code>Insert > Insert Cell Below</code></b> and choose the cell type in the drop down menu. Try adding a cell below to type in your name!


#### Deleting Cells 
To delete a cell, click on the <b><code>scissors</code></b> at the top or <b><code>Edit > Cut Cells</code></b>. Delete the cell below.

In [2]:
print("Delete this!")

Delete this!


**Important Tip**: Everytime you open a Jupyter notebook, it is extremely important to run all the cells from the beginning in order for the notebook to work. 

Now that we have had a brief crash course on Jupyter Notebooks, let's dive into Berkeley AQI!

<br>

## Introduction to the Data

In this notebook we will look at data collected from PrupleAir, a company that manages a network of air quality sensors. The data from these sensors are then collected to create maps like the one displayed above that depicts an intuitive visualization of the air quality in a specific region. In the dataframe below, you will find several metrics that help us do this.

**Before we begin:**

- Click on <b><code>Cell</code></b> in the top toolbar  
- Click on <b><code>Run All</code></b> in the drop down
- Scroll back up to begin going through the notebook!

In [3]:
import matplotlib.pyplot as plt
import numpy as np
import purpleair
import folium
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from datetime import datetime
from IPython.display import clear_output

<br>

# PurpleAir Data

Before we begin looking at data collected from PurpleAir sensors, lets first take a look at what a sensor is, and what it measures. 


> Below is a picture of a real PurpleAir Air Quality Sensor. These sensor can be mounted both indoors or outdoors, and it tracks airborne particulate matter(PM) in real time using PMSX003 laser counters. Particulate matter can include things like dust, smoke, dirt and any other organic or inorganic particles in the air. With multiple sensors mounted in a region, PurpleAir can create a relatively accurate measure of AQI throughout the day as the air quality changes. 

For more information on how sensors work, take a look at the official PurpleAir website [here](https://www2.purpleair.com/community/faq#hc-what-do-the-numbers-on-the-purpleair-map-mean-1)!

<p align="center">
  <img src="images/purpleair-sensor-pm2.5.webp" width="" height="" align="center">
</p>

In order to work with the data, we need to pull it into our workspace. Fortunately, PurpleAir has created an API that allows users to pull in and work with their AQI data. In the code cell below we will import the purpleair API and use it to create a dataframe of data from all PurpleAir sensors, which is roughly ~20,000!

**Run the code cell below!**

In [4]:
from purpleair.network import SensorList
p = SensorList()
df = p.to_dataframe(sensor_filter='all',
                    channel='parent')

Initialized 22,479 sensors!


The dataframe below contains all the sensor data as of the latest update. It contains data on everything from the geograohical latitude and longitude of the sensor to data on the last time that sensor measured airborne PM.

In [5]:
# Displaying dataframe with all the PurpleAir Sensor data
df

Unnamed: 0_level_0,parent,lat,lon,name,location_type,pm_2.5,temp_f,temp_c,humidity,pressure,...,last_update_check,created,uptime,is_owner,10min_avg,30min_avg,1hour_avg,6hour_avg,1day_avg,1week_avg
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
14633,,37.275561,-121.964134,Hazelwood canary,outside,0.81,65.0,18.333333,54.0,1008.57,...,,,,False,0.88,4.10,8.26,17.72,21.80,15.42
25999,,30.053808,-95.494643,Villages of Bridgestone AQI,outside,35.16,70.0,21.111111,70.0,1011.86,...,,,,False,34.71,33.59,32.77,26.03,16.30,14.88
14091,,37.883620,-122.070087,WC Hillside,outside,1.00,63.0,17.222222,57.0,1003.26,...,,,,False,2.28,2.72,3.16,16.25,25.80,22.91
108226,,38.573703,-121.439113,"""C"" Street Air Shelter",inside,4.76,78.0,25.555556,45.0,1015.66,...,,,,False,4.77,4.49,4.15,3.96,4.75,5.46
49409,,18.759182,99.017172,"""First's Place""",outside,40.83,87.0,30.555556,37.0,986.64,...,,,,False,45.70,49.98,50.08,50.61,46.32,32.53
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64085,,36.785883,127.157040,청룡동행정복지센터,outside,40.35,54.0,12.222222,46.0,1027.31,...,,,,False,37.24,38.78,41.66,57.59,63.40,45.20
64995,,36.691324,126.585255,한서대학교,outside,40.64,63.0,17.222222,33.0,1017.02,...,,,,False,45.03,45.97,43.96,39.68,37.22,28.38
64093,,36.710720,126.548390,해미읍성,outside,62.07,61.0,16.111111,44.0,1027.89,...,,,,False,57.93,56.06,52.51,43.19,40.98,31.83
29747,,36.761236,127.395300,화덕보건진료소,outside,34.77,62.0,16.666667,36.0,1018.16,...,,,,False,40.36,49.29,52.26,49.02,46.31,38.67


Here is a breakdown of the dataframe above and what each column represents. 


|Column Name   | Description |
|--------------|---------|
|lat |The latitude coordinate of the location  |
|lon | The longitude coordinate of the location |
|name |  The name of the location|
|location_type | The nature of the location (ie. inside or outside) |
|pm_2.5 | The level of fine particulate matter in the air of that location |
|temp_f | The temperature of the location in degrees Farenheit|
|temp_c | The temperature of the location in degrees Celsius|
|humidity | The humidity percentage of the location|
|pressure | The pressure index of the location (in millibars)|
|last_seen | The last seen date and timestamp in UTC |
|model |  Model of the specific sensor |
|flagged | Whether or not the channel was marked as flagged (usually based on a fault)|
|age | Sensor data age (when data was last received) |
|10min_avg | Average PM 2.5 AQI over the last 10 minutes |
|30min_avg | Average PM 2.5 AQI over the last 30 minutes |
|1hour_avg | Average PM 2.5 AQI over the last hour|
|6hour_avg | Average PM 2.5 AQI over the last 6 hours|
|1day_avg | Average PM 2.5 AQI over the last day |
|1week_avg |  Average PM 2.5 AQI over the last week|

<br>

### Airborne Particulate Matter (PM) 2.5 
While many of the column names are relatively straightforward, such as the "name" column (which displays the set name of the particular sensor), the "location_type" column (which indicates whether it is an indoor or outdoor sensor), etc., we would like to draw your attention to the "pm_2.5" column. 

>The "pm_2.5" column represents the count of airborne pm that is larger than 2.5um/dl, in otherwords, airborne particles that have a diameter of 2.5 micrometers or less. In high levels, PM 2.5 particles can reduce visibility and cause the air to appear hazy. Tracking PM 2.5 is important because prolonged exposure to high levels of PM 2.5 particles can cause adverse US Environmental Protection Agency (EPA) use to calculate the local Air Quality Index (AQI).

**QUESTION: Which item or object is closest to 1 micrometer?**

a) The length of an ant

b) The diameter of a spider web

c) The length of a grain of rice

**ANSWER**

a) The length of an ant is typically 1 millimeter, which is 1,000 micrometers

**b) The diameter of a spider web is typically between 8 to 10 micrometers**

c) The length of a grain of rice is typically 6 millimeters which is 6,000 micrometers.

If you go to the PurpleAir website [here](https://map.purpleair.com/1/mAQI/a10/p604800/cC0#14.67/37.87206/-122.26187), it should navigate you to a map of the surrounding Berkeley area. If you click on the some of the sensored located on UC Berkeley campus, you'll find that one of them is named "Le Conte Hall". 

Let's take a closer look at the Le Conte Hall Sensor! In the dataframe below we filter the dataframe by the sensor name ("Le Conte Hall") to pick out the row that corresponds to the specific sensor we are looking for. 

In [6]:
df[df['name'] == "Le Conte Hall"]

Unnamed: 0_level_0,parent,lat,lon,name,location_type,pm_2.5,temp_f,temp_c,humidity,pressure,...,last_update_check,created,uptime,is_owner,10min_avg,30min_avg,1hour_avg,6hour_avg,1day_avg,1week_avg
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
77905,,37.872589,-122.257219,Le Conte Hall,inside,0.35,78.0,25.555556,32.0,1004.55,...,,,,False,0.28,0.18,0.3,4.67,11.98,9.49


<br>

The row above gives us loads of information on the state of the AQI in Le Conte Hall at the present moment, but it would be nice to see the AQI information over time. Below is a dataframe that contains information about the Le Conte Hall sensor roughly over the last 7 days. We can do this by filtering the times each entry was created at.

In [7]:
## data from Le Conte Hall sensor from the past week
from purpleair.sensor import Sensor
se = Sensor(77905)
le_conte = se.parent.get_historical(weeks_to_get=1,thingspeak_field='secondary')
le_conte['Date'] = [i.date().strftime("%d-%b-%Y") for i in le_conte['created_at']]
le_conte

Unnamed: 0_level_0,created_at,0.3um/dl,0.5um/dl,1.0um/dl,2.5um/dl,5.0um/dl,10.0um/dl,PM1.0 (CF=ATM) ug/m3,PM10 (CF=ATM) ug/m3,Date
entry_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
295290,2021-12-01 00:00:48+00:00,1496.78,369.34,43.39,3.25,0.94,0.65,6.37,9.40,01-Dec-2021
295291,2021-12-01 00:02:48+00:00,1557.96,405.95,51.84,4.38,0.72,0.43,7.20,10.44,01-Dec-2021
295292,2021-12-01 00:04:48+00:00,1430.91,364.16,51.11,3.24,0.48,0.43,6.71,9.74,01-Dec-2021
295293,2021-12-01 00:06:48+00:00,1539.02,386.51,60.63,3.50,0.82,0.43,6.96,10.45,01-Dec-2021
295294,2021-12-01 00:08:48+00:00,1392.07,347.67,37.49,1.35,0.46,0.22,6.32,8.12,01-Dec-2021
...,...,...,...,...,...,...,...,...,...,...
300324,2021-12-07 23:50:34+00:00,964.21,210.06,23.83,1.35,0.22,0.00,3.65,4.78,07-Dec-2021
300325,2021-12-07 23:52:34+00:00,966.82,214.64,25.29,0.37,0.00,0.00,3.43,4.50,07-Dec-2021
300326,2021-12-07 23:54:35+00:00,945.52,202.40,15.53,0.22,0.00,0.00,2.88,3.71,07-Dec-2021
300327,2021-12-07 23:56:34+00:00,943.25,209.67,20.72,0.96,0.21,0.21,3.44,4.56,07-Dec-2021


As you can see from the "created_at" column, the AQI was taken every two minutes over the past 7 days. The data frame also contains information on PM paticules of different diameters such as 0.3, 0.5, 1.0, 2.5, 5.0 and 10.0.

<br>

While this dataframe is useful, there are too many rows of data (~5000) to look at! Below is a widget that plots a line graph of the PM 2.5 measure over a specific day. 

**The drop down bar allows you to pick which day you would like graphed, so go ahead and pick a day!**

In [8]:
def f(date):
    fig = plt.figure(figsize=(13,3))
    plt.plot(le_conte['created_at'].loc[le_conte['Date'] == date], le_conte["2.5um/dl"].loc[le_conte['Date'] == date])
    plt.xlabel('Time')
    plt.ylabel('PM 2.5 Particle Count')
    plt.title('Le Conte Hall Sensor PM 2.5')
    plt.rcParams["figure.figsize"] = (20,3)
    
interact(f, date = list(le_conte['Date'].unique()));

interactive(children=(Dropdown(description='date', options=('01-Dec-2021', '02-Dec-2021', '03-Dec-2021', '04-D…

The line plots above displays the date and hour along the x-axis and the PM 2.5 Particle count along the y-axis.

<br>

**QUESTION: What is the highest index reading on the first time series plot?**

*Your answer here*

<br>

**QUESTION: What trends do you notice about the line plot?**

*Your answer here*

<br>

**QUESTION: Why do you think the index readings fluctuate from point to point?**

*Your answer here*

<br>

While the line plots do show us a trend in the PM2.5 count over time, we still have not clue how that translates to the API Index. The next section will discuss what AQI is and how it is calculated.

### API Index
The API Index contains 6 categories that air quality can fall into. Each category contains a range of index values from 0 - 500 that is calculated from the regions PM 2.5 measure. The chart below is provided by the US Environmental Protection Agency (EPA) and shows the official AQI Index (these breakpoints were revised in 2012). 

For more information on how AQI Index is calculated, take a look at the AQI Index Factsheet provided by the EPA [here](https://www.epa.gov/sites/default/files/2016-04/documents/2012_aqi_factsheet.pdf)!

<p align="center">
  <img src="images/AQI-category.png" width="" height="" align="center">
</p>

<br>

**QUESTION: What is the difference between the original and revised breakpoints?**

*Your answer here*

<br>

**QUESTION: At 3:00 on November 30th, 2021 the PM 2.5 reading is 12.5. What category does it fall into?**

a) Good

b) Moderate

c) Unhealthy for Sensitive Groups

**ANSWER: The category is Moderate because it falls into the 12.1 - 35.4 range.**

Now that we know how sesors work, what they measure and how AQI Indexes are calculated, let's see if we can create a visualization of AQI Indexes that are a little closer to home!

First, let's find a group of sensors that are near UC Berkeley. The code cell below does just that. We use a range of longitude and latitude coordinates to decide whether to include or exclude a sensor. 

In [9]:
## UC Berkeley,CA - Lat: 37.871666 / Lon: -122.272781

berkeleyData = df.loc[(df["lat"] >= 37.8) & (df["lat"] <= 37.9) & (df["lon"] >= -122.3) & (df["lon"] <= -122.2)]
berkeleyData = berkeleyData[["lat", "lon", "name", "location_type", "pm_2.5", "temp_f", "humidity", "pressure"]]
berkeleyData

Unnamed: 0_level_0,lat,lon,name,location_type,pm_2.5,temp_f,humidity,pressure
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
20747,37.838977,-122.205489,1000ft Montclair,outside,1.16,56.0,78.0,979.92
81677,37.889085,-122.264327,"1044 Keith Ave, Berkeley",outside,3.67,59.0,64.0,995.76
79125,37.882941,-122.288017,1094 Tevlin St,inside,0.35,75.0,40.0,1013.70
77685,37.801872,-122.274582,10th and Washington,inside,31.83,77.0,37.0,1014.57
37971,37.883729,-122.290362,"1128 Key Route Blvd, Albany CA",outside,2.04,60.0,61.0,1014.15
...,...,...,...,...,...,...,...,...
26977,37.813427,-122.282483,Xanadu,inside,0.00,84.0,32.0,1015.51
56281,37.813500,-122.282971,Xanadu,outside,0.89,58.0,67.0,1015.93
62619,37.886177,-122.272443,Yolo,inside,0.00,90.0,22.0,1006.36
75223,37.827560,-122.205627,Zinn Drive,outside,2.02,56.0,68.0,986.10


Now that we have a smaller subset of data to work with, the next step is to use the PM 2.5 measures to assign each sensor to an AQI Index Category and corresponding color. 

In [11]:
#creating a column that indicates the AQI code name
color_code = []
for i in berkeleyData["pm_2.5"].to_list():
    if i <= 12.0:
        color_code.append('green')
    elif (i < 12) & (i <=35.4):
        color_code.append('yellow')
    elif (i < 35.5) & (i <=55.4):
        color_code.append('orange')   
    elif (i < 55.5) & (i <=150.4):
        color_code.append('red')
    elif (i < 150.5) & (i <=250.4):
        color_code.append('purple')
    else:
        color_code.append('darkpurple')


berkeleyData['code'] = color_code

<br>

Our last step is to use the longitude and latitude coordinates to map the relative location of the sensor with is corresponding AQI Index color! The widget below contains two sliders. One represents the Latitude value and the other is the Longitude value. 

**Slide the sliders left and right to display a mapping of the sensors in that latitude and longitude region, or use your cursor to drag the mapping area.**

**Hint: Berkeley, CA - Lat: 37.871666 / Lon: -122.272781** 

In [12]:
def map(Latitude ,Longitude):
    m = folium.Map(width=500, height=400, location=[Latitude, Longitude])
    
    for i in np.arange(len(berkeleyData) - 1):
        folium.Marker(
            location=[berkeleyData.iloc[i]['lat'], berkeleyData.iloc[i]['lon']],
            popup=berkeleyData.iloc[i]['name'],
            icon=folium.Icon(color=berkeleyData.iloc[i]['code']),
        ).add_to(m)
    display(m)
    
interact(map, Latitude = (36, 38, 0.001) , Longitude = (-123, -121, 0.001));
## UC Berkeley,CA - Lat: 37.871666 / Lon: -122.272781

interactive(children=(FloatSlider(value=37.0, description='Latitude', max=38.0, min=36.0, step=0.001), FloatSl…

Now that we have created a map we can easily see what the AQI index is across the city! 

<br>

**QUESTION: What do you notice about the map?**

*Your answer here*

<br>

Developed By: Melisa Esqueda, Maham Bawaney & Karalyn Chong