<a name="top"></a>
<div style="width:1000 px">

<div style="float:right; width:98 px; height:98px;">
<img src="https://raw.githubusercontent.com/Unidata/MetPy/master/src/metpy/plots/_static/unidata_150x150.png" alt="Unidata Logo" style="height: 98px;">
</div>

<h1>Pandas</h1>
<h3>Unidata AMS 2021 Student Conference</h3>

<div style="clear:both"></div>
</div>

---
<div style="float:right; width:250 px"><img src="../instructors/images/pandas_timeseries_example.png" alt="Timeseries of Temperature and Pressure from Aug 1st to 15th" style="height: 300px;"></div>


### Focuses
* Using this notebook we will become familiar with Pandas as a tool for quick and easy data analysis! [Pandas](https://pandas.pydata.org/docs/index.html)
* We will use pandas to organize, process, and plot miscellaneous data
* Learn basic subsetting, displaying, and arithmetic operations with pandas
* Plot data from a pandas DataFrame with [Matplotlib](https://matplotlib.org/)




### Objectives
1. [Create a Pandas DataFrame](#1.-Create-a-Pandas-DataFrame)
1. [Subsetting a DataFrame](#2.-Subsetting-a-DataFrame)
1. [Plot data from our DataFrame](#3.-Plot-data-from-our-DataFrame)
---

### Imports


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

---

## 1. Create a Pandas DataFrame


Below are 15 measurements of temperature and pressure. The temperature data is indegrees Kelvin, and the pressure 
data is given to us in units of Bars. We are told the measurements are daily starting on August first of 2020. 
In order to clean up this data, we should turn it into a Pandas DataFrame.

In [None]:
# Below are our sample timeseries of data
pressures = [1.011, 1.009, 1.0085, 1.0089, 1.0099, 1.013, 1.014, 1.013, 1.014, 1.018, 1.017, 1.011, 1.006, 1.001, 1.009]
temperatures = [301.5, 301.1, 301.3, 301.3, 301.7, 302.2, 302.3, 302.3, 302.4, 303.1, 302.9, 302.3, 302.1, 301.2, 301.5] 

# First, lets declare all our data as numpy arrays
temperature_data = np.array(temperatures)
pressure_data = np.array(pressures)

# Next, lets join our two numpy arrays together, so instead of two 15 element arrays, we have one 2x15 array
t_p_arrays = np.array([temperature_data, pressure_data])

# Here we will create the Pandas DataFrame
# To declare a DataFrame we will pass these lists to the pd.DataFrame()
t_p_dataframe = pd.DataFrame(t_p_arrays)
print(t_p_dataframe)

Because we passed the numpy array t_p_arrays to the pd.DataFrame() function, our current dataframe has two rows, 
one for pressure and one for temperature. There are 15 columns (index starts at zero) which correspond to each 
daily reading. Lets change our DataFrame to one in which the columns are temperature and pressure and the rows are
each daily reading!

To do this we have to pass the pd.DataFrame() function an array where the rows and columns are
switched. An easy way to do this is to take the transpose of t_p_arrays before passing it to pd.DataFrame().

In [None]:
# Declare a second dataframe using the transpose of t_p_arrays
t_p_arrays_transpose = np.transpose(t_p_arrays) # This line swaps the rows and columns of t_p_arrays
t_p_dataframe_transpose = pd.DataFrame(t_p_arrays_transpose)
print(t_p_dataframe_transpose)

Nice! Now we have an array where the first column is our index, the second column is our temperature in Kelvin, 
and our third column is pressure in bars! In our next section we will work on making this data more readable. 

<a href="#top">Top</a>

---

## 2. Subsetting a DataFrame
Under this objective, we want to learn how to subset data from the DataFrame for easier interpretation. 
We will start with becoming familiar with how to select one row or column from a padas DataFrame

In [None]:
# Lets just select the column of our DataFrame that corresponds to pressure
temperatures = t_p_dataframe_transpose[0] # Temperature uses index zero because temps are in the first column
print(temperatures) 

Perfect! Subsetting a column in pandas returns a pd.Series object, which are like the pandas equivalent of numpy
arrays.

In [None]:
# If we just wanted pressure and temperature from the 10th day then we should locate the row indexed with a 9
tenth_day_temp_and_pressure = t_p_dataframe_transpose.iloc[9]
print(tenth_day_temp_and_pressure)

In [None]:
# Lets index each of the readings by day
dates = pd.date_range("20200801", periods=15) # The data is 20200801 for August 1st 2020

temp_pres_df = pd.DataFrame(t_p_arrays_transpose, index=dates, columns=['Temp', 'Pressure'])


# Now change our temperature to celcius and the pressures to hPa
temp_pres_df['Temp'] = temp_pres_df['Temp'] - 273.15 # The conversion between
temp_pres_df['Pressure'] = temp_pres_df['Pressure']*1000

display(temp_pres_df) # Use the display() function instead of print for a fancier output!

#### We also want a second set of data which only includes readings where the Pressure is above 1010 hPa.
This can be done easily with our pandas DataFrame!

In [None]:
temp_pres_above_1010_df = temp_pres_df[temp_pres_df['Pressure'] > 1010.0]
display(temp_pres_above_1010_df)

<a href="#top">Top</a>

---

## 3. Plot data from our DataFrame

We can plot temperature and pressure data by using subsetting the columns as we did above and using matplotlib. Instead, we will try to use the pandas built in plot functions!

In [None]:
# Use the pandas .plot() function on our DataFrame

temp_pres_df.plot()

Hurray! The .plot() function automatically uses the column names, and date indices that we specified to create a
plot of pressure and temperature. Unfortunately, the scales of temeprature and pressure need to be changed, lets give both the temperature and pressure their own plot.

<a href="#top">Top</a>

---

In [None]:
 temp_pres_df.plot(subplots=True, figsize=(12, 6))

## Almost done!

Finally, lets add some labels and a title before finishing with our plot!

In [None]:
ax = temp_pres_df.plot(subplots=True, figsize=(12, 6)) 
plt.suptitle('Daily Temperature and Pressure from August 1st - August 15th 2020') # plt.suptitle means 'super title'
ax[0].set_ylabel('Temperature ($\degree$C)') # ax[0] is used because its the first plot
ax[1].set_xlabel('Days') # ax[1] is used because its the second plot
ax[1].set_ylabel('Pressure ($hPa$)')


<a href="#top">Top</a>

---

## Nice job! You are now ready to conquer timesereies data using pandas!