{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "
\n", "\"Unidata\n", "
\n", "\n", "

Pandas

\n", "

Unidata AMS 2021 Student Conference

\n", "\n", "
\n", "
\n", "\n", "---\n", "
\"Timeseries
\n", "\n", "\n", "### Focuses\n", "* Using this notebook we will become familiar with Pandas as a tool for quick and easy data analysis! [Pandas](https://pandas.pydata.org/docs/index.html)\n", "* We will use pandas to organize, process, and plot miscellaneous data\n", "* Learn basic subsetting, displaying, and arithmetic operations with pandas\n", "* Plot data from a pandas DataFrame with [Matplotlib](https://matplotlib.org/)\n", "\n", "\n", "\n", "\n", "### Objectives\n", "1. [Create a Pandas DataFrame](#1.-Create-a-Pandas-DataFrame)\n", "1. [Subsetting a DataFrame](#2.-Subsetting-a-DataFrame)\n", "1. [Plot data from our DataFrame](#3.-Plot-data-from-our-DataFrame)\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Create a Pandas DataFrame\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below are 15 measurements of temperature and pressure. The temperature data is indegrees Kelvin, and the pressure \n", "data is given to us in units of Bars. We are told the measurements are daily starting on August first of 2020. \n", "In order to clean up this data, we should turn it into a Pandas DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Below are our sample timeseries of data\n", "pressures = [1.011, 1.009, 1.0085, 1.0089, 1.0099, 1.013, 1.014, 1.013, 1.014, 1.018, 1.017, 1.011, 1.006, 1.001, 1.009]\n", "temperatures = [301.5, 301.1, 301.3, 301.3, 301.7, 302.2, 302.3, 302.3, 302.4, 303.1, 302.9, 302.3, 302.1, 301.2, 301.5] \n", "\n", "# First, lets declare all our data as numpy arrays\n", "temperature_data = np.array(temperatures)\n", "pressure_data = np.array(pressures)\n", "\n", "# Next, lets join our two numpy arrays together, so instead of two 15 element arrays, we have one 2x15 array\n", "t_p_arrays = np.array([temperature_data, pressure_data])\n", "\n", "# Here we will create the Pandas DataFrame\n", "# To declare a DataFrame we will pass these lists to the pd.DataFrame()\n", "t_p_dataframe = pd.DataFrame(t_p_arrays)\n", "print(t_p_dataframe)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because we passed the numpy array t_p_arrays to the pd.DataFrame() function, our current dataframe has two rows, \n", "one for pressure and one for temperature. There are 15 columns (index starts at zero) which correspond to each \n", "daily reading. Lets change our DataFrame to one in which the columns are temperature and pressure and the rows are\n", "each daily reading!\n", "\n", "To do this we have to pass the pd.DataFrame() function an array where the rows and columns are\n", "switched. An easy way to do this is to take the transpose of t_p_arrays before passing it to pd.DataFrame()." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Declare a second dataframe using the transpose of t_p_arrays\n", "t_p_arrays_transpose = np.transpose(t_p_arrays) # This line swaps the rows and columns of t_p_arrays\n", "t_p_dataframe_transpose = pd.DataFrame(t_p_arrays_transpose)\n", "print(t_p_dataframe_transpose)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nice! Now we have an array where the first column is our index, the second column is our temperature in Kelvin, \n", "and our third column is pressure in bars! In our next section we will work on making this data more readable. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Subsetting a DataFrame\n", "Under this objective, we want to learn how to subset data from the DataFrame for easier interpretation. \n", "We will start with becoming familiar with how to select one row or column from a padas DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Lets just select the column of our DataFrame that corresponds to pressure\n", "temperatures = t_p_dataframe_transpose[0] # Temperature uses index zero because temps are in the first column\n", "print(temperatures) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perfect! Subsetting a column in pandas returns a pd.Series object, which are like the pandas equivalent of numpy\n", "arrays." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If we just wanted pressure and temperature from the 10th day then we should locate the row indexed with a 9\n", "tenth_day_temp_and_pressure = t_p_dataframe_transpose.iloc[9]\n", "print(tenth_day_temp_and_pressure)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Lets index each of the readings by day\n", "dates = pd.date_range(\"20200801\", periods=15) # The data is 20200801 for August 1st 2020\n", "\n", "temp_pres_df = pd.DataFrame(t_p_arrays_transpose, index=dates, columns=['Temp', 'Pressure'])\n", "\n", "\n", "# Now change our temperature to celcius and the pressures to hPa\n", "temp_pres_df['Temp'] = temp_pres_df['Temp'] - 273.15 # The conversion between\n", "temp_pres_df['Pressure'] = temp_pres_df['Pressure']*1000\n", "\n", "display(temp_pres_df) # Use the display() function instead of print for a fancier output!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### We also want a second set of data which only includes readings where the Pressure is above 1010 hPa.\n", "This can be done easily with our pandas DataFrame!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "temp_pres_above_1010_df = temp_pres_df[temp_pres_df['Pressure'] > 1010.0]\n", "display(temp_pres_above_1010_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Plot data from our DataFrame\n", "\n", "We can plot temperature and pressure data by using subsetting the columns as we did above and using matplotlib. Instead, we will try to use the pandas built in plot functions!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use the pandas .plot() function on our DataFrame\n", "\n", "temp_pres_df.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hurray! The .plot() function automatically uses the column names, and date indices that we specified to create a\n", "plot of pressure and temperature. Unfortunately, the scales of temeprature and pressure need to be changed, lets give both the temperature and pressure their own plot." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ " temp_pres_df.plot(subplots=True, figsize=(12, 6))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Almost done!\n", "\n", "Finally, lets add some labels and a title before finishing with our plot!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ax = temp_pres_df.plot(subplots=True, figsize=(12, 6)) \n", "plt.suptitle('Daily Temperature and Pressure from August 1st - August 15th 2020') # plt.suptitle means 'super title'\n", "ax[0].set_ylabel('Temperature ($\\degree$C)') # ax[0] is used because its the first plot\n", "ax[1].set_xlabel('Days') # ax[1] is used because its the second plot\n", "ax[1].set_ylabel('Pressure ($hPa$)')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nice job! You are now ready to conquer timesereies data using pandas!" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:pyaos-ams-2021]", "language": "python", "name": "conda-env-pyaos-ams-2021-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }