{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "collapsed_sections": [ "c4IvRKVyvJwu", "oE_a6wVuMVmu", "mfL77XTyMrqZ", "1KkMPJ0TG3lx" ] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# **MATE Floats! Coding Notebook** - Day 3\n", "\n", "Created by Ethan C. Campbell for NCAT/MATE/GO-BGC Marine Technology Summer Program\n", "\n", "Wednesday, August 23, 2023" ], "metadata": { "id": "OxvLAQ1SWpeR" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-9O6SthNqtT8" }, "outputs": [], "source": [ "import numpy as np # NumPy is an array and math library\n", "import matplotlib.pyplot as plt # Matplotlib is a visualization (plotting) library\n", "import pandas as pd # Pandas lets us work with spreadsheet (.csv) data\n", "from datetime import datetime, timedelta # Datetime helps us work with dates and times" ] }, { "cell_type": "markdown", "source": [ "## Day 3, Part 1: `datetime` objects" ], "metadata": { "id": "c4IvRKVyvJwu" } }, { "cell_type": "markdown", "source": [ "**How do we track the passage of time in a data set?**\n", "\n", "One option is to count the **time elapsed** since some starting time. For example, we might count the number of seconds, minutes, hours, or days. Instead of only using whole numbers (e.g., 1 hour, 2 hours, 3 hours, 4 hours, etc.), we usually use **fractional times** (units with decimals, like 0.75 hours, 1.0 hours, 1.25 hours, 1.5 hours, etc.).\n", "\n", "As an alternative, we may want to simply track the dates and times themselves. After all, it is important to know what date and what time of day a measurement was taken.\n", "\n", "For this, we use the **`datetime`** package in Python. We have already imported it above using:\n", "\n", "> **`from datetime import datetime, timedelta`**" ], "metadata": { "id": "r7LbUikrvUvF" } }, { "cell_type": "markdown", "source": [ "`datetime` allows us to create a new type of variable called a **`datetime` object**. To do this, we use the following function syntax:\n", "\n", "> **`datetime(YEAR,MONTH,DAY,HOUR,MINUTE,SECOND,MICROSECOND)`**\n", "\n", "For example:" ], "metadata": { "id": "zRtGbU7iwlWs" } }, { "cell_type": "code", "source": [ "current_dt = datetime(2023,8,23,12,0,0,0) # This is 8/23/23 at 12:00:00.0p\n", "current_dt = datetime(2023,8,23,12) # Note: this gives the same result\n", "\n", "print(current_dt)" ], "metadata": { "id": "2f19URekvNpt" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "To retrieve part of a datetime from a `datetime` object called `dt`, you can use the following syntax:\n", "\n", "```\n", "dt.year\n", "dt.month\n", "dt.day\n", "dt.hour\n", "dt.minute\n", "dt.second\n", "dt.microsecond\n", "```\n", "\n", "For example:" ], "metadata": { "id": "y34p-U51xrsg" } }, { "cell_type": "code", "source": [ "print(current_dt.year)" ], "metadata": { "id": "Rni5VdTKyIWm" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***Try creating your own datetime object. What happens when you subtract one datetime from another?***" ], "metadata": { "id": "UyvOBvlUycq1" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "tx1cYRfcyhUg" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "The great thing about `datetime` objects is that you can use them just like numbers:\n", "* You can add and subtract them.\n", "* You can put them in lists and arrays.\n", "* `Matplotlib` knows to treat datetimes like numbers in plots." ], "metadata": { "id": "aCycWJMFyKyE" } }, { "cell_type": "markdown", "source": [ "## Day 3, Part 2: Loading sensor time series data" ], "metadata": { "id": "oE_a6wVuMVmu" } }, { "cell_type": "markdown", "source": [ "Up until now, we've been using data that we've typed directly into Python. However, most real-world data is stored in files that we'd like to open using Python.\n", "\n", "The most common type of data file is a **spreadsheet**, which has rows and columns. Generally, the columns will have column labels.\n", "\n", "Spreadsheets are often stored in **comma-separated value (CSV)** format, with the file extension being `.csv`. Data files in this format can be opened using Microsoft Excel or Google Sheets, as well as Python.\n", "\n", "In Python, we use the `pandas` package to work with spreadsheet data. We imported the package earlier using:\n", "\n", "> `import pandas as pd`\n", "\n", "Just like NumPy has arrays, Pandas has two types of objects: `Series` and `DataFrame`. This is what they look like:\n", "![Pandas example.png]()" ], "metadata": { "id": "-ds8ZIbDL1zy" } }, { "cell_type": "markdown", "source": [ "For now, we'll just be applying simple operations to read spreadsheet data using `pandas`. But if you would like to learn more, check out these [lesson slides](https://ethan-campbell.github.io/OCEAN_215/materials/lessons/lesson_9.pdf)." ], "metadata": { "id": "jMxncH-WL1z1" } }, { "cell_type": "markdown", "source": [ "Let's see how we can visualize data from the sensors that you built.\n", "\n", "***First, download two sample data files from Google Drive here:*** https://drive.google.com/drive/folders/18c42CtHgthenSEoPP9WKHEVrC1jn1Dr8?usp=drive_link. They should be named:\n", "* `temp_data_0m.csv` (data from 0 meters depth)\n", "* `temp_data_5m.csv` (data from 5 meters depth)\n", "\n", "Next, we can upload the files to this Google Colab notebook. ***Click the sidebar folder icon on the left, then use the page-with-arrow icon at the top to select the files and upload them.*** NOTE: uploaded files will be deleted from Google Colab when you refresh this notebook!\n", "\n", "We will specify each **filepath** using string variables:" ], "metadata": { "id": "vqsRjtMRu6h_" } }, { "cell_type": "code", "source": [ "filepath_0m = '/content/temp_data_0m.csv'\n", "filepath_5m = '/content/temp_data_5m.csv'" ], "metadata": { "id": "KPV_riICMNLU" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Now, we can load the files using `pandas`:\n", "\n", "> **`pd.read_csv(FILEPATH, ARGUMENTS...)`**\n", "\n", "This function is very customizable using the many optional `ARGUMENTS`, which allow it to handle almost any file. You can find documentation about the arguments [at this link](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html).\n", "\n", "***Let's first take a look at the data file using a simple text editor. Notice the long header. What argument can we use to exclude the header from being loaded?***\n", "\n", "Below, we'll load a data file using ``pd.read_csv()`` and store the data into a new variable.\n", "\n", "We can look at the data using **`display()`** (which is a fancy version of `print()` for DataFrames):" ], "metadata": { "id": "4a4_4izzMJbi" } }, { "cell_type": "code", "source": [ "data_0m = pd.read_csv(filepath_0m)\n", "\n", "display(data_0m)" ], "metadata": { "id": "rE3uBvQx2Fly" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***What do you notice?***\n", "\n", "It appears that we'll need to specify a few additional arguments within `pd.read_csv()`.\n", "\n", "* `index_col`: this argument accepts an integer (e.g., 0) and tells `pandas` to convert that column into an index\n", "* `header`: this argument specifies which line of the file to use for column labels (default: `header=0`), or use `header=None` if there are no labels\n", "* `names`: to specify column labels, give a list of strings with each label\n", "* `parse_dates`: to tell Python to translate certain column(s) into `datetime` objects, give it those column names or indices inside a list\n", "\n", "Remember that you can consult the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) if you need more information.\n", "\n", "Below, we will load in the two data files using `pd.read_csv()` with the function arguments above:" ], "metadata": { "id": "4tw33xD32AB0" } }, { "cell_type": "code", "source": [ "data_0m = pd.read_csv(filepath_0m,index_col=0,header=None,names=['Datetime','ID','Serial','Temp'],parse_dates=['Datetime'])\n", "data_5m = pd.read_csv(filepath_5m,index_col=0,header=None,names=['Datetime','ID','Serial','Temp'],parse_dates=['Datetime'])\n", "\n", "display(data_5m)" ], "metadata": { "id": "xzlsdqo4MbHp" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Day 3, Part 3: Plotting sensor time series data" ], "metadata": { "id": "mfL77XTyMrqZ" } }, { "cell_type": "markdown", "source": [ "The data in a `pandas` DataFrame is similar to a NumPy 2-D array, except we use **column labels** to refer to columns and **index** values to refer to rows.\n", "\n", "To retrieve a specific column, we use bracket notation: **`data_frame[COLUMN_LABEL]`**." ], "metadata": { "id": "4HpB3WGDMaUu" } }, { "cell_type": "code", "source": [ "# For example:\n", "data_0m['Temp']" ], "metadata": { "id": "DXx1Vu-KMaUv" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***Using the data variables for each of the two files, try plotting the two time series (time vs. temperature) using `plt.plot()`.***\n", "\n", "IMPORTANT: instead of referencing the datetime column using `data_0m['Datetime']`, we'll have to reference it using `data_0m.index` since we designated it as the index." ], "metadata": { "id": "cXGIFkQx4jQq" } }, { "cell_type": "code", "source": [ "# Write your code below:\n" ], "metadata": { "id": "toiPuz2sM3xi" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "You may want to use two additional Matplotlib functions:\n", "\n", "* **`plt.gcf().autofmt_xdate()`**: this formats the x-axis datetimes more neatly\n", "* **`plt.xlim([START,END])`**: this sets the lower and upper limits of the x-axis; `START` and `END` can be specified as `datetime` objects (or can be `None`)\n", "\n", "***Try repeating the plot above using these functions:***" ], "metadata": { "id": "mCjTNWcK5Mdi" } }, { "cell_type": "code", "source": [ "# Write your code below:\n" ], "metadata": { "id": "LpXuceDK5fWG" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Day 3, Part 4: Other useful `pandas` functions" ], "metadata": { "id": "1KkMPJ0TG3lx" } }, { "cell_type": "markdown", "source": [ "To select a certain index value of a column, use `.loc[]`:\n", "\n", "> **`data[COLUMN_NAME].loc[INDEX_VALUE]`**\n", "\n", "To select multiple index values, use slicing:\n", "\n", "> **`data[COLUMN_NAME].loc[INDEX_START:INDEX_END]`**\n", "\n", "For example:\n" ], "metadata": { "id": "goG6rxo3G-gb" } }, { "cell_type": "code", "source": [ "# Temperature only, convered to a NumPy array:\n", "data_0m['Temp']" ], "metadata": { "id": "dG-3mgvqGWx8" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Select a single value of temperature\n", "data_0m['Temp'].loc[datetime(2023,7,28,11,49)]" ], "metadata": { "id": "fni0Q0dPHbdS" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Select a range of temperature measurements\n", "data_0m['Temp'].loc[datetime(2023,7,28,11,49):datetime(2023,7,28,11,51)]" ], "metadata": { "id": "5iNjrVc_GY6E" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "To convert a column from `pandas` format to a NumPy array, use **`.values`**:" ], "metadata": { "id": "4M7kkUrYHicu" } }, { "cell_type": "code", "source": [ "# Pandas format, including the Datetime index:\n", "data_0m['Temp']" ], "metadata": { "id": "UJkZHT9dHqEJ" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Temperature only, convered to a NumPy array:\n", "data_0m['Temp'].values" ], "metadata": { "id": "2K9s2ztHHuoo" }, "execution_count": null, "outputs": [] } ] }