{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "collapsed_sections": [ "c4IvRKVyvJwu", "oE_a6wVuMVmu", "mfL77XTyMrqZ", "1KkMPJ0TG3lx" ] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# **MATE Floats! Coding Notebook** - Day 3\n", "\n", "Created by Ethan C. Campbell for NCAT/MATE/GO-BGC Marine Technology Summer Program\n", "\n", "Wednesday, August 23, 2023" ], "metadata": { "id": "OxvLAQ1SWpeR" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-9O6SthNqtT8" }, "outputs": [], "source": [ "import numpy as np # NumPy is an array and math library\n", "import matplotlib.pyplot as plt # Matplotlib is a visualization (plotting) library\n", "import pandas as pd # Pandas lets us work with spreadsheet (.csv) data\n", "from datetime import datetime, timedelta # Datetime helps us work with dates and times" ] }, { "cell_type": "markdown", "source": [ "## Day 3, Part 1: `datetime` objects" ], "metadata": { "id": "c4IvRKVyvJwu" } }, { "cell_type": "markdown", "source": [ "**How do we track the passage of time in a data set?**\n", "\n", "One option is to count the **time elapsed** since some starting time. For example, we might count the number of seconds, minutes, hours, or days. Instead of only using whole numbers (e.g., 1 hour, 2 hours, 3 hours, 4 hours, etc.), we usually use **fractional times** (units with decimals, like 0.75 hours, 1.0 hours, 1.25 hours, 1.5 hours, etc.).\n", "\n", "As an alternative, we may want to simply track the dates and times themselves. After all, it is important to know what date and what time of day a measurement was taken.\n", "\n", "For this, we use the **`datetime`** package in Python. We have already imported it above using:\n", "\n", "> **`from datetime import datetime, timedelta`**" ], "metadata": { "id": "r7LbUikrvUvF" } }, { "cell_type": "markdown", "source": [ "`datetime` allows us to create a new type of variable called a **`datetime` object**. To do this, we use the following function syntax:\n", "\n", "> **`datetime(YEAR,MONTH,DAY,HOUR,MINUTE,SECOND,MICROSECOND)`**\n", "\n", "For example:" ], "metadata": { "id": "zRtGbU7iwlWs" } }, { "cell_type": "code", "source": [ "current_dt = datetime(2023,8,23,12,0,0,0) # This is 8/23/23 at 12:00:00.0p\n", "current_dt = datetime(2023,8,23,12) # Note: this gives the same result\n", "\n", "print(current_dt)" ], "metadata": { "id": "2f19URekvNpt", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "880fed6a-cf4e-416c-c539-3b7b4e67f805" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "2023-08-23 12:00:00\n" ] } ] }, { "cell_type": "markdown", "source": [ "To retrieve part of a datetime from a `datetime` object called `dt`, you can use the following syntax:\n", "\n", "```\n", "dt.year\n", "dt.month\n", "dt.day\n", "dt.hour\n", "dt.minute\n", "dt.second\n", "dt.microsecond\n", "```\n", "\n", "For example:" ], "metadata": { "id": "y34p-U51xrsg" } }, { "cell_type": "code", "source": [ "print(current_dt.year)" ], "metadata": { "id": "Rni5VdTKyIWm", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "a3e5ac63-0cb9-42fd-8ce6-8969f0e39299" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "2023\n" ] } ] }, { "cell_type": "markdown", "source": [ "***Try creating your own datetime object. What happens when you subtract one datetime from another?***" ], "metadata": { "id": "UyvOBvlUycq1" } }, { "cell_type": "code", "source": [ "# Write your code here:\n", "new_dt = datetime(2023,8,24,current_dt.hour,0)\n", "print(new_dt - current_dt)" ], "metadata": { "id": "tx1cYRfcyhUg", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "c4850608-a927-4bdf-8ac3-5b2e711f9b34" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "1 day, 0:00:00\n" ] } ] }, { "cell_type": "markdown", "source": [ "The great thing about `datetime` objects is that you can use them just like numbers:\n", "* You can add and subtract them.\n", "* You can put them in lists and arrays.\n", "* `Matplotlib` knows to treat datetimes like numbers in plots." ], "metadata": { "id": "aCycWJMFyKyE" } }, { "cell_type": "markdown", "source": [ "## Day 3, Part 2: Loading sensor time series data" ], "metadata": { "id": "oE_a6wVuMVmu" } }, { "cell_type": "markdown", "source": [ "Up until now, we've been using data that we've typed directly into Python. However, most real-world data is stored in files that we'd like to open using Python.\n", "\n", "The most common type of data file is a **spreadsheet**, which has rows and columns. Generally, the columns will have column labels.\n", "\n", "Spreadsheets are often stored in **comma-separated value (CSV)** format, with the file extension being `.csv`. Data files in this format can be opened using Microsoft Excel or Google Sheets, as well as Python.\n", "\n", "In Python, we use the `pandas` package to work with spreadsheet data. We imported the package earlier using:\n", "\n", "> `import pandas as pd`\n", "\n", "Just like NumPy has arrays, Pandas has two types of objects: `Series` and `DataFrame`. This is what they look like:\n", "![Pandas example.png]()" ], "metadata": { "id": "-ds8ZIbDL1zy" } }, { "cell_type": "markdown", "source": [ "For now, we'll just be applying simple operations to read spreadsheet data using `pandas`. But if you would like to learn more, check out these [lesson slides](https://ethan-campbell.github.io/OCEAN_215/materials/lessons/lesson_9.pdf)." ], "metadata": { "id": "jMxncH-WL1z1" } }, { "cell_type": "markdown", "source": [ "Let's see how we can visualize data from the sensors that you built.\n", "\n", "***First, download two sample data files from Google Drive here:*** https://drive.google.com/drive/folders/18c42CtHgthenSEoPP9WKHEVrC1jn1Dr8?usp=drive_link. They should be named:\n", "* `temp_data_0m.csv` (data from 0 meters depth)\n", "* `temp_data_5m.csv` (data from 5 meters depth)\n", "\n", "Next, we can upload the files to this Google Colab notebook. ***Click the sidebar folder icon on the left, then use the page-with-arrow icon at the top to select the files and upload them.*** NOTE: uploaded files will be deleted from Google Colab when you refresh this notebook!\n", "\n", "We will specify each **filepath** using string variables:" ], "metadata": { "id": "vqsRjtMRu6h_" } }, { "cell_type": "code", "source": [ "filepath_0m = '/content/temp_data_0m.csv'\n", "filepath_5m = '/content/temp_data_5m.csv'" ], "metadata": { "id": "KPV_riICMNLU" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Now, we can load the files using `pandas`:\n", "\n", "> **`pd.read_csv(FILEPATH, ARGUMENTS...)`**\n", "\n", "This function is very customizable using the many optional `ARGUMENTS`, which allow it to handle almost any file. You can find documentation about the arguments [at this link](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html).\n", "\n", "***Let's first take a look at the data file using a simple text editor. Notice the long header. What argument can we use to exclude the header from being loaded?***\n", "\n", "Below, we'll load a data file using ``pd.read_csv()`` and store the data into a new variable.\n", "\n", "We can look at the data using **`display()`** (which is a fancy version of `print()` for DataFrames):" ], "metadata": { "id": "4a4_4izzMJbi" } }, { "cell_type": "code", "source": [ "data_0m = pd.read_csv(filepath_0m)\n", "\n", "display(data_0m)" ], "metadata": { "id": "rE3uBvQx2Fly", "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "outputId": "c976929b-c91e-438a-8ba7-0b2f717896f2" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ " 7/28/2023 11:47 1 28af1394 21.375\n", "0 7/28/2023 11:48 1 28af1394 21.3750\n", "1 7/28/2023 11:49 1 28af1394 21.3750\n", "2 7/28/2023 11:50 1 28af1394 21.3750\n", "3 7/28/2023 11:51 1 28af1394 21.5000\n", "4 7/28/2023 11:52 1 28af1394 21.4375\n", ".. ... .. ... ...\n", "105 7/28/2023 13:38 1 28af1394 21.8750\n", "106 7/28/2023 13:39 1 28af1394 21.8750\n", "107 7/28/2023 13:40 1 28af1394 21.9375\n", "108 7/28/2023 13:41 1 28af1394 21.8750\n", "109 7/28/2023 13:42 1 28af1394 21.8750\n", "\n", "[110 rows x 4 columns]" ], "text/html": [ "\n", "
\n", " | 7/28/2023 11:47 | \n", "1 | \n", "28af1394 | \n", "21.375 | \n", "
---|---|---|---|---|
0 | \n", "7/28/2023 11:48 | \n", "1 | \n", "28af1394 | \n", "21.3750 | \n", "
1 | \n", "7/28/2023 11:49 | \n", "1 | \n", "28af1394 | \n", "21.3750 | \n", "
2 | \n", "7/28/2023 11:50 | \n", "1 | \n", "28af1394 | \n", "21.3750 | \n", "
3 | \n", "7/28/2023 11:51 | \n", "1 | \n", "28af1394 | \n", "21.5000 | \n", "
4 | \n", "7/28/2023 11:52 | \n", "1 | \n", "28af1394 | \n", "21.4375 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
105 | \n", "7/28/2023 13:38 | \n", "1 | \n", "28af1394 | \n", "21.8750 | \n", "
106 | \n", "7/28/2023 13:39 | \n", "1 | \n", "28af1394 | \n", "21.8750 | \n", "
107 | \n", "7/28/2023 13:40 | \n", "1 | \n", "28af1394 | \n", "21.9375 | \n", "
108 | \n", "7/28/2023 13:41 | \n", "1 | \n", "28af1394 | \n", "21.8750 | \n", "
109 | \n", "7/28/2023 13:42 | \n", "1 | \n", "28af1394 | \n", "21.8750 | \n", "
110 rows × 4 columns
\n", "\n", " | ID | \n", "Serial | \n", "Temp | \n", "
---|---|---|---|
Datetime | \n", "\n", " | \n", " | \n", " |
2023-07-28 10:48:00 | \n", "1 | \n", "28fc3097 | \n", "19.8125 | \n", "
2023-07-28 11:06:00 | \n", "1 | \n", "28fc3097 | \n", "19.5625 | \n", "
2023-07-28 11:08:00 | \n", "1 | \n", "28fc3097 | \n", "19.5625 | \n", "
2023-07-28 11:09:00 | \n", "1 | \n", "28fc3097 | \n", "19.5000 | \n", "
2023-07-28 11:10:00 | \n", "1 | \n", "28fc3097 | \n", "19.5000 | \n", "
2023-07-28 11:11:00 | \n", "1 | \n", "28fc3097 | \n", "19.5625 | \n", "
2023-07-28 11:14:00 | \n", "1 | \n", "28fc3097 | \n", "19.6250 | \n", "
2023-07-28 11:19:00 | \n", "1 | \n", "28fc3097 | \n", "20.3750 | \n", "
2023-07-28 11:31:00 | \n", "1 | \n", "28fc3097 | \n", "19.7500 | \n", "
2023-07-28 11:36:00 | \n", "1 | \n", "28fc3097 | \n", "19.5000 | \n", "
2023-07-28 11:40:00 | \n", "1 | \n", "28fc3097 | \n", "19.6250 | \n", "
2023-07-28 11:45:00 | \n", "1 | \n", "28fc3097 | \n", "20.3125 | \n", "
2023-07-28 11:50:00 | \n", "1 | \n", "28fc3097 | \n", "20.9375 | \n", "
2023-07-28 11:55:00 | \n", "1 | \n", "28fc3097 | \n", "21.0000 | \n", "
2023-07-28 12:00:00 | \n", "1 | \n", "28fc3097 | \n", "21.0000 | \n", "
2023-07-28 12:05:00 | \n", "1 | \n", "28fc3097 | \n", "21.0000 | \n", "
2023-07-28 12:10:00 | \n", "1 | \n", "28fc3097 | \n", "20.9375 | \n", "
2023-07-28 12:15:00 | \n", "1 | \n", "28fc3097 | \n", "21.2500 | \n", "
2023-07-28 12:20:00 | \n", "1 | \n", "28fc3097 | \n", "21.1875 | \n", "
2023-07-28 12:25:00 | \n", "1 | \n", "28fc3097 | \n", "21.0000 | \n", "
2023-07-28 12:30:00 | \n", "1 | \n", "28fc3097 | \n", "21.1875 | \n", "
2023-07-28 12:35:00 | \n", "1 | \n", "28fc3097 | \n", "20.8750 | \n", "
2023-07-28 12:40:00 | \n", "1 | \n", "28fc3097 | \n", "21.1250 | \n", "
2023-07-28 12:45:00 | \n", "1 | \n", "28fc3097 | \n", "20.9375 | \n", "
2023-07-28 12:50:00 | \n", "1 | \n", "28fc3097 | \n", "21.1250 | \n", "
2023-07-28 12:54:00 | \n", "1 | \n", "28fc3097 | \n", "21.2500 | \n", "
2023-07-28 12:59:00 | \n", "1 | \n", "28fc3097 | \n", "21.2500 | \n", "
2023-07-28 13:04:00 | \n", "1 | \n", "28fc3097 | \n", "21.2500 | \n", "
2023-07-28 13:09:00 | \n", "1 | \n", "28fc3097 | \n", "21.1875 | \n", "
2023-07-28 13:14:00 | \n", "1 | \n", "28fc3097 | \n", "21.1875 | \n", "
2023-07-28 13:19:00 | \n", "1 | \n", "28fc3097 | \n", "21.1875 | \n", "
2023-07-28 13:24:00 | \n", "1 | \n", "28fc3097 | \n", "21.1875 | \n", "
2023-07-28 13:29:00 | \n", "1 | \n", "28fc3097 | \n", "21.1875 | \n", "
2023-07-28 13:34:00 | \n", "1 | \n", "28fc3097 | \n", "21.1875 | \n", "
2023-07-28 13:39:00 | \n", "1 | \n", "28fc3097 | \n", "21.2500 | \n", "
2023-07-28 13:44:00 | \n", "1 | \n", "28fc3097 | \n", "21.2500 | \n", "
2023-07-28 13:49:00 | \n", "1 | \n", "28fc3097 | \n", "22.7500 | \n", "