{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "collapsed_sections": [ "UszPR4dNY8sW", "J2FPr4LV86OO", "aUDER1Q4Htyy", "F31svxgPNP_t", "zyTGovhuH1_p", "AyEv5zepGLGE", "3TIfVm6TmBVP", "ESibLGhhRp8n", "hv0lpsGFldF_", "tYzchKpM7DlA" ] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# **MATE Floats! Coding Notebook** - Day 2\n", "\n", "Created by Ethan C. Campbell for NCAT/MATE/GO-BGC Marine Technology Summer Program\n", "\n", "Tuesday, August 22, 2023" ], "metadata": { "id": "OxvLAQ1SWpeR" } }, { "cell_type": "markdown", "source": [ "## Part 1: Python and notebooks" ], "metadata": { "id": "UszPR4dNY8sW" } }, { "cell_type": "markdown", "source": [ "**Computer code** allows us to work with data, create visualizations, and create repeatable scientific workflows. It is an integral part of the modern scientific method!\n", "\n", "Every programming language has a specific **syntax**. In English as well as programming languages, syntax describes valid combinations of symbols and words:\n", "* Syntactically invalid: \"boy dog cat\"\n", "* Syntactically valid: \"boy hugs cat\"\n", "* Syntactically valid (but **semantically** invalid): \"cat hugs boy\"\n", "\n", "**Semantics** refer to whether a phrase has meaning. It's up to us to write computer code that has scientific meaning and is useful. The computer will allow us to write code that is syntactically valid but semantically – or scientifically – incorrect!\n", "\n", "---" ], "metadata": { "id": "stif3BkqXGuD" } }, { "cell_type": "markdown", "source": [ "![Programming languages.png]()" ], "metadata": { "id": "Djp9pEel9qA6" } }, { "cell_type": "markdown", "source": [ "(*Image source: [stackoverflow.blog](https://stackoverflow.blog/2017/09/06/incredible-growth-python/)*)\n", "\n", "No programming language is perfect. As the inventor of C++ once said, *“There are only two kinds of programming languages: the ones people complain about and the ones nobody uses.”*\n", "\n", "However, there are many reasons that we use Python instead of other programming languages, like MATLAB, Java, or C:\n", "- It's free!\n", "- It's old, so it's very stable (Python was created in 1991)\n", "- It can do almost anything\n", "- It's incredibly popular inside and outside of science (so it could help you land a job)\n", "- It's open source, which means anyone can help to improve it\n", "- It reads a bit like written English, so it's easier to write and understand\n", "\n", "***Question: How many of you have heard of Python before this course? Who has written code in Python (or a different language) before?***" ], "metadata": { "id": "E752-6589-dV" } }, { "cell_type": "markdown", "source": [ "---\n", "\n", "This web page is called a **notebook**. It lets us write and run computer code, and the results get displayed and saved alongside the code. If you download this notebook in the File menu, the file extension will be `.ipynb`.\n", "\n", "Sometimes it makes more sense to create a **script** instead of a notebook. Scripts are code files that run from top to bottom, and they don't save the output.\n", "\n", "***Question: When we run Python code in this notebook, where is the code actually being run?***" ], "metadata": { "id": "YkJewcBFh3eR" } }, { "cell_type": "markdown", "source": [ "---\n", "\n", "First, we always have to load **packages** into the notebook using the `import` command! Packages give us additional **functions** that allow us to get more stuff done.\n", "\n", "To run a coding cell, you can click the \"play\" button or type `Shift`-`Enter` (PC) or `Shift`-`Return` (Mac) on your keyboard. ***Try this with the cell below:***" ], "metadata": { "id": "db2A18q6WXtQ" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-9O6SthNqtT8" }, "outputs": [], "source": [ "import numpy as np # NumPy is an array and math library\n", "import matplotlib.pyplot as plt # Matplotlib is a visualization (plotting) library\n", "import pandas as pd # Pandas lets us work with spreadsheet (.csv) data\n", "from datetime import datetime, timedelta # Datetime helps us work with dates and times" ] }, { "cell_type": "markdown", "source": [ "When we write `import numpy as np`, we are saying: \"import the package NumPy and we will access it using the abbreviation `np` from here onwards.\" You could technically write any abbreviation, but `np` is standard for NumPy." ], "metadata": { "id": "vmaVXRWMAxMO" } }, { "cell_type": "markdown", "source": [ "Often we'd like to add notes to our code. You can do this using **comments**, notated above using a \\# (hash) symbol. Everything after the \\# is ignored and not treated like code." ], "metadata": { "id": "_b8DR4MoAlCW" } }, { "cell_type": "markdown", "source": [ "## Part 2: Variables and math" ], "metadata": { "id": "J2FPr4LV86OO" } }, { "cell_type": "markdown", "source": [ "We can use Python as a calculator. Run the cell below:" ], "metadata": { "id": "ESXwBOoGW8qS" } }, { "cell_type": "code", "source": [ "3 + 9" ], "metadata": { "id": "5LUJIIQ6XWi0" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Note that parentheses can be used to change the order of operations:" ], "metadata": { "id": "SOh05mH6EpLI" } }, { "cell_type": "code", "source": [ "1 + 2 * 3 + 4" ], "metadata": { "id": "P06zR16eEi3O" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "(1 + 2) * (3 + 4)" ], "metadata": { "id": "tLOCMyTBEvck" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "If Python doesn't recognize the code, it will give an **error**.\n", "\n", "***What helpful information does the following error message include?***" ], "metadata": { "id": "s5D32op-iCGK" } }, { "cell_type": "code", "source": [ "3 + hello" ], "metadata": { "id": "uCPfRriciBXp" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Try doing some math yourself below. ***Question: Can you figure out how to multiply and divide numbers?***" ], "metadata": { "id": "P3n99f5yXZPs" } }, { "cell_type": "code", "source": [], "metadata": { "id": "PmQitlaNXfKF" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Usually, Python needs to be told when to \"print\" something to the screen. For this, we use the **`print()`** function:" ], "metadata": { "id": "B3kVB6JVXksP" } }, { "cell_type": "code", "source": [ "print('Hello world!')" ], "metadata": { "id": "PQeI0aJbXstQ" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***Try writing code to print a different message:***" ], "metadata": { "id": "gnjpRXeOiZsz" } }, { "cell_type": "code", "source": [], "metadata": { "id": "WGWdyQjEibZE" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Note how comments are used in two ways below, both to describe a section of code and to annotate a specific line:" ], "metadata": { "id": "qetc0zzL13rG" } }, { "cell_type": "code", "source": [ "# This is a section comment\n", "print('This is not a comment')\n", "print('This is also not a comment') # This is a line comment" ], "metadata": { "id": "CWCYvrgX2IX9" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "In Python, we use **variables** to store information. Variables can be numbers (**integers** or **floats**), combinations of characters (called **strings**), a **boolean** (which are either True or False), or other things that are generally called \"**objects**\".\n", "\n", "To save a variable, we use the equal sign (`=`). You can name your variable anything descriptive, as long as it's one word! Note that underscore (`_`) can be used to join words in a variable name." ], "metadata": { "id": "2yuXcWy5XxWX" } }, { "cell_type": "code", "source": [ "a = -5 # This variable is an \"integer\" because is a whole number (a number without a decimal point)\n", "almost_ten = 9.9 # This variable is a \"float\" because is a floating point number (a number with a decimal point)\n", "scientific = 2e3 # This variable is also a float, and is written in scientific notation: 2.0 x 10^3 = 1000\n", "\n", "mate = 'FLOATS' # This variable is a string\n", "mate_2 = \"FLOATS\" # You can also specify strings using double quotation marks\n", "\n", "boolean = True # This variable is a boolean" ], "metadata": { "id": "OGHwCzCiYOiv" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print(a)" ], "metadata": { "id": "p-6FzKFwYNJ9" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print(almost_ten)" ], "metadata": { "id": "47-LhFOaYQ0m" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print(scientific)" ], "metadata": { "id": "XumLM8cKGAiC" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print(mate)\n", "print(mate_2)" ], "metadata": { "id": "qD3PPGarYXdF" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print(boolean)" ], "metadata": { "id": "Hegia9C2GdUw" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "You can do math at the same time that you create a variable!" ], "metadata": { "id": "0rlqnIOZ9NlL" } }, { "cell_type": "code", "source": [ "result = 2023 - 1913\n", "print(result)" ], "metadata": { "id": "PuWeV09m9VA_" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***Try the following:***\n", "1. ***Search on Google for the formula to convert Fahrenheit to Celsius.***\n", "2. ***Save a variable with the current Seattle temperature in Fahrenheit (feel free to guess, or look it up).***\n", "3. ***Then create a new variable with that temperature converted into Celsius (using math).***\n", "4. ***Print that result!***" ], "metadata": { "id": "OBmYHJ93MZO1" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "oV3vRYprMont" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "You can also change a variable using this compact notation:\n", "* `a += b` is the same as `a = a + b`\n", "* `a -= b` is the same as `a = a - b`\n", "* `a *= b` is the same as `a = a * b`" ], "metadata": { "id": "4KD9Yq3lFHH2" } }, { "cell_type": "code", "source": [ "result += 50\n", "print(result)" ], "metadata": { "id": "zPlOmwLpFcu9" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Note that Python treats booleans (True and False) like the integers 1 and 0, respectively. ***This means you can do math with booleans. What will the code produce below, and why?***" ], "metadata": { "id": "aIjuN0miGoUt" } }, { "cell_type": "code", "source": [ "print((False * 5) + (True * 3))" ], "metadata": { "id": "pIt2B0QQG6TX" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***What happens when you add two strings together? Try it below.***" ], "metadata": { "id": "1rp4Jnh27cgX" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "K1CmVOqj7hYP" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Part 3: Lists, 1-D arrays, indexing, and slicing" ], "metadata": { "id": "aUDER1Q4Htyy" } }, { "cell_type": "markdown", "source": [ "To store multiple numbers, we use **lists** or **NumPy arrays**. Lists and arrays are types of variables, and NumPy is one of the packages that we imported at the top. Here's how we create a list or array:" ], "metadata": { "id": "_u4V8X5zYWnc" } }, { "cell_type": "code", "source": [ "my_list = [1,2,3,4,5]" ], "metadata": { "id": "DeEk5f6tGt1I" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "my_array = np.array([1,2,3,4,5,6,7,8,9])" ], "metadata": { "id": "stU_2biAYpWF" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print(my_list)\n", "print(my_array)" ], "metadata": { "id": "1ZZAFrtPYqTi" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "You can add elements to the end of a list by **appending**. The syntax is:\n", "\n", "> **`list_name.append(NEW_ELEMENT)`**" ], "metadata": { "id": "KF7f04zmPAva" } }, { "cell_type": "code", "source": [ "# Append to the list that you created earlier:\n", "my_list.append(6)\n", "my_list.append(7)\n", "print(my_list)" ], "metadata": { "id": "3l95QDNjPON5" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "You can convert a list to an array by putting it inside **`np.array()`**:" ], "metadata": { "id": "lakCzdpAOyys" } }, { "cell_type": "code", "source": [ "print(np.array(my_list))" ], "metadata": { "id": "qVHEFrDVO30V" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "A list can store a combination of numbers and strings, while an array can store only one variable type (so just numbers OR just strings):" ], "metadata": { "id": "tTFXh0wFH_WO" } }, { "cell_type": "code", "source": [ "combo_list = ['element #1', 2, 'element #3', 4]\n", "print(combo_list)" ], "metadata": { "id": "cNOev1VOH-rf" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Naturally, we can do math with arrays. This is very useful!\n", "\n", "***Before running the cells below, what do you expect will be the result of each line of code?***" ], "metadata": { "id": "55hydvn0YtqH" } }, { "cell_type": "code", "source": [ "my_array + 5" ], "metadata": { "id": "eLmXjAhFYs8U" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "my_array * 2" ], "metadata": { "id": "ob3atI21Y1WW" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "my_array + my_array" ], "metadata": { "id": "185UbNiqY3Db" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***What happens when you add two lists together? Try it!***" ], "metadata": { "id": "_V4cxdvBQwdy" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "SrKe3oWZQ1bg" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "If we want to retrieve certain elements from a list or array, we need to count the position of the elements, which we call an **index**. More than one index are **indices**. For example:\n", "\n", "* List: `['A', 'B', 'C', 'D', 'E', 'F', 'G']`\n", "\n", "* Indices: A = 0, B = 1, C = 2, D = 3, E = 4, F = 5, G = 6\n", "\n", "To extract the element, we can **index** or **slice** into the list or array using a bracket **[ ]** after the variable name:\n", "\n", "* Indexing: **`variable_name[INDEX]`**\n", "* Slicing: **`variable_name[START (optional) : END (optional)]`** (note: `END` is exclusive, so it is the index after the final element that you want)\n", "\n", "***Run each cell below and think about why the results make sense:***" ], "metadata": { "id": "oTg8kxr7GB1i" } }, { "cell_type": "code", "source": [ "year = [2,0,2,3]\n", "print(year)" ], "metadata": { "id": "VfMxSqQESQxF" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Examples of indexing:\n", "print(year[0])\n", "print(year[3])\n", "print(year[-1]) # This is pretty cool! Negative indexing counts backwards from the end" ], "metadata": { "id": "31P9AAA63yxZ" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Examples of slicing:\n", "print(year[0:4])\n", "print(year[:2])" ], "metadata": { "id": "HF2d3rOc3zD5" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***Can you find two different ways to extract the last two elements (`['2','3']`) of the variable `year`?***\n", "\n", "***Try using one of them to save (`['2','3']`) into a new variable.***" ], "metadata": { "id": "UfnTZRSI5Q91" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "9AtXnl7A5tL9" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Similarly, you can use indexing or slicing to assign new values in a list or array:" ], "metadata": { "id": "fzu-AQ4pTbSZ" } }, { "cell_type": "code", "source": [ "array_to_modify = np.array([10,20,30,40,50])\n", "array_to_modify[0] = 0\n", "array_to_modify[1:4] = np.array([21,31,41])\n", "array_to_modify[4] *= 2" ], "metadata": { "id": "wvH6Lpb4Ti9d" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***What will `array_to_modify` be after these modifications? Test your prediction by printing the variable below:***" ], "metadata": { "id": "vlfG--UHT_pY" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "ZSBSfaHEUJCQ" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***What happens when you index or slice into a string? Try it!***" ], "metadata": { "id": "fW9RymUp9st2" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "CVt-kKZF90xq" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Part 4: 2-D arrays" ], "metadata": { "id": "F31svxgPNP_t" } }, { "cell_type": "markdown", "source": [ "NumPy arrays can also be **two-dimensional** (or higher dimensions). Whoa!\n", "\n", "This allows us to represent data on multiple **axes** using nested brackets: **[ [ ], [ ], [ ], etc. ]**. Below, I've created a 2-D NumPy array where each column is average monthly temperature for a city. Each row is a different city. I've found the data for [Pasadena, CA](https://en.climate-data.org/north-america/united-states-of-america/california/pasadena-715014/#climate-table) (top row - index 0) and [Seattle, WA](https://en.climate-data.org/north-america/united-states-of-america/washington/seattle-593/#climate-table) (bottom row - index 1) on [climate-data.org](https://en.climate-data.org/)." ], "metadata": { "id": "f-vWngOeHAP7" } }, { "cell_type": "code", "source": [ "temp = np.array([[53.6,53.9,57.3,60.5,64.8,70.1,75.7,76.4,74.1,67.3,59.8,52.9], # (Pasadena)\n", " [40.0,40.6,44.2,48.4,54.9,60.2,66.2,66.7,60.5,52.0,44.5,39.6]]) # (Seattle)\n", "\n", "print(temp)" ], "metadata": { "id": "UtAc_AUKHFZC" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Just like `len()` gives the length of a 1-D array, the command **`.shape`** (a property, not a function!) gives the dimensions of a 2-D (or 3-D, 4-D, etc.) array:" ], "metadata": { "id": "3MpPjhtuknQg" } }, { "cell_type": "code", "source": [ "temp.shape # returns: (number of rows, number of columns)" ], "metadata": { "id": "jOqkONCIkwpS" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "**Axis 0** goes across rows and **axis 1** goes down columns.\n", "\n", "We still index and slice into 2-D arrays using brackets, with the index for each dimension separated by a comma: `,`:\n", "\n", "> **`array_name[ROW_INDEX, COLUMN_INDEX]`**\n", "\n", "So we'd get the temperature in Pasadena (row index 0) in June (month #6, so column index 5) by writing:" ], "metadata": { "id": "foHcxcjTJFgo" } }, { "cell_type": "code", "source": [ "print(temp[0,5])" ], "metadata": { "id": "8QsmFBS_JFAW" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***Use indexing to retrieve the December average temperature in Seattle. Print your result:***" ], "metadata": { "id": "HvY3DkuCLGaK" } }, { "cell_type": "code", "source": [ "# Write your code below\n" ], "metadata": { "id": "P7Ki5VVqLMY-" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Slicing works the same way. Instead of a single row or column index, use the range of indices:\n", "\n", "> **`array_name[ROW_START:ROW_END, COLUMN_START:COLUMN_END]`**\n", "\n", "To get all the elements along a certain axis, just use a single colon, `:`." ], "metadata": { "id": "q08mizMUJ9Mn" } }, { "cell_type": "markdown", "source": [ "***Try using slicing to get the temperatures for the first half of the year for Pasadena:***" ], "metadata": { "id": "vFO3sKq0LZtj" } }, { "cell_type": "code", "source": [ "# Write your code below\n" ], "metadata": { "id": "N_iFNlNELfuN" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***Next, try using slicing to obtain the average temperatures for both cities in August:***" ], "metadata": { "id": "DRvUqWbrLCeo" } }, { "cell_type": "code", "source": [ "# Write your code below\n" ], "metadata": { "id": "qDGhe5fuLkjj" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***Finally, using slicing and mathematical operations to calculate the average temperatures for both cities between December to February (three months). You got this!***" ], "metadata": { "id": "QJ6ZQMAbL0a5" } }, { "cell_type": "code", "source": [ "# Write your code below\n" ], "metadata": { "id": "HB_tMQP_MAFP" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Part 5: Functions" ], "metadata": { "id": "zyTGovhuH1_p" } }, { "cell_type": "markdown", "source": [ "You already know two functions: `print()` and `np.array()`. Functions usually take at least one input \"**argument**\" inside the parentheses, with multiple arguments separated by commas. Then the function \"**returns**\" or \"**outputs**\" something back.\n", "\n", "Let's learn a few other functions...\n", "\n", "The function **`len(INPUT)`** returns the length of a list, array, or string. ***Do the following outputs make sense based on the input arguments?***" ], "metadata": { "id": "yyBf1Z4xYlcY" } }, { "cell_type": "code", "source": [ "year = np.array([2,0,2,3])\n", "array_digits = len(year)\n", "print(array_digits)" ], "metadata": { "id": "zzcRF9jzMLGB" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "year = '2023'\n", "str_digits = len(year)\n", "print(str_digits)" ], "metadata": { "id": "TxHtHB09MUcz" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "The NumPy function **`np.arange(START, END, INTERVAL)`** creates a list of numbers from START to END with a certain INTERVAL between each number.\n", "\n", "***Can you guess what the result of the code below will be?***" ], "metadata": { "id": "ccT1EH-aLxgb" } }, { "cell_type": "code", "source": [ "np.arange(0,100,5)" ], "metadata": { "id": "gEV7V5IXZXiD" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Note that **`np.arange(END)`** is a shorter way of writing **`np.arange(0,END,1)`**:" ], "metadata": { "id": "MlBxrW2iSEQr" } }, { "cell_type": "code", "source": [ "print(np.arange(10))\n", "print(np.arange(0,10,1))" ], "metadata": { "id": "Ts_1PauNSMNR" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Additionally, the NumPy package has many useful functions for mathematical operations:\n", "\n", "* `np.mean(INPUT)` calculates the average value of elements in an `INPUT` list or array\n", "* `np.sum(INPUT)` calculates the sum of elements in an `INPUT` list or array\n", "* `np.max(INPUT)` and `np.min(INPUT)` find the maximum or minimum values in `INPUT`\n", "* `np.ones(N)` creates a new array of length `N` filled with the integer `1`\n", "* `np.zeros(N)` creates a new array of length `N` filled with the integer `0`\n", "\n", "For example:" ], "metadata": { "id": "j7r_aHOBKj63" } }, { "cell_type": "code", "source": [ "# Do some math on arrays:\n", "test = np.array([1,2,3])\n", "print(np.mean(test))\n", "print(np.sum(test))\n", "print(np.max(test))\n", "\n", "# Create new arrays:\n", "print(np.ones(5))\n", "print(np.zeros(5))" ], "metadata": { "id": "57G_kWHPLOis" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Many functions can be **called** (applied) to a variable in two different ways. For example:" ], "metadata": { "id": "SDu-P969RI9_" } }, { "cell_type": "code", "source": [ "np.mean(test) # Option 1" ], "metadata": { "id": "HQEVUCGIROpg" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "test.mean() # Option 2 (same result!)" ], "metadata": { "id": "PSI8XXGTRRho" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "To learn more about a function, you can always consult its online **documentation**! A package's documentation website usually has a page for each function describing its arguments, outputs, and examples of how to use it.\n", "\n", "***Google \"numpy mean\" to find the documentation page for that function. How is the webpage structured, and what information does it tell us about the arguments needed to apply `np.mean()` to 2-D arrays?***\n", "\n", "Now that you've discovered named arguments... ***use `np.mean()` to calculate and print the average annual (yearly) temperature in Seattle using the variable `temp` from earlier:***" ], "metadata": { "id": "Sds1U_tEE1vr" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "A1sFRubGNqvb" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Part 6*: Logical operations" ], "metadata": { "id": "AyEv5zepGLGE" } }, { "cell_type": "markdown", "source": [ "Often, we will want to compare two numbers or variables. We do this using the following **logical operations**:\n", "\n", "* `==` : equal\n", "* `!=` : not equal\n", "* `>` : greater than\n", "* `>=` : greater than or equal to\n", "* `<` : less than\n", "* `<=` : less than or equal to\n", "* `and` or `&` : are both booleans true?\n", "* `or` or `|` : is either boolean true?\n", "* `not` or `~` : reverse the boolean (True -> False, False -> True)\n", "* `in` : is a member\n", "* `not in` : is not a member\n", "\n", "Each logical operation **evaluates to** (returns) a boolean — True or False. Consider the following examples:" ], "metadata": { "id": "daEFmmF-WWwy" } }, { "cell_type": "code", "source": [ "3 == 3" ], "metadata": { "id": "vicDzeODXYQX" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "3 == 3.0 # integers can be compared to floating-point numbers" ], "metadata": { "id": "AYo-JMq9XiI6" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "not 3 == 3" ], "metadata": { "id": "rLtyg4YuYRGj" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "3 == 5" ], "metadata": { "id": "Vf-bE8BdXcsv" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "3 != 5" ], "metadata": { "id": "XkQYFoOsXeO5" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "3 > 5" ], "metadata": { "id": "BVTJ7DsuXfcy" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "5 <= 5" ], "metadata": { "id": "A01RZl1yXrz4" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "(11 == 12) or (12 == 12)" ], "metadata": { "id": "65J31-aKdojg" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "(11 == 12) and (12 == 12)" ], "metadata": { "id": "ZIf4G-zQduE8" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Applying a logical comparison to a NumPy array gives a **boolean array**!" ], "metadata": { "id": "H5CEY27qXJKx" } }, { "cell_type": "code", "source": [ "x = np.array([1,2,3,4,5,6])\n", "\n", "print(x < 4)\n", "print(x <= 4)" ], "metadata": { "id": "HM9BJ_oGXwIv" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Note: \"not\" can't be applied to an entire boolean array. Instead, we have to use \"~\":\n", "print(~np.array([True,False,True]))" ], "metadata": { "id": "MHyuNtiTZOKj" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Note that membership tests work on lists, arrays, and strings:" ], "metadata": { "id": "4WtW0WZYZthB" } }, { "cell_type": "code", "source": [ "print(3 in x) # this is asking: \"is 3 in x?\"" ], "metadata": { "id": "4jK2SArSZMIl" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print(7 in x)" ], "metadata": { "id": "0P0q7toGZMvZ" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print(3 not in x) # this is asking: \"is 3 not in x?\"" ], "metadata": { "id": "W6ntHithZNYR" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print('hello' in 'hello world')" ], "metadata": { "id": "cMOnhhwpZbe3" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print('o w' in 'hello world')" ], "metadata": { "id": "5BQOtilZZlQG" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "print('World' in 'hello world') # note that string membership is case-sensitive" ], "metadata": { "id": "Myd1SxUcZbmb" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Heads up: this next skill is super powerful. We saw above that applying a logical comparison to an array of numbers gives us a boolean array.\n", "\n", "We can use boolean arrays as \"**masks**\" to select certain elements of an array. This is called **boolean indexing**." ], "metadata": { "id": "dh9dJWyaaCdU" } }, { "cell_type": "code", "source": [ "# Let's revisit the Seattle temperatures from earlier:\n", "seattle_temps = np.array([40.0,40.6,44.2,48.4,54.9,60.2,66.2,66.7,60.5,52.0,44.5,39.6])\n", "\n", "# Applying a logical comparison creates a boolean array, or \"mask\":\n", "print(seattle_temps > 60)" ], "metadata": { "id": "MCeGAg1KazgD" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Now let's use the mask to retrieve only the elements where the mask is True:\n", "seattle_temps[seattle_temps > 60]\n", "\n", "# Note: this only works when the mask is the same length as the array!" ], "metadata": { "id": "ukVZKFk4bSDO" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# The boolean indexing gives the same result as specifying the actual array indices:\n", "seattle_temps[[5,6,7,8]]" ], "metadata": { "id": "2PQ1TvPVcSsY" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***How many months of the year is Seattle 40°F or colder? Try using boolean indexing and a function that you've learned to calculate and print the answer:***" ], "metadata": { "id": "DAtgHv27be7b" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "WkQW0l6dbn5Y" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Part 7*: `if` statements and `for` loops" ], "metadata": { "id": "3TIfVm6TmBVP" } }, { "cell_type": "markdown", "source": [ "We can use logical operations to create **conditional actions** using the **`if-else` statement**.\n", "\n", "If the first condition evaluates to `True`, then the lines inside the first block are executed.\n", "\n", "If the first condition evaluates to `False`, then Python tests the second\n", "`elif` statement.\n", "\n", "If all of the `if` and `elif` statements are `False`, then Python will finally run the `else` statement.\n", "\n", "```\n", "if :\n", " \n", " \n", " etc.\n", "elif : (optional)\n", " \n", "elif : (optional)\n", " \n", "else : (optional)\n", " \n", "```\n", "IMPORTANT: note the colons (**`:`**) and how the lines below each condition are indented using a `Tab` or two spaces on your keyboard.\n" ], "metadata": { "id": "JRQXELtbcs-6" } }, { "cell_type": "markdown", "source": [ "***Try changing the value of `rain_chance` below and running the code. Do you understand the control flow?***\n", "\n", "***What range of values for `rain_chance` will trigger the `else` statement?***" ], "metadata": { "id": "15duR-xzeoeE" } }, { "cell_type": "code", "source": [ "rain_chance = 5 # i.e., a 5% chance of rain\n", "\n", "if rain_chance >= 50:\n", " print('Ugh... I better bring an umbrella.')\n", "elif rain_chance == 0:\n", " print('I definitely will not need an umbrella.')\n", "elif rain_chance <= 20:\n", " print('I should be okay without an umbrella.')\n", "else:\n", " print('I am not sure what to do.')" ], "metadata": { "id": "BrwnP44Rfmry" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Sometimes, we might want to perform an action again and again. Coding makes this possible using **loops**!\n", "\n", "A **`for` loop** has the following syntax:\n", "\n", "```\n", "for in :\n", " \n", " \n", " etc.\n", "```\n", "Here, `` can be a list, array, string, or other collection of elements. You can give `` any variable name, and that variable can *only* be used inside the loop.\n", "\n", "***Run and consider the following examples:***" ], "metadata": { "id": "hOg7bceSWEd-" } }, { "cell_type": "code", "source": [ "countdown = [4,3,2,1]\n", "\n", "for item in countdown:\n", " print(item)" ], "metadata": { "id": "KlpsHwSMhV7Z" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "for character in 'floats':\n", " print(character)" ], "metadata": { "id": "IyvdevH0hpQ4" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "for even_number in np.arange(0,7,2):\n", " print(even_number)" ], "metadata": { "id": "LQetubWdhtf-" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***You already learned how to calculate the sum of an array of numbers using `np.sum()`.***\n", "\n", "***Now, try to calculate the average Seattle annual temperaturre by writing a `for` loop below. There are at least two different ways to do this!***" ], "metadata": { "id": "ZQObEHLXh-xs" } }, { "cell_type": "code", "source": [ "seattle_temps = np.array([40.0,40.6,44.2,48.4,54.9,60.2,66.2,66.7,60.5,52.0,44.5,39.6])\n", "\n", "# Write your code below:\n", "\n", "\n", "# Finally, print the average temperature by adding it to the print() statement:\n", "print('The average temperature is:',)" ], "metadata": { "id": "1qSyaJBoiK0P" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Part 8*: Missing data" ], "metadata": { "id": "ESibLGhhRp8n" } }, { "cell_type": "markdown", "source": [ "In the real world, you'll frequently encounter missing data in an array.\n", "\n", "Missing data is represented by the float **`np.nan`** or **`np.NaN`** (the two are the same). NaN stands for \"Not a Number\"." ], "metadata": { "id": "ArvW0A6qmKQ0" } }, { "cell_type": "code", "source": [ "pH_measurements = np.array([7.84, 7.91, 8.05, np.nan, 7.96, 8.03])\n", "\n", "print(pH_measurements)" ], "metadata": { "id": "SZn9Cq_7mbcz" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "We can test for missing values using the function **`np.isnan()`**, which returns a boolean (or a boolean array when applied to an array):" ], "metadata": { "id": "sqSHEDzWnNmH" } }, { "cell_type": "code", "source": [ "np.isnan(5)" ], "metadata": { "id": "sfF17YZ3nTcF" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "np.isnan(np.nan)" ], "metadata": { "id": "h-bbsucSnVDv" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "np.isnan(pH_measurements)" ], "metadata": { "id": "WzRxDMMWnMSQ" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Do you remember boolean indexing? We can use it to extract only the valid data from an array:" ], "metadata": { "id": "uCsjn0klnYYO" } }, { "cell_type": "code", "source": [ "pH_measurements[~np.isnan(pH_measurements)]" ], "metadata": { "id": "SwEClP51nh0o" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "It's good to be aware that missing data can cause functions like `np.mean()` to fail:" ], "metadata": { "id": "ljnxDSL-nyup" } }, { "cell_type": "code", "source": [ "np.mean(pH_measurements)" ], "metadata": { "id": "JF6HxqP3n50y" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Many functions have a \"NaN-safe\" version that ignores missing values and still calculates the result, such as **`np.nanmean()`**:" ], "metadata": { "id": "9AiNIclLoANX" } }, { "cell_type": "code", "source": [ "np.nanmean(pH_measurements)" ], "metadata": { "id": "DElGwXISoIQd" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Part 9: Line and scatter plots" ], "metadata": { "id": "hv0lpsGFldF_" } }, { "cell_type": "markdown", "source": [ "Wow! It's time to start creating visualizations of data, called **plots**.\n", "\n", "Earlier, we imported the package Matplotlib using:\n", "\n", "> `import matplotlib.pyplot as plt`\n", "\n", "Creating a **line plot** is simple. Use the Matplotlib function **`plt.plot()`**. The basic form of the function is:\n", "\n", "> **`plt.plot(X, Y, ...)`**\n", "\n", "where `X` and `Y` are 1-D arrays of data, and the `` can be found on Matplotlib's [documentation webpage](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html)." ], "metadata": { "id": "cUQUaS4voeSG" } }, { "cell_type": "code", "source": [ "x = np.array([0,1,2,3,4])\n", "y = np.array([0,4,2,6,4])\n", "\n", "plt.plot(x,y)" ], "metadata": { "id": "pSJL5_gRobxX" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Some formatting arguments include:\n", "* `c` or `color`: line color (options: `'k'` or `'black'` for black, `'red'` for red, etc. – see [this page](https://matplotlib.org/stable/gallery/color/named_colors.html) for color options)\n", "* `lw` or `linewidth`: line width (a number; the default is 1.5)\n", "* `ls` or `linestyle`: line style (options: `'-', '--', '-.', ':'`)\n", "* `marker`: optional marker style (options: `'.', 'o', 'v', '^', '<', '>', 's', '*',` etc.)\n", "\n", "***Try plotting x versus y again, except this time use a \"goldenrod\"-colored dashed line of width 2.5 with star-shaped markers:***" ], "metadata": { "id": "zO_MeypJp4nE" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "D1XzxR0MqmG0" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Some other options include changing the figure size by starting with a call to:\n", "\n", "> **`plt.figure(figsize=(WIDTH,HEIGHT))`**\n", "\n", "Adding x-axis and y-axis labels and a title at the top:\n", "\n", "> **``plt.xlabel(STRING)``**\n", "\n", "> **``plt.ylabel(STRING)``**\n", "\n", "> **``plt.title(STRING)``**\n", "\n", "Or adding grid lines using:\n", "\n", "> **`plt.grid()`**\n", "\n", "Or adding multiple lines by specifying the **`label`** argument in `plt.plot()` and adding a key using:\n", "\n", "> **`plt.legend()`**\n", "\n", "Check out these additional formatting options below:" ], "metadata": { "id": "wN74Irogq33z" } }, { "cell_type": "code", "source": [ "plt.figure(figsize=(6,3))\n", "plt.plot(x,y,label='Original data')\n", "plt.plot(x,2*y,label='2 * y') # y-values are multiplied by 2 here\n", "plt.legend()\n", "plt.grid()\n", "plt.xlabel('x-values')\n", "plt.ylabel('y-values')\n", "plt.title('This is a title');" ], "metadata": { "id": "58RRTpadrTRJ" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "We can also create a **scatter plot** with just the points (no line). The function is similar to ``plt.plot()``:\n", "\n", "> **``plt.scatter(X, Y, s=SIZE, c=COLOR, marker=MARKER_STYLE, etc.)``**" ], "metadata": { "id": "xbe69iiiuh-g" } }, { "cell_type": "code", "source": [ "plt.figure(figsize=(6,3))\n", "plt.scatter(x,y,s=100,c='dodgerblue',marker='^');" ], "metadata": { "id": "CeRzx1aXu63M" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***Let's bring it all together! Below, try plotting the monthly temperatures in Pasadena, CA and Seattle, WA. Use line plots with circle-shaped markers (or add scatter points separately). Include a legend and label the plot appropriately.***" ], "metadata": { "id": "q-Q6iROxso70" } }, { "cell_type": "code", "source": [ "temp = np.array([[53.6,53.9,57.3,60.5,64.8,70.1,75.7,76.4,74.1,67.3,59.8,52.9], # (Pasadena)\n", " [40.0,40.6,44.2,48.4,54.9,60.2,66.2,66.7,60.5,52.0,44.5,39.6]]) # (Seattle)\n", "\n", "# Write your code below:\n" ], "metadata": { "id": "pNENXVMUtB3o" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Part 10: Loading and plotting spreadsheet data (example with R/V Rachel Carson CTD casts)" ], "metadata": { "id": "tYzchKpM7DlA" } }, { "cell_type": "markdown", "source": [ "Up until now, we've been using data that we've typed directly into Python. However, most real-world data is stored in files that we'd like to open using Python.\n", "\n", "The most common type of data file is a **spreadsheet**, which has rows and columns. Generally, the columns will have column labels.\n", "\n", "Spreadsheets are often stored in **comma-separated value (CSV)** format, with the file extension being `.csv`. Data files in this format can be opened using Microsoft Excel or Google Sheets, as well as Python.\n", "\n", "In Python, we use the `pandas` package to work with spreadsheet data. We imported the package earlier using:\n", "\n", "> `import pandas as pd`\n", "\n", "Just like NumPy has arrays, Pandas has two types of objects: `Series` and `DataFrame`. This is what they look like:\n", "![Pandas example.png]()" ], "metadata": { "id": "1BPxKNjg7SZq" } }, { "cell_type": "markdown", "source": [ "For now, we'll just be applying simple operations to read spreadsheet data using `pandas`. But if you would like to learn more, check out these [lesson slides](https://ethan-campbell.github.io/OCEAN_215/materials/lessons/lesson_9.pdf)." ], "metadata": { "id": "HQr4nB64_8p0" } }, { "cell_type": "markdown", "source": [ "First, let's download two `.csv` data files from Google Drive here: https://drive.google.com/drive/folders/1Am6XdlB-APQ3ccOvLeGK8DFPQ2OnPeJD?usp=share_link. Each file is a CTD cast that was collected from the R/V Rachel Carson off of Carkeek Park near Seattle. ***Save these two files to your computer.***\n", "\n", "Next, we can upload the files to this Google Colab notebook. ***Click the sidebar folder icon on the left, then use the page-with-arrow icon at the top to select the files and upload them.*** NOTE: uploaded files will be deleted from Google Colab when you refresh this notebook!\n", "\n", "We will specify each **filepath** using string variables:" ], "metadata": { "id": "czGyp7MTAc5T" } }, { "cell_type": "code", "source": [ "filepath_0 = '/content/2023051001001_Carkeek.csv'\n", "filepath_1 = '/content/2023051101001_Carkeek.csv'" ], "metadata": { "id": "gnrD640dB5ds" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Now, we can load the files using `pandas`:\n", "\n", "> **`pd.read_csv(FILEPATH, ARGUMENTS...)`**\n", "\n", "This function is very customizable using the many optional `ARGUMENTS`, which allow it to handle almost any file. You can find documentation about the arguments [at this link](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html).\n", "\n", "***Let's first take a look at the data file using a simple text editor. Notice the long header. What argument can we use to exclude the header from being loaded?***\n", "\n", "Below, we'll load each data file using ``pd.read_csv()`` and store each file into a new variable.\n", "\n", "We can look at the data using **`display()`** (which is a fancy version of `print()` for DataFrames):" ], "metadata": { "id": "XaUCH7ikB6Sy" } }, { "cell_type": "code", "source": [ "data_0 = pd.read_csv(filepath_0,comment='#')\n", "data_1 = pd.read_csv(filepath_1,comment='#')\n", "\n", "display(data_0)" ], "metadata": { "id": "4boQwvSg7R5J" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "The data in a `pandas` DataFrame is similar to a NumPy 2-D array, except we use **column labels** to refer to columns and **index** values to refer to rows.\n", "\n", "To retrieve a specific column, we use bracket notation: **`data_frame[COLUMN_LABEL]`**." ], "metadata": { "id": "HYem5ZznDUfk" } }, { "cell_type": "code", "source": [ "# For example:\n", "data_0['density00']" ], "metadata": { "id": "-k030Au_Dyd_" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "***With these tools, can you make a line plot of temperature vs. depth that includes both CTD casts? (Alternatively, you could try plotting salinity, oxygen, or fluorescence vs. depth.)***\n", "\n", "You may need the following line of code to flip the y-axis so the surface is at the top: `plt.gca().invert_yaxis()`." ], "metadata": { "id": "TDLpAjCnELuY" } }, { "cell_type": "code", "source": [ "# Write your code here:\n" ], "metadata": { "id": "dh6QZ2Np9gXs" }, "execution_count": null, "outputs": [] } ] }