{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Python Fundamentals" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "> Make easy things easy and hard things possible.\n", "> \n", "> \\- A slogan of Perl (a predecessor language to Python)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Applied Review" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### Python and Jupyter" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Python is a flexible, general-purpose language that is popular in many fields, but particularly in data science." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Jupyter is an IDE, or *Integrated Development Environment*, that lets us view and run code in notebooks." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- We are using Jupyter via Binder in this course." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Python at Its Simplest: Basic Data Types and Math\n", "While Python can be used to write very complicated programs, one of its strengths is that easy things are still easy.\n", "For example, Python can be a calculator." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 + 2" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "48" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "12 * 4" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Python allows you to *comment* your code -- to leave notes for yourself or others about the code.\n", "Comments start with a `#` and are ignored by Python when it runs your code." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The ** operator is exponentiation.\n", "2 ** 3" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Once you start doing math, you may want to keep the values you calculate for later use." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Python allows you to do this with *variables* -- words that you choose to represent values you've stored." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# Place the result of \"5 * 2\" in a variable called \"x\".\n", "x = 5 * 2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "This process of storing something in a variable is often called **variable assignment**, or simply \"assignment\" for short.\n", "You can assign almost anything to a variable." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# \"Assign\" the value 42 to the variable \"answer\".\n", "answer = 42" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "You can then use the stored values in new calculations." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "47" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "answer + 5" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "21" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ten = 10\n", "eleven = 11\n", "ten + eleven" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Python lets you name your variables whatever you want – the only rule is that they must be composed of numbers, letters, and underscores, and they cannot begin with a number." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It's a good idea to take advantage of this flexibility and name your variables with descriptions that help you remember what they contain." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For example, calling your variables `x`, `y`, and `z` is likely to lead to forgetting what you've stored where (unless you're working with coordinates, a domain where those names have meanings)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "More descriptive names, like `number_of_items` or `size_of_container`, are better." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# Perfectly good variable name\n", "my_3rd_favorite_number = 18" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# Legal, but undescriptive, variable name\n", "a = 7" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "ci-skip" ] }, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid decimal literal (1815327299.py, line 2)", "output_type": "error", "traceback": [ "\u001b[0;36m Cell \u001b[0;32mIn[11], line 2\u001b[0;36m\u001b[0m\n\u001b[0;31m 4_plus_1 = 4 + 1\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid decimal literal\n" ] } ], "source": [ "# Illegal variable name -- it starts with a number\n", "4_plus_1 = 4 + 1" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If you try to name a variable something illegal, Python will gently remind you to follow the rules with a `SyntaxError` and an arrow indicating the location of the error." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "
Caution!
\n", "Sometimes Python doesn't pinpoint the error very well, and the error will not be in the same place as the arrow.
\n", "Caution!
\n", "Most programming languages are zero indexed, so a list with 3 elements has valid locations [0, 1, 2]. But this means that there is no element #3 in a 3-element list! Trying to access it will cause an out-of-range error. This is a common mistake for those new to programming (and sometimes it bites the veterans too).
\n", "Question
\n", "How might you approach #2 if you needed to look up both the CEO and the CFO?
\n", "What data structure would you use? There are several possible solutions.
\n", "\n", " | tailnum | \n", "year | \n", "type | \n", "manufacturer | \n", "model | \n", "engines | \n", "seats | \n", "speed | \n", "engine | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "N10156 | \n", "2004.0 | \n", "Fixed wing multi engine | \n", "EMBRAER | \n", "EMB-145XR | \n", "2 | \n", "55 | \n", "NaN | \n", "Turbo-fan | \n", "
1 | \n", "N102UW | \n", "1998.0 | \n", "Fixed wing multi engine | \n", "AIRBUS INDUSTRIE | \n", "A320-214 | \n", "2 | \n", "182 | \n", "NaN | \n", "Turbo-fan | \n", "
2 | \n", "N103US | \n", "1999.0 | \n", "Fixed wing multi engine | \n", "AIRBUS INDUSTRIE | \n", "A320-214 | \n", "2 | \n", "182 | \n", "NaN | \n", "Turbo-fan | \n", "
3 | \n", "N104UW | \n", "1999.0 | \n", "Fixed wing multi engine | \n", "AIRBUS INDUSTRIE | \n", "A320-214 | \n", "2 | \n", "182 | \n", "NaN | \n", "Turbo-fan | \n", "
4 | \n", "N10575 | \n", "2002.0 | \n", "Fixed wing multi engine | \n", "EMBRAER | \n", "EMB-145LR | \n", "2 | \n", "55 | \n", "NaN | \n", "Turbo-fan | \n", "
\n", " | tailnum | \n", "year | \n", "type | \n", "manufacturer | \n", "model | \n", "engines | \n", "seats | \n", "speed | \n", "engine | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "N10156 | \n", "2004.0 | \n", "Fixed wing multi engine | \n", "EMBRAER | \n", "EMB-145XR | \n", "2 | \n", "55 | \n", "NaN | \n", "Turbo-fan | \n", "
1 | \n", "N102UW | \n", "1998.0 | \n", "Fixed wing multi engine | \n", "AIRBUS INDUSTRIE | \n", "A320-214 | \n", "2 | \n", "182 | \n", "NaN | \n", "Turbo-fan | \n", "
2 | \n", "N103US | \n", "1999.0 | \n", "Fixed wing multi engine | \n", "AIRBUS INDUSTRIE | \n", "A320-214 | \n", "2 | \n", "182 | \n", "NaN | \n", "Turbo-fan | \n", "
3 | \n", "N104UW | \n", "1999.0 | \n", "Fixed wing multi engine | \n", "AIRBUS INDUSTRIE | \n", "A320-214 | \n", "2 | \n", "182 | \n", "NaN | \n", "Turbo-fan | \n", "
4 | \n", "N10575 | \n", "2002.0 | \n", "Fixed wing multi engine | \n", "EMBRAER | \n", "EMB-145LR | \n", "2 | \n", "55 | \n", "NaN | \n", "Turbo-fan | \n", "
Question
\n", "Which of these columns are strings?
\n", "