{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Python Fundamentals" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "> Make easy things easy and hard things possible.\n", "> \n", "> \\- A slogan of Perl (a predecessor language to Python)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Applied Review" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### Python and Jupyter" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Python is a flexible, general-purpose language that is popular in many fields, but particularly in data science." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Jupyter is an IDE, or *Integrated Development Environment*, that lets us view and run code in notebooks." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- We are using Jupyter via Binder in this course." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Python at Its Simplest: Basic Data Types and Math\n", "While Python can be used to write very complicated programs, one of its strengths is that easy things are still easy.\n", "For example, Python can be a calculator." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 + 2" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "48" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "12 * 4" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Python allows you to *comment* your code -- to leave notes for yourself or others about the code.\n", "Comments start with a `#` and are ignored by Python when it runs your code." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The ** operator is exponentiation.\n", "2 ** 3" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Once you start doing math, you may want to keep the values you calculate for later use." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Python allows you to do this with *variables* -- words that you choose to represent values you've stored." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# Place the result of \"5 * 2\" in a variable called \"x\".\n", "x = 5 * 2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "This process of storing something in a variable is often called **variable assignment**, or simply \"assignment\" for short.\n", "You can assign almost anything to a variable." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# \"Assign\" the value 42 to the variable \"answer\".\n", "answer = 42" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "You can then use the stored values in new calculations." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "47" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "answer + 5" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "21" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ten = 10\n", "eleven = 11\n", "ten + eleven" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Python lets you name your variables whatever you want – the only rule is that they must be composed of numbers, letters, and underscores, and they cannot begin with a number." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It's a good idea to take advantage of this flexibility and name your variables with descriptions that help you remember what they contain." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For example, calling your variables `x`, `y`, and `z` is likely to lead to forgetting what you've stored where (unless you're working with coordinates, a domain where those names have meanings)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "More descriptive names, like `number_of_items` or `size_of_container`, are better." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# Perfectly good variable name\n", "my_3rd_favorite_number = 18" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# Legal, but undescriptive, variable name\n", "a = 7" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "ci-skip" ] }, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid decimal literal (1815327299.py, line 2)", "output_type": "error", "traceback": [ "\u001b[0;36m Cell \u001b[0;32mIn[11], line 2\u001b[0;36m\u001b[0m\n\u001b[0;31m 4_plus_1 = 4 + 1\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid decimal literal\n" ] } ], "source": [ "# Illegal variable name -- it starts with a number\n", "4_plus_1 = 4 + 1" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If you try to name a variable something illegal, Python will gently remind you to follow the rules with a `SyntaxError` and an arrow indicating the location of the error." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "
\n", "

Caution!

\n", "

Sometimes Python doesn't pinpoint the error very well, and the error will not be in the same place as the arrow.

\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Your Turn\n", "\n", "1. 4k monitors, counterintuitively, typically have a resolution of 3840x2160. Create two variables, `width` and `height`, and store 3840 and 2160 in them (respectively).\n", "2. How many total pixels are in a display with this resolution? *Hint: fill in the blanks with variable names:* `pixels = ___ * ___`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Beyond Integers\n", "\n", "Fortunately, Python can handle values beyond integers.\n", "It's happy to work with decimal numbers." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.3333333333333333" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 / 3" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2.25" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1.5 * 1.5" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "In computer science lingo, decimal numbers are often called **floating point numbers**, or **floats** for short." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The name refers to how such numbers are stored by a computer internally, but you don't need to worry about that.\n", "Just be aware that many people on the internet and in data science industry will speak in terms of \"floats\" and \"ints\" when they refer to numbers in Python." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Python also can work with text data, like words and sentences." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "my_name = 'ethan'\n", "my_hobbies = 'coding, reading, basketball'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In Python, these bits of text are called **strings** and are enclosed in quotation marks.\n", "Both single quotes (`'`) and double quotes (`\"`) are fine, but most Pythonistas use single quotes as a matter of convention." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Conveniently, Python lets you \"add\" strings together to compose longer strings." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'Monty Python'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'Monty' + ' ' + 'Python'" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'Guido van Rossum'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_name = 'Guido'\n", "last_name = 'van Rossum'\n", "# Remember to add a space between words!\n", "first_name + ' ' + last_name" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The last kind of value that we'll talk about is a **boolean**, or a True/False value." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Python recognizes the words `True` and `False` as **keywords** -- words that have an implicit meaning in the language." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "That means you can assign them to variables as you can with other data types." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "is_the_moon_made_of_cheese = False\n", "is_this_the_best_python_class = True" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Your Turn\n", "\n", "1. Overwrite the `first_name` and `last_name` variables with your name, and run `first_name + ' ' + last_name` again -- make sure it produces what you expect!\n", "2. What happens when you try to add together two different kinds of values, like an integer and a string? Does this behavior make sense?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Lists and Dictionaries" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "So far we've worked with single values: numbers, strings, and booleans.\n", "But Python also supports more complex data types, sometimes called *data structures*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The two most common data structures are **lists** and **dictionaries**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Lists" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As you might expect, a list is an ordered collection of things.\n", "Lists are represented using brackets (`[]`)." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# A list of integers\n", "numbers = [1, 2, 3]\n", "numbers" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['abc', 'def']" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# A list of strings\n", "strings = ['abc', 'def']\n", "strings" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Lists are highly flexible.\n", "They can contain heterogeneous data (i.e. strings, booleans, and numbers can all be in the same list) and lists can even contain other lists!" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "combo = ['a', 'b', 3, 4]\n", "combo_2 = [True, 'True', 1, 1.0]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, [4, 5]]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Note that the last element of the list is another list!\n", "nested_list = [1, 2, 3, [4, 5]]\n", "nested_list" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Individual elements of a list can be accessed by specifying a location in brackets.\n", "This is called **indexing**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Beware: Python is **zero-indexed**, so the first element is element 0!" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'a'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "letters = ['a', 'b', 'c']\n", "letters[0]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'c'" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "letters[2]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Specifying an invalid location will raise an error." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [ "ci-skip" ] }, "outputs": [ { "ename": "IndexError", "evalue": "list index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[24], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m letters[\u001b[39m4\u001b[39m]\n", "\u001b[0;31mIndexError\u001b[0m: list index out of range" ] } ], "source": [ "letters[4]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "
\n", "

Caution!

\n", "

Most programming languages are zero indexed, so a list with 3 elements has valid locations [0, 1, 2]. But this means that there is no element #3 in a 3-element list! Trying to access it will cause an out-of-range error. This is a common mistake for those new to programming (and sometimes it bites the veterans too).

\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Not only can you read individual elements using indexing; you can also *overwrite* elements." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['alpha', 'beta', 'gamma']" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "greek = ['alpha', 'beta', 'delta']\n", "greek[2] = 'gamma'\n", "greek" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Dictionaries" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Dictionaries are collections of **key-value pairs**.\n", "Think of a real dictionary -- you look up a word (a *key*), to find its definition (a *value*).\n", "Any given key can have only one value." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This concept has many names depending on language: map, associative array, dictionary, and more. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "In Python, dictionaries are represented with curly braces. Colons separate a key from its value, and (like lists) commas delimit elements." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "{'first_name': 'Ethan',\n", " 'last_name': 'Swan',\n", " 'alma_mater': 'Notre Dame',\n", " 'employer': '84.51˚',\n", " 'zip_code': 45208}" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ethan = {'first_name': 'Ethan',\n", " 'last_name': 'Swan',\n", " 'alma_mater': 'Notre Dame',\n", " 'employer': '84.51˚',\n", " 'zip_code': 45208}\n", "ethan" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'first_name': 'Brad',\n", " 'last_name': 'Boehmke',\n", " 'alma_mater': 'NDSU',\n", " 'employer': '84.51˚',\n", " 'zip_code': 45385}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "brad = {'first_name': 'Brad',\n", " 'last_name': 'Boehmke',\n", " 'alma_mater': 'NDSU',\n", " 'employer': '84.51˚',\n", " 'zip_code': 45385}\n", "brad" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Values can be looked up and set by passing a key in brackets." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "45208" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ethan['zip_code']" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'84.51˚'" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ethan['employer']" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'first_name': 'Ethan',\n", " 'last_name': 'Swan',\n", " 'alma_mater': 'Notre Dame',\n", " 'employer': 'Eighty Four Fifty One',\n", " 'zip_code': 45208}" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ethan['employer'] = 'Eighty Four Fifty One'\n", "ethan" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Dictionaries, like lists, are very flexible.\n", "Keys are generally strings (though some other types are allowed), and values can be anything -- including lists or other dictionaries!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Your Turn\n", "\n", "1. Create a list of the first 10 even numbers. Use indexing to find the 4th even number. *Remember that the 4th element is at location 3 because of zero-indexing!*\n", "2. Imagine you need a way to quickly determine a company's CEO given the company name. You could use a dictionary such that `ceos['Apple'] = 'Tim Cook'`. Try to add a few more keys to this starter dictionary. For example, Bob Iger is the CEO of Disney.\n", "\n", "```python\n", "ceos = {'Apple': 'Tim Cook',\n", " 'Microsoft': 'Satya Nadella'}\n", "```\n", "\n", "

\n", "\n", "
\n", "

Question

\n", "

How might you approach #2 if you needed to look up both the CEO and the CFO?

\n", "

What data structure would you use? There are several possible solutions.

\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## DataFrames\n", "In data science, the most important complex data structure is the **DataFrame**.\n", "DataFrames are a collection of tabular data -- you might think of them as *tables* or *datasets*, depending on your background." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let's take a look at one." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# Don't worry about this \"boilerplate\" code for now.\n", "import pandas as pd\n", "planes = pd.read_csv('../data/planes.csv')" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tailnumyeartypemanufacturermodelenginesseatsspeedengine
0N101562004.0Fixed wing multi engineEMBRAEREMB-145XR255NaNTurbo-fan
1N102UW1998.0Fixed wing multi engineAIRBUS INDUSTRIEA320-2142182NaNTurbo-fan
2N103US1999.0Fixed wing multi engineAIRBUS INDUSTRIEA320-2142182NaNTurbo-fan
3N104UW1999.0Fixed wing multi engineAIRBUS INDUSTRIEA320-2142182NaNTurbo-fan
4N105752002.0Fixed wing multi engineEMBRAEREMB-145LR255NaNTurbo-fan
\n", "
" ], "text/plain": [ " tailnum year type manufacturer model \\\n", "0 N10156 2004.0 Fixed wing multi engine EMBRAER EMB-145XR \n", "1 N102UW 1998.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 \n", "2 N103US 1999.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 \n", "3 N104UW 1999.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 \n", "4 N10575 2002.0 Fixed wing multi engine EMBRAER EMB-145LR \n", "\n", " engines seats speed engine \n", "0 2 55 NaN Turbo-fan \n", "1 2 182 NaN Turbo-fan \n", "2 2 182 NaN Turbo-fan \n", "3 2 182 NaN Turbo-fan \n", "4 2 55 NaN Turbo-fan " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Asking for the \"head\" of a DataFrame will show you the first 5 rows.\n", "planes.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "DataFrames have **column names** (tailnum, year, type, etc) and **row indexes** (the bold numbers on the left, starting at zero)." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tailnumyeartypemanufacturermodelenginesseatsspeedengine
0N101562004.0Fixed wing multi engineEMBRAEREMB-145XR255NaNTurbo-fan
1N102UW1998.0Fixed wing multi engineAIRBUS INDUSTRIEA320-2142182NaNTurbo-fan
2N103US1999.0Fixed wing multi engineAIRBUS INDUSTRIEA320-2142182NaNTurbo-fan
3N104UW1999.0Fixed wing multi engineAIRBUS INDUSTRIEA320-2142182NaNTurbo-fan
4N105752002.0Fixed wing multi engineEMBRAEREMB-145LR255NaNTurbo-fan
\n", "
" ], "text/plain": [ " tailnum year type manufacturer model \\\n", "0 N10156 2004.0 Fixed wing multi engine EMBRAER EMB-145XR \n", "1 N102UW 1998.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 \n", "2 N103US 1999.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 \n", "3 N104UW 1999.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 \n", "4 N10575 2002.0 Fixed wing multi engine EMBRAER EMB-145LR \n", "\n", " engines seats speed engine \n", "0 2 55 NaN Turbo-fan \n", "1 2 182 NaN Turbo-fan \n", "2 2 182 NaN Turbo-fan \n", "3 2 182 NaN Turbo-fan \n", "4 2 55 NaN Turbo-fan " ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Asking for the \"head\" of a DataFrame will show you the first 5 rows.\n", "planes.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The values (elements) within the DataFrame are the Python types we covered above: integers, floats, strings, and booleans." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "
\n", "

Question

\n", "

Which of these columns are strings?

\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Because DataFrames can hold almost any kind of data and support powerful *data wrangling* features, they have become the basic unit of data science work." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We will have a whole lesson later on DataFrames, so for now we'll move on." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Determining What Type of Data Structure Something Is" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "How can you determine the type of a variable?\n", "Pass it to the `type` function (we'll talk more about functions later)." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "int" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = 5\n", "type(x)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(planes)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "You can even pass values directly to the `type` function." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(7.2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Questions\n", "\n", "Are there any questions before we move on?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" }, "rise": { "autolaunch": true, "transition": "none" } }, "nbformat": 4, "nbformat_minor": 4 }