{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Importing Data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "> There is no data science without data.\n", ">\n", "> \\- A wise person" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Applied Review" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Fundamentals and Data in Python\n", "\n", "* Python stores its data in **variables** - words that you choose to represent values you've stored\n", "* This is done using **assignment** - you assign data to a variable" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Packages/Modules and Data in Python\n", "\n", "* Data is frequently represented inside a **DataFrame** - a class from the pandas library\n", "* The pandas library has **methods** for importing different types of files into DataFrames - operations that import data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## General Model for Importing Data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Memory and Size" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "* Python stores its data in **memory** - this makes it relatively quickly accessible but can cause size limitations in certain fields." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "* With that being said, you are likely not going to run into space limitations anytime soon." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "* Python memory is session-specific, so quitting Python (i.e. shutting down JupyterLab) removes the data from memory." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### General Framework" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "A general way to conceptualize data import into and use within Python:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "1. Data sits in on the computer/server - this is frequently called \"disk\"\n", "2. Python code can be used to copy a data file from disk to the Python session's memory\n", "3. Python data then sits within Python's memory ready to be used by other Python code" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Here is a visualization of this process:\n", "\n", "\n", "
Note
\n", "Recall that the pandas library is the primary library for representing and working with tabular data in Python.
\n", "\n", " | tailnum | \n", "year | \n", "type | \n", "manufacturer | \n", "model | \n", "engines | \n", "seats | \n", "speed | \n", "engine | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "N10156 | \n", "2004.0 | \n", "Fixed wing multi engine | \n", "EMBRAER | \n", "EMB-145XR | \n", "2 | \n", "55 | \n", "NaN | \n", "Turbo-fan | \n", "
1 | \n", "N102UW | \n", "1998.0 | \n", "Fixed wing multi engine | \n", "AIRBUS INDUSTRIE | \n", "A320-214 | \n", "2 | \n", "182 | \n", "NaN | \n", "Turbo-fan | \n", "
2 | \n", "N103US | \n", "1999.0 | \n", "Fixed wing multi engine | \n", "AIRBUS INDUSTRIE | \n", "A320-214 | \n", "2 | \n", "182 | \n", "NaN | \n", "Turbo-fan | \n", "
3 | \n", "N104UW | \n", "1999.0 | \n", "Fixed wing multi engine | \n", "AIRBUS INDUSTRIE | \n", "A320-214 | \n", "2 | \n", "182 | \n", "NaN | \n", "Turbo-fan | \n", "
4 | \n", "N10575 | \n", "2002.0 | \n", "Fixed wing multi engine | \n", "EMBRAER | \n", "EMB-145LR | \n", "2 | \n", "55 | \n", "NaN | \n", "Turbo-fan | \n", "
Question
\n", "Does this JSON data structure remind you of a Python data structure?
\n", "