{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Importing Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "> There is no data science without data.\n",
    ">\n",
    "> \\- A wise person"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Applied Review"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Fundamentals and Data in Python\n",
    "\n",
    "* Python stores its data in **variables** - words that you choose to represent values you've stored\n",
    "* This is done using **assignment** - you assign data to a variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Packages/Modules and Data in Python\n",
    "\n",
    "* Data is frequently represented inside a **DataFrame** - a class from the pandas library\n",
    "* The pandas library has **methods** for importing different types of files into DataFrames - operations that import data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## General Model for Importing Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Memory and Size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "* Python stores its data in **memory** - this makes it relatively quickly accessible but can cause size limitations in certain fields."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "* With that being said, you are likely not going to run into space limitations anytime soon."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "* Python memory is session-specific, so quitting Python (i.e. shutting down JupyterLab) removes the data from memory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### General Framework"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "A general way to conceptualize data import into and use within Python:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "1. Data sits in on the computer/server - this is frequently called \"disk\"\n",
    "2. Python code can be used to copy a data file from disk to the Python session's memory\n",
    "3. Python data then sits within Python's memory ready to be used by other Python code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Here is a visualization of this process:\n",
    "\n",
    "\n",
    "<center>\n",
    "<img src=\"images/import-framework.png\" alt=\"import-framework.png\" width=\"1000\" height=\"1000\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Importing Tabular Data\n",
    "\n",
    "For much of data science, tabular data -- again, think 2-dimensional datasets -- is the most common format of data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Importing pandas\n",
    "\n",
    "This data format can be imported into Python using the pandas library. We can load pandas with the below code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition note alert alert-info\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Note</p></b>\n",
    "    <p>Recall that the pandas library is the primary library for representing and working with tabular data in Python.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Importing Tabular Data with Pandas\n",
    "\n",
    "pandas is preferred because it imports the data directly into a DataFrame -- the data structure of choice for tabular data in Python."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The `read_csv` function is used to import a tabular data file, a CSV, into a DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "planes = pd.read_csv('../data/planes.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "And recall we can visualize the first few rows of our DataFrame using the `head()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>N102UW</td>\n",
       "      <td>1998.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>N103US</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>N104UW</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>N10575</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145LR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  tailnum    year                     type      manufacturer      model  \\\n",
       "0  N10156  2004.0  Fixed wing multi engine           EMBRAER  EMB-145XR   \n",
       "1  N102UW  1998.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "2  N103US  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "3  N104UW  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "4  N10575  2002.0  Fixed wing multi engine           EMBRAER  EMB-145LR   \n",
       "\n",
       "   engines  seats  speed     engine  \n",
       "0        2     55    NaN  Turbo-fan  \n",
       "1        2    182    NaN  Turbo-fan  \n",
       "2        2    182    NaN  Turbo-fan  \n",
       "3        2    182    NaN  Turbo-fan  \n",
       "4        2     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "The `read_csv()` function has many parameters for importing data. A few examples:\n",
    "\n",
    "* `sep` - the data's delimiter\n",
    "* `header` - the row number containing the column names (0 indicates there is no header)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Full documentation can be pulled up by running the method name followed by a question mark:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;31mSignature:\u001b[0m\n",
      "\u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0;34m*\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0msep\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str | None | lib.NoDefault'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m<\u001b[0m\u001b[0mno_default\u001b[0m\u001b[0;34m>\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mdelimiter\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str | None | lib.NoDefault'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mheader\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m\"int | Sequence[int] | None | Literal['infer']\"\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'infer'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mnames\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'Sequence[Hashable] | None | lib.NoDefault'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m<\u001b[0m\u001b[0mno_default\u001b[0m\u001b[0;34m>\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mindex_col\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'IndexLabel | Literal[False] | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0musecols\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mdtype\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'DtypeArg | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mengine\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'CSVEngine | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mconverters\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mtrue_values\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mfalse_values\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mskipinitialspace\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mskiprows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mskipfooter\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'int'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mnrows\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'int | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mna_values\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mkeep_default_na\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mna_filter\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mverbose\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mskip_blank_lines\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mparse_dates\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool | Sequence[Hashable] | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0minfer_datetime_format\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool | lib.NoDefault'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m<\u001b[0m\u001b[0mno_default\u001b[0m\u001b[0;34m>\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mkeep_date_col\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mdate_parser\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m<\u001b[0m\u001b[0mno_default\u001b[0m\u001b[0;34m>\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mdate_format\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mdayfirst\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mcache_dates\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0miterator\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mchunksize\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'int | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mcompression\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'CompressionOptions'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'infer'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mthousands\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mdecimal\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'.'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mlineterminator\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mquotechar\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'\"'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mquoting\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'int'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mdoublequote\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mescapechar\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mcomment\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mencoding\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mencoding_errors\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'strict'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mdialect\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str | csv.Dialect | None'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mon_bad_lines\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'error'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mdelim_whitespace\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mlow_memory\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mmemory_map\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mfloat_precision\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m\"Literal['high', 'legacy'] | None\"\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mstorage_options\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'StorageOptions'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m    \u001b[0mdtype_backend\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'DtypeBackend | lib.NoDefault'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m<\u001b[0m\u001b[0mno_default\u001b[0m\u001b[0;34m>\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
      "\u001b[0;34m\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0;34m'DataFrame | TextFileReader'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mDocstring:\u001b[0m\n",
      "Read a comma-separated values (csv) file into DataFrame.\n",
      "\n",
      "Also supports optionally iterating or breaking of the file\n",
      "into chunks.\n",
      "\n",
      "Additional help can be found in the online docs for\n",
      "`IO Tools <https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html>`_.\n",
      "\n",
      "Parameters\n",
      "----------\n",
      "filepath_or_buffer : str, path object or file-like object\n",
      "    Any valid string path is acceptable. The string could be a URL. Valid\n",
      "    URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is\n",
      "    expected. A local file could be: file://localhost/path/to/table.csv.\n",
      "\n",
      "    If you want to pass in a path object, pandas accepts any ``os.PathLike``.\n",
      "\n",
      "    By file-like object, we refer to objects with a ``read()`` method, such as\n",
      "    a file handle (e.g. via builtin ``open`` function) or ``StringIO``.\n",
      "sep : str, default ','\n",
      "    Delimiter to use. If sep is None, the C engine cannot automatically detect\n",
      "    the separator, but the Python parsing engine can, meaning the latter will\n",
      "    be used and automatically detect the separator by Python's builtin sniffer\n",
      "    tool, ``csv.Sniffer``. In addition, separators longer than 1 character and\n",
      "    different from ``'\\s+'`` will be interpreted as regular expressions and\n",
      "    will also force the use of the Python parsing engine. Note that regex\n",
      "    delimiters are prone to ignoring quoted data. Regex example: ``'\\r\\t'``.\n",
      "delimiter : str, default ``None``\n",
      "    Alias for sep.\n",
      "header : int, list of int, None, default 'infer'\n",
      "    Row number(s) to use as the column names, and the start of the\n",
      "    data.  Default behavior is to infer the column names: if no names\n",
      "    are passed the behavior is identical to ``header=0`` and column\n",
      "    names are inferred from the first line of the file, if column\n",
      "    names are passed explicitly then the behavior is identical to\n",
      "    ``header=None``. Explicitly pass ``header=0`` to be able to\n",
      "    replace existing names. The header can be a list of integers that\n",
      "    specify row locations for a multi-index on the columns\n",
      "    e.g. [0,1,3]. Intervening rows that are not specified will be\n",
      "    skipped (e.g. 2 in this example is skipped). Note that this\n",
      "    parameter ignores commented lines and empty lines if\n",
      "    ``skip_blank_lines=True``, so ``header=0`` denotes the first line of\n",
      "    data rather than the first line of the file.\n",
      "names : array-like, optional\n",
      "    List of column names to use. If the file contains a header row,\n",
      "    then you should explicitly pass ``header=0`` to override the column names.\n",
      "    Duplicates in this list are not allowed.\n",
      "index_col : int, str, sequence of int / str, or False, optional, default ``None``\n",
      "  Column(s) to use as the row labels of the ``DataFrame``, either given as\n",
      "  string name or column index. If a sequence of int / str is given, a\n",
      "  MultiIndex is used.\n",
      "\n",
      "  Note: ``index_col=False`` can be used to force pandas to *not* use the first\n",
      "  column as the index, e.g. when you have a malformed file with delimiters at\n",
      "  the end of each line.\n",
      "usecols : list-like or callable, optional\n",
      "    Return a subset of the columns. If list-like, all elements must either\n",
      "    be positional (i.e. integer indices into the document columns) or strings\n",
      "    that correspond to column names provided either by the user in `names` or\n",
      "    inferred from the document header row(s). If ``names`` are given, the document\n",
      "    header row(s) are not taken into account. For example, a valid list-like\n",
      "    `usecols` parameter would be ``[0, 1, 2]`` or ``['foo', 'bar', 'baz']``.\n",
      "    Element order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``.\n",
      "    To instantiate a DataFrame from ``data`` with element order preserved use\n",
      "    ``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` for columns\n",
      "    in ``['foo', 'bar']`` order or\n",
      "    ``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]``\n",
      "    for ``['bar', 'foo']`` order.\n",
      "\n",
      "    If callable, the callable function will be evaluated against the column\n",
      "    names, returning names where the callable function evaluates to True. An\n",
      "    example of a valid callable argument would be ``lambda x: x.upper() in\n",
      "    ['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster\n",
      "    parsing time and lower memory usage.\n",
      "dtype : Type name or dict of column -> type, optional\n",
      "    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32,\n",
      "    'c': 'Int64'}\n",
      "    Use `str` or `object` together with suitable `na_values` settings\n",
      "    to preserve and not interpret dtype.\n",
      "    If converters are specified, they will be applied INSTEAD\n",
      "    of dtype conversion.\n",
      "\n",
      "    .. versionadded:: 1.5.0\n",
      "\n",
      "        Support for defaultdict was added. Specify a defaultdict as input where\n",
      "        the default determines the dtype of the columns which are not explicitly\n",
      "        listed.\n",
      "engine : {'c', 'python', 'pyarrow'}, optional\n",
      "    Parser engine to use. The C and pyarrow engines are faster, while the python engine\n",
      "    is currently more feature-complete. Multithreading is currently only supported by\n",
      "    the pyarrow engine.\n",
      "\n",
      "    .. versionadded:: 1.4.0\n",
      "\n",
      "        The \"pyarrow\" engine was added as an *experimental* engine, and some features\n",
      "        are unsupported, or may not work correctly, with this engine.\n",
      "converters : dict, optional\n",
      "    Dict of functions for converting values in certain columns. Keys can either\n",
      "    be integers or column labels.\n",
      "true_values : list, optional\n",
      "    Values to consider as True in addition to case-insensitive variants of \"True\".\n",
      "false_values : list, optional\n",
      "    Values to consider as False in addition to case-insensitive variants of \"False\".\n",
      "skipinitialspace : bool, default False\n",
      "    Skip spaces after delimiter.\n",
      "skiprows : list-like, int or callable, optional\n",
      "    Line numbers to skip (0-indexed) or number of lines to skip (int)\n",
      "    at the start of the file.\n",
      "\n",
      "    If callable, the callable function will be evaluated against the row\n",
      "    indices, returning True if the row should be skipped and False otherwise.\n",
      "    An example of a valid callable argument would be ``lambda x: x in [0, 2]``.\n",
      "skipfooter : int, default 0\n",
      "    Number of lines at bottom of file to skip (Unsupported with engine='c').\n",
      "nrows : int, optional\n",
      "    Number of rows of file to read. Useful for reading pieces of large files.\n",
      "na_values : scalar, str, list-like, or dict, optional\n",
      "    Additional strings to recognize as NA/NaN. If dict passed, specific\n",
      "    per-column NA values.  By default the following values are interpreted as\n",
      "    NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan',\n",
      "    '1.#IND', '1.#QNAN', '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'None',\n",
      "    'n/a', 'nan', 'null'.\n",
      "keep_default_na : bool, default True\n",
      "    Whether or not to include the default NaN values when parsing the data.\n",
      "    Depending on whether `na_values` is passed in, the behavior is as follows:\n",
      "\n",
      "    * If `keep_default_na` is True, and `na_values` are specified, `na_values`\n",
      "      is appended to the default NaN values used for parsing.\n",
      "    * If `keep_default_na` is True, and `na_values` are not specified, only\n",
      "      the default NaN values are used for parsing.\n",
      "    * If `keep_default_na` is False, and `na_values` are specified, only\n",
      "      the NaN values specified `na_values` are used for parsing.\n",
      "    * If `keep_default_na` is False, and `na_values` are not specified, no\n",
      "      strings will be parsed as NaN.\n",
      "\n",
      "    Note that if `na_filter` is passed in as False, the `keep_default_na` and\n",
      "    `na_values` parameters will be ignored.\n",
      "na_filter : bool, default True\n",
      "    Detect missing value markers (empty strings and the value of na_values). In\n",
      "    data without any NAs, passing na_filter=False can improve the performance\n",
      "    of reading a large file.\n",
      "verbose : bool, default False\n",
      "    Indicate number of NA values placed in non-numeric columns.\n",
      "skip_blank_lines : bool, default True\n",
      "    If True, skip over blank lines rather than interpreting as NaN values.\n",
      "parse_dates : bool or list of int or names or list of lists or dict, default False\n",
      "    The behavior is as follows:\n",
      "\n",
      "    * boolean. If True -> try parsing the index.\n",
      "    * list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3\n",
      "      each as a separate date column.\n",
      "    * list of lists. e.g.  If [[1, 3]] -> combine columns 1 and 3 and parse as\n",
      "      a single date column.\n",
      "    * dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call\n",
      "      result 'foo'\n",
      "\n",
      "    If a column or index cannot be represented as an array of datetimes,\n",
      "    say because of an unparsable value or a mixture of timezones, the column\n",
      "    or index will be returned unaltered as an object data type. For\n",
      "    non-standard datetime parsing, use ``pd.to_datetime`` after\n",
      "    ``pd.read_csv``.\n",
      "\n",
      "    Note: A fast-path exists for iso8601-formatted dates.\n",
      "infer_datetime_format : bool, default False\n",
      "    If True and `parse_dates` is enabled, pandas will attempt to infer the\n",
      "    format of the datetime strings in the columns, and if it can be inferred,\n",
      "    switch to a faster method of parsing them. In some cases this can increase\n",
      "    the parsing speed by 5-10x.\n",
      "\n",
      "    .. deprecated:: 2.0.0\n",
      "        A strict version of this argument is now the default, passing it has no effect.\n",
      "\n",
      "keep_date_col : bool, default False\n",
      "    If True and `parse_dates` specifies combining multiple columns then\n",
      "    keep the original columns.\n",
      "date_parser : function, optional\n",
      "    Function to use for converting a sequence of string columns to an array of\n",
      "    datetime instances. The default uses ``dateutil.parser.parser`` to do the\n",
      "    conversion. Pandas will try to call `date_parser` in three different ways,\n",
      "    advancing to the next if an exception occurs: 1) Pass one or more arrays\n",
      "    (as defined by `parse_dates`) as arguments; 2) concatenate (row-wise) the\n",
      "    string values from the columns defined by `parse_dates` into a single array\n",
      "    and pass that; and 3) call `date_parser` once for each row using one or\n",
      "    more strings (corresponding to the columns defined by `parse_dates`) as\n",
      "    arguments.\n",
      "\n",
      "    .. deprecated:: 2.0.0\n",
      "       Use ``date_format`` instead, or read in as ``object`` and then apply\n",
      "       :func:`to_datetime` as-needed.\n",
      "date_format : str or dict of column -> format, default ``None``\n",
      "   If used in conjunction with ``parse_dates``, will parse dates according to this\n",
      "   format. For anything more complex,\n",
      "   please read in as ``object`` and then apply :func:`to_datetime` as-needed.\n",
      "\n",
      "   .. versionadded:: 2.0.0\n",
      "dayfirst : bool, default False\n",
      "    DD/MM format dates, international and European format.\n",
      "cache_dates : bool, default True\n",
      "    If True, use a cache of unique, converted dates to apply the datetime\n",
      "    conversion. May produce significant speed-up when parsing duplicate\n",
      "    date strings, especially ones with timezone offsets.\n",
      "\n",
      "iterator : bool, default False\n",
      "    Return TextFileReader object for iteration or getting chunks with\n",
      "    ``get_chunk()``.\n",
      "\n",
      "    .. versionchanged:: 1.2\n",
      "\n",
      "       ``TextFileReader`` is a context manager.\n",
      "chunksize : int, optional\n",
      "    Return TextFileReader object for iteration.\n",
      "    See the `IO Tools docs\n",
      "    <https://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking>`_\n",
      "    for more information on ``iterator`` and ``chunksize``.\n",
      "\n",
      "    .. versionchanged:: 1.2\n",
      "\n",
      "       ``TextFileReader`` is a context manager.\n",
      "compression : str or dict, default 'infer'\n",
      "    For on-the-fly decompression of on-disk data. If 'infer' and 'filepath_or_buffer' is\n",
      "    path-like, then detect compression from the following extensions: '.gz',\n",
      "    '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2'\n",
      "    (otherwise no compression).\n",
      "    If using 'zip' or 'tar', the ZIP file must contain only one data file to be read in.\n",
      "    Set to ``None`` for no decompression.\n",
      "    Can also be a dict with key ``'method'`` set\n",
      "    to one of {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``, ``'tar'``} and other\n",
      "    key-value pairs are forwarded to\n",
      "    ``zipfile.ZipFile``, ``gzip.GzipFile``,\n",
      "    ``bz2.BZ2File``, ``zstandard.ZstdDecompressor`` or\n",
      "    ``tarfile.TarFile``, respectively.\n",
      "    As an example, the following could be passed for Zstandard decompression using a\n",
      "    custom compression dictionary:\n",
      "    ``compression={'method': 'zstd', 'dict_data': my_compression_dict}``.\n",
      "\n",
      "    .. versionadded:: 1.5.0\n",
      "        Added support for `.tar` files.\n",
      "\n",
      "    .. versionchanged:: 1.4.0 Zstandard support.\n",
      "\n",
      "thousands : str, optional\n",
      "    Thousands separator.\n",
      "decimal : str, default '.'\n",
      "    Character to recognize as decimal point (e.g. use ',' for European data).\n",
      "lineterminator : str (length 1), optional\n",
      "    Character to break file into lines. Only valid with C parser.\n",
      "quotechar : str (length 1), optional\n",
      "    The character used to denote the start and end of a quoted item. Quoted\n",
      "    items can include the delimiter and it will be ignored.\n",
      "quoting : int or csv.QUOTE_* instance, default 0\n",
      "    Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of\n",
      "    QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).\n",
      "doublequote : bool, default ``True``\n",
      "   When quotechar is specified and quoting is not ``QUOTE_NONE``, indicate\n",
      "   whether or not to interpret two consecutive quotechar elements INSIDE a\n",
      "   field as a single ``quotechar`` element.\n",
      "escapechar : str (length 1), optional\n",
      "    One-character string used to escape other characters.\n",
      "comment : str, optional\n",
      "    Indicates remainder of line should not be parsed. If found at the beginning\n",
      "    of a line, the line will be ignored altogether. This parameter must be a\n",
      "    single character. Like empty lines (as long as ``skip_blank_lines=True``),\n",
      "    fully commented lines are ignored by the parameter `header` but not by\n",
      "    `skiprows`. For example, if ``comment='#'``, parsing\n",
      "    ``#empty\\na,b,c\\n1,2,3`` with ``header=0`` will result in 'a,b,c' being\n",
      "    treated as the header.\n",
      "encoding : str, optional, default \"utf-8\"\n",
      "    Encoding to use for UTF when reading/writing (ex. 'utf-8'). `List of Python\n",
      "    standard encodings\n",
      "    <https://docs.python.org/3/library/codecs.html#standard-encodings>`_ .\n",
      "\n",
      "    .. versionchanged:: 1.2\n",
      "\n",
      "       When ``encoding`` is ``None``, ``errors=\"replace\"`` is passed to\n",
      "       ``open()``. Otherwise, ``errors=\"strict\"`` is passed to ``open()``.\n",
      "       This behavior was previously only the case for ``engine=\"python\"``.\n",
      "\n",
      "    .. versionchanged:: 1.3.0\n",
      "\n",
      "       ``encoding_errors`` is a new argument. ``encoding`` has no longer an\n",
      "       influence on how encoding errors are handled.\n",
      "\n",
      "encoding_errors : str, optional, default \"strict\"\n",
      "    How encoding errors are treated. `List of possible values\n",
      "    <https://docs.python.org/3/library/codecs.html#error-handlers>`_ .\n",
      "\n",
      "    .. versionadded:: 1.3.0\n",
      "\n",
      "dialect : str or csv.Dialect, optional\n",
      "    If provided, this parameter will override values (default or not) for the\n",
      "    following parameters: `delimiter`, `doublequote`, `escapechar`,\n",
      "    `skipinitialspace`, `quotechar`, and `quoting`. If it is necessary to\n",
      "    override values, a ParserWarning will be issued. See csv.Dialect\n",
      "    documentation for more details.\n",
      "on_bad_lines : {'error', 'warn', 'skip'} or callable, default 'error'\n",
      "    Specifies what to do upon encountering a bad line (a line with too many fields).\n",
      "    Allowed values are :\n",
      "\n",
      "        - 'error', raise an Exception when a bad line is encountered.\n",
      "        - 'warn', raise a warning when a bad line is encountered and skip that line.\n",
      "        - 'skip', skip bad lines without raising or warning when they are encountered.\n",
      "\n",
      "    .. versionadded:: 1.3.0\n",
      "\n",
      "    .. versionadded:: 1.4.0\n",
      "\n",
      "        - callable, function with signature\n",
      "          ``(bad_line: list[str]) -> list[str] | None`` that will process a single\n",
      "          bad line. ``bad_line`` is a list of strings split by the ``sep``.\n",
      "          If the function returns ``None``, the bad line will be ignored.\n",
      "          If the function returns a new list of strings with more elements than\n",
      "          expected, a ``ParserWarning`` will be emitted while dropping extra elements.\n",
      "          Only supported when ``engine=\"python\"``\n",
      "\n",
      "delim_whitespace : bool, default False\n",
      "    Specifies whether or not whitespace (e.g. ``' '`` or ``'    '``) will be\n",
      "    used as the sep. Equivalent to setting ``sep='\\s+'``. If this option\n",
      "    is set to True, nothing should be passed in for the ``delimiter``\n",
      "    parameter.\n",
      "low_memory : bool, default True\n",
      "    Internally process the file in chunks, resulting in lower memory use\n",
      "    while parsing, but possibly mixed type inference.  To ensure no mixed\n",
      "    types either set False, or specify the type with the `dtype` parameter.\n",
      "    Note that the entire file is read into a single DataFrame regardless,\n",
      "    use the `chunksize` or `iterator` parameter to return the data in chunks.\n",
      "    (Only valid with C parser).\n",
      "memory_map : bool, default False\n",
      "    If a filepath is provided for `filepath_or_buffer`, map the file object\n",
      "    directly onto memory and access the data directly from there. Using this\n",
      "    option can improve performance because there is no longer any I/O overhead.\n",
      "float_precision : str, optional\n",
      "    Specifies which converter the C engine should use for floating-point\n",
      "    values. The options are ``None`` or 'high' for the ordinary converter,\n",
      "    'legacy' for the original lower precision pandas converter, and\n",
      "    'round_trip' for the round-trip converter.\n",
      "\n",
      "    .. versionchanged:: 1.2\n",
      "\n",
      "storage_options : dict, optional\n",
      "    Extra options that make sense for a particular storage connection, e.g.\n",
      "    host, port, username, password, etc. For HTTP(S) URLs the key-value pairs\n",
      "    are forwarded to ``urllib.request.Request`` as header options. For other\n",
      "    URLs (e.g. starting with \"s3://\", and \"gcs://\") the key-value pairs are\n",
      "    forwarded to ``fsspec.open``. Please see ``fsspec`` and ``urllib`` for more\n",
      "    details, and for more examples on storage options refer `here\n",
      "    <https://pandas.pydata.org/docs/user_guide/io.html?\n",
      "    highlight=storage_options#reading-writing-remote-files>`_.\n",
      "\n",
      "    .. versionadded:: 1.2\n",
      "\n",
      "dtype_backend : {\"numpy_nullable\", \"pyarrow\"}, defaults to NumPy backed DataFrames\n",
      "    Which dtype_backend to use, e.g. whether a DataFrame should have NumPy\n",
      "    arrays, nullable dtypes are used for all dtypes that have a nullable\n",
      "    implementation when \"numpy_nullable\" is set, pyarrow is used for all\n",
      "    dtypes if \"pyarrow\" is set.\n",
      "\n",
      "    The dtype_backends are still experimential.\n",
      "\n",
      "    .. versionadded:: 2.0\n",
      "\n",
      "Returns\n",
      "-------\n",
      "DataFrame or TextFileReader\n",
      "    A comma-separated values (csv) file is returned as two-dimensional\n",
      "    data structure with labeled axes.\n",
      "\n",
      "See Also\n",
      "--------\n",
      "DataFrame.to_csv : Write DataFrame to a comma-separated values (csv) file.\n",
      "read_csv : Read a comma-separated values (csv) file into DataFrame.\n",
      "read_fwf : Read a table of fixed-width formatted lines into DataFrame.\n",
      "\n",
      "Examples\n",
      "--------\n",
      ">>> pd.read_csv('data.csv')  # doctest: +SKIP\n",
      "\u001b[0;31mFile:\u001b[0m      /usr/local/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/io/parsers/readers.py\n",
      "\u001b[0;31mType:\u001b[0m      function"
     ]
    }
   ],
   "source": [
    "pd.read_csv?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Your Turn\n",
    "\n",
    "1. Python stores its data in ____________. \n",
    "2. What happens to Python's data when the Python session is terminated?\n",
    "3. Load the `../data/flights.csv` data file into Python using the `pandas` library."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Importing Other Files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* While tabular data is the most popular in data science, other types of data will are used as well."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* These are *not* as important as the pandas DataFrame, but it *is* good to be exposed to them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* These additional data formats are going to be more common in a fully functional programming language like Python."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### JSON Files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "A common example is a [JSON](https://en.wikipedia.org/wiki/JSON) file -- these are non-tabular data files that are popular in data engineering due to their space efficiency and flexibility."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Here is an example JSON file:\n",
    "\n",
    "```json\n",
    "{\n",
    "    \"planeId\": \"1xc2345g\",\n",
    "    \"manufacturerDetails\": {\n",
    "        \"manufacturer\": \"Airbus\",\n",
    "        \"model\": \"A330\",\n",
    "        \"year\": 1999\n",
    "    },\n",
    "    \"airlineDetails\": {\n",
    "        \"currentAirline\": \"Southwest\",\n",
    "        \"previousAirlines\": {\n",
    "            \"1st\": \"Delta\"\n",
    "        },\n",
    "        \"lastPurchased\": 2013\n",
    "    },\n",
    "    \"numberOfFlights\": 4654\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition tip alert alert-warning\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Question</p></b>\n",
    "    <p>Does this JSON data structure remind you of a Python data structure?</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The JSON file bears a striking resemblance to the Python `dict` structure due to the key-value pairings."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Importing JSON Files\n",
    "\n",
    "JSON Files can be imported using the `json` library paired with the `with` statement and the `open()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "with open('../data/json_example.json', 'r') as f:\n",
    "    imported_json = json.load(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We can then verify that `input_file` is a `dict`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(imported_json)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "And we can view the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'planeId': '1xc2345g',\n",
       " 'manufacturerDetails': {'manufacturer': 'Airbus',\n",
       "  'model': 'A330',\n",
       "  'year': 1999},\n",
       " 'airlineDetails': {'currentAirline': 'Southwest',\n",
       "  'previousAirlines': {'1st': 'Delta'},\n",
       "  'lastPurchased': 2013},\n",
       " 'numberOfFlights': 4654}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "imported_json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Pickle Files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "So far, we've seen that tabular data files can be imported and represented as DataFrames and JSON files can be imported and represented as dicts, but what about other, more complex data?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Python's native data files are known as **Pickle** files:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* All Pickle files have the `.pickle` extension"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Pickle files are great for saving native Python data that can't easily be represented by other file types such as:\n",
    "  * pre-processed data,\n",
    "  * models,\n",
    "  * any other Python object..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Importing Pickle Files\n",
    "\n",
    "Pickle files can be imported using the `pickle` library paired with the `with` statement and the `open()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import pickle\n",
    "\n",
    "with open('../data/pickle_example.pickle', 'rb') as f:\n",
    "    imported_pickle = pickle.load(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We can view this file and see it's the same data as the JSON:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'planeId': '1xc2345g',\n",
       " 'manufacturerDetails': {'manufacturer': 'Airbus',\n",
       "  'model': 'A330',\n",
       "  'year': 1999},\n",
       " 'airlineDetails': {'currentAirline': 'Southwest',\n",
       "  'previousAirlines': {'1st': 'Delta'},\n",
       "  'lastPurchased': 2013},\n",
       " 'numberOfFlights': 4654}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "imported_pickle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "And that it was loaded directly as a `dict`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(imported_pickle)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Questions\n",
    "\n",
    "Are there any questions before we move on?"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  },
  "rise": {
   "autolaunch": true,
   "transition": "none"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}