{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "---\n",
    "title: \"Load a CSV File With Python and Pandas\"\n",
    "author: \"Andrew Bancroft\"\n",
    "date:   2019-05-22\n",
    "description: \"Several examples of how to load a csv file into a Pandas dataframe with Python\"\n",
    "type: technical_note\n",
    "draft: false\n",
    "comments: true\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"resources\" class=\"jump-target\"></a>\n",
    "<div class=\"resources\">\n",
    "  <div class=\"resources-header\">\n",
    "    Resources\n",
    "  </div>\n",
    "    <div class=\"resources-download-instructions\">\n",
    "    Right-click -> Save as...\n",
    "  </div>\n",
    "  <ul class=\"resources-content\">\n",
    "    <li>\n",
    "        <i class=\"fas fa-file-csv\"></i> <a href=\"https://github.com/andrewcbancroft/datadaylife-blog/raw/master/datasets/Car%20Sales.csv\">Car Sales.csv</a>\n",
    "    </li>\n",
    "    <li>\n",
    "        <i class=\"fas fa-book\"></i> <a href=\"https://raw.githubusercontent.com/andrewcbancroft/datadaylife-blog/master/content/notes/data-engineering-python/load-csv-file-with-python-pandas.ipynb\">load-csv-file-with-python-pandas.ipynb</a>\n",
    "    </li>\n",
    "  </ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Install the [Pandas](https://pandas.pydata.org/) library for your Python environment\n",
    "* Cells in this notebook expect the <a href=\"https://github.com/andrewcbancroft/datadaylife-blog/raw/master/datasets/Car%20Sales.csv\">Car Sales.csv</a> file to be in certain locations; specifics are in the cell itself\n",
    "* [Resources](#resources) to help you practice"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## First Things First"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Data From a CSV File"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### File is in the same directory as your Jupyter Notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>DealershipName</th>\n",
       "      <th>RedCars</th>\n",
       "      <th>SilverCars</th>\n",
       "      <th>BlackCars</th>\n",
       "      <th>BlueCars</th>\n",
       "      <th>MonthSold</th>\n",
       "      <th>YearSold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>902.0</td>\n",
       "      <td>650.0</td>\n",
       "      <td>754.0</td>\n",
       "      <td>792.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>710.0</td>\n",
       "      <td>476.0</td>\n",
       "      <td>518.0</td>\n",
       "      <td>492.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>248.0</td>\n",
       "      <td>912.0</td>\n",
       "      <td>606.0</td>\n",
       "      <td>350.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>782.0</td>\n",
       "      <td>912.0</td>\n",
       "      <td>858.0</td>\n",
       "      <td>446.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>278.0</td>\n",
       "      <td>354.0</td>\n",
       "      <td>482.0</td>\n",
       "      <td>752.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     DealershipName  RedCars  SilverCars  BlackCars  BlueCars  MonthSold  \\\n",
       "0  Clyde's Clunkers    902.0       650.0      754.0     792.0        1.0   \n",
       "1  Clyde's Clunkers    710.0       476.0      518.0     492.0        2.0   \n",
       "2  Clyde's Clunkers    248.0       912.0      606.0     350.0        3.0   \n",
       "3  Clyde's Clunkers    782.0       912.0      858.0     446.0        4.0   \n",
       "4  Clyde's Clunkers    278.0       354.0      482.0     752.0        5.0   \n",
       "\n",
       "   YearSold  \n",
       "0    2018.0  \n",
       "1    2018.0  \n",
       "2    2018.0  \n",
       "3    2018.0  \n",
       "4    2018.0  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Read the CSV file\n",
    "car_sales_data = pd.read_csv(\"Car Sales.csv\")\n",
    "\n",
    "# Show the first 5 rows\n",
    "car_sales_data.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### File is in a different directory than your Jupyter Notebook\n",
    "The example will use your \"home directory\" to make this example applicable across operating systems, but you can use any path as long as the file exists there..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "from os.path import expanduser as ospath\n",
    "\n",
    "user_home_directory = ospath(\"~\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>DealershipName</th>\n",
       "      <th>RedCars</th>\n",
       "      <th>SilverCars</th>\n",
       "      <th>BlackCars</th>\n",
       "      <th>BlueCars</th>\n",
       "      <th>MonthSold</th>\n",
       "      <th>YearSold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>902.0</td>\n",
       "      <td>650.0</td>\n",
       "      <td>754.0</td>\n",
       "      <td>792.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>710.0</td>\n",
       "      <td>476.0</td>\n",
       "      <td>518.0</td>\n",
       "      <td>492.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>248.0</td>\n",
       "      <td>912.0</td>\n",
       "      <td>606.0</td>\n",
       "      <td>350.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>782.0</td>\n",
       "      <td>912.0</td>\n",
       "      <td>858.0</td>\n",
       "      <td>446.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>278.0</td>\n",
       "      <td>354.0</td>\n",
       "      <td>482.0</td>\n",
       "      <td>752.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     DealershipName  RedCars  SilverCars  BlackCars  BlueCars  MonthSold  \\\n",
       "0  Clyde's Clunkers    902.0       650.0      754.0     792.0        1.0   \n",
       "1  Clyde's Clunkers    710.0       476.0      518.0     492.0        2.0   \n",
       "2  Clyde's Clunkers    248.0       912.0      606.0     350.0        3.0   \n",
       "3  Clyde's Clunkers    782.0       912.0      858.0     446.0        4.0   \n",
       "4  Clyde's Clunkers    278.0       354.0      482.0     752.0        5.0   \n",
       "\n",
       "   YearSold  \n",
       "0    2018.0  \n",
       "1    2018.0  \n",
       "2    2018.0  \n",
       "3    2018.0  \n",
       "4    2018.0  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Make sure to use \"/\" slashes and not \"\\\" slashes\n",
    "# There actually needs to be folders named \"Path\" and \"To\" and \"CSV\" and \"File\"\n",
    "# in your home directory (the \"~\" means \"home directory\") for this cell to work\n",
    "csv_file_path = user_home_directory + \"/Path/To/CSV/File/Car Sales.csv\"\n",
    "\n",
    "other_path_car_sales_data = pd.read_csv(csv_file_path)\n",
    "\n",
    "# Show the first 5 rows\n",
    "other_path_car_sales_data.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### From a URL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>DealershipName</th>\n",
       "      <th>RedCars</th>\n",
       "      <th>SilverCars</th>\n",
       "      <th>BlackCars</th>\n",
       "      <th>BlueCars</th>\n",
       "      <th>MonthSold</th>\n",
       "      <th>YearSold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>902.0</td>\n",
       "      <td>650.0</td>\n",
       "      <td>754.0</td>\n",
       "      <td>792.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>710.0</td>\n",
       "      <td>476.0</td>\n",
       "      <td>518.0</td>\n",
       "      <td>492.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>248.0</td>\n",
       "      <td>912.0</td>\n",
       "      <td>606.0</td>\n",
       "      <td>350.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>782.0</td>\n",
       "      <td>912.0</td>\n",
       "      <td>858.0</td>\n",
       "      <td>446.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Clyde's Clunkers</td>\n",
       "      <td>278.0</td>\n",
       "      <td>354.0</td>\n",
       "      <td>482.0</td>\n",
       "      <td>752.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     DealershipName  RedCars  SilverCars  BlackCars  BlueCars  MonthSold  \\\n",
       "0  Clyde's Clunkers    902.0       650.0      754.0     792.0        1.0   \n",
       "1  Clyde's Clunkers    710.0       476.0      518.0     492.0        2.0   \n",
       "2  Clyde's Clunkers    248.0       912.0      606.0     350.0        3.0   \n",
       "3  Clyde's Clunkers    782.0       912.0      858.0     446.0        4.0   \n",
       "4  Clyde's Clunkers    278.0       354.0      482.0     752.0        5.0   \n",
       "\n",
       "   YearSold  \n",
       "0    2018.0  \n",
       "1    2018.0  \n",
       "2    2018.0  \n",
       "3    2018.0  \n",
       "4    2018.0  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Note the URL Encoding with \"%20\" for spaces\n",
    "url_to_csv_file = \"https://github.com/andrewcbancroft/datadaylife-blog/raw/master/datasets/Car%20Sales.csv\"\n",
    "\n",
    "# Read the CSV file\n",
    "url_car_sales_data = pd.read_csv(url_to_csv_file)\n",
    "\n",
    "# Show the first 5 rows\n",
    "url_car_sales_data.head(5)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}