{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"title: \"Load an Excel File With Python and Pandas\"\n",
"author: \"Andrew Bancroft\"\n",
"date: 2019-05-22\n",
"description: \"Several examples of how to load an excel file into a Pandas dataframe with Python\"\n",
"type: technical_note\n",
"draft: false\n",
"comments: true\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"
\n",
" \n",
"
\n",
" Right-click -> Save as...\n",
"
\n",
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Install the [Pandas](https://pandas.pydata.org/) library for your Python environment\n",
"* Cells in this notebook expect the Car Sales.xlsx file to be in certain locations; specifics are in the cell itself\n",
"* [Resources](#resources) to help you practice"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## First Things First"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Data From Excel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### File is in the same directory as your Jupyter Notebook"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" DealershipName | \n",
" RedCars | \n",
" SilverCars | \n",
" BlackCars | \n",
" BlueCars | \n",
" MonthSold | \n",
" YearSold | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Clyde's Clunkers | \n",
" 902 | \n",
" 650 | \n",
" 754 | \n",
" 792 | \n",
" 1 | \n",
" 2018 | \n",
"
\n",
" \n",
" 1 | \n",
" Clyde's Clunkers | \n",
" 710 | \n",
" 476 | \n",
" 518 | \n",
" 492 | \n",
" 2 | \n",
" 2018 | \n",
"
\n",
" \n",
" 2 | \n",
" Clyde's Clunkers | \n",
" 248 | \n",
" 912 | \n",
" 606 | \n",
" 350 | \n",
" 3 | \n",
" 2018 | \n",
"
\n",
" \n",
" 3 | \n",
" Clyde's Clunkers | \n",
" 782 | \n",
" 912 | \n",
" 858 | \n",
" 446 | \n",
" 4 | \n",
" 2018 | \n",
"
\n",
" \n",
" 4 | \n",
" Clyde's Clunkers | \n",
" 278 | \n",
" 354 | \n",
" 482 | \n",
" 752 | \n",
" 5 | \n",
" 2018 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" DealershipName RedCars SilverCars BlackCars BlueCars MonthSold \\\n",
"0 Clyde's Clunkers 902 650 754 792 1 \n",
"1 Clyde's Clunkers 710 476 518 492 2 \n",
"2 Clyde's Clunkers 248 912 606 350 3 \n",
"3 Clyde's Clunkers 782 912 858 446 4 \n",
"4 Clyde's Clunkers 278 354 482 752 5 \n",
"\n",
" YearSold \n",
"0 2018 \n",
"1 2018 \n",
"2 2018 \n",
"3 2018 \n",
"4 2018 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Read the Excel file\n",
"car_sales_data = pd.read_excel(\"Car Sales.xlsx\")\n",
"\n",
"# Show the first 5 rows\n",
"car_sales_data.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### File is in a different directory than your Jupyter Notebook\n",
"The example will use your \"home directory\" to make this example applicable across operating systems, but you can use any path as long as the file exists there..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Find the home directory using Python "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"from os.path import expanduser as ospath\n",
"\n",
"user_home_directory = ospath(\"~\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" DealershipName | \n",
" RedCars | \n",
" SilverCars | \n",
" BlackCars | \n",
" BlueCars | \n",
" MonthSold | \n",
" YearSold | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Clyde's Clunkers | \n",
" 902 | \n",
" 650 | \n",
" 754 | \n",
" 792 | \n",
" 1 | \n",
" 2018 | \n",
"
\n",
" \n",
" 1 | \n",
" Clyde's Clunkers | \n",
" 710 | \n",
" 476 | \n",
" 518 | \n",
" 492 | \n",
" 2 | \n",
" 2018 | \n",
"
\n",
" \n",
" 2 | \n",
" Clyde's Clunkers | \n",
" 248 | \n",
" 912 | \n",
" 606 | \n",
" 350 | \n",
" 3 | \n",
" 2018 | \n",
"
\n",
" \n",
" 3 | \n",
" Clyde's Clunkers | \n",
" 782 | \n",
" 912 | \n",
" 858 | \n",
" 446 | \n",
" 4 | \n",
" 2018 | \n",
"
\n",
" \n",
" 4 | \n",
" Clyde's Clunkers | \n",
" 278 | \n",
" 354 | \n",
" 482 | \n",
" 752 | \n",
" 5 | \n",
" 2018 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" DealershipName RedCars SilverCars BlackCars BlueCars MonthSold \\\n",
"0 Clyde's Clunkers 902 650 754 792 1 \n",
"1 Clyde's Clunkers 710 476 518 492 2 \n",
"2 Clyde's Clunkers 248 912 606 350 3 \n",
"3 Clyde's Clunkers 782 912 858 446 4 \n",
"4 Clyde's Clunkers 278 354 482 752 5 \n",
"\n",
" YearSold \n",
"0 2018 \n",
"1 2018 \n",
"2 2018 \n",
"3 2018 \n",
"4 2018 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Make sure to use \"/\" slashes and not \"\\\" slashes\n",
"# There actually needs to be folders named \"Path\" and \"To\" and \"Excel\" and \"File\"\n",
"# in your home directory (the \"~\" means \"home directory\") for this cell to work\n",
"excel_file_path = user_home_directory + \"/Path/To/Excel/File/Car Sales.xlsx\"\n",
"\n",
"other_path_car_sales_data = pd.read_excel(excel_file_path)\n",
"\n",
"# Show the first 5 rows\n",
"other_path_car_sales_data.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### From a URL"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" DealershipName | \n",
" RedCars | \n",
" SilverCars | \n",
" BlackCars | \n",
" BlueCars | \n",
" MonthSold | \n",
" YearSold | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Clyde's Clunkers | \n",
" 902 | \n",
" 650 | \n",
" 754 | \n",
" 792 | \n",
" 1 | \n",
" 2018 | \n",
"
\n",
" \n",
" 1 | \n",
" Clyde's Clunkers | \n",
" 710 | \n",
" 476 | \n",
" 518 | \n",
" 492 | \n",
" 2 | \n",
" 2018 | \n",
"
\n",
" \n",
" 2 | \n",
" Clyde's Clunkers | \n",
" 248 | \n",
" 912 | \n",
" 606 | \n",
" 350 | \n",
" 3 | \n",
" 2018 | \n",
"
\n",
" \n",
" 3 | \n",
" Clyde's Clunkers | \n",
" 782 | \n",
" 912 | \n",
" 858 | \n",
" 446 | \n",
" 4 | \n",
" 2018 | \n",
"
\n",
" \n",
" 4 | \n",
" Clyde's Clunkers | \n",
" 278 | \n",
" 354 | \n",
" 482 | \n",
" 752 | \n",
" 5 | \n",
" 2018 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" DealershipName RedCars SilverCars BlackCars BlueCars MonthSold \\\n",
"0 Clyde's Clunkers 902 650 754 792 1 \n",
"1 Clyde's Clunkers 710 476 518 492 2 \n",
"2 Clyde's Clunkers 248 912 606 350 3 \n",
"3 Clyde's Clunkers 782 912 858 446 4 \n",
"4 Clyde's Clunkers 278 354 482 752 5 \n",
"\n",
" YearSold \n",
"0 2018 \n",
"1 2018 \n",
"2 2018 \n",
"3 2018 \n",
"4 2018 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Note the URL Encoding with \"%20\" for spaces\n",
"url_to_excel_file = \"https://github.com/andrewcbancroft/datadaylife-blog/raw/master/datasets/Car%20Sales.xlsx\"\n",
"\n",
"# Read the Excel file\n",
"url_car_sales_data = pd.read_excel(url_to_excel_file)\n",
"\n",
"# Show the first 5 rows\n",
"url_car_sales_data.head(5)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}