{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Working with CSV and XSLX files in Python\n",
    "\n",
    "<a href=\"http://creativecommons.org/licenses/by-nc/4.0/\" rel=\"license\"><img style=\"border-width: 0;\" src=\"https://i.creativecommons.org/l/by-nc/4.0/88x31.png\" alt=\"Creative Commons License\" /></a>\n",
    "\n",
    "This tutorial is licensed under a <a href=\"http://creativecommons.org/licenses/by-nc/4.0/\" rel=\"license\">Creative Commons Attribution-NonCommercial 4.0 International License</a>.\n",
    "\n",
    "## Lab Goals\n",
    "\n",
    "This lab covers various methods for reading structured data into Python using the `csv` and `openpyxl`. It also covers other types of delimiters and escape characters. The lab also overs writing data from Python to a structured data file.\n",
    "\n",
    "By the end of this lab, students will be able to:\n",
    "- Describe the structure and components of a `.csv` file\n",
    "- Read a `.csv` file into Python using the `csv` module and a `for` loop\n",
    "- Understand how to approach other types of delimited files\n",
    "- Understand how to work with escape characters when loading structured data\n",
    "- Understand how Python dictionaries work as a type of structured data\n",
    "- Write data from Python to a `.csv` file\n",
    "- Understand the basic of working with `.xlsx` files in Python\n",
    "\n",
    "## Acknowledgements\n",
    "\n",
    "Information and exercises in this lab are adapted from Al Sweigart's [*Automate the Boring Stuff With Python*](https://nostarch.com/automatestuff2) (No Starch Press, 2020).\n",
    "- Chapter 13, \"Working With Excel Spreadsheets\" (302-328)\n",
    "- Chapter 16 \"Working With CSV Files and JSON Data\" (371-388)\n",
    "\n",
    "# Table of Contents\n",
    "\n",
    "- [Data](#data)\n",
    "- [`.csv` data in Python](#csv-data-in-python)\n",
    "  * [What is a `.csv` file?](#what-is-a-csv-file)\n",
    "  * [Reading a `.csv` file into Python](#reading-a-csv-file-into-Python)\n",
    "    * [Reading `.csv` data using a `for` loop](#reading-csv-data-using-a-for-loop)\n",
    "  * [Other delimiters](#other-delimiters)\n",
    "  * [Escape characters](#escape-characters)\n",
    "- [Reading in `.csv` files using dictionaries](#reading-in-csv-files-using-dictionaries)\n",
    "- [Writing to a `.csv` file](#writing-to-a-csv-file)\n",
    "  * [Writing from a dictionary to a `.csv` file](#writing-from-a-dictionary-to-a-csv-file)\n",
    "- [Working with `.xlsx` files](#working-with-xlsx-files)\n",
    "- [Project Prompts](#project-prompts)\n",
    "- [Lab notebook questions](#lab-notebook-questions)\n",
    "\n",
    "# Data\n",
    "\n",
    "You'll need four data files for this lab.\n",
    "- `example.csv`\n",
    "- `example.txt`\n",
    "- `example.xlsx`\n",
    "- `exampleWithHeader.csv`\n",
    "\n",
    "They can all be downloaded from [this GitHub repository](https://github.com/kwaldenphd/tabular-data-python/tree/main/data) as individual files or a zip folder.\n",
    "\n",
    "You can also access them [via Google Drive](https://drive.google.com/drive/folders/1Sp_N34753ONJRU2AFKcocQ2DhCEhyL-m?usp=sharing) (ND users only).\n",
    "\n",
    "# `.csv` data in Python\n",
    "\n",
    "## What is a `.csv` file? \n",
    "\n",
    "1. CSV stands for \"comma-separated values.\" CSV files are tabular data structures (i.e. a spreadsheet), stored in a plain-text format.\n",
    "\n",
    "2. Python includes a built-in `csv` module that allows us to read in data from a `.csv` file (or other type of delimited plain-text file), as well as write data to a `.csv` file.\n",
    "\n",
    "3. In `.csv` files, each line represents a row in the spreadsheet, and commas separate cells in each row (thus the file format name comma-separated values).\n",
    "\n",
    "4. At first glance, `.csv` files look similar to proprietary spreadsheet program file formats, such as files saved in Microsoft Excel or Apple Numbers. \n",
    "\n",
    "5. However, file formats like `.xls` or `.xlsx` (Microsoft Excel) or `.numbers` (Apple Numbers) are NOT plain-text formats. Try to open these file types in a text editor and you'll quickly see the additional content and markup added by the spreadsheet program. More on this later.\n",
    "\n",
    "6. A few characteristics that distinguish `.csv` files (or other plain-text structured data formats) from proprietary spreadsheet file types:\n",
    "- Columns in a `.csv` file don't have a value type. Everything is a string.\n",
    "- Values in a `.csv` file don't have font or color formatting\n",
    "- `.csv` files only contain single worksheets\n",
    "- `.csv` files don't store formatting information like cell width/height\n",
    "- `.csv` files don't recognize merged cells or other kinds of special formatting (frozen or hidden rows/columns, embedded images, etc.)\n",
    "\n",
    "7. Given these limitations, especially compared to the way we often interact with spreadsheet programs like Microsoft Excel or Google Sheets, what's the advantage of working with `.csv` files?\n",
    "\n",
    "8. One key advantage of `.csv` files is their simplicity. You can load or open a `.csv` file in a text editor and be able to quickly see the values in the file. \n",
    "\n",
    "9. When working with data in a programming environment, `.csv` files as a plain-text format simplify the process of loading structured data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q1: Open the <code>example.xlsx</code> file in a text editor. Describe what you see.</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q2: How does your answer to Q1 compare to what you see when you open the <code>example.csv</code> file in a text editor?</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q3: Open the <code>example.xlsx</code> file in a spreadsheet program. Save the file as a <code>.csv</code> format. What happens? Or what happens when you open the newly-created <code>.csv</code> file in a spreadsheet program or text editor?</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading a `.csv` file into Python\n",
    "\n",
    "10. To read data from a `.csv` file into Python, we will use the `csv` module.\n",
    "\n",
    "<blockquote>Check out <a href = \"https://docs.python.org/3/library/csv.html#module-csv\">Python's documentation</a> to learn more about the <code>csv</code> module.</blockquote>\n",
    "\n",
    "11. The `csv` module allows us to create a `reader` object that iterates over lines in a `.csv` file.\n",
    "\n",
    "12. What does this workflow look like? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import csv module\n",
    "import csv\n",
    "\n",
    "# open csv file\n",
    "exampleFile = open('data/example.csv')\n",
    "# NOTE: If you are on Windows, you may have to change 'data/example.csv' to 'data\\example.csv'\n",
    "\n",
    "# create reader object from lines of data in example.csv file using csv.reader function\n",
    "exampleReader = csv.reader(exampleFile)\n",
    "\n",
    "# create list with rows of data from example.csv file\n",
    "exampleData = list(exampleReader)\n",
    "\n",
    "# output list rows\n",
    "exampleData"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "13. You'll notice that the `exampleData` output is a list of lists, or a list that contains sub-lists. \n",
    "\n",
    "14. Each row of data from the original `example.csv` file is a sub-list (with field values separated by commas) within the `exampleData` list.\n",
    "\n",
    "15. Now we can access the value at a particular row and column using the expression `exampleData[row][col]`, where `row` is the index position of one of the lists in `exampleData`, and `col` is the index position of the item located in that list.\n",
    "\n",
    "16. For example, `exampleData[0][0]` would give us the first string from the first list. `exampleData[0][1]` would give us the second string from the first list."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q4: Create a list of sublists and use the index positions to access specific values.</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q5: Read in small CSV and access specific values. Include code + comments.</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading `.csv` data using a `for` loop\n",
    "\n",
    "17. The method we just used to read data from a `.csv` file into Python loads the entire file into memory at once.\n",
    "\n",
    "18. If we use this method on a large `.csv` file, Python is going to try to load the entire file into memory at once. This does not bode well for Python or your computer's performance.\n",
    "\n",
    "19. We can use a `reader` object as part of a `for` loop to iterate through the lines in a `.csv` file and load the file line-by-line.\n",
    "\n",
    "<blockquote>Remember <code>for</code> loops iterate through each item in a series or list of items and performs the content of the loop on each item.</blockquote>\n",
    "\n",
    "20. What does this workflow look like?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import csv module\n",
    "import csv\n",
    "\n",
    "# open .csv file\n",
    "exampleFile = open('data/example.csv')\n",
    "# NOTE: If you are on Windows, you may have to change 'data/example.csv' to 'data\\example.csv'\n",
    "\n",
    "# create reader object from .csv file\n",
    "exampleReader = csv.reader(exampleFile)\n",
    "\n",
    "# iterate through each row in .csv file and print out row content and number\n",
    "for row in exampleReader:\n",
    "  print('Row #' + str(exampleReader.line_num) + ' ' + str(row))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "21. This program imports the `csv` module, makes a `reader` object from the `example.csv` file, and loops through each of the rows in the `reader` object.\n",
    "\n",
    "22. Each row is a list of values, and each value represents a cell.\n",
    "\n",
    "23. The `print()` function prints the current row number and that row's contents. \n",
    "\n",
    "24. The `reader` object includes a `line_num` variable, which contains the number of the current line.\n",
    "\n",
    "25. NOTE: The `reader` object can only be looped over once. If you need to re-read the same `.csv` file, you'll use `csv.reader` to create a new `reader` object."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q6: Read in a small CSV file using a <code>for</code> loop and access specific values. Include code + comments.</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Other delimiters\n",
    "\n",
    "26. But what happens if you need to load in structured data that uses another delimiter, not a comma? \n",
    "\n",
    "27. Remember when we opened a `.csv` file in a plain-text editor, the value fields are separated by a comma.\n",
    "\n",
    "28. But commas are not the only possible delimiter. Tabs, spaces, pipes, or other characters can be used to separate or delimit fields in a dataset.\n",
    "\n",
    "29. The `csv` module includes a range of formatting parameters, known as a `Dialect` class. \n",
    "\n",
    "30. The `Dialect` class includes a range of methods you can use to specify alternate delimiters and (as we'll discover shortly), handle situations like special characters, line breaks, etc.\n",
    "\n",
    "31. The `delimiter` attribute in the `Dialect` class lets us specify what delimiter is being used in the data we want to load."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import csv module\n",
    "import csv\n",
    "\n",
    "# load tab-separated value file\n",
    "tsv_file = open('data/example.txt')\n",
    "\n",
    "# create a reader object and specify the new delimiter\n",
    "read_tsv = csv.reader(tsv_file, delimiter=\"\\t\")\n",
    "\n",
    "# use a for loop to read in the data\n",
    "for row in read_tsv:\n",
    "  print(row)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q7: Modify code to load in example.txt file.</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Escape characters\n",
    "\n",
    "32. But what happens if the values in your dataset include the same character that's being used as a delimiter?\n",
    "\n",
    "33. For example, let's say you have address data in the following structure:\n",
    "<table>\n",
    "    <tr>\n",
    "        <td>Name</td>\n",
    "        <td>Age</td>\n",
    "        <td>Address</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>Jerry</td>\n",
    "        <td>10</td>\n",
    "        <td>2776 McDowell Street, Nashville, Tennessee</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>Tom</td>\n",
    "        <td>20</td>\n",
    "        <td>3171 Jessie Street, Westerville, Ohio</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>Mike</td>\n",
    "        <td>30</td>\n",
    "        <td>1818 Sherman Street, Hope, Kansas</td>\n",
    "    </tr>\n",
    "</table>\n",
    "\n",
    "34. In this example, we want to keep `Address` as an intact field and not separate based on the commas located within the address.\n",
    "\n",
    "35. In order to do this, we need to specify how Python parses fields that include the delimiter character.\n",
    "\n",
    "36. The `quotechar` attribute in the `Dialect` class specifies what character will be used to enclose fields that should be treated as distinct entities and not be split into columns or fields based on the presence of the delimiter character within the field.\n",
    "\n",
    "37. The default for `quotechar` is `\"` (double quotation marks).\n",
    "\n",
    "38. So what does that mean? We put double quotation marks around the field that includes the delimiter character.\n",
    "\n",
    "39. Modified data structure:\n",
    "<table>\n",
    "    <tr>\n",
    "        <td>Name</td>\n",
    "        <td>Age</td>\n",
    "        <td>Address</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>Jerry</td>\n",
    "        <td>10</td>\n",
    "        <td>\"2776 McDowell Street, Nashville, Tennessee\"</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>Tom</td>\n",
    "        <td>20</td>\n",
    "        <td>\"3171 Jessie Street, Westerville, Ohio\"</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>Mike</td>\n",
    "        <td>30</td>\n",
    "        <td>\"1818 Sherman Street, Hope, Kansas\"</td>\n",
    "    </tr>\n",
    "</table>\n",
    "\n",
    "40. Then we can read the modified data into Python.\n",
    "\n",
    "41. But what happens if we have quotation marks within a field that needs to be treated as a distinct entity?\n",
    "\n",
    "42. For example, the following data structure would run into problems when read into Python.\n",
    "<table>\n",
    "    <tr>\n",
    "        <td>Id</td>\n",
    "        <td>User</td>\n",
    "        <td>Comment</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>1</td>\n",
    "        <td>Bob</td>\n",
    "        <td>\"John said \"Hello World\"\"</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>2</td>\n",
    "        <td>Tom</td>\n",
    "        <td>\"\"The Magician\"\"</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>3</td>\n",
    "        <td>Harry</td>\n",
    "        <td>\"\"walk around the corner\" she explained to the child\"</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>4</td>\n",
    "        <td>Louis</td>\n",
    "        <td>\"He said, \"stop pulling the dog's tail\"\"</td>\n",
    "    </tr>\n",
    "</table>\n",
    "\n",
    "43. See our problem? The `Comment` field is enclosed with double quotation marks but also includes quotation marks in the field. \n",
    "\n",
    "44. We need Python to understand the enclosing double quotation marks serve a different purpose than the double quotation marks contained within the `Comment` field.\n",
    "\n",
    "45. We can use a blackslash `\\` character to escape the embedded double quotes.\n",
    "\n",
    "46. Modified data structure:\n",
    "<table>\n",
    "    <tr>\n",
    "        <td>Id</td>\n",
    "        <td>User</td>\n",
    "        <td>Comment</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>1</td>\n",
    "        <td>Bob</td>\n",
    "        <td>\"John said \\\"Hello World\\\"\"</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>2</td>\n",
    "        <td>Tom</td>\n",
    "        <td>\"\\\"The Magician\\\"\"</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>3</td>\n",
    "        <td>Harry</td>\n",
    "        <td>\"\\\"walk around the corner\\\" she explained to the child\"</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>4</td>\n",
    "        <td>Louis</td>\n",
    "        <td>\"He said, \\\"stop pulling the dog's tail\\\"\"</td>\n",
    "    </tr>\n",
    "</table>\n",
    "\n",
    "47. Since the default for `quotechar` is `\"`, we need to modify that default to reflect the new data structure.\n",
    "\n",
    "Create a file called `data.csv` to match the table above before running the code below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import csv module\n",
    "import csv\n",
    "\n",
    "# read csv using quote quotechar\n",
    "with open('data.csv', 'rt') as f:\n",
    "  csv_reader = csv.reader(f, skipinitialspace=True, quotechar='\\\\')\n",
    "  \n",
    "  for line in csv_reader:\n",
    "    print(line)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q8: Define escape characters in your own words. Describe a situation in which escape characters would be needed, and how you would address that challenge using Python syntax.</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reading in `.csv` files using dictionaries\n",
    "\n",
    "48. For `.csv` files that contain header rows, we might want to connect the header row values with subsequent row values.\n",
    "\n",
    "49. We can do this by reading the `.csv` file as a dictionary, rather than a list containing row sub-lists.\n",
    "\n",
    "50. Remember dictionaries have key-value pairs, where we can access a value by using its key name.\n",
    "\n",
    "51. For tabular data, we can think of the key as the field name contained in the header row and the value as column or field values.\n",
    "\n",
    "52. We read a `.csv` file to a dictionary using a `DictReader` object (versus the `csv.reader` object)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import csv module\n",
    "import csv\n",
    "\n",
    "# open csv file\n",
    "exampleFile = open('data/exampleWithHeader.csv')\n",
    "# NOTE: modify the forward slash to a backslash if you are on Windows.\n",
    "\n",
    "# reading exampleFile to a dictionary\n",
    "exampleDictReader = csv.DictReader(exampleFile)\n",
    "\n",
    "# set keys for key-value pairs\n",
    "for row in exampleDictReader:\n",
    "  print(row['Timestamp'], row['Fruit'], row['Quantity'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "53. Within the `for` loop, the `DictReader` object sets `row` to a dictionary object with keys derived from the headers in the first row.\n",
    "\n",
    "54. The `DictReader` object means we don't have to separate the header information from the rest of the data contained in the file, because the `DictReader` object does this for us.\n",
    "\n",
    "<blockquote>Q9: Load a small CSV file that includes header info. Include code + comments.</blockquote>\n",
    "\n",
    "55. But what can we do if we want to read to a dictionary a `.csv` file that doesn't incldue a header row?\n",
    "\n",
    "56. We can pass a second argument to the `DictReader()` function to manually set header names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import csv module\n",
    "import csv\n",
    "\n",
    "# open csv file\n",
    "exampleFile = open('data/example.csv')\n",
    "# NOTE: modify the forward slash to a backslash if you are on Windows.\n",
    "\n",
    "# reading exampleFile to a dictionary with added field names\n",
    "exampleDictReader = csv.DictReader(exampleFile, ['time', 'name', 'amount'])\n",
    "\n",
    "# set keys for key-value pairs\n",
    "for row in exampleDictReader:\n",
    "  print(row['time'], row['name'], row['amount'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q10: Load a small CSV that does not include headers and manually set the headers. Include code + comments.</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Writing to a `.csv` file\n",
    "\n",
    "57. We'll do more with writing to a `.csv` file later in the semester.\n",
    "\n",
    "58. But for now, we can create a `writer` object using the `csv.writer()` function to write data to a `.csv` file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import csv module\n",
    "import csv\n",
    "\n",
    "# create and open output.csv file in write mode\n",
    "outputFile = open('output.csv', 'w', newline='')\n",
    "\n",
    "# create writer object\n",
    "outputWriter = csv.writer(outputFile)\n",
    "\n",
    "# write first row\n",
    "outputWriter.writerow(['spam', 'eggs', 'bacon', 'ham'])\n",
    "\n",
    "# write another row\n",
    "outputWriter.writerow(['Hello, world!', 'eggs', 'bacon', 'ham'])\n",
    "\n",
    "# write a third row\n",
    "outputWriter.writerow([1, 2, 3.141592, 4])\n",
    "\n",
    "# close the output file\n",
    "outputFile.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "59. The `writerow()` method takes a list argument and writes that to a new row in the `writer` object, that is added to the `.csv` file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q11: Create your own small data structure and write it to CSV file. Describe what you expect to see and what actually happens when you look at the output. Include code + comments.</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Writing from a dictionary to a `.csv` file\n",
    "\n",
    "60. We can use the `DictWriter` object to write data in a dictionary to a `.csv` file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import csv module\n",
    "import csv\n",
    "\n",
    "# create and open output.csv file in write mode\n",
    "outputFile = open('output.csv', 'w', newline='')\n",
    "\n",
    "# create writer object\n",
    "outputDictWriter = csv.DictWriter(outputFile, fieldnames = ['Name', 'Pet', 'Phone'])\n",
    "\n",
    "# create header row\n",
    "outputDictWriter.writeheader()\n",
    "\n",
    "# write first row\n",
    "outputDictWriter.writerow({'Name': 'Alice', 'Pet': 'cat', 'Phone': '555-1234'})\n",
    "\n",
    "# write another row\n",
    "outputDictWriter.writerow({'Name': 'Bob', 'Phone': '555-9999'})\n",
    "\n",
    "# write a third row\n",
    "outputDictWriter.writerow({'Phone': '555-5555', 'Name': 'Carol', 'Pet': 'dog'})\n",
    "\n",
    "# close the output file\n",
    "outputFile.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "61. Note that the order of the key-value pairs in the dictionaries created manually using `writerow()` doesn't matter.\n",
    "\n",
    "62. Python writes the dictionaries to the `.csv` file using the order of the keys given to `DictWriter()`.\n",
    "\n",
    "63. Missing keys will be empty in the newly-created `.csv` file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q12: Create a small dictionary and write it to a CSV file. Include code + comments.</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Working with `.xlsx` files\n",
    "\n",
    "64. As noted earlier, file formats like `.xls` or `.xlsx` (Microsoft Excel) or `.numbers` (Apple Numbers) are NOT plain-text formats. \n",
    "\n",
    "65. Try to open these file types in a text editor and you'll quickly see the additional content and markup added by the spreadsheet program. More on this later.\n",
    "\n",
    "66. A few characteristics that distinguish Excel workbooks from `.csv` files (or other plain-text structured data formats):\n",
    "- Columns in an Excel workbook do have a value type, either determined by Excel or set manually.\n",
    "- Values in an Excel workbook can include font or color formatting\n",
    "- Excel workbooks can contain multiple sheets\n",
    "- Excel workbooks store formatting information like cell width/height\n",
    "- Excel files can include merged cells or other kinds of special formatting (frozen or hidden rows/columns, embedded images, formulas, etc.)\n",
    "\n",
    "67. In most cases, the most reliable option is to save each sheet of an Excel workbook to an individual `.csv` file which you can then read into Python.\n",
    "\n",
    "68. But if you need to load an Excel workbook directly into Python, the `openpyxl` module can let us read and modify Excel files.\n",
    "\n",
    "<blockquote><a href=\"https://openpyxl.readthedocs.io/en/stable/\">Click here</a> to learn more about the openpyxl module.</blockquote>\n",
    "\n",
    "69. First thing we need to do is install the `openpyxl` module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "!{sys.executable} -m pip install openpyxl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "70. We can load the `example.xlsx` workbook into Python using `openpyxl`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import openpyxl\n",
    "import openpyxl\n",
    "\n",
    "# load workbook\n",
    "wb = openpyxl.load_workbook('data/example.xlsx')\n",
    "# NOTE: modify the forward slash to a backslash if you are on Windows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "71. Note that if `example.xlsx` is not in the current working directory, otherwise you will want to provide the full file path.\n",
    "\n",
    "72. We can use the `sheetnames` attribute to see what sheets are in the workbook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import openpyxl\n",
    "import openpyxl\n",
    "\n",
    "# load workbook\n",
    "wb = openpyxl.load_workbook('data/example.xlsx')\n",
    "# NOTE: modify the forward slash to a backslash if you are on Windows.\n",
    "\n",
    "# get workbook sheet names\n",
    "wb.sheetnames\n",
    "\n",
    "# isolate a specific sheet from the workbook using its name\n",
    "sheet = wb['Sheet3']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<blockquote>Q13: How would you compare these two formats? When would you want one format vs the other?</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "73. To learn more about working with Excel workbooks in Python:\n",
    "- Al Sweigart, \"Chapter 13, Working With Excel Spreadsheets,\" in *Automate the Boring Stuff With Python* (No Starch Press, 2020): 302-328.\n",
    "- [`openpyxl` documentation](https://openpyxl.readthedocs.io/en/stable/)\n",
    "\n",
    "# Project Prompts\n",
    "\n",
    "## Project #1:\n",
    "\n",
    "Create an automated program that removes first row of files for everything in `removeCsvheader.zip` folder. \n",
    "\n",
    "Include code + comments. \n",
    "\n",
    "Describe a scenario in which you would want or need to do this. \n",
    "\n",
    "## Project #2:\n",
    "\n",
    "Navigate to an open data portal and download a `.csv` or `.xlsx` file. \n",
    "\n",
    "A few places to start: \n",
    "- [Data.gov](https://www.data.gov/) \n",
    "- [City of Chicago Data Portal](https://data.cityofchicago.org/) \n",
    "- [City of South Bend Open Data](https://data-southbend.opendata.arcgis.com/) \n",
    "\n",
    "Open the data in a spreadsheet program and/or text editor\n",
    "- What do you see?\n",
    "- How can we make sense of the data based on available documentation?\n",
    "\n",
    "Load the data in Python as list/sublists and as dictionary. What challenges do you encounter and how do you address/solve them. When would you want one format vs the other.\n",
    "\n",
    "## Project #3:\n",
    "\n",
    "Manually create a small dataset and write to a CSV file.\n",
    "\n",
    "# Lab Notebook Questions\n",
    "\n",
    "Q1: Open the `example.xlsx` file in a text editor. Describe what you see.\n",
    "\n",
    "Q2: How does your answer to Q1 compare to what you see when you open the `example.csv` file in a text editor?\n",
    "\n",
    "Q3: Open the `example.xlsx` file in a spreadsheet program. Save the file as a `.csv` format. What happens? Or what happens when you open the newly-created `.csv` file in a spreadsheet program or text editor?\n",
    "\n",
    "Q4: Create a list of sublists and use the index positions to access specific values. Include code + comments.\n",
    "\n",
    "Q5: Read in a small CSV file and access specific values. Include code + comments.\n",
    "\n",
    "Q6: Read in a small CSV file using a `for` loop and access specific values. Include code + comments.\n",
    "\n",
    "Q8: Modify code to load in `example.txt` file.\n",
    "\n",
    "Q8: Define escape characters in your own words. Describe a situation in which escape characters would be needed, and how you would address that challenge using Python syntax.\n",
    "\n",
    "Q9: Load a small CSV file that includes header info. Include code + comments.\n",
    "\n",
    "Q10: Load a small CSV that does include headers and manually set the headers. Include code + comments.\n",
    "\n",
    "Q11: Create your own small data structure and write it to CSV file. Describe what you expect to see and what actually happens when you look at the output. Include code + comments.\n",
    "\n",
    "Q12: Create a small dictionary and write it to a CSV file. Include code + comments. \n",
    "\n",
    "Q13: How would you compare these two formats? When would you want one format vs the other?"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}