{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# [CptS 111 Introduction to Algorithmic Problem Solving](https://github.com/gsprint23/cpts111)\n",
"[Washington State University](https://wsu.edu)\n",
"\n",
"[Gina Sprint](http://eecs.wsu.edu/~gsprint/)\n",
"# Files\n",
"\n",
"Learner objectives for this lesson\n",
"* Understand file system paths\n",
"* Open, read, write, and close files\n",
"* Use `while` and `for` loops to read in and write data to files"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Storing Data\n",
"Many applications require storing and retrieving data outside of a program. Think of the many different applications you regularly use, do they utilize information that has been saved in some way?\n",
"* PC: Your operating system stores all of your settings, files, and machine state for you.\n",
"* Banks: Every customer's transaction history is stored in massive databases in data warehouses. When you make a debit purchase, your account balance is retrieved from one of these servers to make sure you have enough money for your transaction. (`if balance >= purchase`)\n",
"* Games: Your progress in a game is stored in a file so that when you turn off your console (or laptop), your progress isn't lost.\n",
"\n",
"* Search history: Websites save your recent searches in order to try and learn about your preferences and better predict what you will search for in the future.\n",
"* Authentication: When you authorize an app to \"keep you logged in\", a token is being persistently stored by the app as your authentication so you don't have to type your username and password in each time.\n",
"* Obviously many more.\n",
"\n",
"## Text Files\n",
"A simple way to store data is in a *text file*, such as this simple text file, [transactions.txt](https://raw.githubusercontent.com/gsprint23/cpts111/master/lessons/files/transactions.txt), that stores an individual's credit card transaction history. Each line in the file represents a transaction price.\n",
"\n",
"To process data in a file, we typically take the following approach:\n",
"1. Open the file\n",
"1. Process the file\n",
" * Read data (doesn't modify the file) or\n",
" * Write data (overwrite existing file) or\n",
" * Append data (retains existing information and adds new data)\n",
"1. Close the file\n",
"\n",
"### Opening a File\n",
"Before we can read from a file or write to a file, we first need to open the file and get a file object (AKA handle). We do this with the built-in function `open()`:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# in_file is our variable connecting our program to transactions.txt\n",
"# transactions.txt is a file I have in the *same folder* as this running Python file\n",
"in_file = open(\"transactions.txt\", \"r\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### File Modes\n",
"The first argument to `open()` is a string representing the path to the file and the second argument is the file opening *mode*: \n",
"1. \"r\" for reading\n",
" 1. File must exist or you will get an error\n",
"1. \"w\" for writing\n",
" 1. If the file does not exist, it is created\n",
" 1. If the file does exist, it is cleared!\n",
"1. \"a\" is for appending\n",
" 1. If the file does not exist, it is created\n",
" 1. If the file does exist, new data written to the file is added at the end of the file\n",
"\n",
"You can read more about modes [here](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files). \n",
"\n",
"`open()` returns an object that represents the connection between our program and transactions.txt.\n",
"\n",
"#### Paths\n",
"The directory (or folder) where your Python script is running is called the *current directory*. When you open a file, Python looks for it in the current directory. \n",
"\n",
"If a file you want to open is in a directory other than the current directory, you will have to specify its path. The location of a file is represented by its path, the sequence of folders that the file is stored in, plus the file's name. There are two ways to specify a path:\n",
"1. Relative path: a path to a file or directory relative to the current directory. For example: \"files\\transactions.txt\" refers to a file (\"transactions.txt\") in a directory (\"files\") in the current directory.\n",
"1. Absolute path: a path to a file or directory specified by its exact location on your file system. For example: \"C:\\Users\\gsprint\\cpts111\\lessons\\files\\transactions.txt\" refers to a file (\"transactions.txt\") in the folder \"C:\\Users\\gsprint\\cpts111\\lessons\\files\" on my C:\\ drive.\n",
"\n",
"Note: On a windows machine, folders and file names in a path are separated by backslashes \"\\\". We know the backslash has a special purpose in Python, to escape certain characters, such as a newline \"\\n\"; therefore, you will have to escape a backslash: \"`\\\\`\" in your path to a file: `\"files\\\\transactions.txt\"`. Alternatively, you can specify your path as a raw string: `r\"files\\transactions.txt\"`. On a Unix-based machine (e.g. Mac, Linux distributions), the forward slash \"/\" is used in paths and you don't have to worry about this issue.\n",
"\n",
"### Closing a File\n",
"When we are done with a file, we should close it with `close()`:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"in_file.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Processing a File\n",
"Once a file is open, we want to process the data inside the file (reading) or save data to file (writing). Consider the example [transactions.txt](https://raw.githubusercontent.com/gsprint23/cpts111/master/lessons/files/transactions.txt) we opened earlier.\n",
"\n",
"#### Reading from a File\n",
"We will use the `readline()` function to read in a *single* line in the file (in transactions.txt this is the purchase price as a **string including the newline character \\n**):"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9.98\n",
" '9.98\\n' \n",
"9.98 \n"
]
}
],
"source": [
"transaction = in_file.readline()\n",
"# note the newline printed!! repr() shows non-printable characters like \\n\n",
"print(transaction, repr(transaction), type(transaction))\n",
"transaction = float(transaction)\n",
"print(transaction, type(transaction))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Writing to a File\n",
"Now, let's use use the `write()` function to write the transaction price we just read in to an output file called single_transaction.txt:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# creates the file if it does not exist\n",
"# overwrites the file contents if it does exist\n",
"out_file = open(\"single_transaction.txt\", \"w\")\n",
"# save the value of transaction as string\n",
"out_file.write(\"%.2f\" %(transaction))\n",
"\n",
"# close file because we are done with out_file\n",
"out_file.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example Problem\n",
"On average, how much money do I spend per credit card transaction?\n",
"\n",
"Algorithm:\n",
"1. For each transaction\n",
" 1. Read in the purchase price from file\n",
" 1. Accumulate the total money spent so far\n",
"1. Divide total money spent by total number of transactions\n",
"1. Write the average transaction to file"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"13.42\n",
"\n",
"27.19\n",
"\n",
"9.98\n",
"\n",
"48.56\n",
"\n",
"33.71\n",
"On average, you spend 26.57 per transaction\n"
]
}
],
"source": [
"def read_transaction_price(in_file):\n",
" '''\n",
" \n",
" '''\n",
" # readline() returns a string, including the newline character\n",
" price = in_file.readline()\n",
" # we need to convert the string returned by readline() to a numeric value\n",
" return float(price)\n",
"\n",
"def compute_total_spent():\n",
" '''\n",
" \n",
" '''\n",
" total_spent = 0.0\n",
" \n",
" in_file = open(\"transactions.txt\", \"r\")\n",
"\n",
" # read in all 5 transactions in the file\n",
" for i in range(5):\n",
" total_spent += read_transaction_price(in_file)\n",
" \n",
" # close the file before in_file goes out of scope\n",
" in_file.close()\n",
" \n",
" return total_spent\n",
"\n",
"total_spent = compute_total_spent()\n",
"\n",
"avg_spent_per_transaction = total_spent / 5.0\n",
"\n",
"out_file = open(\"avg_transaction.txt\", \"w\")\n",
"out_file.write(\"On average, you spend %.2f per transaction\" %(avg_spent_per_transaction))\n",
"out_file.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## File Reading\n",
"### `for` Loops\n",
"Let's rewrite our transaction code to read in as many transactions as there are in the file (instead of the hard-coded 5). Using a `for` loop, `` will be all of the lines in the input file, which we can get with a call to `in_file.readlines()`. Our `for` loop will walk through each line one at time with a loop control variable called `line`."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"13.42\n",
"\n",
"27.19\n",
"\n",
"9.98\n",
"\n",
"48.56\n",
"\n",
"33.71\n",
"On average, you spend 26.57 per transaction\n"
]
}
],
"source": [
"def compute_avg_spent():\n",
" '''\n",
" \n",
" '''\n",
" # accumulator variable\n",
" total_spent = 0.0\n",
" # count the transactions\n",
" num_transactions = 0\n",
"\n",
" # the input file contains lines that we will iterate through as our items\n",
" for line in in_file.readlines():\n",
" print(line)\n",
" total_spent += float(line)\n",
" num_transactions += 1\n",
" \n",
" # close the file before in_file goes out of scope\n",
" in_file.close()\n",
" \n",
" return total_spent / num_transactions\n",
"\n",
"avg_spent_per_transaction = compute_avg_spent()\n",
"\n",
"print(\"On average, you spend %.2f per transaction\" %(avg_spent_per_transaction))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `while` Loops \n",
"Let's rewrite our transaction processing code to use a `while` loop. `readline()` will return an empty string when the end of the file is reached. This can be used in our Boolean condition:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"13.42\n",
"\n",
"27.19\n",
"\n",
"9.98\n",
"\n",
"48.56\n",
"\n",
"33.71\n",
"On average, you spend 26.57 per transaction\n"
]
}
],
"source": [
"def compute_avg_spent():\n",
" '''\n",
" \n",
" '''\n",
" # accumulator variable\n",
" total_spent = 0.0\n",
" # count the transactions\n",
" num_transactions = 0\n",
" \n",
" in_file = open(\"transactions.txt\", \"r\")\n",
"\n",
" # read the first line in the file\n",
" spent = in_file.readline()\n",
" # test if this line is the empty string, meaning the end of file has been reached\n",
" while spent != \"\":\n",
" # not end of file, process this transaction\n",
" print(spent)\n",
" total_spent += float(spent)\n",
" num_transactions += 1\n",
" # progress toward Boolean condition being False here is progress through the file\n",
" spent = in_file.readline()\n",
" \n",
" # close the file before in_file goes out of scope\n",
" in_file.close()\n",
" \n",
" return total_spent / num_transactions\n",
"\n",
"avg_spent_per_transaction = compute_avg_spent()\n",
"\n",
"print(\"On average, you spend %.2f per transaction\" %(avg_spent_per_transaction))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## MA13 Practice Problem\n",
"On a blank sheet of paper, write the following:\n",
"1. Your full name\n",
"1. Your TA name\n",
"1. MA #13\n",
"\n",
"Individually, solve the following problems as if they were exam questions. Each student needs to turn in their own paper to get credit for MA13.\n",
"1. (2 pts) Evaluate to `True` or `False`:\n",
" 1. `7 % 2 == 3 or 12 // 10 == 2.2`\n",
" 1. `9 / 2 > 4 or -3 >= 0 and -5`\n",
"1. (1 pt) Besides their name, what are the differences between a `while` loop and a `for` loop?\n",
"1. (3 pts) Construct a `while` loop that displays numbers in the range [3, 30] inclusive that are multiples of 3. \n",
"1. (4 pts) Construct a `for` loop that sums 10 randomly generated numbers in the range [4, 8] inclusive. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TODO\n",
"1. Read the chapters on files in the optional textbook.\n",
"1. Keep working on PA4.\n",
"1. Have a great spring break!\n",
"\n",
"## Next Lesson\n",
"1. More practice with File I/O."
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}