{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# [CptS 111 Introduction to Algorithmic Problem Solving](https://github.com/gsprint23/cpts111)\n", "[Washington State University](https://wsu.edu)\n", "\n", "[Gina Sprint](http://eecs.wsu.edu/~gsprint/)\n", "# Files\n", "\n", "Learner objectives for this lesson\n", "* Understand file system paths\n", "* Open, read, write, and close files\n", "* Use `while` and `for` loops to read in and write data to files" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Storing Data\n", "Many applications require storing and retrieving data outside of a program. Think of the many different applications you regularly use, do they utilize information that has been saved in some way?\n", "* PC: Your operating system stores all of your settings, files, and machine state for you.\n", "* Banks: Every customer's transaction history is stored in massive databases in data warehouses. When you make a debit purchase, your account balance is retrieved from one of these servers to make sure you have enough money for your transaction. (`if balance >= purchase`)\n", "* Games: Your progress in a game is stored in a file so that when you turn off your console (or laptop), your progress isn't lost.\n", "\n", "* Search history: Websites save your recent searches in order to try and learn about your preferences and better predict what you will search for in the future.\n", "* Authentication: When you authorize an app to \"keep you logged in\", a token is being persistently stored by the app as your authentication so you don't have to type your username and password in each time.\n", "* Obviously many more.\n", "\n", "## Text Files\n", "A simple way to store data is in a *text file*, such as this simple text file, [transactions.txt](https://raw.githubusercontent.com/gsprint23/cpts111/master/lessons/files/transactions.txt), that stores an individual's credit card transaction history. Each line in the file represents a transaction price.\n", "\n", "To process data in a file, we typically take the following approach:\n", "1. Open the file\n", "1. Process the file\n", " * Read data (doesn't modify the file) or\n", " * Write data (overwrite existing file) or\n", " * Append data (retains existing information and adds new data)\n", "1. Close the file\n", "\n", "### Opening a File\n", "Before we can read from a file or write to a file, we first need to open the file and get a file object (AKA handle). We do this with the built-in function `open()`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# in_file is our variable connecting our program to transactions.txt\n", "# transactions.txt is a file I have in the *same folder* as this running Python file\n", "in_file = open(\"transactions.txt\", \"r\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### File Modes\n", "The first argument to `open()` is a string representing the path to the file and the second argument is the file opening *mode*: \n", "1. \"r\" for reading\n", " 1. File must exist or you will get an error\n", "1. \"w\" for writing\n", " 1. If the file does not exist, it is created\n", " 1. If the file does exist, it is cleared!\n", "1. \"a\" is for appending\n", " 1. If the file does not exist, it is created\n", " 1. If the file does exist, new data written to the file is added at the end of the file\n", "\n", "You can read more about modes [here](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files). \n", "\n", "`open()` returns an object that represents the connection between our program and transactions.txt.\n", "\n", "#### Paths\n", "The directory (or folder) where your Python script is running is called the *current directory*. When you open a file, Python looks for it in the current directory. \n", "\n", "If a file you want to open is in a directory other than the current directory, you will have to specify its path. The location of a file is represented by its path, the sequence of folders that the file is stored in, plus the file's name. There are two ways to specify a path:\n", "1. Relative path: a path to a file or directory relative to the current directory. For example: \"files\\transactions.txt\" refers to a file (\"transactions.txt\") in a directory (\"files\") in the current directory.\n", "1. Absolute path: a path to a file or directory specified by its exact location on your file system. For example: \"C:\\Users\\gsprint\\cpts111\\lessons\\files\\transactions.txt\" refers to a file (\"transactions.txt\") in the folder \"C:\\Users\\gsprint\\cpts111\\lessons\\files\" on my C:\\ drive.\n", "\n", "Note: On a windows machine, folders and file names in a path are separated by backslashes \"\\\". We know the backslash has a special purpose in Python, to escape certain characters, such as a newline \"\\n\"; therefore, you will have to escape a backslash: \"`\\\\`\" in your path to a file: `\"files\\\\transactions.txt\"`. Alternatively, you can specify your path as a raw string: `r\"files\\transactions.txt\"`. On a Unix-based machine (e.g. Mac, Linux distributions), the forward slash \"/\" is used in paths and you don't have to worry about this issue.\n", "\n", "### Closing a File\n", "When we are done with a file, we should close it with `close()`:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "in_file.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Processing a File\n", "Once a file is open, we want to process the data inside the file (reading) or save data to file (writing). Consider the example [transactions.txt](https://raw.githubusercontent.com/gsprint23/cpts111/master/lessons/files/transactions.txt) we opened earlier.\n", "\n", "#### Reading from a File\n", "We will use the `readline()` function to read in a *single* line in the file (in transactions.txt this is the purchase price as a **string including the newline character \\n**):" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9.98\n", " '9.98\\n' \n", "9.98 \n" ] } ], "source": [ "transaction = in_file.readline()\n", "# note the newline printed!! repr() shows non-printable characters like \\n\n", "print(transaction, repr(transaction), type(transaction))\n", "transaction = float(transaction)\n", "print(transaction, type(transaction))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Writing to a File\n", "Now, let's use use the `write()` function to write the transaction price we just read in to an output file called single_transaction.txt:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# creates the file if it does not exist\n", "# overwrites the file contents if it does exist\n", "out_file = open(\"single_transaction.txt\", \"w\")\n", "# save the value of transaction as string\n", "out_file.write(\"%.2f\" %(transaction))\n", "\n", "# close file because we are done with out_file\n", "out_file.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example Problem\n", "On average, how much money do I spend per credit card transaction?\n", "\n", "Algorithm:\n", "1. For each transaction\n", " 1. Read in the purchase price from file\n", " 1. Accumulate the total money spent so far\n", "1. Divide total money spent by total number of transactions\n", "1. Write the average transaction to file" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "13.42\n", "\n", "27.19\n", "\n", "9.98\n", "\n", "48.56\n", "\n", "33.71\n", "On average, you spend 26.57 per transaction\n" ] } ], "source": [ "def read_transaction_price(in_file):\n", " '''\n", " \n", " '''\n", " # readline() returns a string, including the newline character\n", " price = in_file.readline()\n", " # we need to convert the string returned by readline() to a numeric value\n", " return float(price)\n", "\n", "def compute_total_spent():\n", " '''\n", " \n", " '''\n", " total_spent = 0.0\n", " \n", " in_file = open(\"transactions.txt\", \"r\")\n", "\n", " # read in all 5 transactions in the file\n", " for i in range(5):\n", " total_spent += read_transaction_price(in_file)\n", " \n", " # close the file before in_file goes out of scope\n", " in_file.close()\n", " \n", " return total_spent\n", "\n", "total_spent = compute_total_spent()\n", "\n", "avg_spent_per_transaction = total_spent / 5.0\n", "\n", "out_file = open(\"avg_transaction.txt\", \"w\")\n", "out_file.write(\"On average, you spend %.2f per transaction\" %(avg_spent_per_transaction))\n", "out_file.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## File Reading\n", "### `for` Loops\n", "Let's rewrite our transaction code to read in as many transactions as there are in the file (instead of the hard-coded 5). Using a `for` loop, `` will be all of the lines in the input file, which we can get with a call to `in_file.readlines()`. Our `for` loop will walk through each line one at time with a loop control variable called `line`." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "13.42\n", "\n", "27.19\n", "\n", "9.98\n", "\n", "48.56\n", "\n", "33.71\n", "On average, you spend 26.57 per transaction\n" ] } ], "source": [ "def compute_avg_spent():\n", " '''\n", " \n", " '''\n", " # accumulator variable\n", " total_spent = 0.0\n", " # count the transactions\n", " num_transactions = 0\n", "\n", " # the input file contains lines that we will iterate through as our items\n", " for line in in_file.readlines():\n", " print(line)\n", " total_spent += float(line)\n", " num_transactions += 1\n", " \n", " # close the file before in_file goes out of scope\n", " in_file.close()\n", " \n", " return total_spent / num_transactions\n", "\n", "avg_spent_per_transaction = compute_avg_spent()\n", "\n", "print(\"On average, you spend %.2f per transaction\" %(avg_spent_per_transaction))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `while` Loops \n", "Let's rewrite our transaction processing code to use a `while` loop. `readline()` will return an empty string when the end of the file is reached. This can be used in our Boolean condition:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "13.42\n", "\n", "27.19\n", "\n", "9.98\n", "\n", "48.56\n", "\n", "33.71\n", "On average, you spend 26.57 per transaction\n" ] } ], "source": [ "def compute_avg_spent():\n", " '''\n", " \n", " '''\n", " # accumulator variable\n", " total_spent = 0.0\n", " # count the transactions\n", " num_transactions = 0\n", " \n", " in_file = open(\"transactions.txt\", \"r\")\n", "\n", " # read the first line in the file\n", " spent = in_file.readline()\n", " # test if this line is the empty string, meaning the end of file has been reached\n", " while spent != \"\":\n", " # not end of file, process this transaction\n", " print(spent)\n", " total_spent += float(spent)\n", " num_transactions += 1\n", " # progress toward Boolean condition being False here is progress through the file\n", " spent = in_file.readline()\n", " \n", " # close the file before in_file goes out of scope\n", " in_file.close()\n", " \n", " return total_spent / num_transactions\n", "\n", "avg_spent_per_transaction = compute_avg_spent()\n", "\n", "print(\"On average, you spend %.2f per transaction\" %(avg_spent_per_transaction))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MA13 Practice Problem\n", "On a blank sheet of paper, write the following:\n", "1. Your full name\n", "1. Your TA name\n", "1. MA #13\n", "\n", "Individually, solve the following problems as if they were exam questions. Each student needs to turn in their own paper to get credit for MA13.\n", "1. (2 pts) Evaluate to `True` or `False`:\n", " 1. `7 % 2 == 3 or 12 // 10 == 2.2`\n", " 1. `9 / 2 > 4 or -3 >= 0 and -5`\n", "1. (1 pt) Besides their name, what are the differences between a `while` loop and a `for` loop?\n", "1. (3 pts) Construct a `while` loop that displays numbers in the range [3, 30] inclusive that are multiples of 3. \n", "1. (4 pts) Construct a `for` loop that sums 10 randomly generated numbers in the range [4, 8] inclusive. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## TODO\n", "1. Read the chapters on files in the optional textbook.\n", "1. Keep working on PA4.\n", "1. Have a great spring break!\n", "\n", "## Next Lesson\n", "1. More practice with File I/O." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 1 }