{ "cells": [ { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ] }, "source": [ "# Basic Input and Output\n", "## What is Input and Output?\n", "\n", "With apologies to John Donne, no program is an island. Python programs run in a larger computing environment. They may **input** information from the computer or across the Internet. Or they may **output** information displayed to the screen or written to a file. These are examples of input and output or \"I/O\" in computer science parlance. In this notebook, we will cover the basics of Python I/O by examining string formatting, reading, and writing information to files. Finally, we will delve into a real world example involving snow depth data.\n", "\n", "## Strings and String Formatting\n", "\n", "### Strings\n", "\n", "In computer programming, a string is a sequence of letters or other characters. The \"Graupel\" string is a sequence of the characters `G`, `r`, `a`, `u`, `p`, `e`, `l`. You can print strings to the screen with the [print()](https://docs.python.org/3/library/functions.html#print) function. For example:\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "autoscroll": "json-false", "collapsed": false, "ein.tags": [ "worksheet-0" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Graupel\n" ] } ], "source": [ "print(\"Graupel\")" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ] }, "source": [ "Python strings can either be enclosed in single or double quotes. They can hold not just a single word, but any amount of text such as a paragraph of prose or web page content. They can be assigned to variables just like any other Python data type:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "autoscroll": "json-false", "collapsed": false, "ein.tags": [ "worksheet-0" ] }, "outputs": [], "source": [ "precip = \"Graupel\"" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ] }, "source": [ "Strings are immutable so they cannot be changed in-place similar to tuples. (See the [data structures](https://github.com/Unidata/online-python-training/blob/master/notebooks/Basic%20Data%20Structures.ipynb) notebook for a discussion on immutability.) Indeed, strings can be manipulated in a similar manner to tuples. Python strings are [str](https://docs.python.org/3/library/stdtypes.html#str) objects and support [many methods](https://docs.python.org/3/library/stdtypes.html#str) to act upon them such as [split()](https://docs.python.org/3/library/stdtypes.html#str.split), [join()](https://docs.python.org/3/library/stdtypes.html#str.join) and [replace()](https://docs.python.org/3/library/stdtypes.html#str.replace). Remember, strings are immutable so any method that \"changes\" a string really returns a new string. The original string remains the same.\n", "### String Formatting\n", "\n", "In any realistic program, you will typically incorporate variable information into a string. Imagine you have a Python program that analyzes radar reflectivity data from thunderstorms and will convey that information to the user. For example, consider this string:\n", "\n", "> The peak reflectivity of the thunderstorm cell is 50 dBZ.\n", "\n", "In this string the number \"50\" is variable depending on the data analyzed within the program, while the rest of the string is constant. Python offers several possibilities to print such strings but the best and most powerful is [Python 3 positional formatting](https://docs.python.org/3/library/stdtypes.html#str.format).\n", "\n", "#### Python 3 Positional Formatting\n", "\n", "In positional formatting, there is a \"literal\" or unchanging part of the string and zero or more replacement fields denoted by the `{}` curly braces. For example," ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "autoscroll": "json-false", "collapsed": false, "ein.tags": [ "worksheet-0" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The peak reflectivity of the thunderstorm cell is 50 dBZ.\n" ] } ], "source": [ "print(\"The peak reflectivity of the thunderstorm cell is {} dBZ.\".format(50))" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ] }, "source": [ "The curly braces are swapped out with the arguments of the `format` method which can take zero or more arguments that will match the curly braces.\n", "[Formatting syntax can be quite elaborate](https://docs.python.org/3/library/string.html#formatstrings) and is a [mini-language](https://docs.python.org/3/library/string.html#format-specification-mini-language) within Python. For brevity, we will not cover this topic in any depth, but we will look at a few examples that examine formatting numbers, a common concern in scientific programming.\n", "\n", "##### Decimal Numbers\n", "\n", "We will look more closely at positional formatting involving decimal numbers." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "autoscroll": "json-false", "collapsed": false, "ein.tags": [ "worksheet-0" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "unity is 1, e is 2.72 and pi is 3.142\n" ] } ], "source": [ "print(\"unity is {}, e is {:.2f} and pi is {:.3f}\".format(1, 2.71828, 3.14159))" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ] }, "source": [ "Let's study the `{:.2f}` field. The `:` signifies the start of the string formatting, the `.2` describes the precision of the number after the decimal place (in this case two places) . The `f` denotes that we want a decimal number with a fixed number of digits after the decimal point. (Note, the formatted numbers have been properly rounded.)\n", "##### Scientific Notation\n", "\n", "Another common concern in scientific programming is the display of numbers in scientific notation:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "autoscroll": "json-false", "collapsed": false, "ein.tags": [ "worksheet-0" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The universal gas constant is 8.31e+03 J K-1 mol-1\n" ] } ], "source": [ "print('The universal gas constant is {:.2e} J K-1 mol-1'.format(8314.5))" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ] }, "source": [ "The `{:.2e}` field is largely the same as the field we described earlier except the `e` denotes that we wish to display the number using scientific notation format.\n", "## Reading and Writing Files\n", "\n", "Imagine you wish to share the results of your data analysis with the broader scientific community. Your program may have to **write** data to a file so that it can be uploaded to a data archive, for example. Or perhaps there are data files vital to your research that you want to **read** into your program so that they can be visualized. In these scenarios, it is essential you learn how to write to, and read from files.\n", "\n", "### `open()` built-in Function\n", "\n", "The first order of business is to understand the [open()](https://docs.python.org/3/library/functions.html#open) built-in Python function in conjunction with the `with...as` Python [idiom](https://docs.python.org/3.1/howto/doanddont.html). Imagine you have some data you wish to share with colleagues. You can write the data contained within a hypothetical `data` variable to the `data.txt` file in this manner:\n", "\n", "```python\n", "with open(\"data.txt\", 'w') as f:\n", " f.write(data)\n", "```\n", "\n", "Or maybe you have some data you want to analyze. You can read the contents from the `data.txt` file into the `data` variable.\n", "\n", "```python\n", "with open(\"data.txt\", 'r') as f:\n", " data = f.read()\n", "```\n", "\n", "The first parameter in the `open()` function, in this case `data.txt`, is the file on your computer you wish to read. The second, known as the `mode`, describes how you want to open the file and, in particular, if you want to read its contents, or if you aim to modify them. The options are `r` read only, `w` write only, `a` append, and `r+` read and write. The `mode` parameter can be omitted in which case it will default to `r`, read only mode. Be careful with write modes as you can erase files that are already present with the same name.\n", "\n", "Here we are using the `open()` functions with the `with ... as` Python idiom. The purpose of this idiom is to ensure the file is properly closed when you are finished with it. Otherwise, the responsibility of closing the file is left to the programmer with the [close()](https://docs.python.org/3/library/io.html#io.IOBase.close) method. The file object, which you will need to get your work done, appears after the `as` keyword. In this case, the file object is `f` and will only be available to you in the indented code block following the `with ... as` idiom. Keeping with its batteries-included philosophy, Python will close the file for you when that code block is done executing.\n", "\n", "### Snow Depth Data Exercise: Reading and Writing in Practice\n", "\n", "You are assigned to analyze [National Water and Climate Center](http://www.wcc.nrcs.usda.gov/) snow depth data from [SNOTEL](http://www.wcc.nrcs.usda.gov/snow/) [Site 936](http://wcc.sc.egov.usda.gov/nwcc/site?sitenum=936), Echo Lake, Colorado, USA. Here is a snippet from a file describing the snow data:\n", "\n", " Site Id,Date,Time,WTEQ.I-1 (in) ,PREC.I-1 (in) ,TOBS.I-1 (degC) ,TMAX.D-1 (degC) ,TMIN.D-1 (degC) ,TAVG.D-1 (degC) ,SNWD.I-1 (in) ,\n", " \n", " 936,2016-04-27,, 10.7, 16.7, -3.9, 1.7, -5.8, -2.7, 34,\n", " 936,2016-04-28,, 10.8, 16.8, -4.2, 4.3, -5.3, -1.7, 36,\n", " 936,2016-04-29,, 10.9, 17.0, -4.8, -2.7, -5.1, -4.3, 37,\n", " 936,2016-04-30,, 11.4, 17.5, -6.0, -2.4, -6.0, -4.6, 43,\n", " 936,2016-05-01,, 11.8, 18.0, -7.4, -3.1, -7.5, -5.6, 48,\n", "\n", "The SNOTEL data are expressed in comma-separated values (CSV) format [with the column headers describing the data](http://wcc.sc.egov.usda.gov/nwcc/sensorhistory?sitenum=936). For example, `WTEQ.I-1` is \"Snow Water Equivalent\" in inches, and `SNWD.I-1` is \"Snow Depth\" in inches. (Note, the data are in a mixture of English and metric units.)\n", "\n", "#### Read the Data File\n", "\n", "Let's examine the `snow.csv` data file by reading it with Python. (Python has a [library for handling CSV files](https://docs.python.org/3/library/csv.html), but for the purposes of this exercise, we will ignore it as it is not really needed.)\n", "\n", "We are going to open our CSV snow data file with the `with...as` Python idiom followed by a nested list comprehension to extract the data:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "autoscroll": "json-false", "collapsed": false, "ein.tags": [ "worksheet-0" ] }, "outputs": [], "source": [ "with open(\"data/snow.csv\", 'r') as file:\n", " snowdata = [entries for line in file for entries in [line.split(\",\")]\n", " if (len(entries) > 0 and entries[0].isdigit())]" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ] }, "source": [ "##### List Comprehension Explanation\n", "To read the data into the `snowdata` variable, we are using [nested list comprehension](https://docs.python.org/3/tutorial/datastructures.html#nested-list-comprehensions). In Python, list comprehension is a way of processing sequential data structures including lists, tuples and dictionaries. They take getting used to, but it will be worth your time to understand them as they make code clear and concise especially to other Pythonistas. We will deconstruct this nested list comprehension to better understand it.\n", "The entire list comprehension statement is enclosed in brackets: `[entries ... entries[0].isdigit())]`. The first part of the list comprehension, `for line in file`, loops through every line of the file. Each line is processed sequentially into the `line` string.\n", "\n", "The second part of the list comprehension, `for entries in [line.split(\",\")]`, takes the `line` we obtained from the first list comprehension and splits it (according to the commas) into the `entries` list. For example, this line:\n", "\n", " 936,2016-04-27,, 10.7, 16.7, -3.9, 1.7, -5.8, -2.7, 34,\n", "\n", "will be split into the `entries` list:\n", "\n", "```ipython\n", "[\"936\", \"2016-04-27\", \"\", \"10.7\", \"16.7\", \"-3.9\", \"1.7\", \"-5.8\", \"-2.7\", \"34\"]\n", "```\n", "\n", "The `if (len(entries) > 0 and entries[0].isdigit())` denotes that we only want lines with more than zero entries and that start with a number. This construct helps us get only the lines that contain data and will prevent us from grabbing the header, and will also avoid blank lines.\n", "\n", "The end results is a `snowdata` variable that looks like this:\n", "\n", " [['936' '2016-04-26' '' ..., ' 3.5' ' 34' '\\n']\n", " ['936' '2016-04-27' '' ..., ' -2.7' ' 34' '\\n']\n", " ['936' '2016-04-28' '' ..., ' -1.7' ' 36' '\\n']\n", " ..., \n", " ['936' '2016-05-24' '' ..., ' 3.3' ' 24' '\\n']\n", " ['936' '2016-05-25' '' ..., ' 4.9' ' 24' '\\n']\n", " ['936' '2016-05-26' '' ..., ' 6.1' ' 23' '\\n']]\n", "\n", "(The list has been abbreviated with `...` for clarity.) `snowdata` is a two-dimensional data structure (a list of lists) with the first dimension representing a row of data from a certain date, and the second dimension representing the individual entries as strings for a row of data. The `\\n` is a newline character that is invisible in the CSV file, but we can see in the `snowdata` list. It tells the computer to display subsequent characters on the next line when printing out to the screen or writing to a file.\n", "\n", "Now that we have our SNOTEL data in the `snowdata` variable, let's plot it, and write that plot to a file.\n", "\n", "#### Writing Results to File\n", "\n", "In this part of the exercise, we will fetch the data in the `snowdata` variable and create a histogram of snow depth over the month long interval. From the header information we examined earlier, we know the snow depth field is in the second to last column (remember the `\\n` newline character is in the last column). We have not yet learned about [matplotlib](http://matplotlib.org/) so we will rely upon our newly acquired knowledge of strings to make a text-based histogram. Each bin of the histogram, built by repeating the `-` character according to the snow depth, will be on a separate horizontal line of text.\n", "\n", "To create our histogram, we will gradually build up a lengthy string as we loop through the snow depth data building our histogram bins. Remember, strings are immutable so you cannot append to them without creating a new string as you loop through the data which is inefficient and frowned upon. Instead, there is a preferred (a.k.a. idiomatic) approach to build such strings with Python. Create an empty list (which is mutable), and append strings to that list. Finally, call the string [join()](https://docs.python.org/3/library/stdtypes.html#str.join) method to link all the strings contained in the list together into one big string. For example,\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "autoscroll": "json-false", "collapsed": false, "ein.tags": [ "worksheet-0" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hurricane, cyclone, typhoon\n" ] } ], "source": [ "# create an empty list\n", "storms = []\n", "# append strings to that list\n", "storms.append('hurricane')\n", "storms.append('cyclone')\n", "storms.append('typhoon')\n", "# join the strings together separated by commas\n", "s = ', '.join(storms)\n", "print(s)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ] }, "source": [ "With this knowledge of programatically building long strings, we can create our histogram. As we just described, we will create an empty list called `lines` and we will first append the header information. Then we will write a `for` loop to iterate through the snow data into the `d` variable (which is a list of the individual row entries) to grab the date (at index 1) and the depth in the second to last column which we will obtain with the [negative index trick](https://docs.python.org/3/faq/programming.html#what-s-a-negative-index) (`d[-2]`). Finally, we will create our histogram bins by repeating the \"-\" character according to the snow depth. Python allows strings to be \"multiplied\" to repeat them. For example, \"Z\" \\* 4 results in \"ZZZZ\". We will use this tactic to build the bin. Note, we have to convert the snow depth string to an integer with the [int()](https://docs.python.org/3/library/functions.html#int) function." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "autoscroll": "json-false", "collapsed": false, "ein.tags": [ "worksheet-0" ] }, "outputs": [], "source": [ "# lines empty list\n", "lines = []\n", "# append the header\n", "lines.append(\"SNOTEL Site 936, Echo Lake, Colorado, USA\")\n", "lines.append(\"\")\n", "lines.append(\"{:<12}{:<4}\".format(\"date\", \"snow depth (inches)\"))\n", "# append the snow depth bins\n", "for d in snowdata:\n", " lines.append(\"{:<12}{:<4}{}\".format(d[1], d[-2].strip(), \"-\" * int(d[-2])))\n", "# join on newline so that each string in the lines list appears on a new line\n", "histogram = \"\\n\".join(lines)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ] }, "source": [ "We also take advantage of more positional formatting features (e.g., `{:<12}` and `{:<4}`) to consistently pad the strings so that the header and data align.\n", "We can now print our histogram!\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "autoscroll": "json-false", "collapsed": false, "ein.tags": [ "worksheet-0" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SNOTEL Site 936, Echo Lake, Colorado, USA\n", "\n", "date snow depth (inches)\n", "2016-04-26 34 ----------------------------------\n", "2016-04-27 34 ----------------------------------\n", "2016-04-28 36 ------------------------------------\n", "2016-04-29 37 -------------------------------------\n", "2016-04-30 43 -------------------------------------------\n", "2016-05-01 48 ------------------------------------------------\n", "2016-05-02 49 -------------------------------------------------\n", "2016-05-03 43 -------------------------------------------\n", "2016-05-04 40 ----------------------------------------\n", "2016-05-05 38 --------------------------------------\n", "2016-05-06 36 ------------------------------------\n", "2016-05-07 34 ----------------------------------\n", "2016-05-08 35 -----------------------------------\n", "2016-05-09 35 -----------------------------------\n", "2016-05-10 34 ----------------------------------\n", "2016-05-11 33 ---------------------------------\n", "2016-05-12 33 ---------------------------------\n", "2016-05-13 31 -------------------------------\n", "2016-05-14 30 ------------------------------\n", "2016-05-15 29 -----------------------------\n", "2016-05-16 26 --------------------------\n", "2016-05-17 34 ----------------------------------\n", "2016-05-18 33 ---------------------------------\n", "2016-05-19 32 --------------------------------\n", "2016-05-20 30 ------------------------------\n", "2016-05-21 28 ----------------------------\n", "2016-05-22 26 --------------------------\n", "2016-05-23 25 -------------------------\n", "2016-05-24 24 ------------------------\n", "2016-05-25 24 ------------------------\n", "2016-05-26 23 -----------------------\n" ] } ], "source": [ "print(histogram)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ] }, "source": [ "What does this visual representation of the snow depth data tell you? What conclusions can you draw or what additional questions do these data produce? What happened on the 17th of May?\n", "We will finally write the histogram to a file, completing the exercise." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "autoscroll": "json-false", "collapsed": false, "ein.tags": [ "worksheet-0" ] }, "outputs": [], "source": [ "with open(\"data/histogram.txt\", 'w') as f:\n", " f.write(histogram)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" }, "name": "Basic Input and Output.ipynb", "widgets": { "state": {}, "version": "1.1.1" } }, "nbformat": 4, "nbformat_minor": 0 }