{ "metadata": { "name": "", "signature": "sha256:e1b8445e6b0f20b65fe6791875006d68e64ce2430647cbe93e2d9231ae75ae87" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Reading and Writing files" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We have already talked about built-in Python types, but there are more types that we did not speak about. One of these is the ``file()`` object which can be used to read or write files." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Reading files" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let's try and get the contents of the file into IPython. We start off by creating a file object:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f = open('data/data.txt', 'r')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ``open`` function is taking the [data/data.txt](data/data.txt) file, opening it, and returning an object (which we call ``f``) that can then be used to access the data." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Note that ``f`` is not the data in the file, it is what is called a *file handle*, which points to the file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "type(f)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 2, "text": [ "_io.TextIOWrapper" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now, simply type:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.read()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 3, "text": [ "'RAJ DEJ Jmag e_Jmag\\n2000 (deg) 2000 (deg) 2MASS (mag) (mag) \\n---------- ---------- ----------------- ------ ------\\n010.684737 +41.269035 00424433+4116085 9.453 0.052\\n010.683469 +41.268585 00424403+4116069 9.321 0.022\\n010.685657 +41.269550 00424455+4116103 10.773 0.069\\n010.686026 +41.269226 00424464+4116092 9.299 0.063\\n010.683465 +41.269676 00424403+4116108 11.507 0.056\\n010.686015 +41.269630 00424464+4116106 9.399 0.045\\n010.685270 +41.267124 00424446+4116016 12.070 0.035\\n'" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The ``read()`` function basically just read the whole file and put the contents inside a string." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let's try this again:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.read()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 4, "text": [ "''" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's happened? We read the file, and the file 'pointer' is now sitting at the end of the file, and there is nothing left to read." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let's now try and do something more useful, and capture the contents of the file in a string:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f = open('data/data.txt', 'r')\n", "data = f.read()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now ``data`` should contain a string with the contents of the file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 6, "text": [ "'RAJ DEJ Jmag e_Jmag\\n2000 (deg) 2000 (deg) 2MASS (mag) (mag) \\n---------- ---------- ----------------- ------ ------\\n010.684737 +41.269035 00424433+4116085 9.453 0.052\\n010.683469 +41.268585 00424403+4116069 9.321 0.022\\n010.685657 +41.269550 00424455+4116103 10.773 0.069\\n010.686026 +41.269226 00424464+4116092 9.299 0.063\\n010.683465 +41.269676 00424403+4116108 11.507 0.056\\n010.686015 +41.269630 00424464+4116106 9.399 0.045\\n010.685270 +41.267124 00424446+4116016 12.070 0.035\\n'" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "But what we'd really like to do is read the file line by line. There are several ways to do this, the simplest of which is to use a ``for`` loop in the following way:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f = open('data/data.txt', 'r')\n", "for line in f:\n", " print(repr(line))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "'RAJ DEJ Jmag e_Jmag\\n'\n", "'2000 (deg) 2000 (deg) 2MASS (mag) (mag) \\n'\n", "'---------- ---------- ----------------- ------ ------\\n'\n", "'010.684737 +41.269035 00424433+4116085 9.453 0.052\\n'\n", "'010.683469 +41.268585 00424403+4116069 9.321 0.022\\n'\n", "'010.685657 +41.269550 00424455+4116103 10.773 0.069\\n'\n", "'010.686026 +41.269226 00424464+4116092 9.299 0.063\\n'\n", "'010.683465 +41.269676 00424403+4116108 11.507 0.056\\n'\n", "'010.686015 +41.269630 00424464+4116106 9.399 0.045\\n'\n", "'010.685270 +41.267124 00424446+4116016 12.070 0.035\\n'\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Note that we are using ``repr()`` to show any invisible characters (this will be useful in a minute). Also note that we are now looping over a file rather than a list, and this automatically reads in the next line at each iteration. Each line is being returned as a string. Notice the ``\\n`` at the end of each line - this is a line return character, which indicates the end of a line." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now we're reading in a file line by line, what would be nice would be to get some values out of it. Let's examine the last line in detail. If we just type ``line`` we should see the last line that was printed in the loop:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "line" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 8, "text": [ "'010.685270 +41.267124 00424446+4116016 12.070 0.035\\n'" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We can first get rid of the ``\\n`` character with:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "line = line.strip()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "line" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 10, "text": [ "'010.685270 +41.267124 00424446+4116016 12.070 0.035'" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Next, we can use what we learned about strings and lists to do:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "columns = line.split()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "columns" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 12, "text": [ "['010.685270', '+41.267124', '00424446+4116016', '12.070', '0.035']" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Finally, let's say we care about the object name (the 2MASS column), and the J band magnitude (the Jmag) column:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "name = columns[2]\n", "jmag = columns[3]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "name" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 14, "text": [ "'00424446+4116016'" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "jmag" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 15, "text": [ "'12.070'" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Note that ``jmag`` is a string, but if we want a floating point number, we can instead do:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "jmag = float(columns[3])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "jmag" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 17, "text": [ "12.07" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "One last piece of information we need about files is how we can read a single line. This is done using:\n", "\n", " line = f.readline()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We can put all this together to write a little script to read the data from the file and display the columns we care about to the screen! Here is is:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Open file\n", "f = open('data/data.txt', 'r')\n", "\n", "# Read and ignore header lines\n", "header1 = f.readline()\n", "header2 = f.readline()\n", "header3 = f.readline()\n", "\n", "# Loop over lines and extract variables of interest\n", "for line in f:\n", " line = line.strip()\n", " columns = line.split()\n", " name = columns[2]\n", " jmag = float(columns[3])\n", " print(name, jmag)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "00424433+4116085 9.453\n", "00424403+4116069 9.321\n", "00424455+4116103 10.773\n", "00424464+4116092 9.299\n", "00424403+4116108 11.507\n", "00424464+4116106 9.399\n", "00424446+4116016 12.07\n" ] } ], "prompt_number": 18 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exercise 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is a copy of the above code to read in the file. Modify this code so as to create a dictionary which gives ``Jmag`` for a given ``2MASS`` name, i.e.\n", "\n", " >>> jmag['00424455+4116103']\n", " 10.773\n", " \n", "Then loop over the items in the dictionary and print out for each the source name and the ``Jmag`` value." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# EDIT THE CODE BELOW\n", "\n", "# Open file\n", "f = open('data/data.txt', 'r')\n", "\n", "# Read and ignore header lines\n", "header1 = f.readline()\n", "header2 = f.readline()\n", "header3 = f.readline()\n", "\n", "# Loop over lines and extract variables of interest\n", "for line in f:\n", " line = line.strip()\n", " columns = line.split()\n", " name = columns[2]\n", " jmag = float(columns[3])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Bonus:** can you figure out a way to make sure that you loop over the source names in alphabetical order?" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Writing files" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To open a file for writing, use:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f = open('data_new.txt', 'w')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Then simply use ``f.write()`` to write any content to the file, for example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.write(\"Hello, World!\\n\")" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 21, "text": [ "14" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "If you want to write multiple lines, you can either give a list of strings to the ``writelines()`` method:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.writelines(['spam\\n', 'egg\\n', 'spam\\n'])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "or you can write them as a single string:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.write('spam\\negg\\nspam\\n')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 23, "text": [ "14" ] } ], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Once you have finished writing data to a file, you need to close it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.close()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "(this also applies to reading files)" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "The with-statement" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we have seen above, files must not just be opened but should be properly closed afterwards to make sure they are actually written before using them somewhere else. Sometimes writes to files get cached by Python to minimize actual writing to disk, which is comparably slow. Closing a file ensures that these changes are actually written.\n", "\n", "To avoid forgetting to close a file there is the with-statement." ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('data/data_new.txt', 'w') as f:\n", " f.write('spam\\n')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This opens the specified file and holds the file-object within ``f``, as well as closing the file when the with-codeblock ends. Afterwards, the file is properly closed and not available anymore." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exercise 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Continuing from the example in the 'Reading files' section, can you figure out a way to write out a new file containing two columns - the name of the source and the ``Jmag`` value?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# EDIT THE CODE BELOW\n", "\n", "# Open file\n", "f = open('data/data.txt', 'r')\n", "\n", "# Read and ignore header lines\n", "header1 = f.readline()\n", "header2 = f.readline()\n", "header3 = f.readline()\n", "\n", "# Loop over lines and extract variables of interest\n", "for line in f:\n", " line = line.strip()\n", " columns = line.split()\n", " name = columns[2]\n", " jmag = float(columns[3])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 26 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Notes" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "The above shows you how you can read and write any data file. Of course, there are also functions that exist to help you read in data in certain formats (for example ``numpy`` contains a function ``numpy.loadtxt`` to read in arrays from files) but the key is that with the above, you can ready any file, so any other function is then just making your life easier." ] } ], "metadata": {} } ] }