{ "metadata": { "name": "", "signature": "sha256:0d1ad18c9460f964f5652b2f9b50238491d06f57390afc34113257990e3b74aa" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599_2014/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2014_fall_ASTR599/)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%run talktools.py" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Advanced String Manipulation\n", "# & File I/O\n", "\n", "One of the areas where Python has a distinct (and huge) advantage over lower-level languages like C is in its string manipulation. Operations that are downright painful in other languages can be accomplished very straightforwardly in Python." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The ``string`` module\n", "\n", "We can get a preview of what's available by examining the built-in ``string`` module" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import string\n", "dir(string)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "['ChainMap',\n", " 'Formatter',\n", " 'Template',\n", " '_TemplateMetaclass',\n", " '__builtins__',\n", " '__cached__',\n", " '__doc__',\n", " '__file__',\n", " '__initializing__',\n", " '__loader__',\n", " '__name__',\n", " '__package__',\n", " '_re',\n", " '_string',\n", " 'ascii_letters',\n", " 'ascii_lowercase',\n", " 'ascii_uppercase',\n", " 'capwords',\n", " 'digits',\n", " 'hexdigits',\n", " 'octdigits',\n", " 'printable',\n", " 'punctuation',\n", " 'whitespace']" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Modifying Case:\n", "#### ``lower()``, ``upper()``, ``title()``, ``capitalize()``, ``swapcase()``" ] }, { "cell_type": "code", "collapsed": false, "input": [ "s = \"HeLLo tHEre MY FriEND\"" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "s.upper()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "'HELLO THERE MY FRIEND'" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "s.lower()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "'hello there my friend'" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "s.title()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "'Hello There My Friend'" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "s.capitalize()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "'Hello there my friend'" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "s.swapcase()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "'hEllO TheRE my fRIend'" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Splitting, Cleaning, and Joining\n", "#### ``split()``, ``strip()``, ``join()``, ``replace()``" ] }, { "cell_type": "code", "collapsed": false, "input": [ "s.split()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "['HeLLo', 'tHEre', 'MY', 'FriEND']" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "L = s.capitalize().split()\n", "L" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "['Hello', 'there', 'my', 'friend']" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "s = '_'.join(L)\n", "s" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "'Hello_there_my_friend'" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "s.split('_')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "['Hello', 'there', 'my', 'friend']" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "''.join(s.split('_'))" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "'Hellotheremyfriend'" ] } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "s = \" Too many spaces! \"\n", "s.strip()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "'Too many spaces!'" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "s = \"*~*~*~*Super!!**~*~**~*~**~\"\n", "s.strip('*~')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "'Super!!'" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "s.rstrip('*~')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "'*~*~*~*Super!!'" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "s.lstrip('*~')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "'Super!!**~*~**~*~**~'" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "s.replace('*', '')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "'~~~Super!!~~~~~'" ] } ], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "s.replace('*', '').replace('~', '')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 21, "text": [ "'Super!!'" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Finding substrings\n", "#### ``find()``, ``startswith()``, ``endswith()``" ] }, { "cell_type": "code", "collapsed": false, "input": [ "s = \"The quick brown fox jumped\"\n", "s.find(\"fox\")" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 22, "text": [ "16" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "s[16:]" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 23, "text": [ "'fox jumped'" ] } ], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "s.find('booyah')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 24, "text": [ "-1" ] } ], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "s.startswith('The')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 25, "text": [ "True" ] } ], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "s.endswith('jumped')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 26, "text": [ "True" ] } ], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": [ "s.endswith('fox')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "False" ] } ], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Checking a string's contents\n", "#### ``isdigit()``, ``isalpha()``, ``islower()``, ``isupper()``, ``isspace()``, etc." ] }, { "cell_type": "code", "collapsed": false, "input": [ "'1234'.isdigit()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 28, "text": [ "True" ] } ], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "'123.45'.isdigit()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 29, "text": [ "False" ] } ], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "'ABC'.isalpha()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 30, "text": [ "True" ] } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "'ABC123'.isalpha()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 31, "text": [ "False" ] } ], "prompt_number": 31 }, { "cell_type": "code", "collapsed": false, "input": [ "\"ABC123\".isalnum()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 32, "text": [ "True" ] } ], "prompt_number": 32 }, { "cell_type": "code", "collapsed": false, "input": [ "'ABC easy as 123'.isalnum()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 33, "text": [ "False" ] } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "'hello'.islower()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 34, "text": [ "True" ] } ], "prompt_number": 34 }, { "cell_type": "code", "collapsed": false, "input": [ "'HELLO'.isupper()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 35, "text": [ "True" ] } ], "prompt_number": 35 }, { "cell_type": "code", "collapsed": false, "input": [ "'Hello'.istitle()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 36, "text": [ "True" ] } ], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "' '.isspace()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 37, "text": [ "True" ] } ], "prompt_number": 37 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## String Formatting\n", "\n", "### The old way\n", "\n", "The old-style string formatting operations will look familiar to those who have used C. Essentially, any ``%`` in the string indicates a replacement.\n", "\n", "Basic interface is \n", "\n", " \"%(format)\" % value\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from math import pi\n", "\"my favorite integer is %d, but my favorite float is %f.\" % (42, pi)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 38, "text": [ "'my favorite integer is 42, but my favorite float is 3.141593.'" ] } ], "prompt_number": 38 }, { "cell_type": "code", "collapsed": false, "input": [ "\"in exponential notation it's %e\" % pi" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 39, "text": [ "\"in exponential notation it's 3.141593e+00\"" ] } ], "prompt_number": 39 }, { "cell_type": "code", "collapsed": false, "input": [ "\"to choose smartly if exponential is needed: %g\" % pi" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 40, "text": [ "'to choose smartly if exponential is needed: 3.14159'" ] } ], "prompt_number": 40 }, { "cell_type": "code", "collapsed": false, "input": [ "\"or with a bigger number: %g\" % 123456787654321.0" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 41, "text": [ "'or with a bigger number: 1.23457e+14'" ] } ], "prompt_number": 41 }, { "cell_type": "code", "collapsed": false, "input": [ "\"rounded to three decimal places it's %.3f\" % pi" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 42, "text": [ "\"rounded to three decimal places it's 3.142\"" ] } ], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [ "\"an integer padded with spaces: %10d\" % 42" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 43, "text": [ "'an integer padded with spaces: 42'" ] } ], "prompt_number": 43 }, { "cell_type": "code", "collapsed": false, "input": [ "\"an integer padded on the right: %-10d\" % 42" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 44, "text": [ "'an integer padded on the right: 42 '" ] } ], "prompt_number": 44 }, { "cell_type": "code", "collapsed": false, "input": [ "\"an integer padded with zeros: %010d\" % 42" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 45, "text": [ "'an integer padded with zeros: 0000000042'" ] } ], "prompt_number": 45 }, { "cell_type": "code", "collapsed": false, "input": [ "\"we can also name our arguments: %(value)d\" % dict(value=3)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 46, "text": [ "'we can also name our arguments: 3'" ] } ], "prompt_number": 46 }, { "cell_type": "code", "collapsed": false, "input": [ "\"Escape the percent sign with an extra symbol: the %d%%\" % 99" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 47, "text": [ "'Escape the percent sign with an extra symbol: the 99%'" ] } ], "prompt_number": 47 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Read more about formats in the [Python docs](http://docs.python.org/release/2.7.2/library/stdtypes.html#string-formatting-operations)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Formatting: the new way\n", "\n", "New-style string formatting uses curly braces ``{}`` to contain the formats, which can be referenced by argument number and name:\n", "\n", " \"{0} {name}\".format(first, name=second)\"" ] }, { "cell_type": "code", "collapsed": false, "input": [ "\"{}{}\".format(\"ABC\", 123)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 48, "text": [ "'ABC123'" ] } ], "prompt_number": 48 }, { "cell_type": "code", "collapsed": false, "input": [ "\"{0}{1}\".format(\"ABC\", 123)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 49, "text": [ "'ABC123'" ] } ], "prompt_number": 49 }, { "cell_type": "code", "collapsed": false, "input": [ "\"{0}{0}\".format(\"ABC\", 123)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 50, "text": [ "'ABCABC'" ] } ], "prompt_number": 50 }, { "cell_type": "code", "collapsed": false, "input": [ "\"{1}{0}\".format(\"ABC\", 123)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 51, "text": [ "'123ABC'" ] } ], "prompt_number": 51 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Formatting comes after the ``:``" ] }, { "cell_type": "code", "collapsed": false, "input": [ "(\"%.2f\" % 3.14159) == \"{:.2f}\".format(3.14159)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 52, "text": [ "True" ] } ], "prompt_number": 52 }, { "cell_type": "code", "collapsed": false, "input": [ "\"{0:d} is an integer; {1:.3f} is a float\".format(42, pi)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 53, "text": [ "'42 is an integer; 3.142 is a float'" ] } ], "prompt_number": 53 }, { "cell_type": "code", "collapsed": false, "input": [ "\"{the_answer:010d} is an integer; {pi:.5g} is a float\".format(the_answer=42,\n", " pi=pi)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 54, "text": [ "'0000000042 is an integer; 3.1416 is a float'" ] } ], "prompt_number": 54 }, { "cell_type": "code", "collapsed": false, "input": [ "'{desire} to {place}'.format(desire='Fly me',\n", " place='The Moon')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 55, "text": [ "'Fly me to The Moon'" ] } ], "prompt_number": 55 }, { "cell_type": "code", "collapsed": false, "input": [ "# using a pre-defined dictionary\n", "f = {\"desire\": \"Won't you take me\",\n", " \"place\": \"funky town?\"}\n", "\n", "'{desire} to {place}'.format(**f)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 56, "text": [ "\"Won't you take me to funky town?\"" ] } ], "prompt_number": 56 }, { "cell_type": "code", "collapsed": false, "input": [ "# format also supports binary numbers\n", "\"int: {0:d}; hex: {0:x}; oct: {0:o}; bin: {0:b}\".format(42)" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 57, "text": [ "'int: 42; hex: 2a; oct: 52; bin: 101010'" ] } ], "prompt_number": 57 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## File I/O\n", "\n", "Let's create a file for us to read:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%file inout.dat\n", "Here is a nice file\n", "with a couple lines of text\n", "it is a haiku" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Writing inout.dat\n" ] } ], "prompt_number": 58 }, { "cell_type": "code", "collapsed": false, "input": [ "f = open('inout.dat')\n", "print(f.read())\n", "f.close()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Here is a nice file\n", "with a couple lines of text\n", "it is a haiku\n" ] } ], "prompt_number": 60 }, { "cell_type": "code", "collapsed": false, "input": [ "f = open('inout.dat')\n", "print(f.readlines())\n", "f.close()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['Here is a nice file\\n', 'with a couple lines of text\\n', 'it is a haiku']\n" ] } ], "prompt_number": 62 }, { "cell_type": "code", "collapsed": false, "input": [ "for line in open('inout.dat'):\n", " print(line.split())" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['Here', 'is', 'a', 'nice', 'file']\n", "['with', 'a', 'couple', 'lines', 'of', 'text']\n", "['it', 'is', 'a', 'haiku']\n" ] } ], "prompt_number": 64 }, { "cell_type": "code", "collapsed": false, "input": [ "# write() is the opposite of read()\n", "contents = open('inout.dat').read()\n", "out = open('my_output.dat', 'w')\n", "out.write(contents.replace(' ', '_'))\n", "out.close()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 65 }, { "cell_type": "code", "collapsed": false, "input": [ "!cat my_output.dat" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Here_is_a_nice_file\r\n", "with_a_couple_lines_of_text\r\n", "it_is_a_haiku" ] } ], "prompt_number": 66 }, { "cell_type": "code", "collapsed": false, "input": [ "# writelines() is the opposite of readlines()\n", "lines = open('inout.dat').readlines()\n", "out = open('my_output.dat', 'w')\n", "out.writelines(lines)\n", "out.close()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 67 }, { "cell_type": "code", "collapsed": false, "input": [ "!cat my_output.dat" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Here is a nice file\r\n", "with a couple lines of text\r\n", "it is a haiku" ] } ], "prompt_number": 68 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Breakout: clearing up some output\n", "\n", "Here is some code that creates a comma-delimited file of numbers with random precision, leading spaces, and formatting:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Don't modify this: it simply writes the example file\n", "f = open('messy_data.dat', 'w')\n", "import random\n", "for i in range(100):\n", " for j in range(5):\n", " f.write(' ' * random.randint(0, 6))\n", " f.write('%0*.*g' % (random.randint(8, 12),\n", " random.randint(5, 10),\n", " 100 * random.random()))\n", " if j != 4:\n", " f.write(',')\n", " f.write('\\n')\n", "f.close()" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 69 }, { "cell_type": "code", "collapsed": false, "input": [ "# Look at the first four lines of the file:\n", "!head -4 messy_data.dat" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 069.40604687, 0094.5912, 96.79042884, 0000055.655, 0023.7310709\r\n", " 10.52260323, 000032.757,00033.982631, 0090.194719, 43.57646106\r\n", " 040.527913, 00065.72179, 000086.8327,00011.0367,99.36526435\r\n", " 00000074.411, 3.816226122, 00047.43759, 000079.62696, 040.8001\r\n" ] } ], "prompt_number": 70 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Your task:** Write a program that reads in the contents of ``\"messy_data.dat\"`` and extracts the numbers from each line, using the string manipulations we used above (remember that ``float()`` will convert a suitable string to a floating-point number). \n", "\n", "Next write out a new file named ``\"clean_data.dat\"``. The new file should contain the same data as the old file, but with uniform formatting and aligned columns." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# your solution here\n", "\n" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 71 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### The numpy solution\n", "\n", "What you did above with text wrangling, ``numpy`` can do much more easily:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "data = np.loadtxt(\"messy_data.dat\", delimiter=',')\n", "np.savetxt(\"clean_data.dat\", data,\n", " delimiter=',', fmt=\"%8.4f\")" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 72 }, { "cell_type": "code", "collapsed": false, "input": [ "!head -5 clean_data.dat" ], "language": "python", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 69.4060, 94.5912, 96.7904, 55.6550, 23.7311\r\n", " 10.5226, 32.7570, 33.9826, 90.1947, 43.5765\r\n", " 40.5279, 65.7218, 86.8327, 11.0367, 99.3653\r\n", " 74.4110, 3.8162, 47.4376, 79.6270, 40.8001\r\n", " 77.2510, 79.3929, 36.7943, 71.0619, 74.8516\r\n" ] } ], "prompt_number": 73 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Still, text manipulation is a very good skill to have under your belt!" ] } ], "metadata": {} } ] }