{
"metadata": {
"name": "",
"signature": "sha256:0d1ad18c9460f964f5652b2f9b50238491d06f57390afc34113257990e3b74aa"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599_2014/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2014_fall_ASTR599/)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%run talktools.py"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "display_data",
"text": [
""
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Advanced String Manipulation\n",
"# & File I/O\n",
"\n",
"One of the areas where Python has a distinct (and huge) advantage over lower-level languages like C is in its string manipulation. Operations that are downright painful in other languages can be accomplished very straightforwardly in Python."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The ``string`` module\n",
"\n",
"We can get a preview of what's available by examining the built-in ``string`` module"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import string\n",
"dir(string)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"['ChainMap',\n",
" 'Formatter',\n",
" 'Template',\n",
" '_TemplateMetaclass',\n",
" '__builtins__',\n",
" '__cached__',\n",
" '__doc__',\n",
" '__file__',\n",
" '__initializing__',\n",
" '__loader__',\n",
" '__name__',\n",
" '__package__',\n",
" '_re',\n",
" '_string',\n",
" 'ascii_letters',\n",
" 'ascii_lowercase',\n",
" 'ascii_uppercase',\n",
" 'capwords',\n",
" 'digits',\n",
" 'hexdigits',\n",
" 'octdigits',\n",
" 'printable',\n",
" 'punctuation',\n",
" 'whitespace']"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Modifying Case:\n",
"#### ``lower()``, ``upper()``, ``title()``, ``capitalize()``, ``swapcase()``"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s = \"HeLLo tHEre MY FriEND\""
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.upper()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"'HELLO THERE MY FRIEND'"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.lower()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"'hello there my friend'"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.title()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"'Hello There My Friend'"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.capitalize()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"'Hello there my friend'"
]
}
],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.swapcase()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"'hEllO TheRE my fRIend'"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Splitting, Cleaning, and Joining\n",
"#### ``split()``, ``strip()``, ``join()``, ``replace()``"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.split()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"['HeLLo', 'tHEre', 'MY', 'FriEND']"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"L = s.capitalize().split()\n",
"L"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"['Hello', 'there', 'my', 'friend']"
]
}
],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s = '_'.join(L)\n",
"s"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"'Hello_there_my_friend'"
]
}
],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.split('_')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"['Hello', 'there', 'my', 'friend']"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"''.join(s.split('_'))"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"'Hellotheremyfriend'"
]
}
],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s = \" Too many spaces! \"\n",
"s.strip()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 16,
"text": [
"'Too many spaces!'"
]
}
],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s = \"*~*~*~*Super!!**~*~**~*~**~\"\n",
"s.strip('*~')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 17,
"text": [
"'Super!!'"
]
}
],
"prompt_number": 17
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.rstrip('*~')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 18,
"text": [
"'*~*~*~*Super!!'"
]
}
],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.lstrip('*~')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 19,
"text": [
"'Super!!**~*~**~*~**~'"
]
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.replace('*', '')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"'~~~Super!!~~~~~'"
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.replace('*', '').replace('~', '')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 21,
"text": [
"'Super!!'"
]
}
],
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Finding substrings\n",
"#### ``find()``, ``startswith()``, ``endswith()``"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s = \"The quick brown fox jumped\"\n",
"s.find(\"fox\")"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 22,
"text": [
"16"
]
}
],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s[16:]"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 23,
"text": [
"'fox jumped'"
]
}
],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.find('booyah')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 24,
"text": [
"-1"
]
}
],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.startswith('The')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 25,
"text": [
"True"
]
}
],
"prompt_number": 25
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.endswith('jumped')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 26,
"text": [
"True"
]
}
],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.endswith('fox')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 27,
"text": [
"False"
]
}
],
"prompt_number": 27
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Checking a string's contents\n",
"#### ``isdigit()``, ``isalpha()``, ``islower()``, ``isupper()``, ``isspace()``, etc."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'1234'.isdigit()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 28,
"text": [
"True"
]
}
],
"prompt_number": 28
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'123.45'.isdigit()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 29,
"text": [
"False"
]
}
],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'ABC'.isalpha()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 30,
"text": [
"True"
]
}
],
"prompt_number": 30
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'ABC123'.isalpha()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 31,
"text": [
"False"
]
}
],
"prompt_number": 31
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"ABC123\".isalnum()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 32,
"text": [
"True"
]
}
],
"prompt_number": 32
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'ABC easy as 123'.isalnum()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 33,
"text": [
"False"
]
}
],
"prompt_number": 33
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'hello'.islower()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 34,
"text": [
"True"
]
}
],
"prompt_number": 34
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'HELLO'.isupper()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 35,
"text": [
"True"
]
}
],
"prompt_number": 35
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'Hello'.istitle()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 36,
"text": [
"True"
]
}
],
"prompt_number": 36
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"' '.isspace()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 37,
"text": [
"True"
]
}
],
"prompt_number": 37
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## String Formatting\n",
"\n",
"### The old way\n",
"\n",
"The old-style string formatting operations will look familiar to those who have used C. Essentially, any ``%`` in the string indicates a replacement.\n",
"\n",
"Basic interface is \n",
"\n",
" \"%(format)\" % value\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from math import pi\n",
"\"my favorite integer is %d, but my favorite float is %f.\" % (42, pi)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 38,
"text": [
"'my favorite integer is 42, but my favorite float is 3.141593.'"
]
}
],
"prompt_number": 38
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"in exponential notation it's %e\" % pi"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 39,
"text": [
"\"in exponential notation it's 3.141593e+00\""
]
}
],
"prompt_number": 39
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"to choose smartly if exponential is needed: %g\" % pi"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 40,
"text": [
"'to choose smartly if exponential is needed: 3.14159'"
]
}
],
"prompt_number": 40
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"or with a bigger number: %g\" % 123456787654321.0"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 41,
"text": [
"'or with a bigger number: 1.23457e+14'"
]
}
],
"prompt_number": 41
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"rounded to three decimal places it's %.3f\" % pi"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 42,
"text": [
"\"rounded to three decimal places it's 3.142\""
]
}
],
"prompt_number": 42
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"an integer padded with spaces: %10d\" % 42"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 43,
"text": [
"'an integer padded with spaces: 42'"
]
}
],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"an integer padded on the right: %-10d\" % 42"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 44,
"text": [
"'an integer padded on the right: 42 '"
]
}
],
"prompt_number": 44
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"an integer padded with zeros: %010d\" % 42"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 45,
"text": [
"'an integer padded with zeros: 0000000042'"
]
}
],
"prompt_number": 45
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"we can also name our arguments: %(value)d\" % dict(value=3)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 46,
"text": [
"'we can also name our arguments: 3'"
]
}
],
"prompt_number": 46
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"Escape the percent sign with an extra symbol: the %d%%\" % 99"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 47,
"text": [
"'Escape the percent sign with an extra symbol: the 99%'"
]
}
],
"prompt_number": 47
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Read more about formats in the [Python docs](http://docs.python.org/release/2.7.2/library/stdtypes.html#string-formatting-operations)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Formatting: the new way\n",
"\n",
"New-style string formatting uses curly braces ``{}`` to contain the formats, which can be referenced by argument number and name:\n",
"\n",
" \"{0} {name}\".format(first, name=second)\""
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"{}{}\".format(\"ABC\", 123)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 48,
"text": [
"'ABC123'"
]
}
],
"prompt_number": 48
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"{0}{1}\".format(\"ABC\", 123)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 49,
"text": [
"'ABC123'"
]
}
],
"prompt_number": 49
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"{0}{0}\".format(\"ABC\", 123)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 50,
"text": [
"'ABCABC'"
]
}
],
"prompt_number": 50
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"{1}{0}\".format(\"ABC\", 123)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 51,
"text": [
"'123ABC'"
]
}
],
"prompt_number": 51
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Formatting comes after the ``:``"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"(\"%.2f\" % 3.14159) == \"{:.2f}\".format(3.14159)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 52,
"text": [
"True"
]
}
],
"prompt_number": 52
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"{0:d} is an integer; {1:.3f} is a float\".format(42, pi)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 53,
"text": [
"'42 is an integer; 3.142 is a float'"
]
}
],
"prompt_number": 53
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\"{the_answer:010d} is an integer; {pi:.5g} is a float\".format(the_answer=42,\n",
" pi=pi)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 54,
"text": [
"'0000000042 is an integer; 3.1416 is a float'"
]
}
],
"prompt_number": 54
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'{desire} to {place}'.format(desire='Fly me',\n",
" place='The Moon')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 55,
"text": [
"'Fly me to The Moon'"
]
}
],
"prompt_number": 55
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# using a pre-defined dictionary\n",
"f = {\"desire\": \"Won't you take me\",\n",
" \"place\": \"funky town?\"}\n",
"\n",
"'{desire} to {place}'.format(**f)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 56,
"text": [
"\"Won't you take me to funky town?\""
]
}
],
"prompt_number": 56
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# format also supports binary numbers\n",
"\"int: {0:d}; hex: {0:x}; oct: {0:o}; bin: {0:b}\".format(42)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 57,
"text": [
"'int: 42; hex: 2a; oct: 52; bin: 101010'"
]
}
],
"prompt_number": 57
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## File I/O\n",
"\n",
"Let's create a file for us to read:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%file inout.dat\n",
"Here is a nice file\n",
"with a couple lines of text\n",
"it is a haiku"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Writing inout.dat\n"
]
}
],
"prompt_number": 58
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"f = open('inout.dat')\n",
"print(f.read())\n",
"f.close()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Here is a nice file\n",
"with a couple lines of text\n",
"it is a haiku\n"
]
}
],
"prompt_number": 60
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"f = open('inout.dat')\n",
"print(f.readlines())\n",
"f.close()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"['Here is a nice file\\n', 'with a couple lines of text\\n', 'it is a haiku']\n"
]
}
],
"prompt_number": 62
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for line in open('inout.dat'):\n",
" print(line.split())"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"['Here', 'is', 'a', 'nice', 'file']\n",
"['with', 'a', 'couple', 'lines', 'of', 'text']\n",
"['it', 'is', 'a', 'haiku']\n"
]
}
],
"prompt_number": 64
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# write() is the opposite of read()\n",
"contents = open('inout.dat').read()\n",
"out = open('my_output.dat', 'w')\n",
"out.write(contents.replace(' ', '_'))\n",
"out.close()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"prompt_number": 65
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!cat my_output.dat"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Here_is_a_nice_file\r\n",
"with_a_couple_lines_of_text\r\n",
"it_is_a_haiku"
]
}
],
"prompt_number": 66
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# writelines() is the opposite of readlines()\n",
"lines = open('inout.dat').readlines()\n",
"out = open('my_output.dat', 'w')\n",
"out.writelines(lines)\n",
"out.close()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"prompt_number": 67
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!cat my_output.dat"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Here is a nice file\r\n",
"with a couple lines of text\r\n",
"it is a haiku"
]
}
],
"prompt_number": 68
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Breakout: clearing up some output\n",
"\n",
"Here is some code that creates a comma-delimited file of numbers with random precision, leading spaces, and formatting:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Don't modify this: it simply writes the example file\n",
"f = open('messy_data.dat', 'w')\n",
"import random\n",
"for i in range(100):\n",
" for j in range(5):\n",
" f.write(' ' * random.randint(0, 6))\n",
" f.write('%0*.*g' % (random.randint(8, 12),\n",
" random.randint(5, 10),\n",
" 100 * random.random()))\n",
" if j != 4:\n",
" f.write(',')\n",
" f.write('\\n')\n",
"f.close()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"prompt_number": 69
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Look at the first four lines of the file:\n",
"!head -4 messy_data.dat"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 069.40604687, 0094.5912, 96.79042884, 0000055.655, 0023.7310709\r\n",
" 10.52260323, 000032.757,00033.982631, 0090.194719, 43.57646106\r\n",
" 040.527913, 00065.72179, 000086.8327,00011.0367,99.36526435\r\n",
" 00000074.411, 3.816226122, 00047.43759, 000079.62696, 040.8001\r\n"
]
}
],
"prompt_number": 70
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"**Your task:** Write a program that reads in the contents of ``\"messy_data.dat\"`` and extracts the numbers from each line, using the string manipulations we used above (remember that ``float()`` will convert a suitable string to a floating-point number). \n",
"\n",
"Next write out a new file named ``\"clean_data.dat\"``. The new file should contain the same data as the old file, but with uniform formatting and aligned columns."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# your solution here\n",
"\n"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"prompt_number": 71
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### The numpy solution\n",
"\n",
"What you did above with text wrangling, ``numpy`` can do much more easily:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"data = np.loadtxt(\"messy_data.dat\", delimiter=',')\n",
"np.savetxt(\"clean_data.dat\", data,\n",
" delimiter=',', fmt=\"%8.4f\")"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"prompt_number": 72
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!head -5 clean_data.dat"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 69.4060, 94.5912, 96.7904, 55.6550, 23.7311\r\n",
" 10.5226, 32.7570, 33.9826, 90.1947, 43.5765\r\n",
" 40.5279, 65.7218, 86.8327, 11.0367, 99.3653\r\n",
" 74.4110, 3.8162, 47.4376, 79.6270, 40.8001\r\n",
" 77.2510, 79.3929, 36.7943, 71.0619, 74.8516\r\n"
]
}
],
"prompt_number": 73
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Still, text manipulation is a very good skill to have under your belt!"
]
}
],
"metadata": {}
}
]
}