{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# String manipulation\n", "## Week 3\n", "Through Guttag chapter 7.\n", "\n", "One of the things you'll often do when computing with texts is manipulate strings of characters and words. At the very least, many techniques will require you to split a string into its constituent words, perhaps converting them to lower case and removing punctuation in the process.\n", "\n", "These excercises will help you get a handle on basic string manipulation functions and methods. Guttag's text has a helpful list of (some) string methods on page 67. You should also consult the (fuller) [Python documentation](https://docs.python.org/3.4/library/stdtypes.html#string-methods) of same.\n", "\n", "Note that none of the problems below should require more than 5-10 lines of code.\n", "\n", "### Some text\n", "\n", "First, consider the opening paragraph of *Moby-Dick*:\n", "\n", "> Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people's hats off - then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me.\n", "\n", "### 1. Enter a string\n", "\n", "This is the string with which we'll work for most of the exercise. Enter it as a variable below. **NB**. The string contains quatation marks (well, one apostrophe, anyway). How do you include those in a string?" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Store the string above in a variable. I'd suggest 's', but it's up to you.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. String basics\n", "\n", "Nice. OK, so we have the paragraph in computable form. How many characters does it contain? What's the 100th character?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Counting spaces provides a rough proxy for the number of words in a string. How many spaces are in our string?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. String methods\n", "\n", "Write a function that takes one argument of type `string` and returns a transformation of that string into lowercase with all punctuation marks replaced with spaces. For example, your code should render \"`Moby-Dick is a long book! It's rough.`\" as \"`moby dick is a long book it s rough`\".\n", "\n", "Hint: Use the `lower()` and `replace()` methods. You can define your own list of punctuation marks, or `import` the `string` module to use Python's [built-in list](https://docs.python.org/3.4/library/string.html#string-constants)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Counting words\n", "\n", "OK, grand finale. Write a function that takes a string and returns a dictionary of word counts, keyed to the unique words it contains. For example, \"`the cat is a good cat`\" should return `{'good': 1, 'the': 1, 'cat': 2, 'is': 1, 'a': 1}`, though not necessarily in that order, since key order in dictionaries is arbitrary.\n", "\n", "Hint: use the `split()` method and `try/except` blocks for flow control. Review `dict`s on Guttag p. 67ff. Note that `split()` returns a list over which you can iterate.\n", "\n", "Use the function you just wrote to produce a list of wordcounts in the (lowercase, punctuation-free) *Moby-Dick* string. How many times does the word \"the\" occur?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Bonus\n", "\n", "Ingest the full [plain-text file of *Moby-Dick*](https://raw.githubusercontent.com/wilkens/course-exercises-f15/master/mobydick.txt) from Project Gutenberg. Pre-process it as above, count the words in it, and print a list of the 25 most frequently occurring words (with counts).\n", "\n", "Hint: You'll need to figure out how to sort a dictionary by values. There are several ways to do this, none of them entirely straightforward. I'd suggest looking into the `sorted()` function and the `.get` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }