{ "metadata": { "name": "setdict-storage-learner" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Sets and Dictionaries in Python: Storage (Learner Version)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Objectives\n", "\n", "* Draw a diagram showing how hash tables are implemented, and correctly label the main parts.\n", "* Explain the purpose of a hash function.\n", "* Explain why using mutable values as keys in a hash table can cause problems.\n", "* Correctly identify the error messages Python produces when programs try to put mutable values in hashed data structures.\n", "* Explain the similarities and differences between tuples and lists.\n", "* Explain why using tuples is better than concatenating values when storing multi-part data in hashed data structures." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lesson\n", "\n", "* Can add a string to a set:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "things = set()\n", "things.add(\"lithium\")\n", "print things" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "set(['lithium'])\n" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "* But can't add list:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "things.add([1, 2, 3])" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unhashable type: 'list'", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mthings\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Computer allocates a block of memory to store set\n", "* Then uses a *hash function* to determine where to store each item" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Storing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* If we put a reference to a list into a set, then change the list, the reference is in the wrong place" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Storing\n", "\"Contents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Can only happen with *mutable* values (like lists)\n", "* Can't affect *immutable* values (like numbers and strings)\n", "* Use *tuples* to store multi-part values" ] }, { "cell_type": "code", "collapsed": false, "input": [ "full_name = ('Charles', 'Darwin')\n", "print full_name[0]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Charles\n" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "print len(full_name)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "2\n" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Cannot modify tuple after creation" ] }, { "cell_type": "code", "collapsed": false, "input": [ "full_name[0] = 'Erasmus'" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'tuple' object does not support item assignment", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfull_name\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'Erasmus'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'tuple' object does not support item assignment" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "* So tuples can safely be added to sets" ] }, { "cell_type": "code", "collapsed": false, "input": [ "names = set()\n", "names.add(('Charles', 'Darwin'))\n", "print names" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "set([('Charles', 'Darwin')])\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Key Points\n", "\n", "* Sets are stored in hash tables, which guarantee fast access for arbitrary values.\n", "* The values in sets must be immutable to prevent hash tables misplacing them.\n", "* Use tuples to store multi-part values in sets." ] } ], "metadata": {} } ] }