{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# collections 模块:更多数据结构" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import collections" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 计数器" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以使用 `Counter(seq)` 对序列中出现的元素个数进行统计。\n", "\n", "例如,我们可以统计一段文本中出现的单词及其出现的次数:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Counter({'two': 2, 'one': 2, 'from': 1, 'i': 1, 'tree': 1, 'three': 1, 'china': 1, 'come': 1})\n" ] } ], "source": [ "from string import punctuation\n", "\n", "sentence = \"One, two, three, one, two, tree, I come from China.\"\n", "\n", "words_count = collections.Counter(sentence.translate(None, punctuation).lower().split())\n", "\n", "print words_count" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 双端队列" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "双端队列支持从队头队尾出入队:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n", "9 8 7 6 5 4 3 2 1 0\n", "deque([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])\n", "9 8 7 6 5 4 3 2 1 0\n" ] } ], "source": [ "dq = collections.deque()\n", "\n", "for i in xrange(10):\n", " dq.append(i)\n", " \n", "print dq\n", "\n", "for i in xrange(10):\n", " print dq.pop(), \n", "\n", "print \n", "\n", "for i in xrange(10):\n", " dq.appendleft(i)\n", " \n", "print dq\n", "\n", "for i in xrange(10):\n", " print dq.popleft()," ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "与列表相比,双端队列在队头的操作更快:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100 loops, best of 3: 598 ns per loop\n", "100 loops, best of 3: 291 ns per loop\n" ] } ], "source": [ "lst = []\n", "dq = collections.deque()\n", "\n", "%timeit -n100 lst.insert(0, 10)\n", "%timeit -n100 dq.appendleft(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 有序字典" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "字典的 `key` 按顺序排列:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Regular Dict:\n", "A 1\n", "C 3\n", "B 2\n", "Ordered Dict:\n", "A 1\n", "B 2\n", "C 3\n" ] } ], "source": [ "items = (\n", " ('A', 1),\n", " ('B', 2),\n", " ('C', 3)\n", ")\n", "\n", "regular_dict = dict(items)\n", "ordered_dict = collections.OrderedDict(items)\n", "\n", "print 'Regular Dict:'\n", "for k, v in regular_dict.items():\n", " print k, v\n", "\n", "print 'Ordered Dict:'\n", "for k, v in ordered_dict.items():\n", " print k, v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 带默认值的字典" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于 `Python` 自带的词典 `d`,当 `key` 不存在的时候,调用 `d[key]` 会报错,但是 `defaultdict` 可以为这样的 `key` 提供一个指定的默认值,我们只需要在定义时提供默认值的类型即可,如果 `key` 不存在返回指定类型的默认值:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n", "0\n", "0.0\n" ] } ], "source": [ "dd = collections.defaultdict(list)\n", "\n", "print dd[\"foo\"]\n", "\n", "dd = collections.defaultdict(int)\n", "\n", "print dd[\"foo\"]\n", "\n", "dd = collections.defaultdict(float)\n", "\n", "print dd[\"foo\"]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }