{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#集合(set)与字典(dict)\" data-toc-modified-id=\"集合(set)与字典(dict)-1\">集合(set)与字典(dict)</a></span></li><li><span><a href=\"#1.-集合(set)\" data-toc-modified-id=\"1.-集合(set)-2\">1. 集合(set)</a></span><ul class=\"toc-item\"><li><span><a href=\"#1.1-集合概述\" data-toc-modified-id=\"1.1-集合概述-2.1\">1.1 集合概述</a></span></li><li><span><a href=\"#1.2-集合常见用法\" data-toc-modified-id=\"1.2-集合常见用法-2.2\">1.2 集合常见用法</a></span></li><li><span><a href=\"#1.3-集合的创建与遍历\" data-toc-modified-id=\"1.3-集合的创建与遍历-2.3\">1.3 集合的创建与遍历</a></span><ul class=\"toc-item\"><li><span><a href=\"#1.3.1-集合的创建\" data-toc-modified-id=\"1.3.1-集合的创建-2.3.1\">1.3.1 集合的创建</a></span></li><li><span><a href=\"#1.3.2-集合的遍历\" data-toc-modified-id=\"1.3.2-集合的遍历-2.3.2\">1.3.2 集合的遍历</a></span></li></ul></li><li><span><a href=\"#1.4-集合常用方法\" data-toc-modified-id=\"1.4-集合常用方法-2.4\">1.4 集合常用方法</a></span></li><li><span><a href=\"#1.5-集合的数学运算\" data-toc-modified-id=\"1.5-集合的数学运算-2.5\">1.5 集合的数学运算</a></span></li><li><span><a href=\"#1.6-集合的应用\" data-toc-modified-id=\"1.6-集合的应用-2.6\">1.6 集合的应用</a></span><ul class=\"toc-item\"><li><span><a href=\"#1.6.1-英文词汇统计\" data-toc-modified-id=\"1.6.1-英文词汇统计-2.6.1\">1.6.1 英文词汇统计</a></span></li><li><span><a href=\"#1.6.2-任务随机分配\" data-toc-modified-id=\"1.6.2-任务随机分配-2.6.2\">1.6.2 任务随机分配</a></span></li><li><span><a href=\"#1.6.3-学生选修统计\" data-toc-modified-id=\"1.6.3-学生选修统计-2.6.3\">1.6.3 学生选修统计</a></span></li></ul></li></ul></li><li><span><a href=\"#2.-字典(dict)\" data-toc-modified-id=\"2.-字典(dict)-3\">2. 字典(dict)</a></span><ul class=\"toc-item\"><li><span><a href=\"#2.1-字典概述\" data-toc-modified-id=\"2.1-字典概述-3.1\">2.1 字典概述</a></span></li><li><span><a href=\"#2.2-字典常见用法\" data-toc-modified-id=\"2.2-字典常见用法-3.2\">2.2 字典常见用法</a></span></li><li><span><a href=\"#2.3-字典的创建与遍历\" data-toc-modified-id=\"2.3-字典的创建与遍历-3.3\">2.3 字典的创建与遍历</a></span><ul class=\"toc-item\"><li><span><a href=\"#2.3.1-字典的创建\" data-toc-modified-id=\"2.3.1-字典的创建-3.3.1\">2.3.1 字典的创建</a></span></li><li><span><a href=\"#2.3.2-字典的遍历\" data-toc-modified-id=\"2.3.2-字典的遍历-3.3.2\">2.3.2 字典的遍历</a></span></li></ul></li><li><span><a href=\"#2.4-字典常用方法\" data-toc-modified-id=\"2.4-字典常用方法-3.4\">2.4 字典常用方法</a></span><ul class=\"toc-item\"><li><span><a href=\"#2.4.1-字典中的查找、新增操作\" data-toc-modified-id=\"2.4.1-字典中的查找、新增操作-3.4.1\">2.4.1 字典中的查找、新增操作</a></span></li><li><span><a href=\"#2.4.2-字典中的更新操作\" data-toc-modified-id=\"2.4.2-字典中的更新操作-3.4.2\">2.4.2 字典中的更新操作</a></span></li><li><span><a href=\"#2.4.3-字典中的删除操作\" data-toc-modified-id=\"2.4.3-字典中的删除操作-3.4.3\">2.4.3 字典中的删除操作</a></span></li><li><span><a href=\"#2.4.4-字典的视图对象\" data-toc-modified-id=\"2.4.4-字典的视图对象-3.4.4\">2.4.4 字典的视图对象</a></span></li><li><span><a href=\"#2.4.5-字典的其他方法\" data-toc-modified-id=\"2.4.5-字典的其他方法-3.4.5\">2.4.5 字典的其他方法</a></span></li></ul></li><li><span><a href=\"#2.5-字典应用案例\" data-toc-modified-id=\"2.5-字典应用案例-3.5\">2.5 字典应用案例</a></span><ul class=\"toc-item\"><li><span><a href=\"#2.5.1-英文单词统计\" data-toc-modified-id=\"2.5.1-英文单词统计-3.5.1\">2.5.1 英文单词统计</a></span></li><li><span><a href=\"#2.5.2-行政区划查询\" data-toc-modified-id=\"2.5.2-行政区划查询-3.5.2\">2.5.2 行政区划查询</a></span></li></ul></li></ul></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 集合(set)与字典(dict)\n",
    "**作者：** 郑如滨\n",
    "\n",
    "**主要参考资料:** [Python 3.7.5官方中文文档](https://docs.python.org/zh-cn/3.7/) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**列表**和**元组**均是一种容器对象，并且其中的元素均为**有序**排列。  \n",
    "其他**容器对象**：集合(set)、字典(dict)。他们有一些更加特别的功能。\n",
    "\n",
    "**集合:**  \n",
    "集合内的元素\n",
    "- **不可重复**  \n",
    "- **无序排列**\n",
    "    \n",
    "**字典:**  \n",
    "- 存储了**键-值**对。  \n",
    "- 通过键**快速查找**值。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. 集合(set)\n",
    "## 1.1 集合概述\n",
    "包含0个或多个数据元素的**无序**组合，且元素**不可重复**。\n",
    "\n",
    "集合内元素用**{}**括起来。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'lasa', 361000, 'xiamen', 'beijing', 'wuxi'}\n"
     ]
    }
   ],
   "source": [
    "city = {\"beijing\", \"xiamen\",  361000, \"lasa\", \"wuxi\"}\n",
    "print(city)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "集合内元素无序，所以打印出来的元素顺序与创建时不一致。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**集合主要用途:**\n",
    "1. 成员检测\n",
    "2. 消除重复元素\n",
    "3. 集合的并集、交集、叉集等数学运算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{361000, 'xiamen', 'beijing'} 集合内元素个数： 3\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "city = {\"beijing\", \"xiamen\", 361000, \"xiamen\", 361000}\n",
    "print(city, \"集合内元素个数：\", len(city))\n",
    "print(\"shang\" in city)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.2 集合常见用法\n",
    "使用列表(list)创建集合，并去除了重复元素。  \n",
    "然后使用`in`与`not in`判断指定值是否在集合中。  \n",
    "最后使用`remove`方法删除set中指定元素。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "li zhang zhao chen wang \n",
      "namelist中元素个数为6\n",
      "nameset 中元素个数为5,内容为{'li', 'zhang', 'zhao', 'chen', 'wang'}\n",
      "'zhang'  在nameset中吗? True\n",
      "'zhang'不在nameset中吗? False\n",
      "'张'在nameset中吗? False\n",
      "删除'zhang'后nameset的长度4，内容{'li', 'zhao', 'chen', 'wang'}\n"
     ]
    }
   ],
   "source": [
    "namelist = [\"zhang\",\"wang\",\"zhao\",\"li\",\"wang\",\"chen\"]\n",
    "nameset = set(namelist)\n",
    "for e in nameset:       #遍历nameset\n",
    "    print(e, end = \" \")\n",
    "print()\n",
    "print(\"namelist中元素个数为{}\".format(len(namelist)))\n",
    "print(\"nameset 中元素个数为{},内容为{}\".format(len(nameset),nameset))\n",
    "print(\"'zhang'  在nameset中吗?\", \"zhang\" in nameset)\n",
    "print(\"'zhang'不在nameset中吗?\", \"zhang\" not in nameset)\n",
    "print(\"'张'在nameset中吗?\", \"王\" in nameset)\n",
    "nameset.remove(\"zhang\")\n",
    "print(\"删除'zhang'后nameset的长度{}，内容{}\".format(len(nameset),nameset))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**集合的并交叉运算**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "并集为: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}\n",
      "交集为: {1, 10, 5, 7}\n",
      "在xset但不在yset的元素: {8, 2, 3, 6}\n",
      "在xset或yset，但不在他们的较集中的元素: {2, 3, 4, 6, 8, 9}\n"
     ]
    }
   ],
   "source": [
    "xset = set([1,2,3,5,6,7,8,10]) #创建set\n",
    "yset = set([1,4,5,7,9,10])\n",
    "print(\"并集为:\",xset|yset)   \n",
    "print(\"交集为:\",xset & yset)\n",
    "print(\"在xset但不在yset的元素:\", xset - yset)\n",
    "print(\"在xset或yset，但不在他们的较集中的元素:\", xset^yset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.3 集合的创建与遍历\n",
    "### 1.3.1 集合的创建\n",
    "可使用{}或set函数进行创建。\n",
    "\n",
    "**1.使用{}可以直接创建**  \n",
    "元素之间用**,**分隔。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2, 3} {1, 'b', 2, 'a'}\n"
     ]
    }
   ],
   "source": [
    "xset = {3, 1, 2, 1, 2, 3}\n",
    "yset = {'a','b', 1, 2, 'a'}\n",
    "print(xset, yset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**空集合的创建：**\n",
    "\n",
    "不能使用**{}**代表空集合，因为其代表空字典。  \n",
    "需使用set()函数创建空集合。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'dict'>\n"
     ]
    }
   ],
   "source": [
    "print(type({}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2.使用set()函数创建** \n",
    "\n",
    "该函数接受任何可迭代(iterable)对象(如，str, list, tuple, range等)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'b', 'a', 'c'} {1, 2, 3}\n",
      "{'y', 'z', 'x'} {0, 1, 2, 3, 4}\n",
      "<class 'set'> 0\n"
     ]
    }
   ],
   "source": [
    "xset = set(\"abca\")\n",
    "yset = set([1, 2, 3])\n",
    "print(xset, yset) # {'b', 'c', 'a'} {1, 2, 3}\n",
    "zset = set((\"x\", \"y\", \"z\"))\n",
    "rset = set(range(5))\n",
    "print(zset, rset) # {'z', 'x', 'y'} {0, 1, 2, 3, 4}\n",
    "emptyset = set() # 创建空集合\n",
    "print(type(emptyset), len(emptyset))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**小技巧：**如何判定一个对象是否是可迭代的？ \n",
    "\n",
    "使用**collections**模块中的**Iterable**类型判断"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "from collections import Iterable\n",
    "print(isinstance(\"abc\", Iterable)) \n",
    "print(isinstance([1, 2, 3], Iterable))\n",
    "print(isinstance((1, 2), Iterable))\n",
    "print(isinstance({1, 2, 3}, Iterable))\n",
    "print(isinstance((range(5)), Iterable))\n",
    "# 返回True, zip对象可迭代\n",
    "print(isinstance(zip(['one', 'two', 'three'], [1, 2, 3]), Iterable)) \n",
    "# 返回True, 字典也是可迭代的\n",
    "print(isinstance({\"思明区\":361000}, Iterable)) \n",
    "\n",
    "# 返回False, 数值类型不可迭代\n",
    "print(isinstance(3.14, Iterable))   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3.使用集合推导式创建**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'set'> {0, 64, 4, 36, 16}\n"
     ]
    }
   ],
   "source": [
    "xset = {x**2 for x in range(10) if x % 2 == 0}\n",
    "print(type(xset), xset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3.2 集合的遍历"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "集合是可迭代对象，因此可直接使用for循环进行遍历。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "li zhao wang zhang chen "
     ]
    }
   ],
   "source": [
    "nameset = set([\"zhang\", \"wang\", \"zhao\", \"li\", \"wang\", \"chen\"])\n",
    "for e in nameset:       #遍历nameset\n",
    "    print(e, end = \" \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用**迭代器**进行遍历。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "li zhao wang zhang chen "
     ]
    }
   ],
   "source": [
    "nameIterator = iter(nameset)\n",
    "for i in range(len(nameset)):\n",
    "    print(next(nameIterator), end = \" \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**注意：**\n",
    "\n",
    "集合中的元素是无序排列的。因此将上述遍历代码存于文件，并多次运行，返回的结果可能不一致。如下图所示：  \n",
    "![set内元素无序排列](set内元素无序排列.png)\n",
    "\n",
    "**根本原因：**\n",
    "\n",
    "集合内部插入添加元素时使用了随机种子。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.4 集合常用方法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以将集合的方法按照**增、删、查、其他**进行分类。如下表所示。  \n",
    "\n",
    "**注:** s为所要操作的集合： \n",
    "<escape>\n",
    "<table>\n",
    "  <tr>\n",
    "    <th>类别</th>\n",
    "    <th>方法</th>\n",
    "    <th>说明</th>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"2\">增</td>\n",
    "    <td>s.add(x)</td>\n",
    "    <td>如果数据项x不在集合s中，将x加入s</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>s.update(xset)</td>\n",
    "    <td>在s中添加来自xset中的元素</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"4\">删</td>\n",
    "    <td>s.remove(x)</td>\n",
    "    <td>如x在集合s中，则移除；否则，引发<font color=\"#FF0000\">KeyError异常</font></td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>s.discard(x)</td>\n",
    "    <td>如果x在集合s中，移除该元素；如果x不存在，不报错</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>s.pop()</td>\n",
    "      <td>随机返回集合s中的一个元素，如集合为空，引发<font color=\"#FF0000\">KeyError异常</font></td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>s.clear()</td>\n",
    "    <td>移除s中所有数据项</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"2\">查</td>\n",
    "    <td>x in S</td>\n",
    "    <td>如果x是S的元素，返回True，否则返回False</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>x not in S</td>\n",
    "    <td>如果x不是S的元素，返回True，否则返回False</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"3\">其他</td>\n",
    "    <td>len(s)</td>\n",
    "    <td>返回集合s元素个数</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>s.copy()</td>\n",
    "    <td>返回集合S的一个浅拷贝</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>isdisjoint(), issubset(), issuperset()</td>\n",
    "    <td>判断两个集合之间的关系的方法</td>\n",
    "  </tr>\n",
    "</table>\n",
    "</escape>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2, 3, 4}\n",
      "1 2 3 \n",
      "True False\n"
     ]
    }
   ],
   "source": [
    "s = set()\n",
    "s.add(1)\n",
    "s.update({2, 3, 4})\n",
    "print(s)\n",
    "if 4 in s:   #因为remove方法可能会引发KeyError，所以应先判断\n",
    "    s.remove(4)\n",
    "s.discard(5) #5不存在，也不会报错\n",
    "while len(s) != 0: #因s为空时pop会引发KeyError，所以需判断\n",
    "    print(s.pop(), end = \" \")\n",
    "print()\n",
    "s = {1, 2, 3, 4, 5}\n",
    "x = s.copy()\n",
    "print(s == x, id(s) == id(x)) # 输出True, Flase。x与s内容相同，但不是同一个对象"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**判断两个集合之间的关系的方法与运算符：** \n",
    "\n",
    "方法或运算符|说明\n",
    "-|-\n",
    "s.isdisjoint(xset)|如果集合s没有与xset有共同元素返回`True`\n",
    "s.issubset(xset)|如果s是xset的子集返回`True`\n",
    "s <= xset|判断s是否xset的子集\n",
    "s < xset|判断s是否xset的**真子集**\n",
    "s.issuperset(xset)|判断s是否是xset的超集\n",
    "s >= xset|判断xset是否是s的子集\n",
    "s > xset|判断xset是否是s的**真子集**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "True\n",
      "True False\n",
      "True False\n"
     ]
    }
   ],
   "source": [
    "xset = {1, 2, 3}\n",
    "yset = {1, 2, 3, 4, 5}\n",
    "zset = {4, 5, 6}\n",
    "print(xset.isdisjoint(zset))    #True\n",
    "print(xset.issubset(yset))      #True\n",
    "print(yset.issuperset(xset))    #True\n",
    "print(xset<=yset, xset<{1,2,3}) #True, False\n",
    "print(yset>=xset, xset>{1,2,3}) #True, False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.5 集合的数学运算\n",
    "集合支持并集(`|`)、交集(`&`)、差集(`-`)、对称差集(`^`)等数学运算。  \n",
    "集中集合运算的含义如下图所示：\n",
    "\n",
    "![集合的数学运算](集合的数学运算.png)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "个数: 10 , A|B: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}\n",
      "A&B {1, 10, 3, 9}\n",
      "A-B {5, 6, 7}\n",
      "A^B {2, 4, 5, 6, 7, 8}\n"
     ]
    }
   ],
   "source": [
    "A = {1, 3, 5, 6, 7, 9, 10}\n",
    "B = {1, 2, 3, 4, 8, 9, 10}\n",
    "print(\"个数:\", len(A|B), \", A|B:\", A|B)\n",
    "print(\"A&B\", A&B) \n",
    "print(\"A-B\", A-B)\n",
    "print(\"A^B\", A^B)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "并集(`|`)、交集(`&`)、差集(`-`)、对称差集(`^`)是运算符版本的集合运算。  \n",
    "其对对应的非运算符版本的运算方法分别是:  \n",
    "`union, intersection, difference, symmetric_difference`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}\n",
      "{1, 10, 3, 9}\n",
      "{5, 6, 7}\n",
      "{2, 4, 5, 6, 7, 8}\n"
     ]
    }
   ],
   "source": [
    "A = {1, 3, 5, 6, 7, 9, 10}\n",
    "B = {1, 2, 3, 4, 8, 9, 10}\n",
    "print(A.union(B))\n",
    "print(A.intersection(B))\n",
    "print(A.difference(B))\n",
    "print(A.symmetric_difference(B))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**注意：**不管是运算符或者非运算符版本的集合运算都不会改变原来的集合。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.6 集合的应用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6.1 英文词汇统计\n",
    "给定一段英文，将其中的单词抽取出来。抽取之前先进行预处理：将其中的所有标点符号替换为一个空格。  \n",
    "预处理后，统计这段英文：\n",
    "1. 有多少个单词。\n",
    "2. 有多少不重复的单词。\n",
    "3. 将这些不重复的单词按升序输出。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "原文单词个数： 45\n",
      "不重复的单词个数： 36\n",
      "他们是： ['again', 'alice', 'all', 'and', 'been', 'but', 'door', 'doors', 'down', 'ever', 'every', 'get', 'had', 'hall', 'how', 'locked', 'middle', 'one', 'other', 'out', 'round', 'sadly', 'she', 'side', 'the', 'there', 'they', 'to', 'trying', 'up', 'walked', 'was', 'way', 'were', 'when', 'wondering']\n"
     ]
    }
   ],
   "source": [
    "article = '''\n",
    "There were doors all round the hall, but they were all locked;\n",
    "and when Alice had been all the way down one side and up the\n",
    "other, trying every door, ALICE walked sadly down the middle,\n",
    "wondering how she was ever to get out again.\n",
    "'''\n",
    "article = article.lower()\n",
    "for e in \",;.\":\n",
    "   article = article.replace(e, \" \")\n",
    "words = article.split()\n",
    "print(\"原文单词个数：\", len(words))\n",
    "wordset = set(words)\n",
    "print(\"不重复的单词个数：\", len(wordset))\n",
    "print(\"他们是：\", sorted(wordset))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6.2 任务随机分配\n",
    "假设有A、B、C、D**四项任务**需要完成。现在有一份成员名单，名单上有n个成员，**每个成员都可以做这4项任务**。  \n",
    "希望编写一个算法，每回**从名单随机抽取**一个成员分配某个任务，并将该成员标注为已分配，下回抽取就不应再  \n",
    "抽到该成员。且分配过程中**不能删除名单中的成员**。 希望整个分配过程尽量保证每个任务分配到的人数尽量均衡。  \n",
    "请编写程序打印出每个任务分配到的成员情况。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "任务A分配了10个人，他们是[32, 0, 28, 24, 36, 16, 20, 4, 8, 12]\n",
      "任务B分配了10个人，他们是[29, 37, 17, 21, 9, 25, 5, 33, 13, 1]\n",
      "任务C分配了10个人，他们是[30, 10, 38, 34, 18, 22, 6, 14, 2, 26]\n",
      "任务D分配了9个人，他们是[31, 3, 27, 15, 35, 7, 23, 19, 11]\n",
      "没有交集\n"
     ]
    }
   ],
   "source": [
    "import random\n",
    "tasks = [\"A\", \"B\", \"C\", \"D\"]\n",
    "members = list(range(39))        #假设有39人\n",
    "result = [[] for i in range(4)]  #列表中存放4个成员队列 \n",
    "mset = set()                     #用来存放已分配任务的成员\n",
    "for i in range(len(members)):\n",
    "    while(True): #从名单中随机选取一个成员，可能选到已分配过的成员\n",
    "        i = random.randint(0, len(members)-1)\n",
    "        p = members[i]\n",
    "        if p not in mset:     #该成员未分配任务，因此可进行分配\n",
    "            break\n",
    "    mset.add(p)\n",
    "    result[i%4].append(p)     #按照ABCDA..顺序依次选取任务分配\n",
    "    \n",
    "for i in range(len(result)):\n",
    "    print(\"任务{}分配了{}个人，他们是{}\".\\\n",
    "          format(tasks[i],len(result[i]), result[i]))\n",
    "\n",
    "def isJoint(result):\n",
    "    '''\n",
    "    判断result中几个队列是否有交集\n",
    "    '''\n",
    "    joint = True  #假设他们有交集\n",
    "    for i in range(len(result)):\n",
    "        for j in range(i+1,len(result)):\n",
    "            joint = not set(result[i]).isdisjoint(result[j]) \n",
    "            if joint == True:\n",
    "                return True\n",
    "    return False\n",
    "\n",
    "print(\"有交集\" if isJoint(result) else \"没有交集\")\n",
    "'''\n",
    "joint = True  #假设他们有交集\n",
    "for i in range(len(result)):\n",
    "    for j in range(i+1,len(result)):\n",
    "        joint = not set(result[i]).isdisjoint(result[j]) \n",
    "        if joint == True:\n",
    "            break\n",
    "    if joint == True:\n",
    "        break\n",
    "print(\"有交集\" if joint else \"没有交集\")\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6.3 学生选修统计\n",
    "\n",
    "有A、B两个班。有3门课程供两个班的学生进行选修。尝试编写程序，针对选修情况进行统计。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "同时选修3门课程的学生： {1, 3}\n",
      "A班中没有进行任何选修的: {4}\n",
      "两个班未进行任何选修的： {10, 4}\n",
      "有且仅有选修1门的学生: {8, 5, 6}\n",
      "有且仅有选修2门的学生: {9, 2, 7}\n",
      "1: os java badminton\n",
      "2: java badminton\n",
      "3: os java badminton\n",
      "4:\n",
      "5: os\n"
     ]
    }
   ],
   "source": [
    "classA = set([1, 2, 3, 4, 5])\n",
    "classB = set([6, 7, 8, 9, 10])\n",
    "os = set([1, 3, 5, 6, 9])       #选修操作系统\n",
    "java = set([1, 2, 3, 7, 8, 9])   #选修java\n",
    "badminton = set([1, 2, 3, 7])  #选修羽毛球\n",
    "threeset = os & java & badminton\n",
    "print(\"同时选修3门课程的学生：\", threeset)\n",
    "print(\"A班中没有进行任何选修的:\", classA - (os | java | badminton))\n",
    "print(\"两个班未进行任何选修的：\", (classA|classB) - (os | java | badminton))\n",
    "oneset = (os|java|badminton)-(os&java)-(os&badminton)-(java&badminton)\n",
    "print(\"有且仅有选修1门的学生:\", oneset)\n",
    "print(\"有且仅有选修2门的学生:\", (os|java|badminton) - oneset - threeset)\n",
    "#打印A班每个人的选修情况\n",
    "for e in classA:\n",
    "    result = str(e)+\":\"\n",
    "    if e in os:\n",
    "        result += \" os\"\n",
    "    if e in java:\n",
    "        result += \" java\"\n",
    "    if e in badminton:\n",
    "        result += \" badminton\"\n",
    "    print(result)\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. 字典(dict)\n",
    "## 2.1 字典概述\n",
    "我们使用字典查询某个字时，通常是先通过拼音、偏旁找到要查的**字**。  \n",
    "然后根据该字所在页码，找到该字的**详细释义**。  \n",
    "\n",
    "Python中的字典(dict)：通过键(key)快速找到对应的值(value)。  \n",
    "类比字典，**“字”**就相当于键，**“详细释义”**相当于值。  \n",
    "\n",
    "字典特点与用法：\n",
    "- **主要用法：**通过键可**快速**找到值\n",
    "- 存储了**键:值对**(key:value)组合\n",
    "- 键值必须唯一，且为不可变类型（通常为数、字符串）\n",
    "- 字典中的键对会保持插入顺序（Python 3.7版本）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 字典常见用法\n",
    "**包含：**创建、查找、键成员判断、删除。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "zhang对应: 张\n",
      "字典内容: {'zhang': '章', 'wang': '王', 'zhao': '赵', 'li': '李', 'chen': '陈'} 键个数: 5\n",
      "删除后'zhao'后字典内容: {'zhang': '章', 'wang': '王', 'li': '李', 'chen': '陈'} 键个数: 4\n",
      "字典内容: {'zhang': '章', 'wang': '王', 'li': '李', 'chen': '晨'}\n"
     ]
    }
   ],
   "source": [
    "#创建字典\n",
    "namedict = {\"zhang\":\"张\", \"wang\":\"王\", \"zhao\":\"赵\", \"li\":\"李\", \"chen\":\"陈\"}\n",
    "#字典查找\n",
    "print(\"zhang对应:\", namedict[\"zhang\"])\n",
    "namedict[\"zhang\"] = \"章\" #更新字典\n",
    "print(\"字典内容:\", namedict, \"键个数:\", len(namedict))\n",
    "#删除键\n",
    "del namedict[\"zhao\"]\n",
    "print(\"删除后'zhao'后字典内容:\", namedict, \"键个数:\", len(namedict))\n",
    "#成员判断\n",
    "name = \"chen\"\n",
    "if name in namedict:\n",
    "    namedict[name] = \"晨\"\n",
    "print(\"字典内容:\", namedict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.3 字典的创建与遍历\n",
    "### 2.3.1 字典的创建\n",
    "可使用dict()函数或**{}**来进行创建。\n",
    "\n",
    "**1. 使用{}直接创建**  \n",
    "\n",
    "使用{}将一系列**键值对**括起来。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'北京': 100000, '厦门': 361000, '上海': 200000, '深圳': 518000, '杭州': 310000}\n",
      "键个数: 5 内容: ['北京', '厦门', '上海', '深圳', '杭州']\n"
     ]
    }
   ],
   "source": [
    "# 创建空字典\n",
    "emptyDict = {}\n",
    "# 创建有内容的字典\n",
    "cityDict = { \"北京\":100000, \"厦门\":111111,\"上海\":200000, \"深圳\":518000, \"杭州\":310000, \"厦门\":361000}\n",
    "print(cityDict)        #后出现的key对应的值会覆盖先出现的相同key对应的值\n",
    "print(\"键个数:\",len(cityDict), \"内容:\",list(cityDict))  #显示字典的键列表，按插入顺序排列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2. 使用dict()函数创建**\n",
    "\n",
    "第4行代码：判定两个字典是否相同是根据他们的内容判断。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'一': 1, '二': 2, '三': 3} {'二': 2, '三': 3, '一': 1}\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "dict1 = dict(一 = 1, 二 = 2, 三 = 3) #这里的键“一”为字符串\n",
    "dict2 = {\"二\":2, \"三\":3, \"一\":1}\n",
    "print(dict1, dict2)\n",
    "print(dict1 == dict2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "dict函数接受可迭代(iterable)对象作为参数。  \n",
    "但该对象中的每一项必须为包含两个元素。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "{'北京': 100000, '厦门': 361000, '上海': 200000, '深圳': 518000, '杭州': 310000}\n",
      "{'a': 97, 'b': 98, 'c': 99, 'd': 100, 'e': 101}\n"
     ]
    }
   ],
   "source": [
    "dict1 = dict(一 = 1, 二 = 2, 三 = 3)\n",
    "dict2 = dict([('一',1), ('二',2), ('三',3)])\n",
    "dict3 = dict([['一',1], ['二',2], ['三',3]])\n",
    "print(dict1 == dict2 == dict3)  #上面3个dict都一样\n",
    "dict4 = dict({\"北京\":100000, \"厦门\":111111, \"上海\":200000, \"深圳\":518000, \"杭州\":310000, \"厦门\":361000})\n",
    "print(dict4)  #字典本身也是可迭代对象\n",
    "\n",
    "#以下为使用zip函数创建字典的例子\n",
    "charlist = list(\"abcde\")\n",
    "ordlist = [97, 98, 99, 100, 101]\n",
    "dict5 = dict(zip(charlist, ordlist)) #zip是可迭代对象，将两个可迭代对象绑定起来\n",
    "print(dict5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3. 使用字典推导式创建**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "8*9=72\n"
     ]
    }
   ],
   "source": [
    "#创建一个9*9乘法表，键为元组类型\n",
    "mydict = {(i,j):str(i) + \"*\" + str(j) + \"=\" + str(i*j) for i in range(1, 10) for j in range(1, 10)}\n",
    "print(mydict[(8,9)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**4.使用dict的类方法fromkeys创建**  \n",
    "\n",
    "fromkeys方法可以接收可迭代对象参数作为参数，生成一个新的字典。  \n",
    "该字典的键是列表中的元素，值为None或者自己预先设定的值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['A', 'B', 'C', 'D', 'E']\n",
      "{'A': None, 'B': None, 'C': None, 'D': None, 'E': None}\n",
      "{'A': 0, 'B': 0, 'C': 0, 'D': 0, 'E': 0}\n"
     ]
    }
   ],
   "source": [
    "keyList = [chr(ord('A') + x) for x in range(5)]\n",
    "print(keyList)\n",
    "xdict = dict.fromkeys(keyList)    #键为keyList中的元素，值为None\n",
    "ydict = dict.fromkeys(keyList, 0) #值为0\n",
    "print(xdict)\n",
    "print(ydict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3.2 字典的遍历\n",
    "可以对字典的以下几个方法返回的视图对象进行迭代遍历。\n",
    "- keys()：返回键视图\n",
    "- values()：返回值视图\n",
    "- items()：返回(键, 值)对视图"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "北京:100000\n",
      "厦门:361000\n",
      "上海:200000\n",
      "深圳:518000\n",
      "杭州:310000\n",
      "\n",
      "打印键列表、值列表\n",
      "['北京', '厦门', '上海', '深圳', '杭州']\n",
      "[100000, 361000, 200000, 518000, 310000]\n",
      "\n",
      "遍历值\n",
      "100000 361000 200000 518000 310000 \n",
      "\n",
      "打印items()对象\n",
      "dict_items([('北京', 100000), ('厦门', 361000), ('上海', 200000), ('深圳', 518000), ('杭州', 310000)])\n",
      "\n",
      "将items()转化为列表\n",
      "[('北京', 100000), ('厦门', 361000), ('上海', 200000), ('深圳', 518000), ('杭州', 310000)]\n"
     ]
    }
   ],
   "source": [
    "citys = {\"北京\":100000, \"厦门\":111111, \"上海\":200000,\\\n",
    "         \"深圳\":518000, \"杭州\":310000, \"厦门\":361000}\n",
    "for e in citys:  # 按键遍历。同for e in citys.keys()\n",
    "    print(\"{}:{}\".format(e, citys[e]))\n",
    "    \n",
    "print(\"\\n打印键列表、值列表\")\n",
    "print(list(citys.keys()),list(citys.values()),sep = \"\\n\")\n",
    "\n",
    "print(\"\\n遍历值\")\n",
    "for e in citys.values():  #遍历值\n",
    "    print(e, end = \" \")\n",
    "print()\n",
    "\n",
    "print(\"\\n打印items()对象\")\n",
    "print(citys.items())\n",
    "\n",
    "print(\"\\n将items()转化为列表\")\n",
    "print(list(citys.items()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.4 字典常用方法\n",
    "\n",
    "可以将字典相关的方法按照**查、增、改、删、其他**进行分类。如下表所示：\n",
    "\n",
    "**注:** d为所要操作的集合。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<escape>\n",
    "<table border=\"5\" >\n",
    "  <tr>\n",
    "    <th align=\"left\">类别</th>\n",
    "    <th align=\"left\">方法</th>\n",
    "    <th align=\"left\">说明</th>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"3\">查</td>\n",
    "    <td align=\"left\">d[key]</td> \n",
    "\t<td align=\"left\">根据key返回值。如指定key不存在，引发<font color=\"#FF0000\">KeyError</font>异常</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>d.get(key[,default])</td> <td>在d中查找key对应值，找不到返回None或返回预先设定的default</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>key in d, key not in d</td>\n",
    "    <td>查看key是否在字典d的键集。如果存在，in返回True, not in返回True</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td >增</td>\n",
    "    <td>d[key] = value</td>\n",
    "    <td>如字典d中不存在key，则新增key:value对</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"3\">更新</td>\n",
    "    <td>d[key] = value</td>\n",
    "    <td>如字典d中存在key，则更新该其对应的值为value</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>d.update(items)</td>\n",
    "    <td>使用字典、键值对 items来更新字典</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>d.setdefault(key[,value])</td>\n",
    "    <td>如字典中已有key，则不更新,且返回已有值;如没有key则更新为None或value，且返回None或value</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"4\">删</td>\n",
    "    <td>del d[key]</td>\n",
    "    <td>删除key对应的键值对，如key不存在会引发<font color=\"#FF0000\">KeyError</font>异常</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>d.pop(key[,default])</td>\n",
    "    <td>弹出key对应的键值对。不存在，则返回default或引发<font color=\"#FF0000\">KeyError</font>异常</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>d.popitem()</td>\n",
    "    <td>从字典返回一个键值对元组，并从字典删除该键值对</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>d.clear()</td>\n",
    "    <td>移除字典中所有元素</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"3\">视图</td>\n",
    "    <td>d.items()</td>\n",
    "    <td>返回字典的键值对视图对象。视图对象会动态反映d字典中内容的变化</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>d.keys()</td>\n",
    "    <td>返回字典的键视图对象</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>d.values()</td>\n",
    "    <td>返回字典的值视图对象</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td rowspan=\"3\">其他</td>\n",
    "    <td>d.copy()</td>\n",
    "    <td>对字典d进行浅拷贝</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>==</td>\n",
    "    <td>判断两个字典内容是否相同</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>sorted(d)</td>\n",
    "    <td> 对字典d的键集排序，并返回键列表</td>\n",
    "  </tr>\n",
    "</table>\n",
    "</escape>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4.1 字典中的查找、新增操作\n",
    "\n",
    "d[key]有可能引发KeyError，要谨慎使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "张 张\n",
      "None\n",
      "-1\n",
      "False\n"
     ]
    },
    {
     "ename": "KeyError",
     "evalue": "'qian'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-31-fe34074609d6>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m     10\u001b[0m \u001b[1;32mif\u001b[0m \u001b[1;34m\"cao\"\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mnameDict\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     11\u001b[0m     \u001b[0mnameDict\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"cao\"\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m\"曹\"\u001b[0m \u001b[1;31m#d[key]可能产生异常，所以要先判断\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 12\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnameDict\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"qian\"\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m    \u001b[1;31m#\"qian\"不在字典，会产生异常\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mKeyError\u001b[0m: 'qian'"
     ]
    }
   ],
   "source": [
    "nameDict = {\"zhang\":\"张\", \"wang\":\"王\", \"zhao\":\"赵\",\\\n",
    "            \"li\":\"李\", \"chen\":\"陈\"}\n",
    "'''查找'''\n",
    "# 找到\"zhang\"对的应值。不会引发KeyError。\n",
    "print(nameDict[\"zhang\"], nameDict.get(\"zhang\"))\n",
    "print(nameDict.get(\"cao\"))      #未找到，返回None\n",
    "print(nameDict.get(\"cao\", -1))  #未找到，返回-1\n",
    "print(\"cao\" in nameDict)        #判断\"cao\"是否在nameDict中\n",
    "\n",
    "'''新增'''\n",
    "if \"cao\" not in nameDict:\n",
    "    nameDict[\"cao\"] = \"曹\" #d[key]可能产生异常，所以要先判断\n",
    "print(nameDict[\"qian\"])    #\"qian\"不在字典，会产生异常"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4.2 字典中的更新操作\n",
    "\n",
    "`d.setdefault`方法的主要特点是更新字典的时候同时**有返回值**。     \n",
    "如字典**不存在**该键，返回值为`None`或用户指定的值，同时**更新**字典，  \n",
    "如字典**已存在**该键，返回值为已有的键对应的值，并且**不更新**字典。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'zhang': '章', 'wang': '王', 'zhao': '朝', 'li': '李', 'chen': '陈', 'sun': '吴', 'yan': '闫', 'lu': '鲁', 'guo': '郭', 'geng': '耿'}\n",
      "None None 王 孔\n"
     ]
    }
   ],
   "source": [
    "nameDict = {\"zhang\":\"张\", \"wang\":\"王\", \"zhao\":\"赵\", \"li\":\"李\", \"chen\":\"陈\"}\n",
    "nameDict[\"zhang\"] = \"章\"       #如果zhang不存在，则新增键值对\n",
    "nameDict.update({\"zhao\":\"朝\"}) #更新字典已有键值对\n",
    "nameDict.update({\"sun\":\"吴\"})  #对不存在的键值对，则直接添加新键值对\n",
    "nameDict.update([(\"yan\", \"闫\"),(\"lu\",\"鲁\")]) #使用其他可迭代对象更新\n",
    "nameDict.update(guo=\"郭\", geng=\"耿\")\n",
    "print(nameDict)\n",
    "r1 = nameDict.setdefault(\"xie\")       #字典中无xie，添加\"xie\":none键值对同时返回none\n",
    "r2 = nameDict.setdefault(\"wang\")      #字典中有\"wang\":\"王\"键值对，返回\"王\"，不更新其值  \n",
    "r3 = nameDict.setdefault(\"wang\", \"汪\")#字典中有\"wang\":\"王\"键值对，返回\"王\"，不更新其值\n",
    "r4 = nameDict.setdefault(\"kong\",\"孔\") #字典中无kong，添加\"kong\":\"孔\"键值对同时返回\"孔\"\n",
    "print(r1, r2, r3, r4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**小练习:字符统计**  \n",
    "统计一段英文中每个字符出现的个数（忽略大小写）。并按键排序后输出。  \n",
    "代码中使用了sorted函数，生成了一个新的列表并输出。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('\\n', 5), (' ', 41), (',', 4), ('.', 1), (';', 1)]\n",
      "[('b', 2), ('c', 3), ('d', 15), ('e', 29)]\n"
     ]
    }
   ],
   "source": [
    "article = '''\n",
    "There were doors all round the hall, but they were all locked;\n",
    "and when Alice had been all the way down one side and up the\n",
    "other, trying every door, ALICE walked sadly down the middle,\n",
    "wondering how she was ever to get out again.\n",
    "'''\n",
    "article = article.lower()\n",
    "charCounter = {}\n",
    "for e in article:\n",
    "    charCounter[e] = charCounter.get(e, 0) + 1\n",
    "xlist = sorted(list(charCounter.items()), key = lambda x:x[0])\n",
    "print(xlist[0:5], xlist[6:10], sep =\"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**进阶修改：**  \n",
    "从程序运行结果可以看到结果中包含大量标点符号。对程序进行改写，  \n",
    "统计时如何去除对换行(`\\n`)、空格、逗号、分号、点等标点符号的统计"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4.3 字典中的删除操作\n",
    "\n",
    "- `del d[key]`，删除key对应的键值对。但如果key不存在会引发KeyError\n",
    "- `pop(key[,default])`，弹出key对应的键值对。不存在则返回default或引发KeyError  \n",
    "- `popitem()`，从字典返回一个键值对元组，并从字典删除该键值对\n",
    "- `clear()`，移除字典中所有元素"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'wang': '王', 'zhao': '赵', 'li': '李', 'chen': '陈'}\n",
      "None\n",
      "{0: 10, 1: 11, 2: 12, 3: 13, 4: 14}\n",
      "{}\n"
     ]
    }
   ],
   "source": [
    "'''del演示'''\n",
    "nameDict = {\"zhang\":\"张\", \"wang\":\"王\", \"zhao\":\"赵\", \"li\":\"李\", \"chen\":\"陈\"}\n",
    "#因为del一个不存在的键值对会引发KeyError，所以需用in判断一下\n",
    "if \"zhang\" in nameDict:  \n",
    "    del nameDict[\"zhang\"]\n",
    "print(nameDict)\n",
    "'''pop演示'''\n",
    "#因为zhang对应的键值对不存在，会引发KeyError，所以需要先用in判断一下\n",
    "if \"zhang\" in nameDict:  \n",
    "    x = nameDict.pop(\"zhang\")   #这句并未执行\n",
    "x = nameDict.pop(\"zhang\", None) #键值对不存在返回用户指定的值None\n",
    "print(x)\n",
    "'''popitem演示'''\n",
    "while len(nameDict) != 0:\n",
    "    item = nameDict.popitem()   #实际上返回的是一个键值对元组\n",
    "'''clear演示'''    \n",
    "xdict = {x:x+10 for x in range(5)}\n",
    "print(xdict)\n",
    "xdict.clear()\n",
    "print(xdict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4.4 字典的视图对象\n",
    "- items(),  返回字典的键值对视图对象\n",
    "- keys(),     返回字典的键视图对象\n",
    "- values(), 返回字典的值视图对象\n",
    "\n",
    "通过这些**视图对象**，可以访问字典中的元素。  \n",
    "同时,字典中元素的改变，也会影响到视图对象。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dict_items([('zhang', '张'), ('wang', '王'), ('zhao', '赵'), ('li', '李'), ('chen', '陈')])\n",
      "字典内容修改后：\n",
      "dict_items([('wang', '汪'), ('zhao', '赵'), ('li', '李'), ('chen', '陈')])\n"
     ]
    }
   ],
   "source": [
    "nameDict = {\"zhang\":\"张\", \"wang\":\"王\", \"zhao\":\"赵\", \"li\":\"李\", \"chen\":\"陈\"}\n",
    "items = nameDict.items()\n",
    "print(items)\n",
    "del nameDict[\"zhang\"]\n",
    "nameDict[\"wang\"] = \"汪\"\n",
    "print(\"字典内容修改后：\")\n",
    "print(items)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4.5 字典的其他方法\n",
    "- **d.copy()**，对字典d进行浅拷贝\n",
    "\n",
    "**注意：**\n",
    "\n",
    "只是浅拷贝。字典d的值如果是可变对象，如列表，那么更改原字典的(如列表中的内容)会影响到拷贝得到的新字典对应的值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "False\n",
      "nameDict {'zhang': ['章', '张'], 'wang': ['王', '汪']}\n",
      "cloneDict {'zhang': ['章', '张'], 'wang': ['王', '汪']}\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "nameDict = {\"zhang\":[\"章\",\"张\"], \"wang\":[\"王\"]}\n",
    "cloneDict = nameDict.copy()\n",
    "print(id(nameDict) == id(cloneDict)) #id不一致，说明两个字典不是同一个\n",
    "nameDict[\"wang\"].append(\"汪\") \n",
    "print(\"nameDict\",nameDict)\n",
    "print(\"cloneDict\",cloneDict) \n",
    "print(nameDict == cloneDict) #返回True，因为其键值对一致"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 使用 `==` 判断两个字典内容是否相同\n",
    "\n",
    "虽然xdict与ydict打印出来不一样(因为不同的插入顺序)，但其内容相同。  \n",
    "使用 `==` 运算符依然可以判断两个字典是否相同。\n",
    "\n",
    "- **sorted函数**\n",
    "对字典的键集排序，并返回键列表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1: 'a', 2: 'b', 3: 'c'}\n",
      "{3: 'c', 1: 'a', 2: 'b'}\n",
      "True\n",
      "<class 'list'> [1, 2, 3]\n"
     ]
    }
   ],
   "source": [
    "xdict = {1:\"a\", 2:\"b\", 3:\"c\"}\n",
    "ydict = {3:\"c\", 1:\"a\", 2:\"b\"}\n",
    "print(xdict)\n",
    "print(ydict)\n",
    "print(xdict == ydict)\n",
    "x = sorted(ydict)\n",
    "print(type(x), x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.5 字典应用案例"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5.1 英文单词统计\n",
    "统计一段英文中每个单词出现的次数，并按照出现次数降序排序后输出。   \n",
    "代码中使用了list本身的sort排序，这是一种就地排序,未生成新列表。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('the', 4)\n",
      "('all', 3)\n",
      "('were', 2)\n",
      "('and', 2)\n",
      "('alice', 2)\n",
      "('down', 2)\n",
      "('there', 1)\n",
      "('doors', 1)\n",
      "('round', 1)\n",
      "('hall', 1)\n"
     ]
    }
   ],
   "source": [
    "article = '''\n",
    "There were doors all round the hall, but they were all locked;\n",
    "and when Alice had been all the way down one side and up the\n",
    "other, trying every door, ALICE walked sadly down the middle,\n",
    "wondering how she was ever to get out again.\n",
    "'''\n",
    "article = article.lower()\n",
    "for e in \",;.\":\n",
    "   article = article.replace(e, \" \")\n",
    "wordCounter = {}\n",
    "for word in article.split():\n",
    "    wordCounter[word] = wordCounter.get(word, 0) + 1\n",
    "xlist = list(wordCounter.items())\n",
    "'''xlist.sort这是一种就地排序，未生成新列表'''\n",
    "xlist.sort(key = lambda x:x[1], reverse = True)\n",
    "if len(xlist) >= 10:\n",
    "    for i in range(10):\n",
    "        print(xlist[i])\n",
    "else:\n",
    "    for e in xlist:\n",
    "        print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5.2 行政区划查询"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "设计一行政区划查询系统。输入行政区划名称，就可以找到该行政区划对应的所有信息。  \n",
    "比如，输入“思明区”，则会查询到如下信息:  \n",
    "**思明,3,0592,361001,中国-福建省-厦门市-思明区,118.08233,24.44543,Siming**  \n",
    "分别为“简称、等级、区号、邮编、完整名、经度、纬度、拼音”。  \n",
    "\n",
    "因为该系统需要频繁查询，所以希望其查询速度越快越好。因此使用字典来存储行政区划  \n",
    "信息。尝试完成该系统。\n",
    "\n",
    "行政区划的数据存储在\"行政区划数据库.csv\"文件中。文件内容如下图所示：\n",
    "![](行政区划.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**系统分析:**\n",
    "\n",
    "完成本系统主要考虑如下几个方面：\n",
    "1. 字典的键与值分别存放什么信息？\n",
    "2. 如何读取.csv文件？\n",
    "3. 如何让系统系统的输出更美观？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1. 字典的键与值分别存什么信息**\n",
    "\n",
    "系统要求通过输入**行政区划名称**来查询**该行政区划的所有信息**。   \n",
    "因此，应将文件中第一列**Name列作为字典的键**，后面的列(行政区划所有其他信息)作为**值**。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2. CSV文件读写**\n",
    "\n",
    "CSV文件是一种**文本文件**，其格式非常简单。它是一种通用的文件格式，可以被  \n",
    "记事本、Excel文件打开。文件中的值是以逗号','进行分割。使用记事本打开CSV  \n",
    "文件，内容如下所示：  \n",
    "**北京市,北京,2,10,100000,中国-北京-北京市,116.405285,39.904989,Beijing**\n",
    "\n",
    "可直接使用Python内置的open()函数打开文件，随后读写文件。  \n",
    "\n",
    "**打开文件:**\n",
    "\n",
    "直接使用Python内置的open()函数打开文件，随后读写文件。  \n",
    "**注意：**该文件第一行\"Name、ShortName...\"为标题信息，暂时无需使用。  \n",
    "\n",
    "**整个程序分为如下几个部分：**\n",
    "\n",
    "\n",
    "读取文件并建立行政区划字典areaDict -> 查询字典 -> 输出查询的信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('中国', ['中国', '0', '', '', '中国', '116.3683244', '39.915085', 'China'])\n",
      "('北京', ['北京', '1', '', '', '中国-北京', '116.405285', '39.904989', 'Beijing'])\n",
      "('北京市', ['北京', '2', '10', '100000', '中国-北京-北京市', '116.405285', '39.904989', 'Beijing'])\n",
      "('东城区', ['东城', '3', '10', '100010', '中国-北京-北京市-东城区', '116.41005', '39.93157', 'Dongcheng'])\n",
      "('西城区', ['西城', '3', '10', '100032', '中国-北京-北京市-西城区', '116.36003', '39.9305', 'Xicheng'])\n",
      "('朝阳区', ['朝阳', '3', '431', '130012', '中国-吉林省-长春市-朝阳区', '125.2883', '43.83339', 'Chaoyang'])\n",
      "('丰台区', ['丰台', '3', '10', '100071', '中国-北京-北京市-丰台区', '116.28625', '39.8585', 'Fengtai'])\n",
      "('石景山区', ['石景山', '3', '10', '100043', '中国-北京-北京市-石景山区', '116.2229', '39.90564', 'Shijingshan'])\n",
      "('海淀区', ['海淀', '3', '10', '100089', '中国-北京-北京市-海淀区', '116.29812', '39.95931', 'Haidian'])\n",
      "('门头沟区', ['门头沟', '3', '10', '102300', '中国-北京-北京市-门头沟区', '116.10137', '39.94043', 'Mentougou'])\n",
      "('房山区', ['房山', '3', '10', '102488', '中国-北京-北京市-房山区', '116.14257', '39.74786', 'Fangshan'])\n",
      "请输入要查询的的区域名称：思明区\n",
      "您所要查询的区域详细信息如下： ['思明', '3', '592', '361001', '中国-福建省-厦门市-思明区', '118.08233', '24.44543', 'Siming']\n"
     ]
    }
   ],
   "source": [
    "'''读取文件并建立行政区划字典areaDict，并进行测试(查询并输出)'''\n",
    "def readFileAndCreateDict(filename):\n",
    "    areaDict = {}\n",
    "    with open(filename, 'r') as f: #'r'以只读方式打开文件，f代表打开文件后的对象\n",
    "        f.readline()                #第一行是标题直接跳过\n",
    "        for line in f:              #对文件中剩余的行\n",
    "            xlist = line.strip().split(\",\")\n",
    "            areaDict[xlist[0]] = xlist[1:]\n",
    "    return areaDict\n",
    "\n",
    "def print10Lines(xdict): #打印字典前10行\n",
    "    if len(xdict) >= 10:\n",
    "        i = 0\n",
    "        for e in xdict.items():\n",
    "            print(e)\n",
    "            if i == 10:\n",
    "                break \n",
    "            else:\n",
    "                i += 1\n",
    "    else:\n",
    "        for e in xdict:\n",
    "            print(e)\n",
    "            \n",
    "fileName = \"行政区划数据库.csv\"\n",
    "areaDict = readFileAndCreateDict(fileName)\n",
    "print10Lines(areaDict)\n",
    "name = input(\"请输入要查询的的区域名称：\")\n",
    "print(\"您所要查询的区域详细信息如下：\", areaDict[name])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**代码优化：**\n",
    "\n",
    "上述代码虽然基本可以完成查询功能，但存在两个问题：\n",
    "1. 每次查询都要重读整个文件，效率低下；\n",
    "2. 界面不美观。\n",
    "\n",
    "因此需对上面的程序进行优化。主要增加如下两个函数：\n",
    "- 输出信息函数showInfo()：格式化输出，每输出一项信息就换行\n",
    "- 界面函数menu()：显示界面、接收用户输入并返回查询结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "读取数据库完毕\n",
      "1. 查询\n",
      "2. 退出\n",
      "请输入:1\n",
      "请输入要查询的区域:xxx\n",
      "查无此区域\n",
      "1. 查询\n",
      "2. 退出\n",
      "请输入:1\n",
      "请输入要查询的区域:思明区\n",
      "简称:思明\n",
      "等级:3\n",
      "区号:592\n",
      "邮编:361001\n",
      "全称:中国-福建省-厦门市-思明区\n",
      "精度:118.08233\n",
      "纬度:24.44543\n",
      "1. 查询\n",
      "2. 退出\n",
      "请输入:2\n"
     ]
    }
   ],
   "source": [
    "def readFileAndCreateDict(filename):\n",
    "    areaDict = {}\n",
    "    with open(filename, 'r') as f: \n",
    "        f.readline()                \n",
    "        for line in f:              \n",
    "            xlist = line.strip().split(\",\")\n",
    "            areaDict[xlist[0]] = xlist[1:]\n",
    "    return areaDict\n",
    "\n",
    "def showInfo(x):#x为一个列表, 里面包含行政区划其他信息\n",
    "    t = [\"简称\", \"等级\", \"城市代码\", \"邮编\", \"全称\", \"精度\", \"纬度\", \"拼音\" ]\n",
    "    info = \"{}: {}\\n{}: {}\\n{}: {}\\n{}: {}\\n{}: {}\\n{}: {}\\n{}: {}\".\\\n",
    "        format(t[0],x[0],t[1],x[1],t[2],x[2],t[3],x[3],t[4],x[4],\\\n",
    "               t[5],x[5],t[6],x[6],t[7],x[7])\n",
    "    print(info)\n",
    "    \n",
    "def menu():\n",
    "    return input(\"1. 查询\\n2. 退出\\n请输入:\")\n",
    "\n",
    "'''当.py文件被直接运行时,执行如下的代码块。但作为模块导入，不执行。'''\n",
    "if __name__ == \"__main__\": \n",
    "    fileName = \"行政区划数据库.csv\"\n",
    "    areaDict = readFileAndCreateDict(fileName)\n",
    "    print(\"读取数据库完毕\")\n",
    "    while True:\n",
    "        choice = menu()\n",
    "        if choice == \"1\": #输入1查询，其他退出\n",
    "            name = input(\"请输入要查询的区域:\")\n",
    "            result = areaDict.get(name)\n",
    "            if result == None:\n",
    "                print(\"查无此区域\")\n",
    "            else:\n",
    "                showInfo(result)\n",
    "        else:\n",
    "            break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**小练习（改进）:**\n",
    "1. 编写一函数find(name, title)。根据行政区划名name与列名title找到对应的数据。如find(\"思明区\",\"拼音\")找到思明区的拼音\n",
    "2. 属于同一城市的行政区划都具有相同的城市代码。编写一函数findCity(name)，根据行政区划名name找到其所属城市。\n",
    "3. 现在的查找只能进行精确匹配查找。比如，为了查找“思明区”一定要输入“思明区”。想实现模糊查找，比如，输入“思明”就可以找到名称中包含“思明”的所有行政区划。要如何实现？还能使用字典吗？"
   ]
  }
 ],
 "metadata": {
  "file_extension": ".py",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  },
  "mimetype": "text/x-python",
  "name": "python",
  "npconvert_exporter": "python",
  "pygments_lexer": "ipython3",
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "165px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  },
  "version": 3
 },
 "nbformat": 4,
 "nbformat_minor": 2
}