{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 2、数据结构、函数、条件和循环(1)\n", "\n", "## 课程大纲\n", "\n", "* 数据结构\n", " * 布尔型\n", " * 字符串\n", " * 列表\n", " \n", " \n", "* 内置函数的调用\n", " * type( )\n", " * int( )、 float( )、 str( )\n", " * len( ) \n", " * range( )\n", " * max( ) 、 min( )\n", "\n", "\n", "\n", "* 条件语句\n", " * 布尔表达式\n", " * 逻辑运算符\n", " * 条件执行\n", " * 分支执行\n", " * 链式条件\n", " \n", " \n", "* for 循环\n", "\n", "\n", "* 案例讲解:文本词频\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 数据结构" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 布尔型(bool)\n", "* 布尔表达式是具有真或假状态的一种表达式: True or False\n", "* 可以把布尔型看做“是”和“否”\n", "* 布尔型变量在后续课程中将非常有用,可以帮助我们筛选数据,进行条件执行等等。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "z = 1 < 100\n", "print(z)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False\n" ] } ], "source": [ "z = 1 > 100\n", "print(z)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "# 我们也可以直接赋予变量True或者False\n", "z = True\n", "print(z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 字符串(str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 字符串是若干字符的序列" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "football\n" ] } ], "source": [ "# 定义一个字符串\n", "# = 是赋值的意思,把字符串'football'赋给变量sport\n", "sport = 'football'\n", "print(sport)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 用方括号运算符逐一访问每个字符" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "o\n" ] } ], "source": [ "# 把sport第二个字母赋值给letter\n", "letter = sport[1] # 我们已经定义过变量sport\n", "print(letter)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ot\n" ] } ], "source": [ "letters = sport[2:4]\n", "print(letters)# 第三和第四个字母" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 注意:\n", "* 在Python中,索引是从字符串头部算起的一个偏移量,第一个字母的位置为0。所以sport[0]才会返回字母f。\n", "* 第一个索引值为0是python的惯例,另外一些软件比如R的第一个索引值为1。\n", "* 方括号运算符不光可以对字符串取其中的一部分,也可以使用到其他的数据类型。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 列表 (list)\n", "### 存储多个数据点,是有若干个值组成的一个序列" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['father', 1.78, 'mother', 1.68, 'son', 1.8, 'daughter', 1.65]\n" ] } ], "source": [ "# 目前为止每个我们定义的每个变量都只存有一个数据点,列表帮助我们在一个变量中存贮多个数据点\n", "# 比如存储一家人的身高数据\n", "fam_height = [\"father\", 1.78, \"mother\", 1.68, \"son\", 1.80, \"daughter\", 1.65]\n", "print(fam_height)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 一个列表可以存储不同的数据类型" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a', 3, ['b', 3.1415]]\n" ] } ], "source": [ "mix = ['a', 3, ['b', 3.1415]]\n", "print(mix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 列表的分割\n", "#### 注意:python索引是从0开始\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.78" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 父亲的身高\n", "fam_height[1]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.65" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 女儿的身高\n", "fam_height[7]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.65" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 我们也可以通过负数索引的到女儿的身高\n", "fam_height[-1]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.8" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 儿子的身高\n", "fam_height[-3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 列表切片操作(slicing)\n", "#### 注意切片操作的格式是\n", "[始索引:终索引] (包括始索引,不包括终索引)\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['mother', 1.68]" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fam_height[2:4]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['father', 1.78, 'mother', 1.68]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 默认从0开始\n", "fam_height[:4] " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['son', 1.8, 'daughter', 1.65]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 默认到最后一项\n", "fam_height[4:]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['father', 1.78, 'mother', 1.68, 'son', 1.8, 'daughter', 1.65]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 取所有值\n", "fam_height[:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 作业 1:\n", "下方是一个嵌套的列表,通过对x分割,打印“e”和“g”。\n", "\n", "提示:使用 x[ ][ ]的形式" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "x = [[\"a\", \"b\", \"c\"],\n", " [\"d\", \"e\", \"f\"],\n", " [\"g\", \"h\", \"i\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 列表的操作\n", "* 改变一个元素(element)\n", "* 增加一个元素\n", "* 删除一个元素\n", "* 查找某个元素对应的位置\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['father', 1.78, 'mother', 1.68, 'son', 1.8, 'daughter', 1.68]\n" ] } ], "source": [ "# 改女儿的身高的身高(女儿长高啦)\n", "fam_height[-1] = 1.68\n", "print(fam_height)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['lao wang', 1.77, 'mother', 1.68, 'son', 1.8, 'daughter', 1.68]\n" ] } ], "source": [ "# 可以同时修改多个\n", "fam_height[0:2] = ['lao wang', 1.77]\n", "print(fam_height)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['lao wang', 1.77, 'mother', 1.68, 'son', 1.8, 'daughter', 1.68, 'second_son', 0.51]\n" ] } ], "source": [ "# 增加元素(又生了个儿子哟)\n", "fam_height = fam_height + ['second_son', 0.51]\n", "print(fam_height)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['lao wang', 1.77, 'mother', 1.68, 'son', 1.8, 'second_son', 0.51]\n" ] } ], "source": [ "# 删除元素 (女儿嫁人啦)\n", "del(fam_height[6:8])\n", "print(fam_height)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### 查找某个元素对应的列表索引:\n", "* 我们可以使用python内置的列表函数来处理列表,列表本身自带的函数称为方法(method)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "print(fam_height.index(1.77))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n" ] } ], "source": [ "print(fam_height.index('lao wang'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 内置函数的调用" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 函数是一段可重复使用的代码,往往是为了解决某个特定的任务。\n", "\n", "\n", "\n", "#### Python提供了很多重要的内置函数,不需要我们预先定义就可以使用。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### type函数:查看变量类型" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "x = 10\n", "# x是整数型\n", "print(type(x))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "y = 'python'\n", "# y是字符串\n", "print(type(y))" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "z = False\n", "print(type(z))" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mix = ['a', 3, ['b', 3.1415]]\n", "type(mix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### int 和 float 函数" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n" ] } ], "source": [ "# int函数能将浮点型转换成整数,但不进行四舍五入,而是直接截断\n", "print(int(3.99))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 将字符串转换成浮点型\n", "x = float('3.1415926')\n", "type(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### str函数" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "150\n", "\n" ] } ], "source": [ "# str函数可以把数值变量转换为字符串:对于一些打印命令会有用\n", "x = 150\n", "x = str(x)\n", "print(x)\n", "print(type(x))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "lao wang is 55 years old\n" ] } ], "source": [ "# 同时打印多个变量\n", "name = 'lao wang'\n", "age = 55\n", "print(name + ' is ' + str(age) + ' years old')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### len函数,获取列表元素个数" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fam_height = [\"father\", 1.78, \"mother\", 1.68, \"son\", 1.80, \"daughter\", 1.65]\n", "len(fam_height) #总共8个元素" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 获取字符串长度\n", "len('Python')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### range函数:生成整数序列\n", "* 在循环中非常有用\n", "* range(a, b) a和b为整数满足b>a,生成的一个列表[a, a+1, ..., b-1]" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "range(0, 9)\n", "[0, 1, 2, 3, 4, 5, 6, 7, 8]\n" ] } ], "source": [ "seq = range(0,9)\n", "print(type(seq))\n", "print(seq)\n", "print(list(seq))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### max和min函数:找出一组数中最大值和最小值" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "max(6, 5, 7, 2, 12, 9)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 参数可以是列表\n", "num = [6, 5, 7, 2, 12, 9]\n", "min(num)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# 条件语句 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 布尔表达式\n", "\n", "在条件执行中至关重要,以下是一些常用的比较运算符\n", "\n", "* x == y (比较x与y是否相等,注意必须使用==而不是=)\n", "* x != y (比较x与y是否不等)\n", "* x < y (x是否小于y)\n", "* x > y (x是否大于y)\n", "* x <= y (x是否小于等于y)\n", "* x >= y (x是否大于等于y)\n", "* x is y (x是否和y相同,除了相等,类型还必须一样)\n", "* x is not y (x是否和y不同)\n", "\n", "返回布尔类型 True 或者 False\n" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3 != 5" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3 is 3.0" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = \"banana\"\n", "y = \"orange\"\n", "x is not y" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x == y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 逻辑运算符" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 逻辑运算符包括and(与)、or(或)与not(非)三种。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### and (与)\n", "* True and Ture 返回 True\n", "* True and False 返回 False\n", "* False and False 返回 False" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# x > 0 返回True,x < 10 返回True \n", "x = 5\n", "x > 0 and x < 10" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 也可以用 & 代替and\n", "x = 5\n", "(x > 0) & (x < 10)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = 15\n", "x > 0 and x < 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### or (或)\n", "* True or Ture 返回 True\n", "* True or False 返回 True\n", "* False or False 返回 False" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# x > 0返回True,x < 10返回False\n", "x = 15\n", "x > 0 or x < 10" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 也可以用 | 代替or\n", "x = 15\n", "(x > 0) | (x < 10)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = 15\n", "x > 100 or x < 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### not (非)\n", "\n", "* not True 返回 False\n", "* not False 返回 True\n" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x=15\n", "not (x > 100 )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 条件执行 \n", "* 最简单的条件执行是 **if** 语句\n", "* if 语句后的布尔表达式称为条件,只有当条件返回 True 才会执行缩进的语句\n", "* if 语句的末尾用冒号 :\n", "* if 语句之后的语句要缩进,使用一个tab键或者四个空格" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x 是负数\n" ] } ], "source": [ "# 判断x是负数吗?\n", "x = -1\n", "if x < 0 :\n", " print('x 是负数')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "# 如果条件不满足,则不执行缩进的语句\n", "x = 1\n", "if x < 0 :\n", " print('x 是负数')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 分支执行\n", "* 如果条件语句中存在两个可能性,我们可以使用 **if else** 语句" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x 是正数\n" ] } ], "source": [ "# 判断x是正数 或 不是正数\n", "x = 5\n", "if x > 0 :\n", " print('x 是正数')\n", "else :\n", " print('x 是不是正数')" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x 是不是正数\n" ] } ], "source": [ "x = -5\n", "if x > 0 :\n", " print('x 是正数')\n", "else :\n", " print('x 是不是正数')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 链式条件\n", "* 如果条件语句中存在多个可能性,我们可以使用 **if elif **语句\n", "* elif是else if的缩写" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x 是 0\n" ] } ], "source": [ "x = 0\n", "if x > 0 :\n", " print('x 是正数')\n", "elif x < 0 :\n", " print('x 是负数')\n", "else : \n", " print('x 是 0')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# for 循环\n", "* 有时候我们需要进行遍历操作,比如打印一个列表里的每一个元素,可以使用for语句来构建循环\n", "* 几乎所有的编程语言都包含for循环\n", "\n", "``` \n", "for variable in sequence:\n", " expression\n", "```" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.78\n", "1.68\n", "1.8\n", "1.65\n" ] } ], "source": [ "# 家人身高\n", "family = [1.78, 1.68, 1.80, 1.65]\n", "# 如果想分别打印fam里的每一个身高\n", "print(family[0])\n", "print(family[1])\n", "print(family[2])\n", "print(family[3])" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.78\n", "1.68\n", "1.8\n", "1.65\n" ] } ], "source": [ "# 使用for循环则可简化很多\n", "for height in family : # 变量名height是自定义的,你可以起其他名字\n", " print(height)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "2\n", "3\n", "4\n", "5\n", "6\n", "7\n", "8\n" ] } ], "source": [ "for i in range(0, 9) : \n", " print(i)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "f\n", "a\n", "m\n", "i\n", "l\n", "y\n" ] } ], "source": [ "for c in 'family':\n", " print(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### enumerate函数:同时获得索引和元素的值" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "index 0: 1.78\n", "index 1: 1.68\n", "index 2: 1.8\n", "index 3: 1.65\n" ] } ], "source": [ "for index, height in enumerate(family) :\n", " print('index ' + str(index) + ': ' + str(height))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 案例:找出一个文本中词频最高的单词\n", "### 这里使用了本节课所讲的内置函数,数据结构条,条件语句和for循环" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['the', 'clown', 'ran', 'after', 'the', 'car', 'and', 'the', 'car', 'ran', 'into', 'the', 'tent', 'and', 'the', 'tent', 'fell', 'down', 'on', 'the', 'clown', 'and', 'the', 'car']\n" ] } ], "source": [ "text = 'the clown ran after the car and the car ran into the tent and the tent fell down on the clown and the car'\n", "words = text.split() # 获取单词的列表\n", "print(words)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the\n", "clown\n", "ran\n", "after\n", "the\n", "car\n", "and\n", "the\n", "car\n", "ran\n", "into\n", "the\n", "tent\n", "and\n", "the\n", "tent\n", "fell\n", "down\n", "on\n", "the\n", "clown\n", "and\n", "the\n", "car\n" ] } ], "source": [ "# 用for循环查看words里的单词\n", "for word in words:\n", " print(word)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 策略:\n", "* 首先使用获取文本中所有出现过得单词\n", "* 然后初始化一个全部为零的词频列表\n", "* 通过for循环,某个单词出现一次,词频列表相应的位置加1\n", "* 比较每个单词的词频,得到最高频单词" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 步骤一:获得单词列表" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['the', 'clown', 'ran', 'after', 'car', 'and', 'into', 'tent', 'fell', 'down', 'on']\n" ] } ], "source": [ "unique_words = list() # 初始化空列表\n", "\n", "for word in words :\n", " if (word not in unique_words) : # 使用in判断某个元素是否在列表里\n", " unique_words.append(word)\n", " \n", "print(unique_words)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 步骤二:初始化词频列表" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n" ] } ], "source": [ "# [e]*n 快速初始化\n", "counts = [0] * len(unique_words)\n", "print(counts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 步骤三:统计词频" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[7, 2, 2, 1, 3, 3, 1, 2, 1, 1, 1]\n" ] } ], "source": [ "for word in words :\n", " index = unique_words.index(word)\n", " counts[index] = counts[index] + 1\n", " \n", "print(counts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 步骤四:找出最高词频和其对应的单词" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the 7\n" ] } ], "source": [ "bigcount = None # None为空值,初始化bigcount\n", "bigword = None\n", "\n", "for i in range(len(counts)): \n", " if bigcount is None or counts[i] > bigcount :\n", " bigword = unique_words[i]\n", " bigcount = counts[i]\n", " \n", "print(bigword, bigcount)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " 上述案例融合了本节课所有的知识点。下节课我们讲解python字典以后,可以适当简化。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 作业2:找最大值和最小值\n", "\n", "以下给出一个团队中成员名和他们各自的微信好友数量,请模仿上述词频的例子,通过for循环找出团队中好友数最多和最少的两个人。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "names = ['小赵','小钱','小孙','小李','小王','小张']\n", "friends = [45, 100, 67, 136, 77, 17]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }