{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数据处理03:Python数据分析库Pandas\n",
    "Pandas 是最强大的 Python 数据分析库,它在 NumPy 基础之上构建,功能完善、性能出色并且操作便捷。项目官网 http://pandas.pydata.org/\n",
    "![03_pandas.png](https://upload-images.jianshu.io/upload_images/10829283-cd5d3dfd9847679d.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\n",
    "\n",
    "Pandas 已包含于 Anaconda 中,导入模块时请按惯例命名为 pd:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'0.24.2'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "pd.__version__  # 查看版本号"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas 所提供的对象类型主要有“数据系列”(Series)和“数据网格”(DataFrame)——Series 像是一维数组而 DataFrame 像是二维数组,与数组的关键区别在于它们包含可自定义的“数据索引”(Index),类似于字典的键。DataFrame 中的列就是 Series 对象,每一列有各自的数据类型但共享相同的 Index。让我们先调用构造器创建一个 Series:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    北京\n",
       "1    上海\n",
       "2    广州\n",
       "3    深圳\n",
       "dtype: object"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series([\"北京\", \"上海\", \"广州\", \"深圳\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Series 对象的 index 属性指向所用的索引:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=4, step=1)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Out[2].index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面创建一个新的 Series 并使用城市名拼音缩写作为自定义索引:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['bj', 'sh', 'gz', 'sz'], dtype='object')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cityname = pd.Series([\"北京\", \"上海\", \"广州\", \"深圳\"], index=[\"bj\", \"sh\", \"gz\", \"sz\"])\n",
    "cityname.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "需要注意的是默认的序列索引仍然有效:前者称为显式索引而后者称为隐式索引,当以整数作为显式索引时这可能会引发混淆,因此你还可以用“定位器”属性 loc 和 iloc 来明确指定索引方式:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'深圳'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cityname[\"sz\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'深圳'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cityname[-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'北京'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cityname.loc[\"bj\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'北京'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cityname.iloc[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面让我们再尝试创建 DataFrame,所用方式是向构造器传入一个由可索引对象组成的字典,所生成 DataFrame 的列数据和列标签就是字典的值和键,行索引是一个由所有列数据共用的 Index,列索引则是一个由所有列标签组成的 Index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>名称</th>\n",
       "      <th>人口</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>bj</th>\n",
       "      <td>北京</td>\n",
       "      <td>1877.70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gz</th>\n",
       "      <td>广州</td>\n",
       "      <td>1246.83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sh</th>\n",
       "      <td>上海</td>\n",
       "      <td>2115.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sz</th>\n",
       "      <td>深圳</td>\n",
       "      <td>1137.89</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    名称       人口\n",
       "bj  北京  1877.70\n",
       "gz  广州  1246.83\n",
       "sh  上海  2115.00\n",
       "sz  深圳  1137.89"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "citypop = {\"bj\": 1877.7, \"sh\": 2115, \"gz\": 1246.83, \"sz\": 1137.89}\n",
    "df = pd.DataFrame({\"名称\": cityname, \"人口\": citypop})\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['bj', 'gz', 'sh', 'sz'], dtype='object')"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index  # index属性指向行索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['名称', '人口'], dtype='object')"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns  # columns属性指向列索引"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "你可以使用 Series 与 DataFrame 的索引、属性或方法,以及模块的函数对数据执行各种操作,包括读取、更新和运算等等:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "bj    北京\n",
       "gz    广州\n",
       "sh    上海\n",
       "sz    深圳\n",
       "Name: 名称, dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"名称\"]  # 以索引方式获取列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "bj    1877.70\n",
       "gz    1246.83\n",
       "sh    2115.00\n",
       "sz    1137.89\n",
       "Name: 人口, dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.人口  # 以属性方式获取以标识符规则命名的列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>名称</th>\n",
       "      <th>人口</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>sh</th>\n",
       "      <td>上海</td>\n",
       "      <td>2115.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sz</th>\n",
       "      <td>深圳</td>\n",
       "      <td>1137.89</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    名称       人口\n",
       "sh  上海  2115.00\n",
       "sz  深圳  1137.89"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[2:]  # 以序列索引方式获取行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6377.42"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"人口\"].sum()  # 列数据求和"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>名称</th>\n",
       "      <th>人口</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>sh</th>\n",
       "      <td>上海</td>\n",
       "      <td>2115.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bj</th>\n",
       "      <td>北京</td>\n",
       "      <td>1877.70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gz</th>\n",
       "      <td>广州</td>\n",
       "      <td>1246.83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sz</th>\n",
       "      <td>深圳</td>\n",
       "      <td>1137.89</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    名称       人口\n",
       "sh  上海  2115.00\n",
       "bj  北京  1877.70\n",
       "gz  广州  1246.83\n",
       "sz  深圳  1137.89"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(\"人口\", ascending=False)  # 按人口列降序排列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注意方法和函数默认会返回新对象,索引操作则会在原地修改:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>名称</th>\n",
       "      <th>人口</th>\n",
       "      <th>区号</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>bj</th>\n",
       "      <td>北京</td>\n",
       "      <td>1877.70</td>\n",
       "      <td>010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gz</th>\n",
       "      <td>广州</td>\n",
       "      <td>1246.83</td>\n",
       "      <td>020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sh</th>\n",
       "      <td>上海</td>\n",
       "      <td>2115.00</td>\n",
       "      <td>021</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sz</th>\n",
       "      <td>深圳</td>\n",
       "      <td>1137.89</td>\n",
       "      <td>0755</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    名称       人口    区号\n",
       "bj  北京  1877.70   010\n",
       "gz  广州  1246.83   020\n",
       "sh  上海  2115.00   021\n",
       "sz  深圳  1137.89  0755"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"区号\"] = [\"010\", \"020\", \"021\", \"0755\"]  # 添加新列\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>名称</th>\n",
       "      <th>人口</th>\n",
       "      <th>区号</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>bj</th>\n",
       "      <td>北京</td>\n",
       "      <td>1877.70</td>\n",
       "      <td>010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gz</th>\n",
       "      <td>广州</td>\n",
       "      <td>1246.83</td>\n",
       "      <td>020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sh</th>\n",
       "      <td>上海</td>\n",
       "      <td>2115.00</td>\n",
       "      <td>021</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sz</th>\n",
       "      <td>深圳</td>\n",
       "      <td>1137.89</td>\n",
       "      <td>0755</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>tj</th>\n",
       "      <td>天津</td>\n",
       "      <td>875.24</td>\n",
       "      <td>022</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    名称       人口    区号\n",
       "bj  北京  1877.70   010\n",
       "gz  广州  1246.83   020\n",
       "sh  上海  2115.00   021\n",
       "sz  深圳  1137.89  0755\n",
       "tj  天津   875.24   022"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[\"tj\"] = [\"天津\", 875.24, \"022\"]  # 添加新行\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>名称</th>\n",
       "      <th>人口</th>\n",
       "      <th>区号</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>bj</th>\n",
       "      <td>北京</td>\n",
       "      <td>1877.70</td>\n",
       "      <td>010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gz</th>\n",
       "      <td>广州</td>\n",
       "      <td>1246.83</td>\n",
       "      <td>020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sh</th>\n",
       "      <td>上海</td>\n",
       "      <td>2115.00</td>\n",
       "      <td>021</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sz</th>\n",
       "      <td>深圳</td>\n",
       "      <td>1137.89</td>\n",
       "      <td>0755</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    名称       人口    区号\n",
       "bj  北京  1877.70   010\n",
       "gz  广州  1246.83   020\n",
       "sh  上海  2115.00   021\n",
       "sz  深圳  1137.89  0755"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df.人口 >= 1000]  # 按条件筛选"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "bj     True\n",
       "gz     True\n",
       "sh     True\n",
       "sz     True\n",
       "tj    False\n",
       "Name: 人口, dtype: bool"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.人口 >= 1000  # 筛选的原理是用布尔值系列来索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>名称</th>\n",
       "      <th>人口</th>\n",
       "      <th>区号</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>bj</th>\n",
       "      <td>北京</td>\n",
       "      <td>1877.70</td>\n",
       "      <td>010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gz</th>\n",
       "      <td>广州</td>\n",
       "      <td>1246.83</td>\n",
       "      <td>020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sh</th>\n",
       "      <td>上海</td>\n",
       "      <td>2115.00</td>\n",
       "      <td>021</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sz</th>\n",
       "      <td>深圳</td>\n",
       "      <td>1137.89</td>\n",
       "      <td>0755</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>tj</th>\n",
       "      <td>天津</td>\n",
       "      <td>875.24</td>\n",
       "      <td>022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cq</th>\n",
       "      <td>重庆</td>\n",
       "      <td>851.80</td>\n",
       "      <td>023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>nj</th>\n",
       "      <td>南京</td>\n",
       "      <td>617.82</td>\n",
       "      <td>025</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    名称       人口    区号\n",
       "bj  北京  1877.70   010\n",
       "gz  广州  1246.83   020\n",
       "sh  上海  2115.00   021\n",
       "sz  深圳  1137.89  0755\n",
       "tj  天津   875.24   022\n",
       "cq  重庆   851.80   023\n",
       "nj  南京   617.82   025"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.DataFrame([[\"重庆\", 851.8, \"023\"], [\"南京\", 617.82, \"025\"]], columns=[\"名称\", \"人口\", \"区号\"], index=[\"cq\", \"nj\"])\n",
    "pd.concat([df, df2])  # 拼接两个 DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来的例子是对中国历史上皇帝们的寿命进行统计分析,这次使用现成数据来生成 DataFrame。Pandas 支持读取多种类型的资源,例如以逗号作为分隔符的文本格式(CSV):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "数据网格形状: (302, 5)\n",
      "各列数据类型:\n",
      "num         int64\n",
      "name       object\n",
      "age         int64\n",
      "year       object\n",
      "dynasty    object\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "# 短网址对应的原文件\n",
    "# https://gitee.com/freesand/pyStudy/raw/master/data/emperor.csv\n",
    "# df = pd.read_csv(\"http://t.cn/EMl0NtB\")\n",
    "df = pd.read_csv(\"emperor.csv\")\n",
    "print(\"数据网格形状:\", df.shape)\n",
    "print(\"各列数据类型:\")\n",
    "print(df.dtypes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于大尺寸 DataFrame,推荐先用 shape 和 dtypes 属性查看形状和列数据类型,也可用 head() 方法预览前 5 行内容,DataFrame 在 Jupyter Notebook 中会以表格形式输出:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>num</th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>year</th>\n",
       "      <th>dynasty</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>秦始皇嬴政</td>\n",
       "      <td>50</td>\n",
       "      <td>前259年—前210年</td>\n",
       "      <td>秦</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>秦二世嬴胡亥</td>\n",
       "      <td>24</td>\n",
       "      <td>前230年—前207年</td>\n",
       "      <td>秦</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>汉高帝刘邦</td>\n",
       "      <td>62</td>\n",
       "      <td>前256年—前195年</td>\n",
       "      <td>西汉</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>汉惠帝刘盈</td>\n",
       "      <td>23</td>\n",
       "      <td>前210年—前188年</td>\n",
       "      <td>西汉</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>汉文帝刘恒</td>\n",
       "      <td>46</td>\n",
       "      <td>前202年—前157年</td>\n",
       "      <td>西汉</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   num    name  age         year dynasty\n",
       "0    1   秦始皇嬴政   50  前259年—前210年       秦\n",
       "1    2  秦二世嬴胡亥   24  前230年—前207年       秦\n",
       "2    3   汉高帝刘邦   62  前256年—前195年      西汉\n",
       "3    4   汉惠帝刘盈   23  前210年—前188年      西汉\n",
       "4    5   汉文帝刘恒   46  前202年—前157年      西汉"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于已生成的 DataFrame,还可以进行一些调整操作(修改列标签、去除多余内容等)再开始数据分析,例如列出寿命达到 80 岁的皇帝:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>序号</th>\n",
       "      <th>名号</th>\n",
       "      <th>寿命</th>\n",
       "      <th>生卒</th>\n",
       "      <th>朝代</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>100</td>\n",
       "      <td>梁武帝萧衍</td>\n",
       "      <td>86</td>\n",
       "      <td>464年—549年</td>\n",
       "      <td>南朝梁</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149</th>\n",
       "      <td>149</td>\n",
       "      <td>武则天武瞾</td>\n",
       "      <td>82</td>\n",
       "      <td>624年—705年</td>\n",
       "      <td>唐</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>208</th>\n",
       "      <td>207</td>\n",
       "      <td>宋高宗赵构</td>\n",
       "      <td>81</td>\n",
       "      <td>1107年—1187年</td>\n",
       "      <td>南宋</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>253</th>\n",
       "      <td>252</td>\n",
       "      <td>元世祖孛儿只斤·忽必烈</td>\n",
       "      <td>80</td>\n",
       "      <td>1215年—1294年</td>\n",
       "      <td>元</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>295</th>\n",
       "      <td>296</td>\n",
       "      <td>清高宗(乾隆)爱新觉罗·弘历</td>\n",
       "      <td>89</td>\n",
       "      <td>1711年—1799年</td>\n",
       "      <td>清</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      序号              名号  寿命           生卒   朝代\n",
       "100  100           梁武帝萧衍  86    464年—549年  南朝梁\n",
       "149  149           武则天武瞾  82    624年—705年    唐\n",
       "208  207           宋高宗赵构  81  1107年—1187年   南宋\n",
       "253  252     元世祖孛儿只斤·忽必烈  80  1215年—1294年    元\n",
       "295  296  清高宗(乾隆)爱新觉罗·弘历  89  1711年—1799年    清"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns = [\"序号\", \"名号\", \"寿命\", \"生卒\", \"朝代\"]\n",
    "df[df.寿命 >= 80]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "筛选出明清两朝的皇帝:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>序号</th>\n",
       "      <th>名号</th>\n",
       "      <th>寿命</th>\n",
       "      <th>生卒</th>\n",
       "      <th>朝代</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>267</th>\n",
       "      <td>266</td>\n",
       "      <td>明太祖(洪武)朱元璋</td>\n",
       "      <td>71</td>\n",
       "      <td>1328年—1398年</td>\n",
       "      <td>明</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268</th>\n",
       "      <td>267</td>\n",
       "      <td>明惠宗(建文)朱允炆</td>\n",
       "      <td>26</td>\n",
       "      <td>1377年—1402年</td>\n",
       "      <td>明</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>269</th>\n",
       "      <td>268</td>\n",
       "      <td>明成祖(永乐)朱棣</td>\n",
       "      <td>65</td>\n",
       "      <td>1360年—1424年</td>\n",
       "      <td>明</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>270</th>\n",
       "      <td>269</td>\n",
       "      <td>明仁宗(洪熙)朱高炽</td>\n",
       "      <td>48</td>\n",
       "      <td>1378年—1425年</td>\n",
       "      <td>明</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>271</th>\n",
       "      <td>270</td>\n",
       "      <td>明宣宗(宣德)朱瞻基</td>\n",
       "      <td>38</td>\n",
       "      <td>1398年—1435年</td>\n",
       "      <td>明</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      序号          名号  寿命           生卒 朝代\n",
       "267  266  明太祖(洪武)朱元璋  71  1328年—1398年  明\n",
       "268  267  明惠宗(建文)朱允炆  26  1377年—1402年  明\n",
       "269  268   明成祖(永乐)朱棣  65  1360年—1424年  明\n",
       "270  269  明仁宗(洪熙)朱高炽  48  1378年—1425年  明\n",
       "271  270  明宣宗(宣德)朱瞻基  38  1398年—1435年  明"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mingqing = df[df.朝代.isin([\"明\", \"清\"])]\n",
    "mingqing.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "比较明清两朝的皇帝寿命——聚合输出分组总计数、最低值、最高值、平均值、中位数:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>mean</th>\n",
       "      <th>median</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>朝代</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>明</th>\n",
       "      <td>16</td>\n",
       "      <td>23</td>\n",
       "      <td>71</td>\n",
       "      <td>42.187500</td>\n",
       "      <td>38.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>清</th>\n",
       "      <td>12</td>\n",
       "      <td>19</td>\n",
       "      <td>89</td>\n",
       "      <td>53.333333</td>\n",
       "      <td>59.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    count  min  max       mean  median\n",
       "朝代                                    \n",
       "明      16   23   71  42.187500    38.0\n",
       "清      12   19   89  53.333333    59.5"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compare = mingqing.groupby(\"朝代\").寿命.agg([\"count\", \"min\", \"max\", \"mean\", \"median\"])\n",
    "compare"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "还可以根据全体皇帝的寿命数据绘制直方图来显示值的分布:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([10., 25., 48., 62., 56., 49., 36., 11.,  5.,  0.]),\n",
       " array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.]),\n",
       " <a list of 10 Patch objects>)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAFKCAYAAADFU4wdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFixJREFUeJzt3V1s0/fdhvE7JIueuYHMtmyS0ahVolaraMkOVlGjtlOdEYJCVatD6qaqB96mVFO1LKQgAVErbRpwsD6I7GSqhaaFSZ3QKHIqslZRTBkodKxbWTmADaES8SKII9sJJOYtxs/Bo2btSrDj2PnazvU5Gm74+9ZPkGuxwz8V6XQ6LQAAsKCWWA8AAGAxIsAAABggwAAAGCDAAAAYIMAAABggwAAAGKhayCcbG7ue1+s5nQ4lEsm8XnOx4Qzzg3OcP85w/jjD+cv3GXo8S2f9byX9FXBVVaX1hJLHGeYH5zh/nOH8cYbzt5BnWNIBBgCgVBFgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMLOgPYwDmKpWSRkYqrGdk5HJZLwBQaggwitrISIV+/utTctQW7094SU449Idda+R0Wi8BUEoIMIqeozapGueU9QwAyCveAwYAwAABBgDAAAEGAMAAAQYAwEBWAb527Zo6OzvV1tam9evX6+TJkxofH1cwGFRra6uCwaAmJiYKvRUAgLKRVYB37NihZ555Rh988IH6+/vV1NSkUCgkn8+nwcFB+Xw+hUKhQm8FAKBsZAzw5OSkPv74Y23cuFGSVF1drWXLlikSiSgQCEiSAoGAhoaGCrsUAIAykvHfAV+8eFEul0vbtm3Tv/71L61cuVI9PT2KxWLyer2SJK/Xq3g8nvHJnE6Hqqoq57/6CzyepXm93mJUzGeYSFgvyF4xn2Op4AznjzOcv4U6w4wBnp6e1unTp/XGG2+oublZv/rVr3J+uTmRyO/djDyepRobu57Xay42xX6G8Xjx34byc8V8jqWg2P8slgLOcP7yfYb3i3nGl6Dr6upUV1en5uZmSVJbW5tOnz4tt9utaDQqSYpGo3JxM1wAALKWMcAej0d1dXX67LPPJEkfffSRmpqa5Pf7FQ6HJUnhcFgtLS2FXQoAQBnJ6l7Qb7zxhjZv3qw7d+6ooaFBu3bt0t27d9XV1aUDBw6ovr5evb29hd4KAEDZyCrAjz32mA4ePPiVx/v6+vI+CACAxYA7YQEAYIAAAwBggAADAGCAAAMAYIAAAwBggAADAGCAAAMAYIAAAwBggAADAGCAAAMAYIAAAwBggAADAGAgqx/GAGB26bvS+fNSPF5hPWVWDz+cVmWl9QoAX0SAgXm6cd2hN0PH5ahNWk+5p+SEQ71bVqmpKW09BcAXEGAgDxy1SdU4p6xnACghvAcMAIABAgwAgAECDACAAQIMAIABAgwAgAECDACAAQIMAIABAgwAgAECDACAAQIMAIABAgwAgAECDACAAQIMAIABAgwAgAECDACAAQIMAIABAgwAgAECDACAAQIMAIABAgwAgAECDACAAQIMAICBqmw+yO/364EHHtCSJUtUWVmpgwcPanx8XJs2bdLly5e1YsUK7dmzR7W1tYXeCwBAWcj6K+C+vj719/fr4MGDkqRQKCSfz6fBwUH5fD6FQqGCjQQAoNzk/BJ0JBJRIBCQJAUCAQ0NDeVtFAAA5S6rl6Al6cc//rEqKir00ksv6aWXXlIsFpPX65Ukeb1exePxjNdwOh2qqqrMfe09eDxL83q9xaiYzzCRsF5QHlyuGnk81isyK+Y/i6WCM5y/hTrDrAL8xz/+UcuXL1csFlMwGFRjY2NOT5ZIJHP6fbPxeJZqbOx6Xq+52BT7GcbjFdYTykI8PqmxsbT1jPsq9j+LpYAznL98n+H9Yp7VS9DLly+XJLndbq1du1anTp2S2+1WNBqVJEWjUblcrjxMBQBgccgY4GQyqcnJyZn/PTw8rEceeUR+v1/hcFiSFA6H1dLSUtilAACUkYwvQcdiMb322muSpFQqpQ0bNujZZ5/VE088oa6uLh04cED19fXq7e0t+FgAc5e+K124UPwv5fMiGhabjAFuaGjQe++995XHnU6n+vr6CjIKQP7cuO7Q/+7/VI7a/H4PRj4lJxz6w641cjqtlwALJ+vvggZQuhy1SdU4p6xnAPgCbkUJAIABvgJexFIp6ezZ4v6nPqXw3iUA5IIAL2IjIxX6+a+PF/V7g7FLLrkftF4BAPlHgBe5Yn9vMDnhsJ4AAAXBe8AAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABjIOsCpVEqBQECvvvqqJGl8fFzBYFCtra0KBoOamJgo2EgAAMpN1gHet2+fmpqaZn4dCoXk8/k0ODgon8+nUChUkIEAAJSjrAJ89epVHTlyRBs3bpx5LBKJKBAISJICgYCGhoYKsxAAgDKUVYB37typLVu2aMmS/3x4LBaT1+uVJHm9XsXj8cIsBACgDFVl+oAPP/xQLpdLjz/+uE6cODGvJ3M6HaqqqpzXNf6bx7M0r9dbTBIJ6wXAl/H3ef44w/lbqDPMGOBPPvlEhw8f1tGjR3Xr1i1NTk5q8+bNcrvdikaj8nq9ikajcrlcGZ8skUjmZfTnPJ6lGhu7ntdrLibxeIX1BOBL+Ps8P3xOnL98n+H9Yp7xJejXX39dR48e1eHDh7V792499dRTeuutt+T3+xUOhyVJ4XBYLS0teRsMAEC5y/nfAXd0dGh4eFitra0aHh5WR0dHPncBAFDWMr4E/UWrV6/W6tWrJUlOp1N9fX0FGQUAQLnjTlgAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYmNMPYwCAQkjflc6fL+6fUf3ww2lVVlqvQDkhwADM3bju0Juh43LUJq2n3FNywqHeLavU1JS2noIyQoABFAVHbVI1zinrGcCC4T1gAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMVGX6gFu3bunll1/W7du3lUqltG7dOnV2dmp8fFybNm3S5cuXtWLFCu3Zs0e1tbULsRkAgJKX8Svg6upq9fX16b333lM4HNaxY8f0z3/+U6FQSD6fT4ODg/L5fAqFQguxFwCAspAxwBUVFXrggQckSdPT05qenlZFRYUikYgCgYAkKRAIaGhoqLBLAQAoI1m9B5xKpfTCCy9ozZo1WrNmjZqbmxWLxeT1eiVJXq9X8Xi8oEMBACgnGd8DlqTKykr19/fr2rVreu2113T27NmcnszpdKiqqjKn3zsbj2dpXq+3mCQS1guA0uFy1cjjsV6RGZ8T52+hzjCrAH9u2bJlWr16tY4dOya3261oNCqv16toNCqXy5Xx9ycSyZyH3ovHs1RjY9fzes3FJB6vsJ4AlIx4fFJjY2nrGffF58T5y/cZ3i/mGV+CjsfjunbtmiTp5s2bOn78uBobG+X3+xUOhyVJ4XBYLS0teZoLAED5y/gVcDQa1datW5VKpZROp9XW1qbnnntO3/72t9XV1aUDBw6ovr5evb29C7EXAICykDHA3/rWt2a+0v0ip9Opvr6+gowCAKDccScsAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAxUWQ8oZ6mUNDJSYT1jVhcuFO82ACh3BLiARkYq9PNfn5KjNmk95Z5il1xyP2i9AgAWJwJcYI7apGqcU9Yz7ik54bCeAACLFu8BAwBggAADAGCAAAMAYIAAAwBggAADAGCAAAMAYIAAAwBggAADAGCAAAMAYIAAAwBggAADAGCAAAMAYIAAAwBggAADAGCAAAMAYIAAAwBggAADAGAgY4CvXLmiV155RevXr1d7e7v6+vokSePj4woGg2ptbVUwGNTExETBxwIAUC4yBriyslJbt27V+++/r/379+udd97RuXPnFAqF5PP5NDg4KJ/Pp1AotBB7AQAoCxkD7PV6tXLlSklSTU2NGhsbNTo6qkgkokAgIEkKBAIaGhoq7FIAAMrInN4DvnTpks6cOaPm5mbFYjF5vV5J/x/peDxekIEAAJSjqmw/cGpqSp2dndq+fbtqampyejKn06Gqqsqcfu9sPJ6leb1ePiUS1gsA5IvLVSOPx3pFZsX8ObFULNQZZhXgO3fuqLOzU88//7xaW1slSW63W9FoVF6vV9FoVC6XK+N1Eonk/Nb+F49nqcbGruf1mvkUj1dYTwCQJ/H4pMbG0tYz7qvYPyeWgnyf4f1invEl6HQ6rZ6eHjU2NioYDM487vf7FQ6HJUnhcFgtLS15mAoAwOKQ8Svgf/zjH+rv79ejjz6qF154QZLU3d2tjo4OdXV16cCBA6qvr1dvb2/BxwIAUC4yBvg73/mO/v3vf9/zv33+b4IBAMDccCcsAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAxUWQ8AgGKXvitduFBhPSMjl8t6AeaCAANABjeuO/S/+z+VozZpPWVWyQmH/rBrjZxO6yXIVsYAb9u2TUeOHJHb7dahQ4ckSePj49q0aZMuX76sFStWaM+ePaqtrS34WACw4qhNqsY5ZT0DZSTje8Avvvii9u7d+6XHQqGQfD6fBgcH5fP5FAqFCjYQAIBylDHATz755Fe+uo1EIgoEApKkQCCgoaGhwqwDAKBM5fQecCwWk9frlSR5vV7F4/G8jspGKiWdPSvF48X7jRGl8E0bAAAbC/pNWE6nQ1VVlXm51tmz0ivbjhf1N0XELrnkftB6BYDFxONZaj2h5C3UGeYUYLfbrWg0Kq/Xq2g0KleW3/ueSOQvlvF4RdF/U0RywmE9AcAiMzZ23XpCSfN4lub1DO8X85xuxOH3+xUOhyVJ4XBYLS0tuS0DAGCRyhjg7u5u/eAHP9D58+f17LPP6k9/+pM6Ojo0PDys1tZWDQ8Pq6OjYyG2AgBQNjK+BL179+57Pt7X15f3MQAALBbcCxoAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAA1XWAwAA85e+K50/L8XjFdZTZvXww2lVVlqvKB4EGADKwI3rDr0ZOi5HbdJ6yj0lJxzq3bJKTU1p6ylFgwADQJlw1CZV45yynoEs8R4wAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAYIMAAABggwAAAGCDAAAAbmFeCjR49q3bp1Wrt2rUKhUL42AQBQ9nIOcCqV0i9/+Uvt3btXAwMDOnTokM6dO5fPbQAAlK2cA3zq1Ck99NBDamhoUHV1tdrb2xWJRPK5DQCAslWV628cHR1VXV3dzK+XL1+uU6dO5WVUtpITjgV9vrm6cf1/rCfcV7Hvk9iYD8W+Tyr+jcW+Tyr+jckJhy5cqLCekZHHs3DPlXOA0+n0Vx6rqLj/4Xo8S3N9untcSzrx7pq8XQ8AACm/rbqfnF+Crqur09WrV2d+PTo6Kq/Xm5dRAACUu5wD/MQTT2hkZEQXL17U7du3NTAwIL/fn89tAACUrZxfgq6qqtKbb76pn/zkJ0qlUvr+97+vRx55JJ/bAAAoWxXpe72ZCwAACoo7YQEAYIAAAwBgoGQDzG0w5+7KlSt65ZVXtH79erW3t6uvr0+SND4+rmAwqNbWVgWDQU1MTBgvLX6pVEqBQECvvvqqJM5wrq5du6bOzk61tbVp/fr1OnnyJGc4R7///e/V3t6uDRs2qLu7W7du3eIMs7Bt2zb5fD5t2LBh5rH7ndvbb7+ttWvXat26dTp27Fhet5RkgLkNZm4qKyu1detWvf/++9q/f7/eeecdnTt3TqFQSD6fT4ODg/L5fPwfmizs27dPTU1NM7/mDOdmx44deuaZZ/TBBx+ov79fTU1NnOEcjI6Oat++fXr33Xd16NAhpVIpDQwMcIZZePHFF7V3794vPTbbuZ07d04DAwMaGBjQ3r179Ytf/EKpVCpvW0oywNwGMzder1crV66UJNXU1KixsVGjo6OKRCIKBAKSpEAgoKGhIcuZRe/q1as6cuSINm7cOPMYZ5i9yclJffzxxzPnV11drWXLlnGGc5RKpXTz5k1NT0/r5s2b8nq9nGEWnnzySdXW1n7psdnOLRKJqL29XdXV1WpoaNBDDz2U1zs+lmSA73UbzNHRUcNFpefSpUs6c+aMmpubFYvFZm6i4vV6FY/HjdcVt507d2rLli1asuQ/f304w+xdvHhRLpdL27ZtUyAQUE9Pj5LJJGc4B8uXL9ePfvQjPffcc3r66adVU1Ojp59+mjPM0WznVujWlGSAc7kNJv5jampKnZ2d2r59u2pqaqznlJQPP/xQLpdLjz/+uPWUkjU9Pa3Tp0/rhz/8ocLhsL7+9a/zUukcTUxMKBKJKBKJ6NixY7px44b6+/utZ5WdQremJAPMbTBzd+fOHXV2dur5559Xa2urJMntdisajUqSotGoXC6X5cSi9sknn+jw4cPy+/3q7u7WX//6V23evJkznIO6ujrV1dWpublZktTW1qbTp09zhnNw/PhxPfjgg3K5XPra176m1tZWnTx5kjPM0WznVujWlGSAuQ1mbtLptHp6etTY2KhgMDjzuN/vVzgcliSFw2G1tLRYTSx6r7/+uo4eParDhw9r9+7deuqpp/TWW29xhnPg8XhUV1enzz77TJL00UcfqampiTOcg29+85v69NNPdePGDaXTac5wnmY7N7/fr4GBAd2+fVsXL17UyMiIVq1albfnLdk7Yf3lL3/Rzp07Z26D+dOf/tR6UtH7+9//rpdfflmPPvrozPuX3d3dWrVqlbq6unTlyhXV19ert7dX3/jGN4zXFr8TJ07od7/7nd5++20lEgnOcA7OnDmjnp4e3blzRw0NDdq1a5fu3r3LGc7Bb37zG/35z39WVVWVHnvsMe3YsUNTU1OcYQbd3d3629/+pkQiIbfbrZ/97Gf63ve+N+u5/fa3v9W7776ryspKbd++Xd/97nfztqVkAwwAQCkryZegAQAodQQYAAADBBgAAAMEGAAAAwQYAAADBBgAAAMEGAAAAwQYAAAD/wf3CEQ7NxguzgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 576x396 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "plt.style.use(\"seaborn\")\n",
    "plt.hist(df.寿命, range=(0, 100), edgecolor=\"blue\")  # 直方图,范围0至100(默认为最小值到最大值)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas 的功能非常丰富,想更深入地了解请查看官方文档 http://pandas.pydata.org/pandas-docs/stable/\n",
    "\n",
    "——编程原来是这样……"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7",
   "language": "python",
   "name": "python3.7"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}