{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "pandas含有使数据分析工作变得更快更简单的高级数据结构和操作工具。pandas基于NumPy构建，让以NumPy为中心的应用变得更加简单。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from pandas import Series, DataFrame\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##1. Pandas的数据结构\n",
    "pandas的两个主要数据结构是：Series和DataFrame。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###1.1 Series\n",
    "Series是一种类似于一维数组的对象，它由一组**数据**（各种NumPy数据类型）以及一组与之相关的**数据索引**组成。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####1. Series的构建"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    4\n",
       "1    7\n",
       "2   -5\n",
       "3    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj = Series([4, 7, -5, 3])\n",
    "obj\n",
    "# Series的字符串表现形式为：索引在左边，值在右边。\n",
    "# 由于我们没有为数据指定索引，于是会自动创建一个0到N-1的整数索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 4,  7, -5,  3], dtype=int64)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 获取Series的values和index属性\n",
    "obj.values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Int64Index([0, 1, 2, 3], dtype='int64')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "d    4\n",
       "b    7\n",
       "a   -5\n",
       "c    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 创建Series带有可以对各个数据点进行标记的索引\n",
    "obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])\n",
    "obj2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'd', u'b', u'a', u'c'], dtype='object')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj2.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-5"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj2['a']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####2. NumPy数组运算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "d    4\n",
       "b    7\n",
       "a   -5\n",
       "c    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "d    4\n",
       "b    7\n",
       "c    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 布尔表达式过滤\n",
    "obj2[obj2 > 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "d     8\n",
       "b    14\n",
       "a   -10\n",
       "c     6\n",
       "dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 标量乘法\n",
    "obj2 * 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "d      54.598150\n",
       "b    1096.633158\n",
       "a       0.006738\n",
       "c      20.085537\n",
       "dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 应用数学函数\n",
    "np.exp(obj2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将Series看成是一个定长的有序字典，因为它是索引值到数据值的一个映射。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'b' in obj2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'e' in obj2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####3. 通过Python字典创建Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ohio      35000\n",
       "Oregon    16000\n",
       "Texas     71000\n",
       "Utah       5000\n",
       "dtype: int64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}\n",
    "# 传入Python字典，原字典的键成为Series的索引\n",
    "obj3 = Series(sdata)\n",
    "obj3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California      NaN\n",
       "Ohio          35000\n",
       "Oregon        16000\n",
       "Texas         71000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sindex = ['California', 'Ohio', 'Oregon', 'Texas']\n",
    "obj4 = Series(sdata, index=sindex)\n",
    "# sdata中跟states索引项匹配的值会被找出来并放到相应的位置上\n",
    "obj4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California     True\n",
       "Ohio          False\n",
       "Oregon        False\n",
       "Texas         False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj4.isnull()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####4. Series自动对齐"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "California       NaN\n",
       "Ohio           70000\n",
       "Oregon         32000\n",
       "Texas         142000\n",
       "Utah             NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj3 + obj4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####5. Series的name属性"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "state\n",
       "California      NaN\n",
       "Ohio          35000\n",
       "Oregon        16000\n",
       "Texas         71000\n",
       "Name: population, dtype: float64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj4.name = 'population'\n",
    "obj4.index.name = 'state'\n",
    "obj4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####6. 修改Series的索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Bob      4\n",
       "Steve    7\n",
       "Jeff    -5\n",
       "Ryan     3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']\n",
    "obj"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###1.2 DataFrame\n",
    "DataFrame是一个表格型的数据结构，它含有一组有序的列，每列可以是不同的值类型。DataFrame既有行索引也有列索引，它可以被看做由Series组成的字典（功用同一个索引）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####1. 构建DataFrame\n",
    "最常用是直接传入一个由等长列表或NumPy数组组成的字典"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pop</th>\n",
       "      <th>state</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1.5</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1.7</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 2001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 3.6</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 2002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 2.4</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 2.9</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2002</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   pop   state  year\n",
       "0  1.5    Ohio  2000\n",
       "1  1.7    Ohio  2001\n",
       "2  3.6    Ohio  2002\n",
       "3  2.4  Nevada  2001\n",
       "4  2.9  Nevada  2002"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],\n",
    "        'year': [2000, 2001, 2002, 2001, 2002],\n",
    "        'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}\n",
    "frame = DataFrame(data)\n",
    "frame\n",
    "# DataFrame会自动加上索引，且全部被有序排列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>state</th>\n",
       "      <th>pop</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 2000</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 2001</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2002</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 3.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 2001</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 2002</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   year   state  pop\n",
       "0  2000    Ohio  1.5\n",
       "1  2001    Ohio  1.7\n",
       "2  2002    Ohio  3.6\n",
       "3  2001  Nevada  2.4\n",
       "4  2002  Nevada  2.9"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 如果指定列序列，则DataFrame的列就会按照指定顺序进行排列\n",
    "DataFrame(data, columns=['year', 'state', 'pop'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>state</th>\n",
       "      <th>pop</th>\n",
       "      <th>debt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td> 2000</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.5</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td> 2001</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.7</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td> 2002</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 3.6</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td> 2001</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.4</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td> 2002</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.9</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       year   state  pop debt\n",
       "one    2000    Ohio  1.5  NaN\n",
       "two    2001    Ohio  1.7  NaN\n",
       "three  2002    Ohio  3.6  NaN\n",
       "four   2001  Nevada  2.4  NaN\n",
       "five   2002  Nevada  2.9  NaN"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 如果传入的列在数据中找不到，就会产生NA值\n",
    "frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],\n",
    "         index=['one', 'two', 'three', 'four', 'five'])\n",
    "frame2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####2. 对DataFrame的行和列的操作"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过类似字典标记的方式或属性的方式，可以将DataFrame的列获取为一个Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "one        Ohio\n",
       "two        Ohio\n",
       "three      Ohio\n",
       "four     Nevada\n",
       "five     Nevada\n",
       "Name: state, dtype: object"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2['state']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "one      2000\n",
       "two      2001\n",
       "three    2002\n",
       "four     2001\n",
       "five     2002\n",
       "Name: year, dtype: int64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.year"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "返回的Series拥有原DataFrame相同的索引，且其name属性已经被设置好了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "用索引字段ix可以获得DataFrame的一行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "year     2002\n",
       "state    Ohio\n",
       "pop       3.6\n",
       "debt      NaN\n",
       "Name: three, dtype: object"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.ix['three']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "列可以通过赋值的方式进行修改"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "frame2['debt'] = 16.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>state</th>\n",
       "      <th>pop</th>\n",
       "      <th>debt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td> 2000</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.5</td>\n",
       "      <td> 16.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td> 2001</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.7</td>\n",
       "      <td> 16.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td> 2002</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 3.6</td>\n",
       "      <td> 16.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td> 2001</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.4</td>\n",
       "      <td> 16.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td> 2002</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.9</td>\n",
       "      <td> 16.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       year   state  pop  debt\n",
       "one    2000    Ohio  1.5  16.5\n",
       "two    2001    Ohio  1.7  16.5\n",
       "three  2002    Ohio  3.6  16.5\n",
       "four   2001  Nevada  2.4  16.5\n",
       "five   2002  Nevada  2.9  16.5"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将列表或数组赋值给某个列时，其长度必须跟DataFrame的长度相匹配。如果赋值的是一个Series，就会精确匹配DataFrame的索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>state</th>\n",
       "      <th>pop</th>\n",
       "      <th>debt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td> 2000</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.5</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td> 2001</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.7</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td> 2002</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 3.6</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td> 2001</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.4</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td> 2002</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.9</td>\n",
       "      <td> 5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       year   state  pop  debt\n",
       "one    2000    Ohio  1.5     1\n",
       "two    2001    Ohio  1.7     2\n",
       "three  2002    Ohio  3.6     3\n",
       "four   2001  Nevada  2.4     4\n",
       "five   2002  Nevada  2.9     5"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2['debt'] = [1, 2, 3, 4, 5]\n",
    "frame2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>state</th>\n",
       "      <th>pop</th>\n",
       "      <th>debt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td> 2000</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.5</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td> 2001</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.7</td>\n",
       "      <td>-1.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td> 2002</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 3.6</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td> 2001</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.4</td>\n",
       "      <td>-1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td> 2002</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.9</td>\n",
       "      <td>-1.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       year   state  pop  debt\n",
       "one    2000    Ohio  1.5   NaN\n",
       "two    2001    Ohio  1.7  -1.2\n",
       "three  2002    Ohio  3.6   NaN\n",
       "four   2001  Nevada  2.4  -1.5\n",
       "five   2002  Nevada  2.9  -1.7"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "val = Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])\n",
    "frame2['debt'] = val\n",
    "frame2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为不存在的列赋值会创建出一个新列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "frame2['eastern'] = frame2.state == 'Ohio'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>state</th>\n",
       "      <th>pop</th>\n",
       "      <th>debt</th>\n",
       "      <th>eastern</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td> 2000</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.5</td>\n",
       "      <td> NaN</td>\n",
       "      <td>  True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td> 2001</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.7</td>\n",
       "      <td>-1.2</td>\n",
       "      <td>  True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td> 2002</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 3.6</td>\n",
       "      <td> NaN</td>\n",
       "      <td>  True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td> 2001</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.4</td>\n",
       "      <td>-1.5</td>\n",
       "      <td> False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td> 2002</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.9</td>\n",
       "      <td>-1.7</td>\n",
       "      <td> False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       year   state  pop  debt eastern\n",
       "one    2000    Ohio  1.5   NaN    True\n",
       "two    2001    Ohio  1.7  -1.2    True\n",
       "three  2002    Ohio  3.6   NaN    True\n",
       "four   2001  Nevada  2.4  -1.5   False\n",
       "five   2002  Nevada  2.9  -1.7   False"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "关键字del用于删除列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "del frame2['eastern']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'year', u'state', u'pop', u'debt'], dtype='object')"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>state</th>\n",
       "      <th>pop</th>\n",
       "      <th>debt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td> 2000</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.5</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td> 2001</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 1.7</td>\n",
       "      <td>-1.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td> 2002</td>\n",
       "      <td>   Ohio</td>\n",
       "      <td> 3.6</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td> 2001</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.4</td>\n",
       "      <td>-1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td> 2002</td>\n",
       "      <td> Nevada</td>\n",
       "      <td> 2.9</td>\n",
       "      <td>-1.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       year   state  pop  debt\n",
       "one    2000    Ohio  1.5   NaN\n",
       "two    2001    Ohio  1.7  -1.2\n",
       "three  2002    Ohio  3.6   NaN\n",
       "four   2001  Nevada  2.4  -1.5\n",
       "five   2002  Nevada  2.9  -1.7"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过索引方式返回的列只是相应数据的视图而已，并不是副本。对返回的Series所做的任何修改都会反映到原DataFrame上。通过Series的copy方法即可显式地复制列。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "####3. 传给DataFrame嵌套字典\n",
    "如果数据形式是嵌套字典（字典的字典），将它传给DataFrame，它会被解释为：外层的键作为列，内层的键则作为行索引。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nevada</th>\n",
       "      <th>Ohio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td> NaN</td>\n",
       "      <td> 1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2001</th>\n",
       "      <td> 2.4</td>\n",
       "      <td> 1.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2002</th>\n",
       "      <td> 2.9</td>\n",
       "      <td> 3.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Nevada  Ohio\n",
       "2000     NaN   1.5\n",
       "2001     2.4   1.7\n",
       "2002     2.9   3.6"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pop = {'Nevada': {2001: 2.4, 2002: 2.9},\n",
    "      'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}\n",
    "frame3 = DataFrame(pop)\n",
    "frame3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2000</th>\n",
       "      <th>2001</th>\n",
       "      <th>2002</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Nevada</th>\n",
       "      <td> NaN</td>\n",
       "      <td> 2.4</td>\n",
       "      <td> 2.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td> 1.5</td>\n",
       "      <td> 1.7</td>\n",
       "      <td> 3.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        2000  2001  2002\n",
       "Nevada   NaN   2.4   2.9\n",
       "Ohio     1.5   1.7   3.6"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 对结果进行转置\n",
    "frame3.T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以输入给DataFrame构造器的数据：\n",
    "- 二维ndarray： 数据矩阵\n",
    "- 由数组、列表或元组组成的字典： 每个序列会变成DataFrame的一列。所有序列的长度必须相同\n",
    "- NumPy的结构化/记录数组： 类似于 有数组组成的字典\n",
    "- 由Series组成的字典\n",
    "- 由字典组成的字典\n",
    "- 字典或Series的列表\n",
    "- 由列表或元组组成的列表\n",
    "- 另一个DataFrame\n",
    "- NumPy的MaskedArray"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####4. DataFrame的属性"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>state</th>\n",
       "      <th>Nevada</th>\n",
       "      <th>Ohio</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td> NaN</td>\n",
       "      <td> 1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2001</th>\n",
       "      <td> 2.4</td>\n",
       "      <td> 1.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2002</th>\n",
       "      <td> 2.9</td>\n",
       "      <td> 3.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "state  Nevada  Ohio\n",
       "year               \n",
       "2000      NaN   1.5\n",
       "2001      2.4   1.7\n",
       "2002      2.9   3.6"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 设置DataFrame的index和columns的name属性，并显示出来\n",
    "frame3.index.name = 'year'\n",
    "frame3.columns.name = 'state'\n",
    "frame3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame的values属性会以二维ndarray的形式返回DataFrame中的数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ nan,  1.5],\n",
       "       [ 2.4,  1.7],\n",
       "       [ 2.9,  3.6]])"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame3.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##2. 索引对象\n",
    "pandas的索引对象负责轴标签和其他元数据(比如轴名称等)，构建Series或DataFrame时， 所用到的任何数组或其他序列的标签都会被转换成一个Index。Index对象是"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'a', u'b', u'c'], dtype='object')"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj = Series(range(3), index=['a','b','c'])\n",
    "index = obj.index\n",
    "index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index([u'b', u'c'], dtype='object')"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "index[1:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}