{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "pandas含有使数据分析工作变得更快更简单的高级数据结构和操作工具。pandas基于NumPy构建,让以NumPy为中心的应用变得更加简单。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "from pandas import Series, DataFrame\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##1. Pandas的数据结构\n", "pandas的两个主要数据结构是:Series和DataFrame。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###1.1 Series\n", "Series是一种类似于一维数组的对象,它由一组**数据**(各种NumPy数据类型)以及一组与之相关的**数据索引**组成。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####1. Series的构建" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 4\n", "1 7\n", "2 -5\n", "3 3\n", "dtype: int64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj = Series([4, 7, -5, 3])\n", "obj\n", "# Series的字符串表现形式为:索引在左边,值在右边。\n", "# 由于我们没有为数据指定索引,于是会自动创建一个0到N-1的整数索引" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 4, 7, -5, 3], dtype=int64)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 获取Series的values和index属性\n", "obj.values" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Int64Index([0, 1, 2, 3], dtype='int64')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj.index" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "d 4\n", "b 7\n", "a -5\n", "c 3\n", "dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 创建Series带有可以对各个数据点进行标记的索引\n", "obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])\n", "obj2" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index([u'd', u'b', u'a', u'c'], dtype='object')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj2.index" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "-5" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj2['a']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####2. NumPy数组运算" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "d 4\n", "b 7\n", "a -5\n", "c 3\n", "dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj2" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "d 4\n", "b 7\n", "c 3\n", "dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 布尔表达式过滤\n", "obj2[obj2 > 0]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "d 8\n", "b 14\n", "a -10\n", "c 6\n", "dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 标量乘法\n", "obj2 * 2" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "d 54.598150\n", "b 1096.633158\n", "a 0.006738\n", "c 20.085537\n", "dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 应用数学函数\n", "np.exp(obj2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "将Series看成是一个定长的有序字典,因为它是索引值到数据值的一个映射。" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'b' in obj2" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'e' in obj2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####3. 通过Python字典创建Series" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Ohio 35000\n", "Oregon 16000\n", "Texas 71000\n", "Utah 5000\n", "dtype: int64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}\n", "# 传入Python字典,原字典的键成为Series的索引\n", "obj3 = Series(sdata)\n", "obj3" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "California NaN\n", "Ohio 35000\n", "Oregon 16000\n", "Texas 71000\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sindex = ['California', 'Ohio', 'Oregon', 'Texas']\n", "obj4 = Series(sdata, index=sindex)\n", "# sdata中跟states索引项匹配的值会被找出来并放到相应的位置上\n", "obj4" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "California True\n", "Ohio False\n", "Oregon False\n", "Texas False\n", "dtype: bool" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj4.isnull()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####4. Series自动对齐" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "California NaN\n", "Ohio 70000\n", "Oregon 32000\n", "Texas 142000\n", "Utah NaN\n", "dtype: float64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj3 + obj4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####5. Series的name属性" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "state\n", "California NaN\n", "Ohio 35000\n", "Oregon 16000\n", "Texas 71000\n", "Name: population, dtype: float64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj4.name = 'population'\n", "obj4.index.name = 'state'\n", "obj4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####6. 修改Series的索引" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Bob 4\n", "Steve 7\n", "Jeff -5\n", "Ryan 3\n", "dtype: int64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']\n", "obj" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###1.2 DataFrame\n", "DataFrame是一个表格型的数据结构,它含有一组有序的列,每列可以是不同的值类型。DataFrame既有行索引也有列索引,它可以被看做由Series组成的字典(功用同一个索引)。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####1. 构建DataFrame\n", "最常用是直接传入一个由等长列表或NumPy数组组成的字典" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
popstateyear
0 1.5 Ohio 2000
1 1.7 Ohio 2001
2 3.6 Ohio 2002
3 2.4 Nevada 2001
4 2.9 Nevada 2002
\n", "
" ], "text/plain": [ " pop state year\n", "0 1.5 Ohio 2000\n", "1 1.7 Ohio 2001\n", "2 3.6 Ohio 2002\n", "3 2.4 Nevada 2001\n", "4 2.9 Nevada 2002" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],\n", " 'year': [2000, 2001, 2002, 2001, 2002],\n", " 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}\n", "frame = DataFrame(data)\n", "frame\n", "# DataFrame会自动加上索引,且全部被有序排列" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearstatepop
0 2000 Ohio 1.5
1 2001 Ohio 1.7
2 2002 Ohio 3.6
3 2001 Nevada 2.4
4 2002 Nevada 2.9
\n", "
" ], "text/plain": [ " year state pop\n", "0 2000 Ohio 1.5\n", "1 2001 Ohio 1.7\n", "2 2002 Ohio 3.6\n", "3 2001 Nevada 2.4\n", "4 2002 Nevada 2.9" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 如果指定列序列,则DataFrame的列就会按照指定顺序进行排列\n", "DataFrame(data, columns=['year', 'state', 'pop'])" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearstatepopdebt
one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7 NaN
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4 NaN
five 2002 Nevada 2.9 NaN
\n", "
" ], "text/plain": [ " year state pop debt\n", "one 2000 Ohio 1.5 NaN\n", "two 2001 Ohio 1.7 NaN\n", "three 2002 Ohio 3.6 NaN\n", "four 2001 Nevada 2.4 NaN\n", "five 2002 Nevada 2.9 NaN" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 如果传入的列在数据中找不到,就会产生NA值\n", "frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],\n", " index=['one', 'two', 'three', 'four', 'five'])\n", "frame2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####2. 对DataFrame的行和列的操作" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "通过类似字典标记的方式或属性的方式,可以将DataFrame的列获取为一个Series" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "one Ohio\n", "two Ohio\n", "three Ohio\n", "four Nevada\n", "five Nevada\n", "Name: state, dtype: object" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame2['state']" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "one 2000\n", "two 2001\n", "three 2002\n", "four 2001\n", "five 2002\n", "Name: year, dtype: int64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame2.year" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "返回的Series拥有原DataFrame相同的索引,且其name属性已经被设置好了" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "用索引字段ix可以获得DataFrame的一行" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "year 2002\n", "state Ohio\n", "pop 3.6\n", "debt NaN\n", "Name: three, dtype: object" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame2.ix['three']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "列可以通过赋值的方式进行修改" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "frame2['debt'] = 16.5" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearstatepopdebt
one 2000 Ohio 1.5 16.5
two 2001 Ohio 1.7 16.5
three 2002 Ohio 3.6 16.5
four 2001 Nevada 2.4 16.5
five 2002 Nevada 2.9 16.5
\n", "
" ], "text/plain": [ " year state pop debt\n", "one 2000 Ohio 1.5 16.5\n", "two 2001 Ohio 1.7 16.5\n", "three 2002 Ohio 3.6 16.5\n", "four 2001 Nevada 2.4 16.5\n", "five 2002 Nevada 2.9 16.5" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "将列表或数组赋值给某个列时,其长度必须跟DataFrame的长度相匹配。如果赋值的是一个Series,就会精确匹配DataFrame的索引" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearstatepopdebt
one 2000 Ohio 1.5 1
two 2001 Ohio 1.7 2
three 2002 Ohio 3.6 3
four 2001 Nevada 2.4 4
five 2002 Nevada 2.9 5
\n", "
" ], "text/plain": [ " year state pop debt\n", "one 2000 Ohio 1.5 1\n", "two 2001 Ohio 1.7 2\n", "three 2002 Ohio 3.6 3\n", "four 2001 Nevada 2.4 4\n", "five 2002 Nevada 2.9 5" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame2['debt'] = [1, 2, 3, 4, 5]\n", "frame2" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearstatepopdebt
one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7-1.2
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4-1.5
five 2002 Nevada 2.9-1.7
\n", "
" ], "text/plain": [ " year state pop debt\n", "one 2000 Ohio 1.5 NaN\n", "two 2001 Ohio 1.7 -1.2\n", "three 2002 Ohio 3.6 NaN\n", "four 2001 Nevada 2.4 -1.5\n", "five 2002 Nevada 2.9 -1.7" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val = Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])\n", "frame2['debt'] = val\n", "frame2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "为不存在的列赋值会创建出一个新列" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "frame2['eastern'] = frame2.state == 'Ohio'" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearstatepopdebteastern
one 2000 Ohio 1.5 NaN True
two 2001 Ohio 1.7-1.2 True
three 2002 Ohio 3.6 NaN True
four 2001 Nevada 2.4-1.5 False
five 2002 Nevada 2.9-1.7 False
\n", "
" ], "text/plain": [ " year state pop debt eastern\n", "one 2000 Ohio 1.5 NaN True\n", "two 2001 Ohio 1.7 -1.2 True\n", "three 2002 Ohio 3.6 NaN True\n", "four 2001 Nevada 2.4 -1.5 False\n", "five 2002 Nevada 2.9 -1.7 False" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "关键字del用于删除列" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "del frame2['eastern']" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index([u'year', u'state', u'pop', u'debt'], dtype='object')" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame2.columns" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearstatepopdebt
one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7-1.2
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4-1.5
five 2002 Nevada 2.9-1.7
\n", "
" ], "text/plain": [ " year state pop debt\n", "one 2000 Ohio 1.5 NaN\n", "two 2001 Ohio 1.7 -1.2\n", "three 2002 Ohio 3.6 NaN\n", "four 2001 Nevada 2.4 -1.5\n", "five 2002 Nevada 2.9 -1.7" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "通过索引方式返回的列只是相应数据的视图而已,并不是副本。对返回的Series所做的任何修改都会反映到原DataFrame上。通过Series的copy方法即可显式地复制列。" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "####3. 传给DataFrame嵌套字典\n", "如果数据形式是嵌套字典(字典的字典),将它传给DataFrame,它会被解释为:外层的键作为列,内层的键则作为行索引。" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NevadaOhio
2000 NaN 1.5
2001 2.4 1.7
2002 2.9 3.6
\n", "
" ], "text/plain": [ " Nevada Ohio\n", "2000 NaN 1.5\n", "2001 2.4 1.7\n", "2002 2.9 3.6" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop = {'Nevada': {2001: 2.4, 2002: 2.9},\n", " 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}\n", "frame3 = DataFrame(pop)\n", "frame3" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
200020012002
Nevada NaN 2.4 2.9
Ohio 1.5 1.7 3.6
\n", "
" ], "text/plain": [ " 2000 2001 2002\n", "Nevada NaN 2.4 2.9\n", "Ohio 1.5 1.7 3.6" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 对结果进行转置\n", "frame3.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以输入给DataFrame构造器的数据:\n", "- 二维ndarray: 数据矩阵\n", "- 由数组、列表或元组组成的字典: 每个序列会变成DataFrame的一列。所有序列的长度必须相同\n", "- NumPy的结构化/记录数组: 类似于 有数组组成的字典\n", "- 由Series组成的字典\n", "- 由字典组成的字典\n", "- 字典或Series的列表\n", "- 由列表或元组组成的列表\n", "- 另一个DataFrame\n", "- NumPy的MaskedArray" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####4. DataFrame的属性" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stateNevadaOhio
year
2000 NaN 1.5
2001 2.4 1.7
2002 2.9 3.6
\n", "
" ], "text/plain": [ "state Nevada Ohio\n", "year \n", "2000 NaN 1.5\n", "2001 2.4 1.7\n", "2002 2.9 3.6" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 设置DataFrame的index和columns的name属性,并显示出来\n", "frame3.index.name = 'year'\n", "frame3.columns.name = 'state'\n", "frame3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "DataFrame的values属性会以二维ndarray的形式返回DataFrame中的数据" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[ nan, 1.5],\n", " [ 2.4, 1.7],\n", " [ 2.9, 3.6]])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame3.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##2. 索引对象\n", "pandas的索引对象负责轴标签和其他元数据(比如轴名称等),构建Series或DataFrame时, 所用到的任何数组或其他序列的标签都会被转换成一个Index。Index对象是" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index([u'a', u'b', u'c'], dtype='object')" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj = Series(range(3), index=['a','b','c'])\n", "index = obj.index\n", "index" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index([u'b', u'c'], dtype='object')" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "index[1:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.5" } }, "nbformat": 4, "nbformat_minor": 0 }