{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CHAPTER 4 \n", "# NumPy Basics: Arrays and Vectorized Computation\n", "\n", "在数值计算领域,说Numpy是python最重要的包也不为过。在numpy中有下面这些东西:\n", "\n", "- ndarray, 一个有效的多维数组,能提供以数组为导向的快速数值计算和灵活的广播功能(broadcasting)\n", "\n", "- 便利的数学函数\n", "\n", "- 用于读取/写入(reading/writing)数据到磁盘的便利工具\n", "\n", "- 线性代数,随机数生成,傅里叶变换能力\n", "\n", "- 可以用C API来写C,C++,或FORTRAN\n", "\n", "通过学习理解numpy中数组和数组导向计算,能帮我们理解pandas之类的工具。\n", "\n", "# 4.1 The NumPy ndarray: A Multidimensional Array Object(ndarray: 多维数组对象)\n", "\n", "N-dimensional array object(n维数组对象), or ndarray,这是numpy的关键特征。先来尝试一下,生成一个随机数组:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Generate some random data\n", "data = np.random.randn(2, 3)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.35512366, -0.63779545, 0.14137933],\n", " [ 0.36642056, 0.30898139, -0.87040292]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "进行一些数学运算:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-3.55123655, -6.37795453, 1.41379333],\n", " [ 3.66420556, 3.0898139 , -8.70402916]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data * 10" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.71024731, -1.27559091, 0.28275867],\n", " [ 0.73284111, 0.61796278, -1.74080583]])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data + data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "每一个数组都有一个shape,来表示维度大小。而dtype,用来表示data type:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 3)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.shape" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1 Greating ndarrays (创建n维数组)\n", "\n", "最简单的方法使用array函数,输入一个序列即可,比如list:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 6. , 7.5, 8. , 0. , 1. ])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data1 = [6, 7.5, 8, 0, 1]\n", "arr1 = np.array(data1)\n", "arr1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "嵌套序列能被转换为多维数组:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3, 4],\n", " [5, 6, 7, 8]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]\n", "arr2 = np.array(data2)\n", "arr2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "因为data2是一个list of lists, 所以arr2维度为2。我们能用ndim和shape属性来确认一下:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2.ndim" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 4)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "除非主动声明,否则np.array会自动给data搭配适合的类型,并保存在dtype里:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr1.dtype" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "除了np.array,还有一些其他函数能创建数组。比如zeros,ones,另外还可以在一个tuple里指定shape:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros(10)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0., 0.]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros((3, 6))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0.00000000e+000, 0.00000000e+000],\n", " [ 2.16538378e-314, 2.16514681e-314],\n", " [ 2.16511832e-314, 2.16072529e-314]],\n", "\n", " [[ 0.00000000e+000, 0.00000000e+000],\n", " [ 2.14037397e-314, 6.36598737e-311],\n", " [ 0.00000000e+000, 0.00000000e+000]]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.empty((2, 3, 2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "np.empty并不能保证返回所有是0的数组,某些情况下,会返回为初始化的垃圾数值,比如上面。\n", "\n", "arange是一个数组版的python range函数:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里是一些创建数组的函数:\n", "\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午1.04.36.png)\n", "\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午1.04.36_cn.png)\n", "\n", "# 2 Data Types for ndarrays\n", "\n", "dtype保存数据的类型:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr1 = np.array([1, 2, 3], dtype=np.float64)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr2 = np.array([1, 2, 3], dtype=np.int32)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr1.dtype" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int32')" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "dtype才是numpy能灵活处理其他外界数据的原因。\n", "\n", "类型表格:\n", "\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午1.15.52.png)\n", "\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午1.15.52_cn.png)\n", "\n", "可以用astype来转换类型:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([1, 2, 3, 4, 5])\n", "arr.dtype" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "float_arr = arr.astype(np.float64)\n", "float_arr.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上面是把int变为float。如果是把float变为int,小数点后的部分会被丢弃:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 3.7, -1.2, -2.6, 0.5, 12.9, 10.1])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])\n", "arr" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 3, -1, -2, 0, 12, 10], dtype=int32)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr.astype(np.int32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还可以用astype把string里的数字变为实际的数字:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([b'1.25', b'-9.6', b'42'], \n", " dtype='|S4')" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)\n", "numeric_strings" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1.25, -9.6 , 42. ])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numeric_strings.astype(float)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "要十分注意`numpy.string_`类型,这种类型的长度是固定的,所以可能会直接截取部分输入而不给警告。\n", "\n", "如果转换(casting)失败的话,会给出一个ValueError提示。\n", "\n", "可以用其他数组的dtype直接来制定类型:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "int_array = np.arange(10)\n", "\n", "calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "int_array.astype(calibers.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还可以利用类型的缩写,比如u4就代表unit32:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "empty_unit32 = np.empty(8, dtype='u4')\n", "empty_unit32" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "记住,astype总是会返回一个新的数组\n", "\n", "# 3 Arithmetic with NumPy Arrays(数组计算)\n", "\n", "数组之所以重要,是因为不用写for循环就能表达很多操作,这种特性叫做vectorization(向量化)。任何两个大小相等的数组之间的运算,都是element-wise(点对点):" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr = np.array([[1., 2., 3.], [4., 5., 6.]])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 2., 3.],\n", " [ 4., 5., 6.]])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 4., 9.],\n", " [ 16., 25., 36.]])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr * arr" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 0., 0.],\n", " [ 0., 0., 0.]])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr - arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "element-wise 我翻译为点对点,就是指两个数组的运算,在同一位置的元素间才会进行运算。\n", "\n", "这种算数操作如果涉及标量(scalar)的话,会涉及到数组的每一个元素:\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1. , 0.5 , 0.33333333],\n", " [ 0.25 , 0.2 , 0.16666667]])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 / arr" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1. , 1.41421356, 1.73205081],\n", " [ 2. , 2.23606798, 2.44948974]])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr ** 0.5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "两个数组的比较会产生布尔数组:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 4., 1.],\n", " [ 7., 2., 12.]])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])\n", "arr2" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[False, True, False],\n", " [ True, False, True]], dtype=bool)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2 > arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4 Basic Indexing and Slicing(基本的索引和切片)\n", "\n", "一维的我们之前已经在list部分用过了,没什么不同:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.arange(10)\n", "arr" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr[5]" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5, 6, 7])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr[5:8]" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr[5:8] = 12" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里把12赋给`arr[5:8]`,其实用到了broadcasted(我觉得应该翻译为广式转变)。这里有一个比较重要的概念需要区分,python内建的list与numpy的array有个明显的区别,这里array的切片后的结果只是一个views(视图),用来代表原有array对应的元素,而不是创建了一个新的array。但list里的切片是产生了一个新的list:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([12, 12, 12])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr_slice = arr[5:8]\n", "arr_slice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果我们改变arr_slice的值,会反映在原始的数组arr上:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr_slice[1] = 12345" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 12, 12345, 12, 8, 9])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`[:]`这个赋值给所有元素:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr_slice[:] = 64" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 64, 64, 64, 8, 9])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "之所以这样设计是出于性能和内存的考虑,毕竟如果总是复制数据的话,会很影响运算时间。当然如果想要复制,可以使用copy()方法,比如`arr[5:8].copy()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在一个二维数组里,单一的索引指代的是一维的数组:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([7, 8, 9])" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n", "arr2d[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "有两种方式可以访问单一元素:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[0][2]" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[0, 2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以把下图中的axis0看做row(行),把axis1看做column(列):\n", "\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午2.08.18.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于多维数组,如果省略后面的索引,返回的将是一个低纬度的多维数组。比如下面一个2 x 2 x 3数组:" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 1, 2, 3],\n", " [ 4, 5, 6]],\n", "\n", " [[ 7, 8, 9],\n", " [10, 11, 12]]])" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])\n", "arr3d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "arr3d[0]是一个2x3数组:" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr3d[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "标量和数组都能赋给arr3d[0]:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[42, 42, 42],\n", " [42, 42, 42]],\n", "\n", " [[ 7, 8, 9],\n", " [10, 11, 12]]])" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "old_values = arr3d[0].copy()\n", "\n", "arr3d[0] = 42\n", "\n", "arr3d" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 1, 2, 3],\n", " [ 4, 5, 6]],\n", "\n", " [[ 7, 8, 9],\n", " [10, 11, 12]]])" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr3d[0] = old_values\n", "arr3d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`arr3d[1, 0]`会给你一个(1, 0)的一维数组:" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([7, 8, 9])" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr3d[1, 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上面的一步等于下面的两步:" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 7, 8, 9],\n", " [10, 11, 12]])" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = arr3d[1]\n", "x" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([7, 8, 9])" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "一定要牢记这些切片后返回的数组都是views\n", "\n", "## Indexing with slices(用切片索引)\n", "\n", "一维的话和python里的list没什么差别:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 64, 64, 64, 8, 9])" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, 2, 3, 4, 64])" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr[1:6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "二维的话,数组的切片有点不同:" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以看到,切片是沿着axis 0(行)来处理的。所以,数组中的切片,是要沿着设置的axis来处理的。我们可以把arr2d[:2]理解为“选中arr2d的前两行”。\n", "\n", "当然,给定多个索引后,也可以使用复数切片:" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6],\n", " [7, 8, 9]])" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2, 3],\n", " [5, 6]])" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[:2, 1:] # 前两行,第二列之后" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "记住,选中的是array view。通过混合整数和切片,能做低维切片。比如,我们选中第二行的前两列:" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([4, 5])" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[1, :2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "选中第三列的前两行:" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([3, 6])" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[:2, 2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "冒号表示提取整个axis(轴):" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1],\n", " [4],\n", " [7]])" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[:, :1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "看图示有助于理解:\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午2.41.52.png)\n", "\n", "赋值也很方便:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0],\n", " [4, 0, 0],\n", " [7, 8, 9]])" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[:2, 1:] = 0\n", "arr2d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5 Boolean Indexing (布尔索引)\n", "\n", "假设我们的数组数据里有一些重复。这里我们用numpy.random里的randn函数来随机生成一些离散数据:" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], \n", " dtype='