{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CHAPTER 4 \n", "# NumPy Basics: Arrays and Vectorized Computation\n", "\n", "在数值计算领域，说Numpy是python最重要的包也不为过。在numpy中有下面这些东西：\n", "\n", "- ndarray, 一个有效的多维数组，能提供以数组为导向的快速数值计算和灵活的广播功能（broadcasting）\n", "\n", "- 便利的数学函数\n", "\n", "- 用于读取/写入(reading/writing)数据到磁盘的便利工具\n", "\n", "- 线性代数，随机数生成，傅里叶变换能力\n", "\n", "- 可以用C API来写C，C++，或FORTRAN\n", "\n", "通过学习理解numpy中数组和数组导向计算，能帮我们理解pandas之类的工具。\n", "\n", "# 4.1 The NumPy ndarray: A Multidimensional Array Object（ndarray: 多维数组对象）\n", "\n", "N-dimensional array object（n维数组对象）, or ndarray，这是numpy的关键特征。先来尝试一下，生成一个随机数组：" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Generate some random data\n", "data = np.random.randn(2, 3)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.35512366, -0.63779545, 0.14137933],\n", " [ 0.36642056, 0.30898139, -0.87040292]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "进行一些数学运算：" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-3.55123655, -6.37795453, 1.41379333],\n", " [ 3.66420556, 3.0898139 , -8.70402916]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data * 10" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.71024731, -1.27559091, 0.28275867],\n", " [ 0.73284111, 0.61796278, -1.74080583]])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data + data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "每一个数组都有一个shape，来表示维度大小。而dtype，用来表示data type：" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 3)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.shape" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1 Greating ndarrays (创建n维数组)\n", "\n", "最简单的方法使用array函数，输入一个序列即可，比如list：" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 6. , 7.5, 8. , 0. , 1. ])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data1 = [6, 7.5, 8, 0, 1]\n", "arr1 = np.array(data1)\n", "arr1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "嵌套序列能被转换为多维数组：" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3, 4],\n", " [5, 6, 7, 8]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]\n", "arr2 = np.array(data2)\n", "arr2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "因为data2是一个list of lists, 所以arr2维度为2。我们能用ndim和shape属性来确认一下：" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2.ndim" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 4)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "除非主动声明，否则np.array会自动给data搭配适合的类型，并保存在dtype里：" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr1.dtype" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "除了np.array，还有一些其他函数能创建数组。比如zeros,ones,另外还可以在一个tuple里指定shape：" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros(10)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0., 0., 0.]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros((3, 6))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0.00000000e+000, 0.00000000e+000],\n", " [ 2.16538378e-314, 2.16514681e-314],\n", " [ 2.16511832e-314, 2.16072529e-314]],\n", "\n", " [[ 0.00000000e+000, 0.00000000e+000],\n", " [ 2.14037397e-314, 6.36598737e-311],\n", " [ 0.00000000e+000, 0.00000000e+000]]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.empty((2, 3, 2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "np.empty并不能保证返回所有是0的数组，某些情况下，会返回为初始化的垃圾数值，比如上面。\n", "\n", "arange是一个数组版的python range函数：" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里是一些创建数组的函数：\n", "\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午1.04.36.png)\n", "\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午1.04.36_cn.png)\n", "\n", "# 2 Data Types for ndarrays\n", "\n", "dtype保存数据的类型：" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr1 = np.array([1, 2, 3], dtype=np.float64)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr2 = np.array([1, 2, 3], dtype=np.int32)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr1.dtype" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int32')" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "dtype才是numpy能灵活处理其他外界数据的原因。\n", "\n", "类型表格：\n", "\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午1.15.52.png)\n", "\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午1.15.52_cn.png)\n", "\n", "可以用astype来转换类型：" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([1, 2, 3, 4, 5])\n", "arr.dtype" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "float_arr = arr.astype(np.float64)\n", "float_arr.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上面是把int变为float。如果是把float变为int，小数点后的部分会被丢弃：" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 3.7, -1.2, -2.6, 0.5, 12.9, 10.1])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])\n", "arr" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 3, -1, -2, 0, 12, 10], dtype=int32)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr.astype(np.int32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还可以用astype把string里的数字变为实际的数字：" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([b'1.25', b'-9.6', b'42'], \n", " dtype='|S4')" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)\n", "numeric_strings" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1.25, -9.6 , 42. ])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numeric_strings.astype(float)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "要十分注意`numpy.string_`类型，这种类型的长度是固定的，所以可能会直接截取部分输入而不给警告。\n", "\n", "如果转换（casting）失败的话，会给出一个ValueError提示。\n", "\n", "可以用其他数组的dtype直接来制定类型：" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "int_array = np.arange(10)\n", "\n", "calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "int_array.astype(calibers.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还可以利用类型的缩写，比如u4就代表unit32：" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "empty_unit32 = np.empty(8, dtype='u4')\n", "empty_unit32" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "记住，astype总是会返回一个新的数组\n", "\n", "# 3 Arithmetic with NumPy Arrays（数组计算）\n", "\n", "数组之所以重要，是因为不用写for循环就能表达很多操作，这种特性叫做vectorization(向量化)。任何两个大小相等的数组之间的运算，都是element-wise（点对点）：" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr = np.array([[1., 2., 3.], [4., 5., 6.]])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 2., 3.],\n", " [ 4., 5., 6.]])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 4., 9.],\n", " [ 16., 25., 36.]])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr * arr" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 0., 0.],\n", " [ 0., 0., 0.]])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr - arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "element-wise 我翻译为点对点，就是指两个数组的运算，在同一位置的元素间才会进行运算。\n", "\n", "这种算数操作如果涉及标量（scalar）的话，会涉及到数组的每一个元素：\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1. , 0.5 , 0.33333333],\n", " [ 0.25 , 0.2 , 0.16666667]])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 / arr" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1. , 1.41421356, 1.73205081],\n", " [ 2. , 2.23606798, 2.44948974]])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr ** 0.5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "两个数组的比较会产生布尔数组：" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 4., 1.],\n", " [ 7., 2., 12.]])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])\n", "arr2" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[False, True, False],\n", " [ True, False, True]], dtype=bool)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2 > arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4 Basic Indexing and Slicing（基本的索引和切片）\n", "\n", "一维的我们之前已经在list部分用过了，没什么不同：" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.arange(10)\n", "arr" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr[5]" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5, 6, 7])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr[5:8]" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr[5:8] = 12" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里把12赋给`arr[5:8]`，其实用到了broadcasted（我觉得应该翻译为广式转变）。这里有一个比较重要的概念需要区分，python内建的list与numpy的array有个明显的区别，这里array的切片后的结果只是一个views（视图），用来代表原有array对应的元素，而不是创建了一个新的array。但list里的切片是产生了一个新的list：" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([12, 12, 12])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr_slice = arr[5:8]\n", "arr_slice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果我们改变arr_slice的值，会反映在原始的数组arr上：" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr_slice[1] = 12345" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 12, 12345, 12, 8, 9])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`[:]`这个赋值给所有元素：" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr_slice[:] = 64" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 64, 64, 64, 8, 9])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "之所以这样设计是出于性能和内存的考虑，毕竟如果总是复制数据的话，会很影响运算时间。当然如果想要复制，可以使用copy()方法，比如`arr[5:8].copy()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在一个二维数组里，单一的索引指代的是一维的数组：" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([7, 8, 9])" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n", "arr2d[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "有两种方式可以访问单一元素：" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[0][2]" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[0, 2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以把下图中的axis0看做row（行），把axis1看做column（列）：\n", "\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午2.08.18.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于多维数组，如果省略后面的索引，返回的将是一个低纬度的多维数组。比如下面一个2 x 2 x 3数组：" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 1, 2, 3],\n", " [ 4, 5, 6]],\n", "\n", " [[ 7, 8, 9],\n", " [10, 11, 12]]])" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])\n", "arr3d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "arr3d[0]是一个2x3数组：" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr3d[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "标量和数组都能赋给arr3d[0]:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[42, 42, 42],\n", " [42, 42, 42]],\n", "\n", " [[ 7, 8, 9],\n", " [10, 11, 12]]])" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "old_values = arr3d[0].copy()\n", "\n", "arr3d[0] = 42\n", "\n", "arr3d" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 1, 2, 3],\n", " [ 4, 5, 6]],\n", "\n", " [[ 7, 8, 9],\n", " [10, 11, 12]]])" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr3d[0] = old_values\n", "arr3d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`arr3d[1, 0]`会给你一个(1, 0)的一维数组：" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([7, 8, 9])" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr3d[1, 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上面的一步等于下面的两步：" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 7, 8, 9],\n", " [10, 11, 12]])" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = arr3d[1]\n", "x" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([7, 8, 9])" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "一定要牢记这些切片后返回的数组都是views\n", "\n", "## Indexing with slices（用切片索引）\n", "\n", "一维的话和python里的list没什么差别：" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 64, 64, 64, 8, 9])" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, 2, 3, 4, 64])" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr[1:6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "二维的话，数组的切片有点不同：" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以看到，切片是沿着axis 0（行）来处理的。所以，数组中的切片，是要沿着设置的axis来处理的。我们可以把arr2d[:2]理解为“选中arr2d的前两行”。\n", "\n", "当然，给定多个索引后，也可以使用复数切片：" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6],\n", " [7, 8, 9]])" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2, 3],\n", " [5, 6]])" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[:2, 1:] # 前两行，第二列之后" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "记住，选中的是array view。通过混合整数和切片，能做低维切片。比如，我们选中第二行的前两列：" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([4, 5])" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[1, :2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "选中第三列的前两行：" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([3, 6])" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[:2, 2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "冒号表示提取整个axis（轴）：" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1],\n", " [4],\n", " [7]])" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[:, :1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "看图示有助于理解：\n", "![](../MarkdownPhotos/chp04/屏幕快照 2017-10-24 下午2.41.52.png)\n", "\n", "赋值也很方便：" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0],\n", " [4, 0, 0],\n", " [7, 8, 9]])" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr2d[:2, 1:] = 0\n", "arr2d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5 Boolean Indexing (布尔索引)\n", "\n", "假设我们的数组数据里有一些重复。这里我们用numpy.random里的randn函数来随机生成一些离散数据：" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], \n", " dtype='