{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. 什么是numpy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[numpy](http://numpy.org)是Numerical Python的简写,是python数值计算的基石。它是目前Python数值计算中最为重要的基础包。\n", "Numpy的重要功能:\n", "\n", "1. ndarry,一种高效的多位数组,提供了基于数组的便捷算数操作以及灵活的广播功能(广播功能将会单独进行讲解)。\n", "2. 对所有数据进行快速的矩阵计算,而无需编写循环程序\n", "3. 对硬盘中的数组数据进行读写的工具,并对内存映射文件进行操作\n", "4. 线性代数、随机数生成以及傅立叶变换功能" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于一个100万的数据来说,使用numpy进行操作要比用python直接操作快10到100倍" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 12.5 ms, sys: 9.99 ms, total: 22.5 ms\n", "Wall time: 23.6 ms\n" ] } ], "source": [ "import numpy as np #这是约定俗成的导入numpy的方式,建议你也这样做\n", "list1 = np.arange(1000000)\n", "list2 = list(range(1000000))\n", "%time for _ in range(10):list3 = list1 * 2" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 608 ms, sys: 207 ms, total: 816 ms\n", "Wall time: 838 ms\n" ] } ], "source": [ "%time for _ in range(10):list4 = [x * 2 for x in list2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这个例子可以看出明显的差距" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. ndarray: 多维数组对象" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy的核心特征之一就是N-维数组对象-ndarray。ndarry是Python中一个快速、灵活的大型数据集容齐。数组允许你使用列斯与标量的操作语法在整块数据上进行数学计算。\n", "下面是使用Numpy生成小的随机数组" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1.44387396 -0.25442857 1.13255162]\n", " [ 0.53088762 1.39146013 0.56735466]]\n" ] } ], "source": [ "import numpy as np\n", "data = np.random.randn(2,3)\n", "print(data)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[14.43873964 -2.5442857 11.32551616]\n", " [ 5.30887617 13.91460135 5.67354659]]\n" ] } ], "source": [ "print(data * 10)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 2.88774793 -0.50885714 2.26510323]\n", " [ 1.06177523 2.78292027 1.13470932]]\n" ] } ], "source": [ "print(data + data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ndarry包含几个常用的属性\n", "- T: 返回数组的转置\n", "- shape: 返回数组每一维度的数量\n", "- dtype: 返回数组的数据类型\n", "- size: 返回数组中的数据数量\n", "- ndim: 返回数组的维数" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1.44387396 0.53088762]\n", " [-0.25442857 1.39146013]\n", " [ 1.13255162 0.56735466]]\n", "(2, 3)\n", "float64\n", "6\n", "2\n" ] } ], "source": [ "print(data.T)\n", "print(data.shape)\n", "print(data.dtype)\n", "print(data.size)\n", "print(data.ndim)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.1 生成ndarry" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "生成ndarry最简单的方式就是使用array函数。\n", "np.array函数会自动推断数组的数据类型" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[6. 7.5 8. 0. 1. ]\n", "float64\n" ] } ], "source": [ "data1 = [6,7.5,8,0,1]\n", "arr1 = np.array(data1)\n", "print(arr1)\n", "print(arr1.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "同等长度的列表,将会自动转换成多维数组" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3 4]\n", " [4 5 6 7]]\n", "int64\n" ] } ], "source": [ "data2 = [[1,2,3,4],[4,5,6,7]]\n", "arr2 = np.array(data2)\n", "print(arr2)\n", "print(arr2.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还有一些新的函数可以创建新的数组\n", "- zeros(): 生成全是0的数组 \n", "- ones(): 生成全是1的数组" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", "[[0. 0. 0.]\n", " [0. 0. 0.]]\n", "[1. 1. 1. 1. 1.]\n" ] } ], "source": [ "print(np.zeros(10))\n", "print(np.zeros((2,3)))#⚠️注意这里要有括号,传入一个元组\n", "print(np.ones(5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2 ndarray的数据类型" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "类型|类型代码|描述\n", "-|-|-\n", "int8,unit8|i1,u1|有符号和无符号的8位整数\n", "int16,unit16|i2,u2|有符号和无符号的16位整数\n", "int32,unit32|i4,u4|有符号和无符号的32位整数\n", "int64,unit64|i8,u8|有符号和无符号的64位整数\n", "float16|f2|半精度浮点数\n", "float32|f4|标准单精度浮点数,兼容C语言float\n", "float64|f8|标准双精度浮点数,兼容c语言double和python float\n", "float128|f16|拓展精度浮点数\n", "bool|?|布尔值,存储True或False\n", "string_|S|字符串类型" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int64\n" ] } ], "source": [ "import numpy as np\n", "arr = np.array([1,2,3,4,5])\n", "print(arr.dtype)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64\n" ] } ], "source": [ "float_arr = arr.astype(np.float64)\n", "print(float_arr.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在上面这个例子中,整数被转换为浮点数,使用了`astype()`函数,如果把浮点数转换为整数,则小数点后的部分被消除。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.3 numpy数组运算" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "数组最重要的特征就是允许你进行批量操作而不需要使用for循环。称此种特征为向量化" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 2. 3.]\n", " [4. 5. 6.]]\n", "\n", " [[ 1. 4. 9.]\n", " [16. 25. 36.]]\n", "\n", " [[0. 0. 0.]\n", " [0. 0. 0.]]\n", "\n", " [[1. 0.5 0.33333333]\n", " [0.25 0.2 0.16666667]]\n", "\n", " [[1. 1.41421356 1.73205081]\n", " [2. 2.23606798 2.44948974]]\n" ] } ], "source": [ "import numpy as np\n", "arr = np.array([[1.,2.,3.],[4.,5.,6.]])\n", "print(arr)\n", "print(\"\\n\", arr *arr)\n", "print(\"\\n\", arr -arr)\n", "print(\"\\n\", 1 / arr)\n", "print(\"\\n\", arr ** 0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还可以比较两个数组,会返回比较后的布尔值" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 4. 1.]\n", " [ 7. 2. 12.]]\n", "\n", " [[False True False]\n", " [ True False True]]\n" ] } ], "source": [ "arr2 = np.array([[0.,4.,1.],[7.,2.,12.]])\n", "print(arr2)\n", "print(\"\\n\", arr2 > arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.4 索引与切片" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5 6 7 8 9]\n", "5\n", "[5 6 7]\n", "[ 0 1 2 3 4 10 10 10 8 9]\n" ] } ], "source": [ "import numpy as np\n", "#numpy的arange类似于python的range函数\n", "arr = np.arange(10)\n", "print(arr)\n", "print(arr[5])#称为索引\n", "print(arr[5:8])#称为切片\n", "arr[5:8] = 10\n", "print(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这是在一维数组上的切片操作,需要⚠️注意的是当对数组进行操作时会直接对原数组进行修改。" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [4 5 6]\n", " [7 8 9]]\n", "\n", " [1 2 3]\n", "\n", " 3\n" ] } ], "source": [ "import numpy as np\n", "arr = np.array([[1,2,3],[4,5,6],[7,8,9]])\n", "print(arr)\n", "print(\"\\n\", arr[0])\n", "print(\"\\n\",arr[0][2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于一个多维数组,一次切片会返回一维数组,若想精确的选择某个值需要对其再进行索引" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "数组也可以与python中的列表对象类似,通过切片对其中的数据进行取值" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 4]\n", "[0, 1, 2, 3]\n", "[0, 1, 2, 3, 4, 5, 6, 7, 8]\n" ] } ], "source": [ "arr = [0,1,2,3,4,5,6,7,8,9]\n", "print(arr[1:5])\n", "print(arr[:4])\n", "print(arr[:-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于多维数组来说,可以通过多级的索引或切片进行取值,如图所示:\n", "" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2 5 8]\n", "\n", "\n", "[3 6]\n", "\n", "\n", "[4 5]\n", "\n", "\n", "[[1 2]\n", " [4 5]]\n" ] } ], "source": [ "import numpy as np\n", "arr = np.array([[1,2,3],[4,5,6],[7,8,9]])\n", "print(arr[:, 1])\n", "print(\"\\n\")\n", "print(arr[0:2, 2])\n", "print(\"\\n\")\n", "print(arr[1, 0:2])\n", "print(\"\\n\")\n", "print(arr[0:2, 0:2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.5 数组的转置" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "与线性代数中理解的矩阵转置一样,numpy中提供了方便的数组转置方法" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0 1 2 3 4]\n", " [ 5 6 7 8 9]\n", " [10 11 12 13 14]]\n", "\n", "\n", "[[ 0 5 10]\n", " [ 1 6 11]\n", " [ 2 7 12]\n", " [ 3 8 13]\n", " [ 4 9 14]]\n" ] } ], "source": [ "arr = np.arange(15).reshape((3,5))\n", "print(arr)\n", "print(\"\\n\")\n", "print(arr.T)#此为数组的转置" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对矩阵进行计算时会涉及一些特殊的操作,比如计算矩阵的内积,我们可以使用np.dot方法" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[125 140 155 170 185]\n", " [140 158 176 194 212]\n", " [155 176 197 218 239]\n", " [170 194 218 242 266]\n", " [185 212 239 266 293]]\n" ] } ], "source": [ "print(np.dot(arr.T, arr))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.6 numpy中的通用函数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "一元通用函数\n", "\n", "函数名|描述|\n", "-|-|\n", "`abs`,`fabs`|逐元素地计算整数,浮点数或负数的绝对值|\n", "`sqrt`|计算每个元素的平方根|\n", "`square`|计算每个元素的平方|\n", "`exp`|计算每个元素的自然指数值$e^{x}$\n", "`log`,`log10`,`log2`|分别对应自然对数,对数10为底,对数2为底|\n", "`sign`|计算每个元素的符号值|" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3 4 5 6 7 8 9]\n", "\n", "\n", "[1. 1.41421356 1.73205081 2. 2.23606798 2.44948974\n", " 2.64575131 2.82842712 3. ]\n", "\n", "\n", "[ 1 4 9 16 25 36 49 64 81]\n", "\n", "\n", "[2.71828183e+00 7.38905610e+00 2.00855369e+01 5.45981500e+01\n", " 1.48413159e+02 4.03428793e+02 1.09663316e+03 2.98095799e+03\n", " 8.10308393e+03]\n", "\n", "\n", "[0. 0.30103 0.47712125 0.60205999 0.69897 0.77815125\n", " 0.84509804 0.90308999 0.95424251]\n", "\n", "\n", "[1 1 1 1 1 1 1 1 1]\n" ] } ], "source": [ "arr = np.arange(1, 10)\n", "print(arr)\n", "print(\"\\n\")\n", "print(np.sqrt(arr))\n", "print(\"\\n\")\n", "print(np.square(arr))\n", "print(\"\\n\")\n", "print(np.exp(arr))\n", "print(\"\\n\")\n", "print(np.log10(arr))\n", "print(\"\\n\")\n", "print(np.sign(arr))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "二元通用函数\n", "\n", "函数名|描述|\n", "-|-|\n", "add|将数组对应元素相加|\n", "multiply|将数组对应元素相乘|\n", "divide,floor_fivide|初或整除|\n", "power|将第二个数组元素作为第一个数组元素的幂|\n", "maximum|逐个计算元素的最大值|\n", "minimum|逐个计算元素的最小值|\n", "mode|求元素的模|" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-2.81931681 -1.16594766 1.25560557 -0.53089508 -0.88346508 -0.69976644\n", " 0.18584147 -0.78739778]\n", "\n", "\n", "[ 1.10865854 0.07394118 0.18840718 -0.56140484 0.50711533 -0.60786824\n", " -0.24535454 -2.96499746]\n", "\n", "\n", "[ 1.10865854 0.07394118 1.25560557 -0.53089508 0.50711533 -0.60786824\n", " 0.18584147 -0.78739778]\n", "\n", "\n", "[-3.12565966 -0.08621154 0.23656511 0.29804707 -0.44801869 0.4253658\n", " -0.04559705 2.33463243]\n" ] } ], "source": [ "x = np.random.randn(8)\n", "y = np.random.randn(8)\n", "print(x)\n", "print(\"\\n\")\n", "print(y)\n", "print(\"\\n\")\n", "print(np.maximum(x,y))\n", "print(\"\\n\")\n", "print(np.multiply(x,y))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.7 使用numpy进行线性代数的计算" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "numpy中提供了大量的线性代数操作,这些函数是使用标准的线性代数库来实现的,numpy的线性代数操作都存在于linalg中。\n", "\n", "函数名|描述|\n", "-|-|\n", "`diag`|将一个方针的对角元素作为一维数组返回,或将一维数组转换成一个方阵|\n", "`dot`|计算矩阵的点乘|\n", "`trace`|计算对角元素的和|\n", "`det`|计算矩阵的行列式|\n", "`eig`|计算方阵的特征值和特征向量|\n", "`inv`|计算方阵的逆矩阵|" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 2. 3.]\n", " [4. 5. 6.]]\n", "\n", "\n", "[[ 6. 23.]\n", " [-1. 7.]\n", " [ 8. 7.]]\n", "\n", "\n", "[[ 28. 58.]\n", " [ 67. 169.]]\n", "\n", "\n", "[[14. 32.]\n", " [32. 77.]]\n", "\n", "\n", "diag: [14. 77.]\n", "trace: 91.0\n", "det: 54.00000000000001\n", "dig: (array([ 0.59732747, 90.40267253]), array([[-0.92236578, -0.3863177 ],\n", " [ 0.3863177 , -0.92236578]]))\n", "inv: [[ 1.42592593 -0.59259259]\n", " [-0.59259259 0.25925926]]\n" ] } ], "source": [ "import numpy as np\n", "x = np.array([[1.,2.,3.], [4.,5.,6.]])\n", "y = np.array([[6.,23.],[-1,7],[8,7]])\n", "print(x)\n", "print(\"\\n\")\n", "print(y)\n", "print(\"\\n\")\n", "print(x.dot(y))\n", "print(\"\\n\")\n", "z = x.dot(x.T)\n", "print(z)\n", "print(\"\\n\")\n", "print(\"diag: \", np.diag(z))\n", "print(\"trace: \",z.trace())\n", "print(\"det: \", np.linalg.det(z))\n", "print(\"dig: \", np.linalg.eig(z))\n", "print(\"inv: \", np.linalg.inv(z))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.8 伪随机数生成" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "numpy.random填补了python内建的random的不足,使用numpy可以高效的生成多种概率分布下的完整样本值数组。\n", "\n", "函数名|描述|\n", "-|-|\n", "`seed`|向随机数生成器传递随机状态种子|\n", "`shuffle`|随机排列一个序列|\n", "`rand`|从均匀分布中抽取样本|\n", "`randint`|根据给定的范围抽取随机整数|\n", "`randn`|从均值0方差1的正态分布中抽取样本|\n", "`normal`|从正态分布中抽取样本|" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-0.44935052 1.36724432 -0.31976099 0.90308154]\n", " [ 1.04837402 -0.87623288 1.93919589 1.11932443]\n", " [-0.73634142 -0.71623403 -1.94616623 0.82323579]\n", " [-1.90485977 0.50994768 -0.06533462 -0.65938636]]\n" ] } ], "source": [ "samples = np.random.normal(size=(4,4))\n", "print(samples)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3 4 5 6 7 8 9]\n" ] } ], "source": [ "sample = np.arange(1, 10)\n", "print(sample)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }