{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n", "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#统计函数\" data-toc-modified-id=\"统计函数-1\"><span class=\"toc-item-num\">1 </span>统计函数</a></span><ul class=\"toc-item\"><li><span><a href=\"#从给定数组中的元素沿指定轴返回最小值—numpy.amin()\" data-toc-modified-id=\"从给定数组中的元素沿指定轴返回最小值—numpy.amin()-1.1\"><span class=\"toc-item-num\">1.1 </span>从给定数组中的元素沿指定轴返回最小值—numpy.amin()</a></span></li><li><span><a href=\"#从给定数组中的元素沿指定轴返回最大值—numpy.amax()\" data-toc-modified-id=\"从给定数组中的元素沿指定轴返回最大值—numpy.amax()-1.2\"><span class=\"toc-item-num\">1.2 </span>从给定数组中的元素沿指定轴返回最大值—numpy.amax()</a></span></li><li><span><a href=\"#返回沿轴的值的极差(最大值---最小值)—numpy.ptp()\" data-toc-modified-id=\"返回沿轴的值的极差(最大值---最小值)—numpy.ptp()-1.3\"><span class=\"toc-item-num\">1.3 </span>返回沿轴的值的极差(最大值 - 最小值)—numpy.ptp()</a></span></li><li><span><a href=\"#百分位数—numpy.percentile()\" data-toc-modified-id=\"百分位数—numpy.percentile()-1.4\"><span class=\"toc-item-num\">1.4 </span>百分位数—numpy.percentile()</a></span></li><li><span><a href=\"#用于计算数组-a-中元素的中位数(中值)—numpy.median()\" data-toc-modified-id=\"用于计算数组-a-中元素的中位数(中值)—numpy.median()-1.5\"><span class=\"toc-item-num\">1.5 </span>用于计算数组 a 中元素的中位数(中值)—numpy.median()</a></span></li><li><span><a href=\"#返回数组中元素的算术平均值—numpy.mean()\" data-toc-modified-id=\"返回数组中元素的算术平均值—numpy.mean()-1.6\"><span class=\"toc-item-num\">1.6 </span>返回数组中元素的算术平均值—numpy.mean()</a></span></li><li><span><a href=\"#返回数组中元素的加权平均值—numpy.average()\" data-toc-modified-id=\"返回数组中元素的加权平均值—numpy.average()-1.7\"><span class=\"toc-item-num\">1.7 </span>返回数组中元素的加权平均值—numpy.average()</a></span></li><li><span><a href=\"#返回数组的标准差—numpy.std()\" data-toc-modified-id=\"返回数组的标准差—numpy.std()-1.8\"><span class=\"toc-item-num\">1.8 </span>返回数组的标准差—numpy.std()</a></span></li><li><span><a href=\"#返回数组的方差—numpy.var()\" data-toc-modified-id=\"返回数组的方差—numpy.var()-1.9\"><span class=\"toc-item-num\">1.9 </span>返回数组的方差—numpy.var()</a></span></li></ul></li><li><span><a href=\"#转置\" data-toc-modified-id=\"转置-2\"><span class=\"toc-item-num\">2 </span>转置</a></span></li></ul></div>" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#全部行都能输出\n", "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 统计函数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy有很多有用的统计函数,用于从数组中给定的元素中查找最小,最大,百分标准差和方差等。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy有很多有用的统计函数,用于从数组中给定的元素中查找最小,最大,百分标准差和方差等。 \n", "\n", "|**常用统计函数** ||\n", "| ----------- | ------------------------------- |\n", "|numpy.amin() |从给定数组中的元素沿指定轴返回最小值|\n", "|numpy.amax() |从给定数组中的元素沿指定轴返回最大值|\n", "|numpy.ptp() |返回沿轴的值的极差(最大值 - 最小值)|\n", "|numpy.percentile()|返回特定轴的百分位数|\n", "|numpy.median()|返回数组中值|\n", "|numpy.mean()|返回数组的算术平均值|\n", "|numpy.average()|返回数组的加权平均值|\n", "|numpy.std()|返回数组的标准差|\n", "|numpy.var()|返回数组的方差|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 从给定数组中的元素沿指定轴返回最小值—numpy.amin() " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[3, 7, 5],\n", " [8, 4, 3],\n", " [2, 4, 9]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([[3,7,5],[8,4,3],[2,4,9]])\n", "a" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n", "[2 4 3]\n", "[3 3 2]\n" ] } ], "source": [ "print(np.amin(a))\n", "print(np.amin(a,axis=0)) #返回每一列的最小值\n", "print(np.amin(a,axis=1)) #返回每一行的最小值" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 从给定数组中的元素沿指定轴返回最大值—numpy.amax() " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[3 7 5]\n", " [8 4 3]\n", " [2 4 9]]\n" ] } ], "source": [ "a = np.array([[3,7,5],[8,4,3],[2,4,9]])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9\n", "[8 7 9]\n", "[7 8 9]\n" ] } ], "source": [ "print(np.amax(a))\n", "print(np.amax(a,axis=0)) #返回每一列的最大值\n", "print(np.amax(a,axis=1)) #返回每一行的最大值" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 返回沿轴的值的极差(最大值 - 最小值)—numpy.ptp()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[3 7 5]\n", " [8 4 3]\n", " [2 4 9]]\n" ] } ], "source": [ "a = np.array([[3,7,5],[8,4,3],[2,4,9]])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7\n", "[6 3 6]\n", "[4 5 7]\n" ] } ], "source": [ "print(np.ptp(a)) #返回整个数组的极差\n", "print(np.ptp(a, 0)) #返回每一列的极差\n", "print(np.ptp(a, 1)) #返回每一行的极差" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 百分位数—numpy.percentile()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- 百分位数是统计中使用的度量,表示小于这个值得观察值占某个百分比。\n", "- 函数numpy.percentile() 接受以下参数。numpy.percentile(a, q, axis)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- a 输入数组\n", "- q 要计算的百分位数,在 0 ~ 100 之间\n", "- axis 沿着它计算百分位数的轴" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[30 40 70]\n", " [80 20 10]\n", " [50 90 60]]\n" ] } ], "source": [ "a = np.array([[30,40,70],[80,20,10],[50,90,60]])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "50.0\n", "[50. 40. 60.]\n", "[40. 20. 60.]\n" ] } ], "source": [ "print(np.percentile(a,50))\n", "print(np.percentile(a,50, axis = 0)) #按行\n", "print(np.percentile(a,50, axis = 1)) #按列" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 用于计算数组 a 中元素的中位数(中值)—numpy.median()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "中值定义为将数据样本的上半部分与下半部分分开的值" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[30 65 70]\n", " [80 95 10]\n", " [50 90 60]]\n" ] } ], "source": [ "a = np.array([[30,65,70],[80,95,10],[50,90,60]])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "65.0\n", "[50. 90. 60.]\n", "[65. 80. 60.]\n" ] } ], "source": [ "print(np.median(a))\n", "print(np.median(a, axis = 0)) #按列\n", "print(np.median(a, axis = 1)) #按行" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 返回数组中元素的算术平均值—numpy.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "算术平均值是沿轴的元素的总和除以元素的数量。函数返回数组中元素的算术平均值。如果提供了轴,则沿其计算。" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [3 4 5]\n", " [4 5 6]]\n" ] } ], "source": [ "a = np.array([[1,2,3],[3,4,5],[4,5,6]])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.6666666666666665\n", "[2.66666667 3.66666667 4.66666667]\n", "[2. 4. 5.]\n" ] } ], "source": [ "print(np.mean(a))\n", "print(np.mean(a, axis = 0)) #按列\n", "print(np.mean(a, axis = 1)) #按行" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 返回数组中元素的加权平均值—numpy.average()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- 加权平均值是由每个分量乘以反映其重要性的因子得到的平均值。\n", "- 函数根据在另一个数组中给出的各自的权重计算数组中元素的加权平均值。\n", "- 该函数可以接受一个轴参数。如果没有指定轴,则数组会被展开。 \n", "- 考虑数组 [1,2,3,4] 和相应的权重 [4,3,2,1] ,通过将相应元素的乘积相加,并将和除以权重的和,来计算加权平均值。 \n", "- 加权平均值 = `(1*4+2*3+3*2+4*1)/(4+3+2+1)`" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3 4]\n" ] } ], "source": [ "a = np.array([1,2,3,4])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.5\n" ] } ], "source": [ "#不指定权重时相当于 mean 函数\n", "print(np.average(a))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.0\n" ] } ], "source": [ "wts = np.array([4,3,2,1])\n", "print(np.average(a,weights = wts))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2.0, 10.0)\n" ] } ], "source": [ "#如果 returned 参数设为 true,则会返回权重的和\n", "print(np.average([1,2,3,4],weights=[4,3,2,1], returned=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 返回数组的标准差—numpy.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "标准差是与均值的偏差的平方的平均值的平方根。标准差公式如下: std = sqrt(mean((x - x.mean())**2))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.118033988749895\n" ] } ], "source": [ "print(np.std([1,2,3,4]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 返回数组的方差—numpy.var()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "方差是偏差的平方的平均值,即mean((x - x.mean())** 2)。换句话说,标准差是方差的平方根。" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.25\n" ] } ], "source": [ "print(np.var([1,2,3,4]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 转置" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy 的转置可以按照你的需要对数组的轴进行转换。" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0 1 2 3 4]\n", " [ 5 6 7 8 9]\n", " [10 11 12 13 14]]\n" ] } ], "source": [ "arr = np.arange(15).reshape((3, 5))\n", "print(arr)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0 5 10]\n", " [ 1 6 11]\n", " [ 2 7 12]\n", " [ 3 8 13]\n", " [ 4 9 14]]\n" ] } ], "source": [ "print(arr.T)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14]])" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " - 需要注意的是,转置只能发生在二维及以上的维度的数组上生效,一维的数组只有一个维度是\n", " - 不可以转置的。\n", " - 一维的怎么办?np.reshape(1,-1)变成二维,再转置" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7,)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "1" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(7)\n", "a.shape\n", "a.ndim\n", "a" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.T" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2, 3, 4, 5, 6]])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = a.reshape(1,-1) #变二维\n", "b" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4],\n", " [5],\n", " [6]])" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.T" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7,)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.shape" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4],\n", " [5],\n", " [6]])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.reshape(7,1) #也可以直接reshape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "523.537px", "left": "0px", "top": "110.284px", "width": "420.219px" }, "toc_section_display": true, "toc_window_display": true }, "toc-autonumbering": true }, "nbformat": 4, "nbformat_minor": 2 }