{ "cells": [ { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "# Numerical operations with Numpy\n", "\n", "## 3.1 **Broadcasting**\n", "## 3.2 **Array shape manipulation**\n", "## 3.3 **Sorting data**\n", "## **Summary**\n", "## **Exercises**\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "## 3.1 Broadcasting Operations\n", "\n", "- We just covered basic operations (add, multiple, square etc) such are element-wise but that works on arrays of same size\n", "- **Broadcasting** comes handy when we are dealing with different shapes. This time, we'll explore a more advanced concept in numpy called broadcasting. \n", "\n", "- The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, *the smaller array is \"broadcast\" across the larger array so that they have compatible shapes*. \n", "- Broadcasting provides a means of **vectorizing array operations** so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are also cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.\n", "- In this little tutorial we will provide a gentle introduction to broadcasting with numerous examples ranging from simple to involved. \n", "- We will also go through a few examples of when to and when not to use boradcasting.\n", "\n", "\n", "\n", "#### This example below shows how broadcasting works\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "### So, lets start taking baby steps...\n", "\n", "Here an element-wise multiplication occurs since the two arrays are of same shape" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([2., 4., 6.])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "e = np.array([1.0, 2.0, 3.0])\n", "f = np.array([2.0, 2.0, 2.0])\n", "e*f" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "##### Hint / Try it?\n", "\n", "What would have happened if `f = np.array([2.0, 2.0])`. would it still multiply?" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([2., 4., 6.])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# But if it was like this\n", "\n", "e = np.array([1.0, 2.0, 3.0])\n", "f = 2.0\n", "e*f" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "##### What happened here\n", "\n", "This is the most simplest example on numpy broadcasting where an array and a scalar values were combined in an operation.\n", "\n", "so it kind of *stechted in the row direction*! The scalar **f** is stretched to become an array of with the same shape as **e** so the shapes are compatible for element-by-element multiplication.\n", "\n", "\n", "\n", "** So what are the rules then?**\n", "- They must either be equal / same shape\n", "OR\n", "- One of them must be 1, like f was above" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 0., 0.],\n", " [10., 10., 10.],\n", " [20., 20., 20.],\n", " [30., 30., 30.]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Typical broadcasting in practice\n", "g = np.array([[ 0.0, 0.0, 0.0], [10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]])\n", "g " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([1., 2., 3.])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h = np.array([1.0, 2.0, 3.0])\n", "h" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 2., 3.],\n", " [11., 12., 13.],\n", " [21., 22., 23.],\n", " [31., 32., 33.]])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g + h" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "### What happened above?\n", "\n", "A 2-D (two-dimensional) array multiplied by 1-D (one-dimensional) array. It got stretched in the column direction so as to match the elements of the 2D array columns.\n", "\n", "\n", "Would the same be possible for different shapes? Does broadcasting magically understands and fixes our assumptions?\n", "\n", "Let's take a look...\n", "\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "ename": "ValueError", "evalue": "operands could not be broadcast together with shapes (4,3) (4,) ", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mg\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m10.0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m10.0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m10.0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m20.0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m20.0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m20.0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m30.0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m30.0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m30.0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mi\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3.0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mg\u001b[0m\u001b[0;34m+\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: operands could not be broadcast together with shapes (4,3) (4,) " ] } ], "source": [ "g = np.array([[ 0.0, 0.0, 0.0], [10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]])\n", "i = np.array([0.0, 1.0, 2.0, 3.0])\n", "g+i " ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "### We had a mismatch...\n", "\n", "\n", "\n", "Explanation: When the trainling dimensions of the arrays are different as you saw above, then broadcasting will fail making it impossible to align the values in the rows of the first array with the elements of the second array for an **element-by-element** addition or multiplication.\n", "\n", "### Also, is there a way to do this in one line of code\n", "\n", "Tip: look up more into np.tile and np.arange" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 0, 0],\n", " [10, 10, 10],\n", " [20, 20, 20],\n", " [30, 30, 30]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.tile(np.arange(0, 40, 10), (3, 1))\n", "a = a.T # transpose this\n", "a" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([0, 1, 2])\n", "b" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "##### Now, we add these two" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2],\n", " [10, 11, 12],\n", " [20, 21, 22],\n", " [30, 31, 32]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a + b" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "##### So you see that broadcasting was applied magically...\n", "\n", "Ask yourself, why couldn't we add original `a` and `b` ?\n", "\n", "Note, original a was:\n", "```python\n", "array([[ 0, 10, 20, 30],\n", " [ 0, 10, 20, 30],\n", " [ 0, 10, 20, 30]])\n", "```" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1., 1.]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = np.ones((5, 6))\n", "c" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "##### Let's assign an array of dimension 0 to an array of dimension 1" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([[2., 2., 2., 2., 2., 2.],\n", " [1., 1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1., 1.]])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c[0] = 2\n", "c" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([ 0, 10, 20])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = np.arange(0, 30, 10)\n", "d" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "(3,)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.shape" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "(3, 1)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = d[:, np.newaxis] # Here we add a new axis and make it a 2D array\n", "d.shape" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "ename": "ValueError", "evalue": "operands could not be broadcast together with shapes (4,3) (3,1) ", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ma\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: operands could not be broadcast together with shapes (4,3) (3,1) " ] } ], "source": [ "a + d" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "#### Another example on broadcasting\n", "\n", "Let’s construct an array of distances (in miles) between cities of Route 66: Chicago, Springfield, Saint-Louis, Tulsa, Oklahoma City, Amarillo, Santa Fe, Albuquerque, Flagstaff and Los Angeles." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448],\n", " [ 198, 0, 105, 538, 673, 977, 1277, 1346, 1715, 2250],\n", " [ 303, 105, 0, 433, 568, 872, 1172, 1241, 1610, 2145],\n", " [ 736, 538, 433, 0, 135, 439, 739, 808, 1177, 1712],\n", " [ 871, 673, 568, 135, 0, 304, 604, 673, 1042, 1577],\n", " [1175, 977, 872, 439, 304, 0, 300, 369, 738, 1273],\n", " [1475, 1277, 1172, 739, 604, 300, 0, 69, 438, 973],\n", " [1544, 1346, 1241, 808, 673, 369, 69, 0, 369, 904],\n", " [1913, 1715, 1610, 1177, 1042, 738, 438, 369, 0, 535],\n", " [2448, 2250, 2145, 1712, 1577, 1273, 973, 904, 535, 0]])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mileposts = np.array([0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448])\n", "distance_array = np.abs(mileposts - mileposts[:, np.newaxis])\n", "distance_array" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "#### Another example\n", "\n", "A lot of grid-based or network-based problems can also use broadcasting. For instance, if we want to compute the distance from the origin of points on a 10x10 grid, we can do" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([[0. , 1. , 2. , 3. , 4. ],\n", " [1. , 1.41421356, 2.23606798, 3.16227766, 4.12310563],\n", " [2. , 2.23606798, 2.82842712, 3.60555128, 4.47213595],\n", " [3. , 3.16227766, 3.60555128, 4.24264069, 5. ],\n", " [4. , 4.12310563, 4.47213595, 5. , 5.65685425]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x, y = np.arange(5), np.arange(5)[:, np.newaxis]\n", "distance = np.sqrt(x**2 + y**2)\n", "distance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Or in color... " ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "hideCode": false, "hidePrompt": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW0AAAD8CAYAAAC8TPVwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAK20lEQVR4nO3c0aufB33H8c+3J5mxaa0Ts1JtWYeMDhFWt+Bg3QYrTjoVt0sFvRJys43KBjIv/QfEm10sqGxDp4haGI65dWjRglqbGrVt1DnXsVIhK05sFJ2p312cX5fSnPj71Z5fnvNtXi845Jzk6emHh/SdX5/f86S6OwDMcNXSAwDYnGgDDCLaAIOINsAgog0wiGgDDHJok4Oq6pEkTyR5Msn57j6+zVEA7G2jaK/8fnc/vrUlAKzl8gjAILXJE5FV9R9J/idJJ/nr7j65xzEnkpxIkp0jh3/z6C//4j5PneeqeNr0KS/YOb/0hAPj2qt+tPSEA+O6q3669IQD49RXf/x4dx9bd9ym0X5Zdz9WVb+U5O4kf9bdn73U8df92vX92yff/KwGPx9dfegnS084MF5x9L+XnnBg/O6131h6woHxhqv9AfaUnRv+7dQm7xdudHmkux9b/Xg2yV1JXvPc5gHw81gb7ao6WlXXPvV5ktcleXDbwwC42CZ3j1yf5K6qeur4v+/uT211FQB7Whvt7v52kl+/DFsAWMMtfwCDiDbAIKINMIhoAwwi2gCDiDbAIKINMIhoAwwi2gCDiDbAIKINMIhoAwwi2gCDiDbAIKINMIhoAwwi2gCDiDbAIKINMIhoAwwi2gCDiDbAIKINMIhoAwwi2gCDiDbAIKINMIhoAwwi2gCDiDbAIKINMIhoAwwi2gCDiDbAIKINMMjG0a6qnar6clV9cpuDALi0Z/NK+84kZ7Y1BID1Nop2Vd2Y5A1J3rfdOQD8LIc2PO69Sd6Z5NpLHVBVJ5KcSJKdl7w4Zx6+6bmvm+7Ik0svODC++ZJjS084ML5z/XVLTzg4jt239IJx1r7Srqo3Jjnb3ad+1nHdfbK7j3f38Z1rrtm3gQBcsMnlkduSvKmqHknykSS3V9UHt7oKgD2tjXZ3v6u7b+zum5O8Ocmnu/utW18GwEXcpw0wyKZvRCZJuvueJPdsZQkAa3mlDTCIaAMMItoAg4g2wCCiDTCIaAMMItoAg4g2wCCiDTCIaAMMItoAg4g2wCCiDTCIaAMMItoAg4g2wCCiDTCIaAMMItoAg4g2wCCiDTCIaAMMItoAg4g2wCCiDTCIaAMMItoAg4g2wCCiDTCIaAMMItoAg4g2wCCiDTCIaAMMsjbaVXWkqu6rqq9U1UNV9e7LMQyAix3a4JgfJ7m9u89V1eEk91bVP3X3F7a8DYBnWBvt7u4k51ZfHl599DZHAbC3ja5pV9VOVZ1OcjbJ3d39xT2OOVFV91fV/U+eO3fxNwHgOdvk8ki6+8kkt1bVi5PcVVWv6u4Hn3HMySQnk+Tq62/q6x7e2fex05y/2jl4yg9ettFvtSvC6aUHHCA3HLll6QkHyNc2OupZ3T3S3d9Lck+SO579IACeq03uHjm2eoWdqnphktcm+fq2hwFwsU3+n/WGJH9bVTvZjfxHu/uT250FwF42uXvkq0lefRm2ALCGJyIBBhFtgEFEG2AQ0QYYRLQBBhFtgEFEG2AQ0QYYRLQBBhFtgEFEG2AQ0QYYRLQBBhFtgEFEG2AQ0QYYRLQBBhFtgEFEG2AQ0QYYRLQBBhFtgEFEG2AQ0QYYRLQBBhFtgEFEG2AQ0QYYRLQBBhFtgEFEG2AQ0QYYRLQBBhFtgEFEG2CQtdGuqpuq6jNVdaaqHqqqOy/HMAAudmiDY84n+YvufqCqrk1yqqru7u6Ht7wNgGdY+0q7u7/T3Q+sPn8iyZkkL9/2MAAutskr7f9XVTcneXWSL+7xayeSnEiSIy+4LsdO/3Af5s32o2MvWHrCAbKz9IAD44kXXb30hAPj3196bOkJ42z8RmRVXZPk40ne0d3ff+avd/fJ7j7e3ccPHz66nxsBWNko2lV1OLvB/lB3f2K7kwC4lE3uHqkk709yprvfs/1JAFzKJq+0b0vytiS3V9Xp1cfrt7wLgD2sfSOyu+9NUpdhCwBreCISYBDRBhhEtAEGEW2AQUQbYBDRBhhEtAEGEW2AQUQbYBDRBhhEtAEGEW2AQUQbYBDRBhhEtAEGEW2AQUQbYBDRBhhEtAEGEW2AQUQbYBDRBhhEtAEGEW2AQUQbYBDRBhhEtAEGEW2AQUQbYBDRBhhEtAEGEW2AQUQbYBDRBhhkbbSr6gNVdbaqHrwcgwC4tE1eaf9Nkju2vAOADayNdnd/Nsl3L8MWANbYt2vaVXWiqu6vqvt/8pMf7Ne3BeBpDu3XN+ruk0lOJsmL6iVd957er2891pHfuXXpCQfGoZdfvfSEg+NHO0svODB+eP7w0hPGcfcIwCCiDTDIJrf8fTjJ55PcUlWPVtXbtz8LgL2svabd3W+5HEMAWM/lEYBBRBtgENEGGES0AQYRbYBBRBtgENEGGES0AQYRbYBBRBtgENEGGES0AQYRbYBBRBtgENEGGES0AQYRbYBBRBtgENEGGES0AQYRbYBBRBtgENEGGES0AQYRbYBBRBtgENEGGES0AQYRbYBBRBtgENEGGES0AQYRbYBBRBtgENEGGGSjaFfVHVX1jar6VlX95bZHAbC3tdGuqp0kf5XkD5O8MslbquqV2x4GwMU2eaX9miTf6u5vd/f/JvlIkj/a7iwA9nJog2NenuS/nvb1o0l+65kHVdWJJCdWX/74X/tjDz73ecN97mNJ8tIkjy+8ZHmfcx6exrlY+U/n4ulu2eSgTaJde/xcX/QT3SeTnEySqrq/u49vMuD5zrnY5Txc4Fxc4FxcUFX3b3LcJpdHHk1y09O+vjHJYz/PKACem02i/aUkv1pVv1JVv5DkzUn+YbuzANjL2ssj3X2+qv40yT8n2Unyge5+aM0/dnI/xj1POBe7nIcLnIsLnIsLNjoX1X3R5WkADihPRAIMItoAg+xrtD3uvquqPlBVZ6vqir9XvapuqqrPVNWZqnqoqu5cetNSqupIVd1XVV9ZnYt3L71paVW1U1VfrqpPLr1lSVX1SFV9rapOr7v1b9+uaa8ed/9mkj/I7m2CX0rylu5+eF/+BYNU1e8lOZfk77r7VUvvWVJV3ZDkhu5+oKquTXIqyR9fob8vKsnR7j5XVYeT3Jvkzu7+wsLTFlNVf57keJIXdfcbl96zlKp6JMnx7l77oNF+vtL2uPtKd382yXeX3nEQdPd3uvuB1edPJDmT3adsrzi969zqy8Orjyv2ToCqujHJG5K8b+ktk+xntPd63P2K/I+TvVXVzUleneSLyy5ZzupywOkkZ5Pc3d1X7LlI8t4k70zy06WHHACd5F+q6tTqrwS5pP2M9kaPu3Nlqqprknw8yTu6+/tL71lKdz/Z3bdm98ni11TVFXn5rKremORsd59aessBcVt3/0Z2/zbVP1ldYt3Tfkbb4+7saXX99uNJPtTdn1h6z0HQ3d9Lck+SOxaespTbkrxpdS33I0lur6oPLjtpOd392OrHs0nuyu7l5j3tZ7Q97s5FVm++vT/Jme5+z9J7llRVx6rqxavPX5jktUm+vuyqZXT3u7r7xu6+Obut+HR3v3XhWYuoqqOrN+lTVUeTvC7JJe8827dod/f5JE897n4myUc3eNz9eamqPpzk80luqapHq+rtS29a0G1J3pbdV1KnVx+vX3rUQm5I8pmq+mp2X+Tc3d1X9K1uJEmuT3JvVX0lyX1J/rG7P3Wpgz3GDjCIJyIBBhFtgEFEG2AQ0QYYRLQBBhFtgEFEG2CQ/wPQ5UE7R3OP2AAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.pcolor(distance)\n", "plt.colorbar" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4]]),\n", " array([[0, 1, 2, 3, 4]]))" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Note : The numpy.ogrid function allows to directly create vectors\n", "# x and y of the previous example\n", "x, y = np.ogrid[0:5, 0:5]\n", "x, y" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((5, 1), (1, 5))" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.shape, y.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " `np.ogrid` is quite useful as soon as we have to handle computations on a grid. While on other hand, `np.mgrid`\n", " directly provides matrices full of indices for cases where we can't or maybe don't want to benefit from broadcasting." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 0, 0, 0],\n", " [1, 1, 1, 1],\n", " [2, 2, 2, 2],\n", " [3, 3, 3, 3]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x, y = np.mgrid[0:4, 0:4]\n", "x" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2, 3],\n", " [0, 1, 2, 3],\n", " [0, 1, 2, 3],\n", " [0, 1, 2, 3]])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A bit on Vector quantization or VQ\n", "\n", "A simple way to understand bradcasting is with this real world situation. The basic operatio in VQ finds the closest point in a set of points, called $codes$ in VQ speak, to a given point, called the `observation`.\n", "\n", "In the 2D example below, the value in an $observation$ describe the weight and height of an athlete to be classified. The $codes$ represent different classes of athletes such as dancer, runner, swimmer an so on.\n", "\n", "Finding the closest point requires calculating the distance between `observation` and each of the `codes`.\n", "\n", "The shortest distance provides the best match. Here in this example, `codes[0]` is the closest class indicating that the athlete is likely a basketball player.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "from numpy import array, argmin, sqrt, sum\n", "observation = array([111.0, 188.0])\n", "codes = array([[102.0, 203.0],\n", " [132.0, 193.0],\n", " [45.0, 155.0],\n", " [57.0, 173.0]])" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This is how broadcast happens\n", "difference = codes - observation\n", "distance = sqrt(sum(difference**2, axis=-1))\n", "nearest = argmin(distance)\n", "nearest" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The basic operation of vector quantization calculates the distance between an object to be classified, the black square, and multiple known codes, the gray circles. In the very basic case, the codes represent classes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A more advanced example \n", "\n", "`@article{scikit-learn,\n", " title={Scikit-learn: Machine Learning in {P}ython},\n", " author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.\n", " and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.\n", " and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and\n", " Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},\n", " journal={Journal of Machine Learning Research},\n", " volume={12},\n", " pages={2825--2830},\n", " year={2011}\n", "}`" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAOUAAAC6CAYAAABREYo0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAATd0lEQVR4nO2df5AW9X3H3x84Du8Oj8M7FBDiAQYmwUkOo4yYhlCPpsAgBlFIxahpO0lN09ZUMq2TMZMb20uTPuk4ncmYsanUFmNAwRQzgToyOZhWo55yIhTPH0G5s4d64HGwB9wdfPvH7iN7z+0+9zz7fHe/n32+n9fMM89ze8/uvnaffe/3x/4ipRQEQeDDONMCgiCMREIpCMyQUAoCMySUgsAMCaUgMENCKQjMqCjmyw0NDaqxsTEmlfjp6HDfm5rMOXR2uu/z5xc2PM55JjGNOJcryXno5uWXX+5VSk0N+l9RoWxsbER7e7seKwPU1bnvJhdh6VL3va2tsOFxzjOJacS5XEnOQzdE9G7Y/6T6KgjMKKqkTDvjGO+CqqpMG+SHsx9ntyhYFcrPfMa0QTg7d5o2yA9nP85uUWBcdgiCnVhVUr4b2rQ2zwMPuO/332/WIwzOfpzdomBVSfnRR+6LI7t3u69SyGQyyGQyeoRy0OEXF5zdomBVSVluZAO4ceNGAIDjOCZ1BE1IKFNAbviy+EPY2tqaqJMQHxJKZgQFMLcEDKqiDg0NxSsmJIZVoZwwwbRBMJlMBr29q0A0rqAqaCEh1U19feyziAxntyhYFcoFC0wbjCaTycBxHNx665ZRw4HRVdYggoLsD2pX13rMmjWrJM9t20oaPVY4u0XBqlByJKxkDBveVsAJnq2trSOqs+fOnYvkJpjBqlD+7nemDUbiD9izzzYDAJYty9+3v2fPnjGnG0f78r773Pcf/ED7pEuGs1sUrAplf79pg5H4A9bVNTP2+bW1tWFp9pKKInn+eb0uOuHsFgWrTh5IO6V06HR1dRVUygrmkVAaotCAtbS0fPy5lJMDpF2ZHiSUhjB19k2cp+IJerCqTTlxomkDtxoZRG1tMg3eqDuDmfE3eSPD2S0KVoXyU58ybRBejVy79qmETYpj82bTBuFwdouCVF8ToJBji0kj58ryxaqS8q23kp1ftu3mOA7a29vznp2zc+cfAgBWrPivRNz8xzILOVRyzz3u+4MPxigVEc5uUbCqpDx1yn0lheM4H7fhxmrLHT06DUePTktC62OyOw3/oZKwUr2j48ItOrnB2S0KVoVSGInjOCMOuQCFnTEkxItV1VchnOyJ8YJ5pKSMiaCOFM7HByWQfLCqpKyuTm5eQSeF59vw6+uPxalTMvPmjR5WzOVlcRLklmasCiXnH2/16l+ZVsjLww+PHsaldA1ySzNSfRUEZlhVUr7xhmmDcHbsWAWAR4nZ0tKCmpqaEdXSr3/dfedYKnF2i4JVoRwYiHf6pbSxjh3jdaOZ3Kop5x0aZ7coWBXKuMluyJx7WQX+SJsyBrh0gJSK7FzMICWlEEruzqWrqwuZzJaQbwu6sCqUkyaZNghn2rSjphUCyZaWTU0bcebMeyxrAU1Npg30YlUor7zStEE4SV0dUizZEM6cmcGKFcGBDOqtTZJyuToki7QphYIYq4R0HIfldaNpxKpQHjrkvuKg1E6RbdvWYNu2NZps9FOI3549e4x0Dt1+u/sqF6yqvp49G9+0S21r9ffXajKJh0L9HMdJ/JzY7u5EZpMYVoUyLuTQwUg4dgalCauqr3EhG6GgEwmlIDDDquprLeNm26xZvBtGUfwymYyWx/CNxeLFsU4+cawK5Zw5pg3CGetpW6aJ4uc4TiKPSyiXp21lkepricixOUE3VpWUBw/qn6auu79t2XIrAGD9+ie0TE83pfrFeZhk7Vr3vVye6GxVKHU9SzWOQyADAwneQCgCpfrF2UN9jPftjYrGqlDqQg6BFEdfX59phVQhoSwAOTmgNCSUxSGhDCC3/SMlo5AkVoVyypTCvue/rUdS52/OmXM4kflEhbNfc7NpA71YFcorriju+0mWkF/84t7E5hUFXX5x7Ojuv1/r5IzD6jglx2N+0p7UizQFxoZVKON+4tP+/e6rGJLaiDZvvg2bN9+WyLyiwNlvxQr3VS5YVX09f37s75gqGYeGJhiZb6Ho9POvYx1V2dOnS54EK1iEklMVUapX8SPrOD8sQik/kiBcgFWbEjBbanIqsQV7YVFS+omz1Kwf43EdJkvsefN4PxAjLj8dh0hWrdIkwwR2oYyTfNfami4lP//5543Ofyzi8nMcp+T7xhp+Zq122FVfTZDJZKRdaxhZ/xdgGcq4Sq2ODveVC4cNYtOmO7Fp052mNUJJ0i+TyRS1DSxd6r7KBZbVVw4hEcxh++/PMpSCneQ2I5K+qTMXrA+l6Q4e4QK5JaStJSbLNiUAtLa2JjIfW394gS9sS8ohXTfU8TF1qvZJamPBghju6qURzn7r1pk20AvbUAL62xSXXx48fQ4sWtRuWiEvnP2++U3TBnoxHsp811Dqrlrm3heYU9V1cND9KSorhw2bBGPSb6yTCwYG3Pdq3jcELBjjoYz7Gko/r72W2KyK5rHHNgAAvva1Rw2bBGPaL98OdOVK953hNfKRYNvRIwi2klgoiz1LQxBysWX7SSyUjuNEbsPZ8mMI+eHUBxAnqai+xvFjcLxJlyAADDp6kmTatAufk+xgKoSmpoAz5RnB2e+uu0wb6MXaUHJj4cJXTSvkhbOfhDLFxHCSkDYcpwoAUFPD89ZsnP16e933hgazHrowGsqkO3Cyz6fk2HG0dat7rhjX45Sc/W65xX0vl24Cox09JnrTzp49Y00vnpBOUtH7Cugr3ZTSMhnBEC0tLSxrOjpJTSh1lG7DwzzPKxWKo9xrOqkJpQ4klEIaMNbRY6IKcvHF/YnPs1CuvZbvpVEAb7+77zZtoBdjoTRRBampGUh8noVy1VV8LyIGePutX2/aQC+pqr5GLV2zJ8MPD4/H8PB4zVZ6OHGiFidO1JrWCIWbn39b6OpyX+VC4qEspdqaW7rmXnkSdCVKW1vbxyfD9/ZORW8vz3uCbN++Btu3rzGtEQo3P/+28NWvuq9yIfHqa6nVVv9V6IXc/YzbOa6CMBapqr5mcRynoBK33I9nCeVJKkMJYNRNe4MCWO7Hs2ynXHe6ZXFCelBbUwJZ/pTrb1wWocwl7MeqrT2RsEnhXH8970fhcfa7917TBnopy1CGUV3N77KjLPPn835oLGe/G280baCX1LYpozA0VIGhIZ77od7eevT2jvGoaYNw9ctkMujsBDo7TZvow8gWaqqBfuwY36tgn37afUY4x+sVAb5+juPgG99wP8v1lCVQrg10wQxd5XQ6Dyyrvgrlybnc51GkHAmlIDBDQikIzODZFRkTkyf3mVYIZcmSvaYV8sLZb8mSvbjjjkbTGtqwKpRVVWdMK4Qyd+5h0wp54ew3d+5hLFtm2kIfVlVfBwcrMThYaVojkJ6ey9DTc5lpjVA4+/X0XIYOvjdwLxqrQnn8+CU4fvwS0xqB7Nq1HLt2LTetEQpnv127lmPt2vI5LGJVKIXypZwOiyQSynK9xEYQ4iCRUMoZPEISlMvOX6qvQtlQ6B0puGPVIZG6uo9MK4TS3LzbtEJeOPv53cqhVmZVKC+66KxphVA+8Ylu0wp54ezH2S0KVoXyzJmJphVCOXJkJgC+GxhnP85uUbCqTdnXNwV9fVNMawSye3czdu9uNq0RCme/XLewG6mlBatKSsEO0t6utKqkFOwiraWlhFIoW9JaYkooBYEZVrUpL7nkuGmFUJYv32VaIS+c/Ti7RcGqkrKychCVlYOmNQKZPv19TJ/+vmmNUDj75XNLY7vSqpLy9OmLTCuE8vbbswHwvZiYs18+tzS2K60K5YkTdaYVQtm7dwkAnhs9wNuPs1sUrKq+CkIakFAKZU/a2pUSSqHsSVu7UkIpWEGaSkurOnrq63tNK4Ry442/Mq2QF85+hbilqbS0KpQTJgybVgiloeGYaYW8cPbj7BYFq0I5MFBlWiGUzs55APg+nJWzH2e3KFgVyv7+yaYVQnnuucUA+G5YnP04u0VBOnoEa0hLZ4+EUrCGtHT2SCgFq0hDaSmhFKwiDfeGjb2jh9MKaGj40LRCKDff/JRphbxw9ivWjXs1NvZQcloBFRV8HwIzeXK/aYW8cPbj7BYFqw6JOE61aYVQDhxYAAC46qqDhk2C4ezH2S0KVrUpT56sxcmTtaY1AnnppWvw0kvXmNYIhbNfFDdOzapcrAqlIGTh3OEjoRSshVN/hx8JpWA1HEtLCaVgNRxLS6t6X6dO/cC0Qijr1m01rZAXzn6c3aJgVSjHjz9vWiGUmprTphXywtmPs1sUrKq+njo1CadOTTKtEci+fZ/Fvn2fNa0RCme/Ut24tSutKim5BhIAOjqaAAALF75q2CQYzn6lunFrV1pVUgpCGJxKy1hDyWlBBSEfjuOgtbXVtAaAmEPJrVogCPkYGhpiUZBI9VUQfHA4/c6qjp5LL+X5KDcA2LDhMdMKeeHsp9vNcRy0tLSgpqYGGzdu1DrtQrAqlOPGKdMKoVRW8r0nLcDbLy43U80vq0J58uTFphVCefFF99KjRYvaDZsEw9kvTjcTJaZVoXScGtMKoRw86F6oy3GjB3j7xe2Wrc7mEldYpaNHECISV/VWQikIJdDS0qK9tzaWUGYymcDiXhDKEd2HUbSHMpPJyEkDgnXoDCYpVfhhAiL6EMC7WuZcOA0AuD5YkrMbwNuPsxsQv98VSqmpQf8oKpQmIKJ2pRTL26hxdgN4+3F2A8z6SUePIDBDQikIzEhDKB82LZAHzm4Abz/OboBBP/ZtSkGwjTSUlIJgFcZCSUSziOg3RHSIiA4S0V95w79PRO8RUYf3Whkyfh0RPUlEr3vTWKzRbb5v/h1E1E9E93j/+wsi6vScfxQy/re9/x8goseJ6CINTo8Q0QdEdMA37B+95d9PRE8RUZ03fAIRPUpEr3nr5r6Qac4moheI6E0i2kJElRrdtvjW3ztE1OENrySiTZ7bq0S0NGSagcsWwS1sO3vAm3YHET1DRDO84URE/0xEb3n/v3qM6e/wL7cWlFJGXgCmA7ja+3wxgDcAfBrA9wFsLGD8RwH8qfe5EkBdTJ7jARwFcAWA3wfwLICJ3v8uDfj+5QAOA6jy/t4K4C4NHksAXA3ggG/YlwBUeJ9/COCH3ufbAPzC+1wN4B0AjQHT3ArgK97nnwK4W5dbzv9/DOB73uc/B7Apu/4AvAxgXMA4gcumcTur9X3nLwH81Pu8EsBOAATgOgAv5Jn2zQB+HrbcUV/GSkqlVI9S6hXv80kAh+Bu0GNCRLVwN4R/9cYfVEr1xaTaDOBtpdS7AO4G8A9KqbPefMPu7lwBoIqIKuCG4v9KlVBK7QVwPGfYM0qp7MWEvwUwM/svADXe/KsADAIY8RBHIiIANwB40hv0KIAv63LLmc86AI97gz4NYLc33gcA+gCMOh6YZ9mKdQvczpRS/vVRA3edAcBNAP5dufwWQB0RTQ9YrkkA/hrA30XxygeLNiURNQJYCOAFb9C3vKrDI0Q0JWCUOQA+BLCJiPYR0c+IKK7rsr6CCxvUPABf8Kp8e4jo2twvK6XeA5ABcARAD4ATSqlnYnLz88dw9/CAGzTHm/8RABmlVG5o6gH0+Tb8bhS4UyySLwB4Xyn1pvf3qwBuIqIKIpoN4HMAZo0xDf+yRSZ3OyOivyeiLgAbAHzP+9rlALp8o4Wtlwfg1gAGSvXKxXgovT3ONgD3eHuvhwDMBdAEd6P6ccBoFXCrSw8ppRbC3QD/Nga3SgCrATzhm+8UuNWa7wDY6pUE/nGmwN3bzgYwA26Jdbtut5x5fhfAMIDsfTEWATjnzX82gHuJaE7uaAGTiqMr/o9wYacGAI/A3dDbATwI4Dm47oEELFskArYzKKW+q5Sa5U37W9mvBow+Yr0QUROAK5VSsTxz3mgoiWgC3BX1mFJqOwAopd5XSp1TSp0H8C9wN7BcugF0K6WyJeuTcEOqmxUAXlFKZW/u0w1gu1e1eRHAebjnSPpZBuCwUupDpdQQgO0Aro/BDQBARHcCWAVgg/IaOnDblLuUUkNeFfF/MLqK2Au3apa90H0mNFSzc9wq4La7tmSHKaWGlVLfVko1KaVuAlAH4M2Q8YOWLYrHqO0sh58DWOt97sbIkjtovSwG8DkiegfAfwOYR0RtUf1yMdn7SnDbhIeUUv/kG+6vv68BMKpnSyl1FEAXEc33BjUD+N8YNHP38r+E2w4DEc2D28GUe9LyEQDXEVG1t4zNcNsx2iGi5QD+BsBqpZS/GnUEwA1eT2IN3JL9df+43kb+GwC3eIPuBPCfmhWXAXhdKdXtc67ONjWI6A8ADCulRv12eZatKPJsZ5/0fW01LqyfHQDu8NbddXCbHz3+aSqlHlJKzVBKNQL4PQBvKKWWRnUchc5eo2Je3sIoAPsBdHivlQD+A8Br3vAdAKZ7358B4Ne+8ZvgVoH2ww3LFM1+1QCOAZjsG1YJYDPcHcUrAG4IcWuB+yMf8JZnogafx+FW54fg7s3/BMBbcNs/2fWX7UGcBLfKfRDuzuo7vun8GsAM7/McAC9603kiqmeQmzf83wD8Wc53GwF0wt1RPQv3aons/34G4Brvc+CyadzOtnm/z34AT8Pt/AHc6utPALztbYfX+KbVETD9RmjufZUzegSBGcY7egRBGImEUhCYIaEUBGZIKAWBGRJKQWCGhFIQmCGhFARmSCgFgRn/D4qMmMbbuUjKAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# A more complex example\n", "import numpy as np\n", "import scipy as sp\n", "import matplotlib.pyplot as plt\n", "\n", "from sklearn import cluster\n", "\n", "\n", "try: # SciPy >= 0.16 have face in misc\n", " from scipy.misc import face\n", " face = face(gray=True)\n", "except ImportError:\n", " face = sp.face(gray=True)\n", "\n", "n_clusters = 5\n", "np.random.seed(0)\n", "\n", "X = face.reshape((-1, 1)) # We need an (n_sample, n_feature) array\n", "k_means = cluster.KMeans(n_clusters=n_clusters, n_init=4)\n", "k_means.fit(X)\n", "values = k_means.cluster_centers_.squeeze()\n", "labels = k_means.labels_\n", "\n", "# create an array from labels and values\n", "face_compressed = np.choose(labels, values)\n", "face_compressed.shape = face.shape\n", "\n", "vmin = face.min()\n", "vmax = face.max()\n", "\n", "# original face\n", "plt.figure(1, figsize=(3, 2.2))\n", "plt.imshow(face, cmap=plt.cm.gray, vmin=vmin, vmax=256)\n", "\n", "# compressed face\n", "plt.figure(2, figsize=(3, 2.2))\n", "plt.imshow(face_compressed, cmap=plt.cm.gray, vmin=vmin, vmax=vmax)\n", "\n", "# equal bins face\n", "regular_values = np.linspace(0, 256, n_clusters + 1)\n", "regular_labels = np.searchsorted(regular_values, face) - 1\n", "regular_values = .5 * (regular_values[1:] + regular_values[:-1]) # mean\n", "regular_face = np.choose(regular_labels.ravel(), regular_values, mode=\"clip\")\n", "regular_face.shape = face.shape\n", "plt.figure(3, figsize=(3, 2.2))\n", "plt.imshow(regular_face, cmap=plt.cm.gray, vmin=vmin, vmax=vmax)\n", "\n", "# histogram\n", "plt.figure(4, figsize=(3, 2.2))\n", "plt.clf()\n", "plt.axes([.01, .01, .98, .98])\n", "plt.hist(X, bins=256, color='.5', edgecolor='.5')\n", "plt.yticks(())\n", "plt.xticks(regular_values)\n", "values = np.sort(values)\n", "for center_1, center_2 in zip(values[:-1], values[1:]):\n", " plt.axvline(.5 * (center_1 + center_2), color='b')\n", "\n", "for center_1, center_2 in zip(regular_values[:-1], regular_values[1:]):\n", " plt.axvline(.5 * (center_1 + center_2), color='b', linestyle='--')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "## 3.2 Array Shape Manipulation\n", "\n", "### Flattening" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "'\\nA 1-D array, containing the elements of the input, is returned. A copy is\\n made only if needed.\\n Do help(np.ravel) to learn more\\n'" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([[1, 2, 3], [4, 5, 6]])\n", "a.ravel() \n", "\"\"\"\n", "A 1-D array, containing the elements of the input, is returned. A copy is\n", " made only if needed.\n", " Do help(np.ravel) to learn more\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([[1, 4],\n", " [2, 5],\n", " [3, 6]])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.T" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "### Reshaping" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 3)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.shape" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4, 5, 6])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.reshape(-1)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4, 5, 6])" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = a.ravel()\n", "b" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = b.reshape((2, 3))\n", "b" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Which is same as ...\n", "a.reshape(2, -1)\n" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[99, 2, 3],\n", " [ 4, 5, 6]])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Note: ndarray.reshape may return a view (cf help(np.reshape))), or copy\n", "b[0, 0] = 99\n", "a" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 0.],\n", " [0., 0.],\n", " [0., 0.]])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reshape also returns a copy, take a look\n", "a = np.zeros((3, 2))\n", "b = a.T.reshape(3 * 2)\n", "b [0] = 9\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Memory layout of a numpy array\n", "\n", "- [Here's a good example of how it works](https://eli.thegreenplace.net/2015/memory-layout-of-multi-dimensional-arrays/)\n", "- " ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.random.rand(2, 2)\n", "x.data" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(140391821883968, False)" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.__array_interface__['data']" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(140391821883968, False)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0].__array_interface__['data']" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(140391821883968, False)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0,:].__array_interface__['data']" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(140391821883984, False)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[1,:].__array_interface__['data']" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(140391740312704, False)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0,0].__array_interface__['data']" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false }, "source": [ "## 3.3 Sorting Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Function*\n", "\n", "`sort (arr, axis=-1, kind='quick', order=None)`\n", "\n", "*Method*\n", "\n", "`arr.sort (axis=-1, kind='quick', order=None)`" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 3 4]\n", " [1 3 3]]\n" ] } ], "source": [ "# Sorting along an axis. see what happens?\n", "a = np.array([[1, 4, 3], [3, 1, 3]])\n", "b = np.sort(a, axis=1)\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 3 4]\n", " [1 3 3]]\n" ] } ], "source": [ "# In-place sort\n", "a.sort(axis=1)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "data": { "text/plain": [ "array([3, 1, 0, 2])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sorting with fancy indexing\n", "a = np.array([5, 4, 6, 1])\n", "x = np.argsort(a)\n", "x" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "hideCode": false, "hidePrompt": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n", "2\n" ] } ], "source": [ "# Finding minima and maxima\n", "b = np.array([3,5,2,6])\n", "b_max = np.argmax(b)\n", "b_min = np.argmin(b)\n", "print(b_max)\n", "print(b_min)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some Exercises 😅\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1. Array manipulations\n", "\n", "Create this 2D array (without typing manually)\n", "\n", "`\n", "[[1, 7, 12],\n", " [2, 8, 13],\n", " [3, 9, 14],\n", " [4, 10, 15],\n", " [5, 11, 16]]\n", "`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "hideCode": false, "hidePrompt": false }, "source": [ "#### Fun Exercises: Challenge questions\n", "\n", "1. Try in-place, out_of_place sorting\n", "2. Create arrays with different dtypes and sort them.\n", "3. Use all or array_equal to see what it returns\n", "4. Use np.random.shuffle to create a more sortable input\n", "5. Combine ravel, sort and reshape in one\n", "6. Look at the `axis` keyword for `sort` and rewrite the previous exercise" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19],\n", " [20, 21, 22, 23, 24]])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(25).reshape(5, 5)\n", "a" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function sum in module numpy:\n", "\n", "sum(a, axis=None, dtype=None, out=None, keepdims=, initial=, where=)\n", " Sum of array elements over a given axis.\n", " \n", " Parameters\n", " ----------\n", " a : array_like\n", " Elements to sum.\n", " axis : None or int or tuple of ints, optional\n", " Axis or axes along which a sum is performed. The default,\n", " axis=None, will sum all of the elements of the input array. If\n", " axis is negative it counts from the last to the first axis.\n", " \n", " .. versionadded:: 1.7.0\n", " \n", " If axis is a tuple of ints, a sum is performed on all of the axes\n", " specified in the tuple instead of a single axis or all the axes as\n", " before.\n", " dtype : dtype, optional\n", " The type of the returned array and of the accumulator in which the\n", " elements are summed. The dtype of `a` is used by default unless `a`\n", " has an integer dtype of less precision than the default platform\n", " integer. In that case, if `a` is signed then the platform integer\n", " is used while if `a` is unsigned then an unsigned integer of the\n", " same precision as the platform integer is used.\n", " out : ndarray, optional\n", " Alternative output array in which to place the result. It must have\n", " the same shape as the expected output, but the type of the output\n", " values will be cast if necessary.\n", " keepdims : bool, optional\n", " If this is set to True, the axes which are reduced are left\n", " in the result as dimensions with size one. With this option,\n", " the result will broadcast correctly against the input array.\n", " \n", " If the default value is passed, then `keepdims` will not be\n", " passed through to the `sum` method of sub-classes of\n", " `ndarray`, however any non-default value will be. If the\n", " sub-class' method does not implement `keepdims` any\n", " exceptions will be raised.\n", " initial : scalar, optional\n", " Starting value for the sum. See `~numpy.ufunc.reduce` for details.\n", " \n", " .. versionadded:: 1.15.0\n", " \n", " where : array_like of bool, optional\n", " Elements to include in the sum. See `~numpy.ufunc.reduce` for details.\n", " \n", " .. versionadded:: 1.17.0\n", " \n", " Returns\n", " -------\n", " sum_along_axis : ndarray\n", " An array with the same shape as `a`, with the specified\n", " axis removed. If `a` is a 0-d array, or if `axis` is None, a scalar\n", " is returned. If an output array is specified, a reference to\n", " `out` is returned.\n", " \n", " See Also\n", " --------\n", " ndarray.sum : Equivalent method.\n", " \n", " add.reduce : Equivalent functionality of `add`.\n", " \n", " cumsum : Cumulative sum of array elements.\n", " \n", " trapz : Integration of array values using the composite trapezoidal rule.\n", " \n", " mean, average\n", " \n", " Notes\n", " -----\n", " Arithmetic is modular when using integer types, and no error is\n", " raised on overflow.\n", " \n", " The sum of an empty array is the neutral element 0:\n", " \n", " >>> np.sum([])\n", " 0.0\n", " \n", " For floating point numbers the numerical precision of sum (and\n", " ``np.add.reduce``) is in general limited by directly adding each number\n", " individually to the result causing rounding errors in every step.\n", " However, often numpy will use a numerically better approach (partial\n", " pairwise summation) leading to improved precision in many use-cases.\n", " This improved precision is always provided when no ``axis`` is given.\n", " When ``axis`` is given, it will depend on which axis is summed.\n", " Technically, to provide the best speed possible, the improved precision\n", " is only used when the summation is along the fast axis in memory.\n", " Note that the exact precision may vary depending on other parameters.\n", " In contrast to NumPy, Python's ``math.fsum`` function uses a slower but\n", " more precise approach to summation.\n", " Especially when summing a large number of lower precision floating point\n", " numbers, such as ``float32``, numerical errors can become significant.\n", " In such cases it can be advisable to use `dtype=\"float64\"` to use a higher\n", " precision for the output.\n", " \n", " Examples\n", " --------\n", " >>> np.sum([0.5, 1.5])\n", " 2.0\n", " >>> np.sum([0.5, 0.7, 0.2, 1.5], dtype=np.int32)\n", " 1\n", " >>> np.sum([[0, 1], [0, 5]])\n", " 6\n", " >>> np.sum([[0, 1], [0, 5]], axis=0)\n", " array([0, 6])\n", " >>> np.sum([[0, 1], [0, 5]], axis=1)\n", " array([1, 5])\n", " >>> np.sum([[0, 1], [np.nan, 5]], where=[False, True], axis=1)\n", " array([1., 5.])\n", " \n", " If the accumulator is too small, overflow occurs:\n", " \n", " >>> np.ones(128, dtype=np.int8).sum(dtype=np.int8)\n", " -128\n", " \n", " You can also start the sum with a value other than zero:\n", " \n", " >>> np.sum([10], initial=5)\n", " 15\n", "\n" ] } ], "source": [ "help(np.sum)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function sum in module numpy.matrixlib.defmatrix:\n", "\n", "sum(self, axis=None, dtype=None, out=None)\n", " Returns the sum of the matrix elements, along the given axis.\n", " \n", " Refer to `numpy.sum` for full documentation.\n", " \n", " See Also\n", " --------\n", " numpy.sum\n", " \n", " Notes\n", " -----\n", " This is the same as `ndarray.sum`, except that where an `ndarray` would\n", " be returned, a `matrix` object is returned instead.\n", " \n", " Examples\n", " --------\n", " >>> x = np.matrix([[1, 2], [4, 3]])\n", " >>> x.sum()\n", " 10\n", " >>> x.sum(axis=1)\n", " matrix([[3],\n", " [7]])\n", " >>> x.sum(axis=1, dtype='float')\n", " matrix([[3.],\n", " [7.]])\n", " >>> out = np.zeros((2, 1), dtype='float')\n", " >>> x.sum(axis=1, dtype='float', out=np.asmatrix(out))\n", " matrix([[3.],\n", " [7.]])\n", "\n" ] } ], "source": [ "help(np.matrix.sum)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.5" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum([1.0, 1.5])" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum([1.0, 0.4, 0.5, 0.6], dtype=np.int32)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum([[0, 2], [0, 6]])" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 8])" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum([[0, 2], [0, 6]], axis=0)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 6])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum([[0, 2], [0, 6]], axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "hide_code_all_hidden": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 2 }