{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Numpy\n", "[Numpy](http://numpy.scipy.org/) is the fundamental library for scientific computing in Python. It contains list like objects that work like arrays, matrices, and data tables. This is how scientists typically expect data to behave. Numpy also provides linear algebra, Fourier transforms, random number generation, and tools for integrating C/C++ and Fortran code.\n", "\n", "[Matplotlib](http://matplotlib.org/) is the reigning library for 2D (with budding support for 3D) scientific plotting in Python. It produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Between Numpy and Matplotlib, much of MATLAB's functionality can be replaced with Python.\n", "\n", "If you primarily want to work with tables of data, [Pandas](pandas), which depends on Numpy, is probably the module that you want to start with." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Why Numpy?\n", "\n", "Python was never designed originally for scientific computing, and contains many high-level abstractions necessary to enable its enormously flexible object-oriented interface. In Python, storing most integers requires more than just 4-8 bytes. It also requires at least a couple pointers per-integer. Performing a calculation on two numbers requires one or two bytecode operations, each of which can take dozens of CPU instructions for each pass through the Python eval loop. And when it comes to looping and index operations of Python lists the situation is even more dire.\n", "\n", "### A basic example\n", "\n", "`Z = A + B * C`\n", "\n", "In pure Python:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Create 3 lists of a million ints\n", "A = range(1000000)\n", "B = range(1000000)\n", "C = range(1000000)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.36 s ± 500 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "# Time doing the operation with a for loop\n", "Z = []\n", "for idx in range(len(A)):\n", " Z.append(A[idx] + B[idx] * C[idx])\n", " \n", "# print(Z) # DON'T DO THIS! It will print all 10000 array items" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Create 3 Numpy arrays with a million ints\n", "\n", "import numpy as np\n", "\n", "A = np.arange(1000000)\n", "B = np.arange(1000000)\n", "C = np.arange(1000000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using Numpy:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "18.9 ms ± 475 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "%%timeit\n", "# Time the operation with Numpy\n", "Z = A + B * C" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0 2 6 ... 999995000006 999997000002\n", " 999999000000]\n" ] } ], "source": [ "# Print the result\n", "Z = A + B * C\n", "print(Z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to just *looking* simpler, the Numpy version is significantly faster. The for loop disappears completely and is replaced by vectorized array operations. When printing very large arrays the output is also truncated by default, and for machine integers much less memory is used." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Memory Usage\n", "3 x 1000000 lists of Python ints: ~96 MB\n", "\n", "3 x 1000000 Numpy arrays of 64-bit ints: ~32 MB" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More human-friendly interfaces to numerical libraries\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numpy and Matplotlib" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "\n", "ax = plt.subplot(111)\n", "x = np.arange(0.0, 5.0, 0.01)\n", "y = np.cos(x * np.pi)\n", "plt.ylim(-1.5, 1.5)\n", "lines, = plt.plot(x, y, lw=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numpy Array Basics\n", "#### Creating a Numpy array" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create an array from a list of ints and show the array\n", "import numpy\n", "\n", "vals = [1, 2, 3]\n", "arr = numpy.array(vals)\n", "arr" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3]\n" ] } ], "source": [ "# Print the array--notice any difference?\n", "print(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unlike Python lists, NumPy arrays are homogeneous: all values must have exactly the same type. This allows values to be packed together as shown here, which saves memory and is much faster to process.\n", "\n", "\n", "\n", "If we give NumPy initial values of different types, it finds the most general type and stores all the values in the array using that type. For example, if we construct an array from an integer and a float, the array's values are both floats: " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1. , 2.3])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create an array from a heterogeneous list\n", "arr = numpy.array([1, 2.3])\n", "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want a specific type, we can pass an optional argument to array called dtype (for \"data type\"). For example, we can tell NumPy to create an array of 32-bit floats even though all the initial values are integers: " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1., 2., 3., 4.], dtype=float32)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create an array of floats from a list of ints\n", "numpy.array([1, 2, 3, 4], dtype=numpy.float32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy provides many basic numerical data types, each of which is identified by a name like float32.\n", "The three called `int`, `float`, and `complex` are whatever the underlying hardware+OS platform uses as its native C types: this will usually be 32- or 64-bit.\n", "\n", "Note: Changing the dtype of an array is usually not going to yield anything useful. Instead use `arr.astype()` or similar:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3 4]\n", "[1. 2. 3. 4.]\n", "[1.+0.j 2.+0.j 3.+0.j 4.+0.j]\n" ] } ], "source": [ "# Create arrays of ints, floats, and complex from lists of ints\n", "print(numpy.array([1, 2, 3, 4], dtype=numpy.int))\n", "print(numpy.array([1, 2, 3, 4], dtype=numpy.float))\n", "print(numpy.array([1, 2, 3, 4], dtype=numpy.complex))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3 4]\n", "[1.e-45 3.e-45 4.e-45 6.e-45]\n", "[1. 2. 3. 4.]\n" ] } ], "source": [ "# Create an array of ints and show the dtype\n", "arr = numpy.array([1, 2, 3, 4], dtype=numpy.int32)\n", "print(arr)\n", "\n", "# Try reassigning the dtype to float\n", "arr.dtype = numpy.float32\n", "print(arr)\n", "\n", "# Restore the correct dtype and use .astype() instead\n", "arr.dtype = numpy.int32\n", "print(arr.astype(numpy.float32))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many other ways to create arrays besides calling the basic `array()` constructor. For example, the `zeros()` function takes a tuple specifying array dimensions as an argument and returns an array of zeros of that size: " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 0., 0.],\n", " [0., 0., 0.]])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a 2x3 array of zeros\n", "z = numpy.zeros((2, 3))\n", "z" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The array's dtype defaults `float` unless something else is specified with the `dtype=` argument. This is typical in most functions in Numpy that create/return arrays." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 0, 0],\n", " [0, 0, 0]], dtype=int32)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a 2x3 array of integer zeros\n", "z = numpy.zeros((2, 3), dtype=numpy.int32)\n", "z" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1. 1.]\n", " [1. 1. 1.]]\n", "\n", "[[1. 0.]\n", " [0. 1.]]\n" ] } ], "source": [ "# The ones and identity functions work much the same way:\n", "print(numpy.ones((2, 3)))\n", "print()\n", "print(numpy.identity(2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's also possible to create NumPy arrays without filling them with data using the empty function. This function does not initialize the values, so the array contains whatever bits were lying around in memory when it was called:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 0.],\n", " [0., 1.]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a 2x2 empty array\n", "arr = numpy.empty((2, 2))\n", "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This might not seem particularly useful, but if a program is going to overwrite an array immediately after creating it, perhaps by filling it with the result of some computation, there's no point taking the time to fill it with zeroes or ones." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[10 11 12 13 14 15 16 17 18 19]\n" ] } ], "source": [ "# Another frequently useful array creation function is arange:\n", "\n", "arr = np.arange(10, 20)\n", "print(arr)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[10. 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 11. 11.1 11.2 11.3\n", " 11.4 11.5 11.6 11.7 11.8 11.9 12. 12.1 12.2 12.3 12.4 12.5 12.6 12.7\n", " 12.8 12.9 13. 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 14. 14.1\n", " 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 15. 15.1 15.2 15.3 15.4 15.5\n", " 15.6 15.7 15.8 15.9 16. 16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8 16.9\n", " 17. 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8 17.9 18. 18.1 18.2 18.3\n", " 18.4 18.5 18.6 18.7 18.8 18.9 19. 19.1 19.2 19.3 19.4 19.5 19.6 19.7\n", " 19.8 19.9]\n" ] } ], "source": [ "arr = np.arange(10, 20, 0.1)\n", "print(arr)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 5.26315789 10.52631579 15.78947368 21.05263158\n", " 26.31578947 31.57894737 36.84210526 42.10526316 47.36842105\n", " 52.63157895 57.89473684 63.15789474 68.42105263 73.68421053\n", " 78.94736842 84.21052632 89.47368421 94.73684211 100. ]\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEICAYAAACzliQjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAR4klEQVR4nO3dfYxddZ3H8fcXasHism1hINjSDiaND2tUyETxIcal/iFoLH9oojtZG0PSbOKu+JAI2j+Mf5DoxviUNW4moFs2DatbiTTGuCEFY/YPuw5CeLDslkVbCpUOqTysjQHS7/5xzoRhuHemc++5c+89v/crae7cM+fe+5szZz7z6znnfiYyE0lSGc4a9gAkSavH0Jekghj6klQQQ1+SCmLoS1JBDH1JKsiyoR8R34+IExHx4IJlGyPizog4XN9uqJdHRHwnIh6JiPsj4opBDl6StDKx3HX6EfFe4P+AWzPzzfWyfwROZuZXI+JGYENm3hAR1wD/AFwDvAP4dma+Y7lBXHjhhTk5OdnfVyJJbXXyJBw5AqdPv7TsrLN48PTp5/6cef5KnmrNcitk5i8jYnLR4h3A++qP9wC/AG6ol9+a1W+SX0XE+oi4JDOPL/Uak5OTzM7OrmTcklSOycmXBz7A6dO8Fc5Z6VP1ekz/4vkgr28vqpdvAh5bsN6xetkrRMSuiJiNiNm5ubkehyFJBTh6tOPiV8HalT5V0ydyo8OyjsePMnMmM6cyc2piYqLhYUhSi2zZ0nHxC/D8Sp+q19B/MiIuAahvT9TLjwGXLlhvM/BEj68hSQK46SZYt+7ly9at4wl4fKVP1Wvo7wd21h/vBO5YsPwT9VU8VwLPLHc8X5K0jOlpmJmBrVshorqdmeEpOLnSpzqTq3duozppeyHwJPBl4CfAj4AtwFHgo5l5MiIC+CfgA8Ap4JOZuewZ2qmpqfREriStTETck5lTK3nMmVy98/Eun9reYd0EPrWSAUiSVo/vyJWkghj6klQQQ1+SCmLoS9Ig7d1bvaP2rLOq2717hzqcZU/kSpJ6tHcv7NoFp05V948cqe5DdRnmEDjTl6RB2b37pcCfd+pUtXxIDH1JGpQunTldl68CQ1+SBqVLZ07X5avA0JekQenSmcNNNw1nPBj6kjQ4XTpzhnUSF7x6R5IGa3p6qCG/mDN9SSqIoS9JBTH0Jakghr4kdTNiFQpN8ESuJHUyghUKTXCmL0mdjGCFQhMMfUnqZAQrFJpg6EtSJyNYodAEQ1+SOhnBCoUmGPqS1MkIVig0wat3JKmbEatQaIIzfUkqiKEvSQUx9CWpIIa+pHZqYYVCEzyRK6l9Wlqh0ARn+pLap6UVCk0w9CW1T0srFJpg6Etqn5ZWKDTB0JfUPi2tUGiCoS+pfVpaodAEr96R1E4trFBogjN9SSpIX6EfEZ+NiIci4sGIuC0izo2IyyLiYEQcjogfRsTapgYrSepPz6EfEZuATwNTmflm4GzgY8DXgG9m5jbgj8B1TQxUktS/fg/vrAFeHRFrgHXAceAqYF/9+T3AtX2+hqSSWJ8wUD2HfmY+DnwdOEoV9s8A9wBPZ+aL9WrHgE2dHh8RuyJiNiJm5+bmeh2GpDaZr084cgQyX6pPMPgb08/hnQ3ADuAy4LXAecDVHVbNTo/PzJnMnMrMqYmJiV6HIalNrE8YuH4O77wf+F1mzmXmC8DtwLuA9fXhHoDNwBN9jlFSKaxPGLh+Qv8ocGVErIuIALYDvwXuBj5Sr7MTuKO/IUoqhvUJA9fPMf2DVCdsfwM8UD/XDHAD8LmIeAS4ALilgXFKKoH1CQPX1ztyM/PLwJcXLX4UeHs/zyupUPPvoN29uzqks2VLFfi+s7Yx1jBIGi3WJwyUNQySVBBDX5IKYuhLUkEMfUkqiKEvqTn25ow8r96R1Iz53pz5GoX53hzwapwR4kxfUjPszRkLhr6kZtibMxYMfUnNsDdnLBj6kpphb85YMPQlNWN6GmZmYOtWiKhuZ2Y8iTtivHpHUnPszRl5zvQlqSCGviQVxNCXpIIY+pIqVigUwRO5kqxQKIgzfUlWKBTE0JdkhUJBDH1JVigUxNCXZIVCQQx9SVYoFMSrdyRVrFAogjN9SSqIoS9JBTH0Jakghr7UBlYo6Ax5Ilcad1YoaAWc6UvjzgoFrYChL407KxS0Aoa+NO6sUNAKGPrSuLNCQSvQV+hHxPqI2BcRD0fEoYh4Z0RsjIg7I+JwfbuhqcFK6sAKBa1AvzP9bwM/z8w3AG8FDgE3AgcycxtwoL4vaZCmp+H3v4fTp6tbA19d9Bz6EXE+8F7gFoDMfD4znwZ2AHvq1fYA1/Y7SElSM/qZ6b8OmAN+EBH3RsTNEXEecHFmHgeoby9qYJySpAb0E/prgCuA72Xm5cCfWMGhnIjYFRGzETE7NzfXxzAkSWeqn9A/BhzLzIP1/X1UvwSejIhLAOrbE50enJkzmTmVmVMTExN9DEMaY9YnaJX1HPqZ+QfgsYh4fb1oO/BbYD+ws162E7ijrxFKbTVfn3DkCGS+VJ9g8GuAIjN7f3DE24CbgbXAo8AnqX6R/AjYAhwFPpqZJ5d6nqmpqZydne15HNJYmpysgn6xrVurK3CkZUTEPZk5tZLH9FW4lpn3AZ1ecHs/zysVwfoEDYHvyJWGxfoEDYGhLw2L9QkaAkNfGhbrEzQE/hEVaZimpw15rSpn+pJUEENfkgpi6EtSQQx9qVdWKGgMeSJX6sV8hcL8HySfr1AAT8xqpDnTl3qxe/dLgT/v1KlquTTCDH2pF1YoaEwZ+lIvrFDQmDL0pV5YoaAxZehLvbBCQWPKq3ekXlmhoDHkTF+SCmLoS1JBDH1JKoihL0kFMfRVJntzVCiv3lF57M1RwZzpqzz25qhghr7KY2+OCmboqzz25qhghr7KY2+OCmboqzz25qhgXr2jMtmbo0I505ekghj6klQQQ1+SCmLoa/xYoSD1zBO5Gi9WKEh9caav8WKFgtQXQ1/jxQoFqS99h35EnB0R90bET+v7l0XEwYg4HBE/jIi1/Q9TqlmhIPWliZn+9cChBfe/BnwzM7cBfwSua+A1pIoVClJf+gr9iNgMfBC4ub4fwFXAvnqVPcC1/byG9DJWKEh96ffqnW8BXwD+or5/AfB0Zr5Y3z8GbOr0wIjYBewC2OJ/zbUSVihIPet5ph8RHwJOZOY9Cxd3WDU7PT4zZzJzKjOnJiYmeh2GJGkF+pnpvxv4cERcA5wLnE81818fEWvq2f5m4In+hylJakLPM/3M/GJmbs7MSeBjwF2ZOQ3cDXykXm0ncEffo5QkNWIQ1+nfAHwuIh6hOsZ/ywBeQ+PI+gRp6BqpYcjMXwC/qD9+FHh7E8+rFrE+QRoJviNXq8P6BGkkGPpaHdYnSCPB0NfqsD5BGgmGvlaH9QnSSDD0tTqsT5BGgn9ERavH+gRp6JzpS1JBDH1JKoihL0kFMfR1ZqxQkFrBE7lanhUKUms409fyrFCQWsPQ1/KsUJBaw9DX8qxQkFrD0NfyrFCQWsPQ1/KsUJBaw6t3dGasUJBawZm+JBXE0Jekghj6klQQQ1+SCmLol8DeHEk1r95pO3tzJC3gTL/t7M2RtICh33b25khawNBvO3tzJC1g6LedvTmSFjD0287eHEkLePVOCezNkVRzpi9JBTH0Jakghr4kFcTQH2XWJ0hqmCdyR5X1CZIGoOeZfkRcGhF3R8ShiHgoIq6vl2+MiDsj4nB9u6G54RbE+gRJA9DP4Z0Xgc9n5huBK4FPRcSbgBuBA5m5DThQ39dKWZ8gaQB6Dv3MPJ6Zv6k/fg44BGwCdgB76tX2ANf2O8giWZ8gaQAaOZEbEZPA5cBB4OLMPA7VLwbgoi6P2RURsxExOzc318Qw2sX6BEkD0HfoR8RrgB8Dn8nMZ8/0cZk5k5lTmTk1MTHR7zDax/oESQPQ19U7EfEqqsDfm5m314ufjIhLMvN4RFwCnOh3kMWyPkFSw/q5eieAW4BDmfmNBZ/aD+ysP94J3NH78CRJTepnpv9u4G+BByLivnrZl4CvAj+KiOuAo8BH+xuiJKkpPYd+Zv4nEF0+vb3X55UkDY41DINihYKkEWQNwyBYoSBpRDnTHwQrFCSNKEN/EKxQkDSiDP1BsEJB0ogy9AfBCgVJI8rQHwQrFCSNKK/eGRQrFCSNIGf6klQQQ1+SCmLoS1JBDP1OrFCQ1FKeyF3MCgVJLeZMfzErFCS1mKG/mBUKklrM0F/MCgVJLWboL2aFgqQWM/QXs0JBUot59U4nVihIailn+pJUEENfkgpi6EtSQdoX+lYoSFJX7TqRa4WCJC2pXTN9KxQkaUntCn0rFCRpSe0KfSsUJGlJ7Qp9KxQkaUntCn0rFCRpSe26egesUJCkJbRrpi9JWpKhL0kFMfQlqSADCf2I+EBE/HdEPBIRNw7iNSRJK9d46EfE2cB3gauBNwEfj4g3LftAO3MkaeAGcfXO24FHMvNRgIj4N2AH8Nuujzh50s4cSVoFgzi8swl4bMH9Y/Wy7h5/3M4cSVoFgwj96LAsX7FSxK6ImI2IWZ5/vvMz2ZkjSY0aROgfAy5dcH8z8MTilTJzJjOnMnOKtWs7P5OdOZLUqMh8xSS8vyeMWAP8D7AdeBz4NfA3mflQt8ecG/HsX8F5seCXUMLpo3DkKTjZ6ADHx4XAU8MexAhwO7gN5rkdKgu3w9bMnFjJgxs/kZuZL0bE3wP/AZwNfH+pwAf4c+b58x9HxGxmTjU9rnHjdqi4HdwG89wOlX63w0C6dzLzZ8DPBvHckqTe+Y5cSSrIKIb+zLAHMCLcDhW3g9tgntuh0td2aPxEriRpdI3iTF+SNCCGviQVZKRCv8R2zoi4NCLujohDEfFQRFxfL98YEXdGxOH6dsOwx7oaIuLsiLg3In5a378sIg7W2+GHEdHlnXztERHrI2JfRDxc7xfvLG1/iIjP1j8PD0bEbRFxbgn7QkR8PyJORMSDC5Z1/N5H5Tt1Xt4fEVecyWuMTOj33M45/l4EPp+ZbwSuBD5Vf903AgcycxtwoL5fguuBQwvufw34Zr0d/ghcN5RRra5vAz/PzDcAb6XaHsXsDxGxCfg0MJWZb6Z6v8/HKGNf+BfgA4uWdfveXw1sq//tAr53Ji8wMqHPgnbOzHwemG/nbLXMPJ6Zv6k/fo7qB3wT1de+p15tD3DtcEa4eiJiM/BB4Ob6fgBXAfvqVVq/HSLifOC9wC0Amfl8Zj5NefvDGuDV9Tv81wHHKWBfyMxf8soWgm7f+x3ArVn5FbA+Ii5Z7jVGKfRX3s7ZMhExCVwOHAQuzszjUP1iAC4a3shWzbeALwCn6/sXAE9n5ov1/RL2idcBc8AP6sNcN0fEeRS0P2Tm48DXgaNUYf8McA/l7Qvzun3ve8rMUQr9M2rnbKuIeA3wY+AzmfnssMez2iLiQ8CJzLxn4eIOq7Z9n1gDXAF8LzMvB/5Eiw/ldFIfs94BXAa8FjiP6lDGYm3fF5bT08/HKIX+GbVztlFEvIoq8Pdm5u314ifn/6tW354Y1vhWybuBD0fE76kO7V1FNfNfX/8XH8rYJ44BxzLzYH1/H9UvgZL2h/cDv8vMucx8AbgdeBfl7Qvzun3ve8rMUQr9XwPb6jP0a6lO3Owf8pgGrj5ufQtwKDO/seBT+4Gd9cc7gTtWe2yrKTO/mJmbM3OS6nt/V2ZOA3cDH6lXK2E7/AF4LCJeXy/aTvVX50raH44CV0bEuvrnY34bFLUvLNDte78f+ER9Fc+VwDPzh4GWlJkj8w+4hqqW+X+B3cMezyp9ze+h+i/Z/cB99b9rqI5nHwAO17cbhz3WVdwm7wN+Wn/8OuC/gEeAfwfOGfb4VuHrfxswW+8TPwE2lLY/AF8BHgYeBP4VOKeEfQG4jeo8xgtUM/nrun3vqQ7vfLfOyweornZa9jWsYZCkgozS4R1J0oAZ+pJUEENfkgpi6EtSQQx9SSqIoS8BEfHputFy77DHIg2Sl2xKQEQ8DFydmb8b9likQXKmr+JFxD9TvfFnf0R8PiJ+UveT/yoi3lKvs7HTcmncGPoqXmb+HVVnyV8Dk8C9mfkW4EvArfVqX+myXBorhr70cu+hets/mXkXcEFE/OUSy6WxYuhLL9etrrbEmme1kKEvvdwvgWmAiHgf8FRWf9+g23JprHj1jgTUPf5TVH+16wdUf8DjFLArM++PiI2dlg9puFLPDH1JKoiHdySpIIa+JBXE0Jekghj6klQQQ1+SCmLoS1JBDH1JKsj/A1j2xjJu7B8FAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# linspace can be used to create an n-sized array of evenly spaced samples in some range:\n", "\n", "arr = np.linspace(0, 100, 20)\n", "print(arr)\n", "\n", "plt.axis([-1, 101, -1, 101])\n", "plt.xlabel('foo')\n", "lines, = plt.plot(arr, arr, 'ro')" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 1.12883789 1.27427499 1.43844989 1.62377674 1.83298071\n", " 2.06913808 2.33572147 2.6366509 2.97635144 3.35981829 3.79269019\n", " 4.2813324 4.83293024 5.45559478 6.15848211 6.95192796 7.8475997\n", " 8.8586679 10. ]\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAO30lEQVR4nO3db4hdd53H8c8n/2im3ca2GaUmzdwKxa4061Yuu9WAiOODqq3pE6EyLUUW5smutrJQ6g5L8cHAPpAlebAsDLUacOgisaytiKtERfbBhr1JC2kbF6V2ptHY3Bqc6kZM3H73wdyJmZu5M5N7zj3n/M55vyDMzMmdOd9L0k9/+Z3f7/tzRAgAkJ4tZRcAABgOAQ4AiSLAASBRBDgAJIoAB4BEbSvyZrt3745Wq1XkLQEgeSdOnHgzIsb7rxca4K1WS51Op8hbAkDybC+sdZ0pFABIFAEOAIkiwAEgUQQ4ACSKAAeARBHgAJAoAhwAEkWAA0CiCHAASBQBDgCJIsABIFEEOAAkigAHgEQR4ACQKAIcABK1YYDbftr2OdsvXXHtZtvft/3T3sebRlsmAJRj/tS8Woda2vKlLWodamn+1HzZJV22mRH41yTd23ftCUnHIuIOScd6XwNArcyfmtf089NaWFpQKLSwtKDp56crE+IbBnhE/FjS+b7LByUd6X1+RNIDOdcFAKWbOTajC5curLp24dIFzRybKami1YadA39XRJyVpN7Hdw56oe1p2x3bnW63O+TtAKB4i0uL13S9aCN/iBkRcxHRjoj2+PhVZ3ICQGXt27Xvmq4XbdgAf8P2rZLU+3guv5IAoBpmJ2c1tn1s1bWx7WOanZwtqaLVhg3w5yQ90vv8EUnfyqccAKiOqf1Tmrt/ThO7JmRZE7smNHf/nKb2T5VdmiTJEbH+C+xnJH1E0m5Jb0h6UtK/S/qGpH2SFiV9OiL6H3Repd1uR6fTyVgyADSL7RMR0e6/vm2jb4yIzwz4rcnMVQEAhsZOTABIFAEOAIkiwAEgUQQ4ACSKAAeARBHgAJAoAhwAEkWAA0CiCHAASBQBDgCJIsABIFEEOAAkigAHgEQR4ACQKAIcQOXMn5pX61BLW760Ra1DrcqcAl81G/YDB4AizZ+a1/Tz05dPg19YWtD089OSVJmTcKqCETiASpk5NnM5vFdcuHRBM8dmSqqoughwAJWyuLR4TdebjAAHUCn7du27putNRoADqJTZyVmNbR9bdW1s+5hmJ2dLqqi6CHAAlTK1f0pz989pYteELGti14Tm7p/jAeYaHBGF3azdbken0ynsfgBQB7ZPRES7/zojcABIFAEOAIkiwAEgUQQ4ACSKAAeARBHgAJAoAhwAEkWAA0CiCHAASFSmALf9Bdsv237J9jO2r8urMADA+oYOcNt7JH1eUjsi7pK0VdKDeRUGAFhf1imUbZJ22t4maUzSL7OXBADYjKEDPCJ+IenLkhYlnZW0FBHf63+d7WnbHdudbrc7fKUACsfZlNWWZQrlJkkHJd0u6d2Srrf9UP/rImIuItoR0R4fHx++UgCFWjmbcmFpQaG4fDYlIV4dWaZQPibp5xHRjYhLkp6V9KF8ygJQNs6mrL4sAb4o6R7bY7YtaVLS6XzKAlA2zqasvixz4MclHZV0UtKp3s+ay6kuACXjbMrqy7QKJSKejIg7I+KuiHg4Iv6QV2EAysXZlNXHTkwAa+JsyurjTEwAqDjOxASAmiHAASBRBDgAJIoAB4BEEeAAkCgCHAASRYADQKIIcABIFAEOAIkiwIGEcMACrrSt7AIAbM7KAQsrPbpXDliQRH+ShmIEDiSCAxbQjwAHEsEBC+hHgAOJ4IAF9CPAgURwwAL6EeBAIjhgAf040AEAKo4DHQCgZghwAEgUAQ4AiSLAASBRBDgAJIoAB4BEEeAAkCgCHAASRYADOaJfN4pEP3AgJ/TrRtEYgQM5oV83ipYpwG2/w/ZR2z+xfdr2B/MqDEgN/bpRtKwj8MOSvhsRd0p6v6TT2UsC0kS/bhRt6AC3faOkD0v6iiRFxMWI+E1ehQGpoV83ipZlBP4eSV1JX7X9gu2nbF/f/yLb07Y7tjvdbjfD7YBqo183ijZ0P3DbbUn/JelARBy3fVjSWxHxj4O+h37gAHDtRtEP/IykMxFxvPf1UUkfyPDzAADXYOgAj4hfSXrd9nt7lyYlvZJLVQCADWXdyPM5SfO2d0h6VdJns5cEANiMTAEeES9KumpeBgAweuzEBIBEEeAAkCgCHI1F50Ckjm6EaCQ6B6IOGIGjkegciDogwNFIdA5EHRDgaCQ6B6IOCHA0Ep0DUQcEOBqJzoGog6G7EQ6DboQAcO1G0Y0QAFAiAhwAEkWAA0CiCHAASBQBjqTQvwT4E3qhIBn0LwFWYwSOZNC/BFiNAEcy6F8CrEaAIxn0LwFWI8CRDPqXAKsR4EgG/UuA1eiFAgAVRy8UAKgZAhwAEkWAA0CiCHAUhm3wQL7YSo9CsA0eyB8jcBSCbfBA/ghwFIJt8ED+CHAUgm3wQP4IcBSCbfBA/jIHuO2ttl+w/e08CkI9sQ0eyF8eq1AelXRa0o05/CzU2NT+KQIbyFGmEbjtvZI+KempfMoBAGxW1imUQ5Iel/T2oBfYnrbdsd3pdrsZbwcAWDF0gNu+T9K5iDix3usiYi4i2hHRHh8fH/Z2qAh2UwLVkWUO/ICkT9n+hKTrJN1o++sR8VA+paFq2E0JVMvQI/CI+GJE7I2IlqQHJf2A8K43dlMC1cI6cGwauymBasklwCPiRxFxXx4/C9XFbkqgWhiBY9PYTQlUCwGOTWM3JVAtHGoMABXHoca4Cmu6gbRxIk9DsaYbSB8j8IZiTTeQPgK8oVjTDaSPAG8o1nQD6SPAG4o13UD6CPCGYk03kD7WgQNAxbEOvAFY1w00C+vAa4J13UDzMAKvCdZ1A81DgNcE67qB5iHAa4J13UDzEOA1wbpuoHkI8JpgXTfQPKwDT8D8qXnNHJvR4tKi9u3ap9nJWYIZaJBB68BZRlhxLA8EMAhTKBXH8kAAgxDgFcfyQACDEOAVx/JAAIMQ4BXH8kAAgxDgJduoARXLAwEMwjLCEvWvMJGWR9cENIAr0U62glhhAiALArxErDABkAUBXiJWmADIggAfsfUeUrLCBEAWQwe47dts/9D2adsv2340z8LqYOUh5cLSgkJxeRv8SoizwgRAFkOvQrF9q6RbI+Kk7T+TdELSAxHxyqDvadoqlNahlhaWFq66PrFrQq899lrxBQFIUu6rUCLibESc7H3+W0mnJe0ZvsT64SElgFHKZQ7cdkvS3ZKOr/F707Y7tjvdbjeP2yWDh5QARilzgNu+QdI3JT0WEW/1/35EzEVEOyLa4+PjWW9XSYMeVPKQEsAoZeoHbnu7lsN7PiKezaektGymXzeHMQAYhSwPMS3piKTzEfHYZr6njg8xeVAJYNRGsZX+gKSHJX3U9ou9X5/I8POSxINKAGXJsgrlPyPCEfEXEfGXvV/fybO4Khk0z82DSgBlYSfmJqy3IYcHlQDKQoBvwnpdA9lNCaAsnEq/hvlT86tWjqz1kFL60zz31P4pAhtA4QjwPmstC7Ss0NWrdZjnBlAmplD6rDVdEgpZXnWNeW4AZSPA+wxa/hcK5rkBVErjp1D657tv3nmzfv37X1/1OjbmAKiaRgf4WvPd27ds146tO3Tx/y5efh3TJQCqqFEB3j/a/t3F3101333p7Uu6ZectumHHDfQvAVBpjQnwtUbbg5z//Xm9+fibRZUGAENpzEPMtVaXDMLyQAApqHWAX9m/ZL0R95WY7waQitoGeH//kkFu2XkLywMBJKm2c+CbmTIZ2z6mwx8/TGADSFJtR+Dr9eNmtA2gDmo7Ah/UhIoNOQDqorYjcPp0A6i72gY4fboB1N3QhxoPo46HGgPAqI3iUGMAQIkIcABIFAEOAIkiwAEgUQQ4ACSKAAeARBHgAJAoAhwAEkWAA0CiCHAASBQBDgCJIsABIFGZAtz2vbb/x/bPbD+RV1EAgI0NHeC2t0r6F0kfl/Q+SZ+x/b68CgMArC/LCPyvJP0sIl6NiIuS/k3SwXzKAgBsJMuRanskvX7F12ck/XX/i2xPS5ruffkH2y9luGeKdkt6s+wiCta099y09yvxnos2sdbFLAHuNa5ddTpERMxJmpMk2521mpLXGe+5/pr2fiXec1VkmUI5I+m2K77eK+mX2coBAGxWlgD/b0l32L7d9g5JD0p6Lp+yAAAbGXoKJSL+aPvvJP2HpK2Sno6Ilzf4trlh75cw3nP9Ne39SrznSij0UGMAQH7YiQkAiSLAASBRhQR407bc277N9g9tn7b9su1Hy66pKLa32n7B9rfLrqUItt9h+6jtn/T+vD9Ydk2jZvsLvb/XL9l+xvZ1ZdeUN9tP2z535b4V2zfb/r7tn/Y+3lRmjVIBAd7QLfd/lPT3EfHnku6R9LcNeM8rHpV0uuwiCnRY0ncj4k5J71fN37vtPZI+L6kdEXdpeQHDg+VWNRJfk3Rv37UnJB2LiDskHet9XaoiRuCN23IfEWcj4mTv899q+T/qPeVWNXq290r6pKSnyq6lCLZvlPRhSV+RpIi4GBG/KbeqQmyTtNP2NkljquH+j4j4saTzfZcPSjrS+/yIpAcKLWoNRQT4Wlvuax9mK2y3JN0t6Xi5lRTikKTHJb1ddiEFeY+krqSv9qaNnrJ9fdlFjVJE/ELSlyUtSjoraSkivlduVYV5V0SclZYHaZLeWXI9hQT4prbc15HtGyR9U9JjEfFW2fWMku37JJ2LiBNl11KgbZI+IOlfI+JuSf+rCvyzepR6874HJd0u6d2Srrf9ULlVNVcRAd7ILfe2t2s5vOcj4tmy6ynAAUmfsv2alqfJPmr76+WWNHJnJJ2JiJV/XR3VcqDX2cck/TwiuhFxSdKzkj5Uck1FecP2rZLU+3iu5HoKCfDGbbm3bS3Pi56OiH8uu54iRMQXI2JvRLS0/Gf8g4io9cgsIn4l6XXb7+1dmpT0SoklFWFR0j22x3p/zydV8we3V3hO0iO9zx+R9K0Sa5GUrRvhpgy55T51ByQ9LOmU7Rd71/4hIr5TYk0Yjc9Jmu8NTl6V9NmS6xmpiDhu+6ikk1pebfWCKrjFPCvbz0j6iKTdts9IelLSP0n6hu2/0fL/yD5dXoXL2EoPAIliJyYAJIoAB4BEEeAAkCgCHAASRYADQKIIcABIFAEOAIn6f0MjMvDzfWQhAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Likewise, logspace \n", "arr = np.logspace(0, 1, 20, base=10)\n", "print(arr)\n", "plt.axis([0, 11, 0, 11])\n", "lines, = plt.plot(arr, arr, 'go')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Creating a 2D array" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6],\n", " [7, 8, 9]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a 3x3 array\n", "\n", "import numpy as np\n", "\n", "example_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n", "example_array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with everything else in Python, assigning an array to a variable does not copy its data. Instead, it creates an alias for the original data. For example, let's create an array of ones and assign it to a variable `first`, then assign the value of `first` to `second`: " ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1.]\n", " [1. 1.]]\n" ] } ], "source": [ "# Variable assignment does not *copy* data in array; it just creates a new pointer to that array\n", "first = numpy.ones((2, 2))\n", "second = first\n", "print(first)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[9. 1.]\n", " [1. 1.]]\n" ] } ], "source": [ "# Index assignment updates elements in an array\n", "# To index N-D arrays use a , to separate the indices into each dimension\n", "\n", "second[0, 0] = 9\n", "print(first)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we really want a copy of the array so that we can make changes without affecting the original data, we can use the copy method:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1.]\n", " [1. 1.]]\n", "\n", "first:\n", "[[1. 1.]\n", " [1. 1.]]\n", "\n", "second:\n", "[[9. 1.]\n", " [1. 1.]]\n" ] } ], "source": [ "# If we really want a copy we can use the .copy() method\n", "first = numpy.ones((2, 2))\n", "print(first)\n", "print()\n", "\n", "second = first.copy()\n", "second[0, 0] = 9\n", "print('first:')\n", "print(first)\n", "print()\n", "print('second:')\n", "print(second)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arrays have properties as well as methods. We have already met dtype, which is the array's data type. Another is shape, which is a tuple of the array's size along each dimension: " ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1.]\n", " [1. 1.]]\n", "(2, 2)\n" ] } ], "source": [ "# Show the shape\n", "\n", "print(first)\n", "print(first.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to modify an array's shape, but be careful that the total number of elements in the array is still the same:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1.],\n", " [1.],\n", " [1.],\n", " [1.]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Modify the shape with a valid shape\n", "\n", "first.shape = (4, 1)\n", "first" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "cannot reshape array of size 4 into shape (3,2)", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Modify the shape with an invalid shape\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mfirst\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0mfirst\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: cannot reshape array of size 4 into shape (3,2)" ] } ], "source": [ "# Modify the shape with an invalid shape\n", "\n", "first.shape = (3, 2)\n", "first" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that there are no parentheses after `shape`: it is a piece of data, not a method call. Also note that the tuple in `shape` is exactly what we pass into functions like `zeros` to create new arrays, which makes it easy to reproduce the shape of existing data:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.],\n", " [0.],\n", " [0.],\n", " [0.]])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# You can use .shape to define new arrays with that shape\n", "blank = np.zeros(first.shape)\n", "blank" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other data members include `size`, which is the total number of elements in the array, and `nbytes` which is the total physical bytes of memory used by the array. As the code below shows, `size` is simply the product of the array's lengths along its dimensions, and `nbytes` is the product of `size` and the size of the data type:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "size: 84\n", "nbytes: 672\n" ] } ], "source": [ "# Arrays also have .size and .nbytes attributes\n", "block = numpy.zeros((4, 7, 3))\n", "print('size:', block.size)\n", "print('nbytes:', block.nbytes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are also special methods for reshaping the array in common ways. We can rearrange the data in an array using the `transpose` method, which flips the array on all its axes. This does not actually move data around in memory. Instead, it creates an alias that appears to have the values stored differently. We also call this a *view* of the array:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 4]\n", " [2 5]\n", " [3 6]]\n", "\n", "The original array is unchanged:\n", "[[1 2 3]\n", " [4 5 6]]\n" ] } ], "source": [ "# .transpose() creates a *view* of the same data but transposed\n", "\n", "arr = numpy.array([[1, 2, 3],\n", " [4, 5, 6]])\n", "\n", "print(arr.transpose())\n", "print()\n", "print('The original array is unchanged:')\n", "print(arr)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1 4]\n", " [ 2 42]\n", " [ 3 6]]\n", "\n", "[[ 1 2 3]\n", " [ 4 42 6]]\n" ] } ], "source": [ "# However, modifying the transposed array will modify the original array\n", "# (unless we explictly make a copy of it first)\n", "trans = arr.transpose()\n", "trans[1, 1] = 42\n", "print(trans)\n", "print()\n", "print(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a common enough operation (especially on matrices) that `arr.T` can be used as shorthand for `arr.transpose()`. Note that this is not a function call with parentheses after it:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1 4]\n", " [ 2 42]\n", " [ 3 6]]\n" ] } ], "source": [ "# We can also use arr.T\n", "print(arr.T)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `ravel` method does something similar: it creates a one-dimensional view of the original data. As you'd expect, the result's shape has a single value, which is the number of elements we started with." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1 2 3 4 42 6]\n", "\n", "The original array is still unchanged:\n", "[[ 1 2 3]\n", " [ 4 42 6]]\n" ] } ], "source": [ "# .ravel() creates a 1-dimensional view of the original data\n", "print(arr.ravel())\n", "print()\n", "print('The original array is still unchanged:')\n", "print(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But even though the array returned by `ravel()` has a different shape it's worth emphasizing again that it is a *view* of the original array, and that modifying its contents will modify the original array too:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1 2 3]\n", " [ 9 42 6]]\n" ] } ], "source": [ "# Remember, the result of .ravel() is just a view; updating the array returned by\n", "# .ravel() updates the original array data too\n", "arr.ravel()[3] = 9\n", "print(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What order do raveled values appear in? Let's start by thinking about a 2×4 array `A`. It looks two-dimensional, but the computer's memory is 1-dimensional: each location is identified by a single integer address. Any program that works with multi-dimensional data must therefore decide how to lay out those values. \n", "\n", "\n", "\n", "One possibility is *row-major order*, which concatenates the rows. This is what C uses, and since Python was originally written in C, it uses the same convention: \n", "\n", "\n", "\n", "In contrast, column-major order concatenates the columns. FORTRAN does this, and MATLAB follows along.\n", "\n", "\n", "\n", "There's no real difference in performance or usability, but the differences cause headaches when data has to be moved from one programming language to another. For example, if your Python code wants to call an eigenvalue function written in FORTRAN, you will probably have to rearrange the data, just as you have to be careful about 0-based versus 1-based indexing. Note that you cannot use the array's `transpose` method to do this, since, as explained earlier, it doesn't actually move data around.\n", "\n", "It's also possible that if your software is specifically tuned to operate on an array one row at a time or one column at a time it may be desireable to have the data arranged in memory so that it's accessed linearly. But this is the kind of performance tuning you won't need until and unless you *know* you need it. For the most part stick with row-major order (the default in Numpy) and don't worry about it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we really want to change the physical size of the data, we have to use array.resize. This works in place, i.e., it modifies the array, rather than returning a new alias: " ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5 6 7 8]\n", "\n", "[[0 1]\n", " [2 3]]\n" ] } ], "source": [ "# Resize a larger array to a smaller array\n", "\n", "block = numpy.arange(9)\n", "print(block)\n", "print()\n", "\n", "block.resize(2, 2)\n", "print(block)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the example above shows, when we resize a 3×3 array to be 2×2, we get the values that were in the first two rows and columns. (And note that once again, the new dimensions are passed directly, rather than in a tuple.)\n", "\n", "If we enlarge the array by resizing, the new locations are assigned zero. Which locations are \"new\" is determined by the raveling order of the array; as the example below shows, the existing values are packed into the first part of memory, *not* into the upper left corner of the logical matrix: " ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1.]\n", " [1. 1.]]\n", "\n", "[[1. 1. 1.]\n", " [1. 0. 0.]\n", " [0. 0. 0.]]\n" ] } ], "source": [ "# Resize a smaller array to a larger array\n", "small = numpy.ones((2, 2))\n", "print(small)\n", "print()\n", "\n", "large = small.copy()\n", "large.resize(3, 3)\n", "print(large)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is, however, possible to fill the upper left corner by first allocating a new array of zeros and using slice assignment:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1. 0.]\n", " [1. 1. 0.]\n", " [0. 0. 0.]]\n" ] } ], "source": [ "# Fill a corner of an array from a smaller array\n", "large = numpy.zeros((3, 3))\n", "large[:2, :2] = small\n", "print(large)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexing Numpy Arrays\n", "\n", "Now we'll take a closer look at some of the ways we can index arrays. It may seem like a small topic at first, but clever indexing allows us to avoid writing loops, which reduces the size of our code and makes it more efficient.\n", "\n", "Arrays are subscripted by integers, just like lists and other sequences, so they can be sliced like other sequences as well. For example, if `block` is the array shown below, then `block[0:3, 0:2]` selects its first three rows and the first two columns: \n", "\n", "\n", "\n", "The comma syntax (`[X, Y]`) for indexing multi-dimensional arrays was lobbied for and eventually added to the Python language by the scientific community. Although it is possible to write `block[0:3][0:2]` this is comparatively inefficient, especially when doing many array indexing operations. (Exercise question: Why?)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with other sliceable types, it's possible to assign to slices (as we saw a bit earlier). For example, we can assign zero to columns 1 and 2 in row 1 of ``block in a single statement: " ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 10 0 0 40]\n", " [110 120 130 140]\n", " [210 220 230 240]]\n" ] } ], "source": [ "# Assign zero to a slice of the first row\n", "block = numpy.array([[10, 20, 30, 40],\n", " [110, 120, 130, 140],\n", " [210, 220, 230, 240]])\n", "block[0, 1:3] = 0\n", "print(block)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And, as with most other operations (such as `transpose` and `ravel` like I saw earlier), slicing creates an alias rather than immediately copying data: " ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 1],\n", " [2, 3],\n", " [4, 5]])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a 2D array\n", "original = numpy.arange(6).reshape((3, 2))\n", "original" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 1],\n", " [2, 3]])" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Slice a corner of of the array\n", "slc = original[0:2, 0:2]\n", "slc" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 0],\n", " [0, 0],\n", " [4, 5]])" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Set the values of the slice to zero--the original has changed\n", "slc[:, :] = 0\n", "original" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice in the example above how we used `slice[:, :]` to refer to all of the array's elements at once. All of Python's other slicing shortcuts work as well, so that expressions like `original[-2:, 1:]` behave sensibly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Slicing on both sides of an assignment is a way to shift data along the axes. If `vector` is a one-dimensional array, then `vector[1:4]` selects locations 1, 2, and 3, while `vector[0:3]` selects locations 0, 1, and 2. Assigning the former to the latter therefore overwrites the lower three values with the upper three, leaving the uppermost value untouched:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([20, 30, 40, 40])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Assign a slice of a vector to another slice of the same vector\n", "vector = numpy.array([10, 20, 30, 40])\n", "vector[0:3] = vector[1:4]\n", "vector" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compare this with loop-based code that shifts values up or down, and you'll see why most programmers prefer the vectorized programming model. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We co do even more sophisticated things by using a list or an array as a subscript. For example, if `subscript` is a list containing 3, 1, and 2, then `vector[subscript]` creates a new array whose elements are selected from vector in the obvious way: \n", "\n", "" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 10, 20, 30])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create the vector [0, 10, 20, 30] using arange() and *\n", "vector = numpy.arange(4) * 10\n", "vector" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([30, 10, 20])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Index with a list of indices\n", "subscript = [3, 1, 2]\n", "vector[subscript]" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 10, 20, 30])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# It should be emphasized that such arbitrary subscripting returns a *copy* of the original array:\n", "sub = vector[subscript]\n", "sub[:] = 0\n", "vector" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([42, 10, 20, 42])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# It is however possible to perform assignment into an array using an arbitrary subscript,\n", "# without copying\n", "subscript = [0, 3]\n", "vector[subscript] = 42\n", "vector" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Boolean indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use Boolean masking on the left side of assignment as well, though we have to be careful about its meaning. If we use a mask directly, elements are taken in order from the source on the right and assigned to elements corresponding to True values in the mask: " ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 2])" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Mask an array [0 1 2 3] with the mask [T F T F]\n", "a = numpy.array([0, 1, 2, 3])\n", "mask = [True, False, True, False]\n", "a[mask]" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([101, 1, 103, 3])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Assign [101 102 103 104] using the mask\n", "a[mask] = numpy.array([101, 102, 103, 104])[mask]\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Operators like `<` and `==` work the way we would expect with arrays, but there is one trick. Python does not allow objects to re-define the meaning of `and`, `or`, and `not`, since they are keywords. The expression `(vector <= 20) and (vector >= 20)` therefore produces an error message instead of selecting elements with exactly the value 20: " ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ True, True, True, False])" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Give the items <= 20 of [0 10 20 30]\n", "vector = numpy.array([0, 10, 20, 30])\n", "vector <= 20" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Gives the items <= 20 and > 0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;34m(\u001b[0m\u001b[0mvector\u001b[0m \u001b[0;34m<=\u001b[0m \u001b[0;36m20\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mvector\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" ] } ], "source": [ "# Gives the items <= 20 and > 0\n", "(vector <= 20) and (vector > 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One way around this is to use functions like `logical_and` and `logical_or`, which combine the elements' Boolean arrays like their namesakes: " ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, True, True, False])" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Do the same expression with logical_and\n", "numpy.logical_and(vector <= 20, vector > 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another is to use the bitwise and/or operators `&` and `|`:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, True, True, False])" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Do the same expression with &\n", "(vector <= 20) & (vector > 0)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 7])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Since comparison operators return an array of bools, a comparison expression\n", "# can also be used to index an array\n", "\n", "array1 = np.array([1, 1, 1, 2, 2, 2, 1])\n", "array2 = np.array([1, 2, 3, 4, 5, 6, 7])\n", "array2[array1 == 1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many other slick ways to index and get values from Numpy arrays, such as `array.where()`, `array.select()`, and `array.choose()`. Some of them can get very complex and difficult to understand, but have their uses--you can read more about them and other indexing tricks in the [Numpy documentation](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html), as well as its more advanced [indexing routines](https://docs.scipy.org/doc/numpy/reference/routines.indexing.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Math\n", "### Arrays\n", "Math on arrays is vectorized and behaves more or less like \"most\" scientists would expect:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3, 3, 3, 5, 5, 5, 3])" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "array1 = np.array([1, 1, 1, 2, 2, 2, 1])\n", "array2 = np.array([1, 2, 3, 4, 5, 6, 7])\n", "\n", "array1 * 2 + 1" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, 2, 3, 8, 10, 12, 7])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Multiply array1 and array2\n", "array1 * array2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But that doesn't mean they behave exactly like the matrices that mathematicians use. For example, let's create an array, and then multiply it by itself: " ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 4],\n", " [ 9, 16]])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a 2x2 array and multiply it with itself\n", "arr = numpy.array([[1, 2], [3, 4]])\n", "arr * arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy does the operation elementwise, instead of doing \"real\" matrix multiplication. On the bright side, elementwise operation means that array addition works as you would expect: " ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2, 4],\n", " [6, 8]])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Now add it with itself\n", "arr + arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And since there's only one sensible way to interpret an expression like \"array plus one\", NumPy does the sensible thing there too. " ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2, 3],\n", " [4, 5]])" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add 1 to all elements--note that this does not *modify* arr\n", "arr + 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like other array-based libraries or languages, NumPy provides many useful tools for common arithmetic operations. For example, we can add up the values in our array with a single function call: " ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sum the array\n", "numpy.sum(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also calculate the partial sum along each axis by passing an extra argument into `sum`: " ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([4, 6])" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sum along the row-axis\n", "numpy.sum(arr, axis=0)" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3, 7])" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sum along the column-axis\n", "numpy.sum(arr, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Matrices\n", "There also exists a special matrix class that behaves more naturally like a matrix. For example the `*` operator on two matrix objects performs matrix multiplication:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[14],\n", " [32]])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Make a 2x3 and a 3x1 matrix/column vector and mupltiply the former by the latter\n", "matrix1 = np.matrix([[1, 2, 3], [4, 5, 6]])\n", "matrix2 = np.matrix([1, 2, 3]).transpose()\n", "matrix1 * matrix2" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[14, 32],\n", " [32, 77]])" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Multiply a matrix by its transpose\n", "matrix1 * matrix1.transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Various basic linear algebra operations are also possible on matrics; inverse, eigenvalues, determinants, etc.; many of these are available through the [`np.linalg`](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html) module:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "A = np.matrix([\n", " [2, 0, 0],\n", " [1, 1, 1],\n", " [0, 0, 2]\n", "])" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[ 0.5, 0. , 0. ],\n", " [-0.5, 1. , -0.5],\n", " [ 0. , 0. , 0.5]])" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.inv(A)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.0" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.det(A)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 2. 2.]\n", "[[0. 0.70710678 0. ]\n", " [1. 0.70710678 0.70710678]\n", " [0. 0. 0.70710678]]\n" ] } ], "source": [ "eigenvals, eigenvects = np.linalg.eig(A)\n", "print(eigenvals)\n", "print(eigenvects)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other functions are applied element-wise to the matrix, just as they are on plain arrays:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[7.3890561 , 1. , 1. ],\n", " [2.71828183, 2.71828183, 2.71828183],\n", " [1. , 1. , 7.3890561 ]])" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing and Exporting Data\n", "The numpy function `genfromtxt()` is a powerful way to import text data.\n", "It can use different delimiters, skip header rows, control the type of imported data, give columns of data names, handle missing data, and a number of other useful goodies. See the [documentation](http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html) for a full list of features of run `help(np.genfromtxt)` from the Python shell (after importing the module of course).\n", "\n", "The more recently added [`np.loadtxt`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) is similar to `genfromtxt` but a bit simpler, and does not have options for handling missing data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic Import and Export\n", "#### Import\n", "Basic imports using Numpy will treat all data as floats.\n", "If we're doing a basic import we'll typically want to skip the header row (since it's generally not composed of numbers)." ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a, b, c, d\r\n", "1.0, 2.0, 4.0\r\n", "5.0, 6.0, 7.0\r\n", "1e2, 2e3, 4e5\r\n", "1e-2, 2e-3, 4.345e-5\r\n" ] } ], "source": [ "!cat data/examp_data.txt" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1.000e+00, 2.000e+00, 4.000e+00],\n", " [5.000e+00, 6.000e+00, 7.000e+00],\n", " [1.000e+02, 2.000e+03, 4.000e+05],\n", " [1.000e-02, 2.000e-03, 4.345e-05]])" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use genfromtxt() to create an array from the file\n", "data = np.genfromtxt('./data/examp_data.txt', delimiter=',', skip_header=1)\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Export" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "# Use savetxt() to dump the output to a new file with a different delimiter\n", "np.savetxt('./data/examp_output.txt', data, delimiter=', ')" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.000000000000000000e+00, 2.000000000000000000e+00, 4.000000000000000000e+00\r\n", "5.000000000000000000e+00, 6.000000000000000000e+00, 7.000000000000000000e+00\r\n", "1.000000000000000000e+02, 2.000000000000000000e+03, 4.000000000000000000e+05\r\n", "1.000000000000000021e-02, 2.000000000000000042e-03, 4.344999999999999904e-05\r\n" ] } ], "source": [ "!cat data/examp_output.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importing Data Tables\n", "Lots of scientific data comes in the form of tables, with one row per observation, and one column per thing observed.\n", "Often the different columns to have different types (including text).\n", "The best way to work with this type of data is in a Structured Array." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Import\n", "To do this we let Numpy automatically detect the data types in each column using the optional argument ``dtype=None``.\n", "We can also use an existing header row as the names for the columns using the optional arugment ``Names=True``." ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "site,species,mass\r\n", "1,DS,125\r\n", "1,DM,70\r\n", "2,DM,55\r\n", "1,CB,40\r\n", "2,DS,110\r\n", "1,CB,45\r\n" ] } ], "source": [ "!cat data/examp_data_species_mass.txt" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([(1, 'DS', 125), (1, 'DM', 70), (2, 'DM', 55), (1, 'CB', 40),\n", " (2, 'DS', 110), (1, 'CB', 45)],\n", " dtype=[('site', '" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from scipy import special\n", "from matplotlib import pyplot as plt\n", "\n", "plt.xlim(0, 10)\n", "plt.ylim(-1, 1)\n", "x = np.linspace(0, 10, 1000)\n", "\n", "for v in range(6):\n", " y = special.jv(v, x)\n", " plt.plot(x, special.jv(v, x))\n", " \n", " # Add labels\n", " maximum = np.argmax(y)\n", " xpos = x[maximum] + 0.25 # trial and error\n", " ypos = y[maximum] + 0.05\n", " plt.text(xpos, ypos, f'$ J_{v}(x) $ ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numerical integration\n", "\n", "Functions can be integrated numerically over a given interval using, among other more specialized functions [`scipy.integrate.quad`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.quad.html#scipy.integrate.quad). For example:\n", "\n", "$$\n", "\\int_0^{2\\pi} J_1(x)dx\n", "$$" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "val: 0.7797230914600654\n", "error upper bound: 2.2568868939342564e-14\n" ] } ], "source": [ "from scipy.integrate import quad\n", "val, err = quad(lambda x: special.jv(1, x), 0, 2 * np.pi)\n", "print('val:', val)\n", "print('error upper bound:', err)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solving ODEs\n", "\n", "Solve $ y(x) $ for\n", "\n", "$$\n", "x^2\\frac{d^2y}{dx^2} + x\\frac{dy}{dx} + (x^2 - 1)y = 0\n", "$$\n", "\n", "with initial values $ y(0) = 0, y'(0) = 0 $.\n", "\n", "using [`scipy.integrate.odeint`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.odeint.html). `odeint()` solves first-order ODEs, so first we must convert to one by defining $ z = y' $ so that:\n", "\n", "$$\n", "\\frac{dz}{dx} = \\frac{y}{x^2} - \\frac{z}{x} - y\n", "$$\n", "\n", "We can then solve for $ y $ and $ z $ simulateously by packing them into a vector ``[y, z]``:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 2.22044605e-16 2.22044605e-16]\n", " [ 5.04406506e-02 4.98088155e-01]\n", " [ 1.00495655e-01 4.92369336e-01]\n", " ...\n", " [ 9.33283707e-02 -2.41613277e-01]\n", " [ 6.86195858e-02 -2.47203933e-01]\n", " [ 4.34727731e-02 -2.50283081e-01]]\n" ] } ], "source": [ "from scipy.integrate import odeint\n", "\n", "def func(yz, x):\n", " y, z = yz\n", " return [z, y/x**2 - z/x - y] # [dy/dx, dz/dx]\n", "\n", "# Solve over 100 points between 0 and 10\n", "# Use a small epsilon for the initial value to avoid divide by zero\n", "eps = np.finfo(float).eps\n", "x = np.linspace(eps, 10, 100)\n", "sol = odeint(func, [eps, eps], x)\n", "print(np.array2string(sol, threshold=10))" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot the solution against the known answer:\n", "_, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))\n", "\n", "ax1.plot(x, special.jv(1, x), c='g')\n", "ax1.set_xlim(0, 10)\n", "ax1.set_title(r'$ \\mathtt{special.jv(1, x)} $')\n", "\n", "ax2.plot(x, sol[:,0], c='g', label='$ y(x) $')\n", "ax2.plot(x, sol[:,1], c='b', label=\"$ z(x) = y'(x) $\")\n", "ax2.set_xlim(0, 10)\n", "ax2.set_title(r'$ \\mathtt{odeint} $ solution')\n", "ax2.legend()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 1 }