{ "cells": [ { "cell_type": "markdown", "id": "d73d685e-8834-44ce-a75f-9ccbcefba1cf", "metadata": {}, "source": [ "# Numpy Arrays\n", "\n", "## Goals\n", "\n", "* For beginners, get a sense of how an array can be used.\n", "* For more experienced practitioners, fill in a deeper understanding of how arrays work and perhaps see one or two useful new things." ] }, { "cell_type": "code", "execution_count": 1, "id": "d7f6f7e1-089e-42aa-b5ae-0e0e0ddc9c85", "metadata": { "tags": [] }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 3, "id": "ad0ad1ba-04d4-44ee-9ff2-c42438bc5c8e", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(15).reshape(3, 5)\n", "a" ] }, { "cell_type": "markdown", "id": "5c58464b-f8fe-4e5a-a19c-ccddec2fb5ff", "metadata": {}, "source": [ "## Items and slices" ] }, { "cell_type": "code", "execution_count": 46, "id": "a1f97b6d-94ca-4722-9925-8092edcf6432", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[1, 1]" ] }, { "cell_type": "code", "execution_count": 29, "id": "8f18a4c0-9e9f-4f9d-b298-15cfade5477d", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[0]" ] }, { "cell_type": "code", "execution_count": 30, "id": "a774c866-6192-45c8-b830-32f81f823a23", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([ 0, 5, 10])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[:, 0]" ] }, { "cell_type": "code", "execution_count": 47, "id": "a8c5368e-57f8-4dd6-b02f-1977a1f1a971", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([[0, 1],\n", " [5, 6]])" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[0:2, 0:2]" ] }, { "cell_type": "markdown", "id": "904bbc00-6a99-4fa6-ad3d-d1e8c88725b2", "metadata": {}, "source": [ "What does this do?|" ] }, { "cell_type": "code", "execution_count": 49, "id": "8f63d3c3-5ecb-4f26-87a7-2e70237383a6", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([], shape=(0, 5), dtype=int64)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[10:1000]" ] }, { "cell_type": "markdown", "id": "7ce57f12-7963-4c2e-96e1-bdfecf624559", "metadata": {}, "source": [ "## Arrays with different dimensions can be combined via \"broadcasting\"" ] }, { "cell_type": "code", "execution_count": 34, "id": "b0a504f6-1da4-4b35-a38c-2695bc4d5686", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 100, 200, 300, 400],\n", " [ 500, 600, 700, 800, 900],\n", " [1000, 1100, 1200, 1300, 1400]])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a * 100" ] }, { "cell_type": "code", "execution_count": 35, "id": "66550b98-57ae-43ca-ac57-7f1dd260e83e", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 0, 0, 0, 0],\n", " [ 5, 5, 5, 5, 5],\n", " [10, 10, 10, 10, 10]])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a - a[0]" ] }, { "cell_type": "code", "execution_count": 55, "id": "432aabd8-e00e-47a0-9c27-5944b0b08a92", "metadata": { "tags": [] }, "outputs": [ { "ename": "ValueError", "evalue": "operands could not be broadcast together with shapes (3,5) (3,) ", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[55], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43ma\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m-\u001b[39;49m\u001b[43m \u001b[49m\u001b[43ma\u001b[49m\u001b[43m[\u001b[49m\u001b[43m:\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m \u001b[38;5;66;03m# nope!\u001b[39;00m\n", "\u001b[0;31mValueError\u001b[0m: operands could not be broadcast together with shapes (3,5) (3,) " ] } ], "source": [ "a - a[:, 0] # nope!" ] }, { "cell_type": "markdown", "id": "a3284862-fceb-4e13-b070-ad79469210f6", "metadata": { "tags": [] }, "source": [ "Quoting https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html\n", "\n", "> Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.\n", "\n", "> Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.\n", "\n", "> Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.\n" ] }, { "cell_type": "code", "execution_count": 61, "id": "1820c966-3b91-493c-8678-a30c689ab963", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3, 5)\n", "(3,)\n" ] } ], "source": [ "print(a.shape)\n", "print(a[:, 0].shape)" ] }, { "cell_type": "code", "execution_count": 62, "id": "e14c3e09-25bf-433d-8652-3cda7f8e8244", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([ 0, 5, 10])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[:, 0]" ] }, { "cell_type": "code", "execution_count": 58, "id": "9329ff6a-4def-4f39-a0b5-49b153d7ea96", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([[ 0],\n", " [ 5],\n", " [10]])" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[:, 0, np.newaxis]" ] }, { "cell_type": "code", "execution_count": 63, "id": "f13ab434-b175-4c44-b42f-ed7ec764763e", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3, 5)\n", "(3, 1)\n" ] } ], "source": [ "print(a.shape)\n", "print(a[:, 0, np.newaxis].shape)" ] }, { "cell_type": "code", "execution_count": 56, "id": "8400fcbb-5f7f-4200-b982-d9adf2982a9f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4]])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a - a[:, 0, np.newaxis]" ] }, { "cell_type": "markdown", "id": "d008856f-5848-4ae0-9a87-a590ea033818", "metadata": {}, "source": [ "Slices can be created on their own and reused." ] }, { "cell_type": "code", "execution_count": 79, "id": "f4c71fc7-52f6-40cc-b130-fe3ef59fe308", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b = array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])\n", "every2 = slice(None, None, 2)\n", "every10 = slice(None, None, 10)\n", "b[every2] = array([ 0, 2, 4, 6, 8, 10, 12, 14])\n", "b[every10] = array([ 0, 10])\n" ] } ], "source": [ "every2 = np.s_[::2]\n", "every10 = np.s_[::10]\n", "b = np.arange(15)\n", "print(f\"{b = }\")\n", "print(f\"{every2 = }\")\n", "print(f\"{every10 = }\")\n", "print(f\"{b[every2] = }\")\n", "print(f\"{b[every10] = }\")" ] }, { "cell_type": "markdown", "id": "01dc4801-b160-4978-9c97-d95da729e50d", "metadata": {}, "source": [ "Great reference on slices in Python in general and multi-dimensional slicing in particular: https://quansight-labs.github.io/ndindex/slices.html" ] }, { "cell_type": "markdown", "id": "0be401df-b534-4431-8252-6a748eb44912", "metadata": {}, "source": [ "## Anatomy of an Array" ] }, { "cell_type": "code", "execution_count": 4, "id": "4440f9e6-6615-4fd1-a7c0-e05d55e79ed5", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(3, 5)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.shape" ] }, { "cell_type": "code", "execution_count": 6, "id": "9602c680-1557-4299-aeb3-550a24d8eb8a", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.ndim" ] }, { "cell_type": "code", "execution_count": 7, "id": "add65ea4-b5a3-4304-bc55-ff495cffe25b", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.size" ] }, { "cell_type": "code", "execution_count": 8, "id": "abb4b64c-3038-47c8-af77-869ca22c7cc3", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.ndim" ] }, { "cell_type": "code", "execution_count": 9, "id": "f72533b0-49c2-48d8-ae3e-cd0a114737f4", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "120" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.nbytes" ] }, { "cell_type": "code", "execution_count": 10, "id": "e52af5d9-08ea-434d-a58c-8b9a7f4f9ce7", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.dtype" ] }, { "cell_type": "code", "execution_count": 15, "id": "82ce2731-56df-494e-832a-a363ee6250ff", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.tolist()" ] }, { "cell_type": "markdown", "id": "c69dba7a-00a6-40bc-9fb0-c89f0c8ef79f", "metadata": {}, "source": [ "## Peeking under the hood, just for a moment\n", "\n", "A block of memory with rules to \"striding\" through it and interpreting it" ] }, { "cell_type": "code", "execution_count": null, "id": "5de92b22-3c0d-4f73-bc72-1ebbbf9c95c3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.data" ] }, { "cell_type": "code", "execution_count": 50, "id": "7d5a1e50-00af-4c9f-baa9-3006a926c13e", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.dtype.itemsize" ] }, { "cell_type": "code", "execution_count": 51, "id": "cbc97b8a-2355-4b5a-9f3e-e43ff81583b8", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(3, 5)" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.shape" ] }, { "cell_type": "code", "execution_count": 22, "id": "e58db0bb-b79a-4b13-be85-57e60d29bb9e", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(40, 8)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.strides" ] }, { "cell_type": "code", "execution_count": 33, "id": "ab31f56f-b245-4863-9a64-128d686126dd", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4]\n", "[ 0 5 10]\n" ] } ], "source": [ "print(a[0])\n", "print(a[:, 0])" ] }, { "cell_type": "code", "execution_count": 23, "id": "1f282944-ba5a-4a5a-8d8a-8803c8b7e351", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.data" ] }, { "cell_type": "code", "execution_count": 28, "id": "5a5f5b09-81b7-4b72-b4c0-c702fcc32e03", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'00000000000000000100000000000000020000000000000003000000000000000400000000000000050000000000000006000000000000000700000000000000080000000000000009000000000000000a000000000000000b000000000000000c000000000000000d000000000000000e00000000000000'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.data.hex()" ] }, { "cell_type": "markdown", "id": "f4ebcb0a-33f4-46f5-b533-bc7909807876", "metadata": {}, "source": [ "## Limitations and Coping Strategies\n", "\n", "* No way to label to dimensions, have to keep track of which is which\n", " * Pass around an object, like a dict, as a key.\n", " * Consider using xarray.\n", " * Resist the temptation to subclass! If you want to go down that general path, look at [Writing custom array containers](https://numpy.org/doc/stable/user/basics.dispatch.html).\n", "* No way to include coordinates (\"tick labels\")\n", " * Pass around a simple object, like a dict, containing multiple numpy arrays.\n", " * Consider using xarray.\n", "* No built-in support for units\n", " * Use a library like pynt.\n", " * Numpy has added support for custom data types...\n", " * https://numpy.org/neps/nep-0042-new-dtypes.html\n", " * https://github.com/numpy/numpy-user-dtypes\n", " * ...which can be used to implement units!\n", " * https://github.com/seberg/unitdtype\n", " * Numpy's unit support is not \"mainstream\" yet, but it is growing." ] }, { "cell_type": "code", "execution_count": null, "id": "980eaa70-9775-4038-a8c3-978d6048d6e0", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" } }, "nbformat": 4, "nbformat_minor": 5 }