{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Math and array tools\n", "\n", "Arrays are the basis of science [[citation needed]](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed). This tutorial walks you through some tools to make your life working with arrays a little more pleasant." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Click [here](https://mybinder.org/v2/gh/sciris/sciris/HEAD?labpath=docs%2Ftutorials%2Ftut_arrays.ipynb) to open an interactive version of this notebook.\n", "\n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden", "tags": [] }, "outputs": [], "source": [ "# Hide this cell from output, just include so we have reproducible results\n", "import numpy as np\n", "np.random.seed(4) # 4 looks nice" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Array indexing\n", "\n", "Let's create some data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "data = np.random.rand(100)\n", "\n", "print(f'{data = }')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if we want to do something super simple, like find the indices of the values above 0.9? In NumPy, it's not super straightforward:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "inds = (data>0.9).nonzero()[0]\n", "\n", "print(f'{inds = }')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Sciris, there's a function for doing exactly this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sciris as sc\n", "\n", "inds = sc.findinds(data>0.9)\n", "\n", "print(f'{inds = }')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Likewise, what if we want to find the value closest to, say, 0.5? In NumPy, that would be" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "target = 0.5\n", "nearest = np.argmin(abs(data-target))\n", "\n", "print(f'{nearest = }, {data[nearest] = }')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which is not _too_ long, but it's a little harder to remember than the Sciris equivalent:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nearest = sc.findnearest(data, target)\n", "\n", "print(f'{nearest = }, {data[nearest] = }')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Sciris functions also work on anything \"data like\": for example," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "target = 50\n", "data = [81, 78, 66, 25, 6, 8, 53, 96, 64, 23]\n", "\n", "# With NumPy\n", "ind = np.argmin(abs(np.array(data)-target))\n", "\n", "# With Sciris\n", "ind = sc.findnearest(data, 50)\n", "\n", "print(f'{ind=}, {data[ind]=}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These have been simple examples, but you can see how Sciris functions can do the same things with less typing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Interlude on creating arrays\n", "\n", "Speaking of which, here's a pretty fast way to create an array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sc.cat(1,2,3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`sc.cat()` will take anything array-like and turn it into an actual array. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a 2x2 matrix\n", "data = np.random.rand(2,2)\n", "\n", "# Add a row with NumPy\n", "data = np.concatenate([data, np.atleast_2d(np.array([1,2]))])\n", "\n", "# Add a row with Sciris\n", "data = sc.cat(data, [1,2])\n", "\n", "print(f'{data = }')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yes, the NumPy command really does end with `]))])`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Missing values\n", "\n", "Now that we know some tools for indexing arrays, let's look at ways to actually change them.\n", "\n", "We all know that missing data is one of humanity's greatest scourges. Luckily, it can be swiftly eradicated with Sciris: either removed entirely or replaced:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d0 = [1, 2, np.nan, 4, np.nan, 6, np.nan, np.nan, np.nan, 10]\n", "\n", "d1 = sc.rmnans(d0) # Remove nans\n", "d2 = sc.fillnans(d0, 0) # Replace NaNs with 0s\n", "d3 = sc.fillnans(d0, 'linear') # Replace NaNs with linearly interpolated values\n", "\n", "print(f'{d0 = }')\n", "print(f'{d1 = }')\n", "print(f'{d2 = }')\n", "print(f'{d3 = }') # This is more impressive than ChatGPT, imo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data smoothing\n", "\n", "What if we have some seriously lumpy data we want to smooth out? We have a few options for doing that:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Make data\n", "n = 50\n", "x = np.arange(n)\n", "data = 20*np.random.randn(n)**2\n", "data = sc.randround(data) # Stochastically round to the nearest integer -- e.g. 0.7 is rounded up 70% of the time\n", "\n", "# Simple smoothing\n", "smooth = sc.smooth(data, 7)\n", "\n", "# Use a rolling average\n", "roll = sc.rolling(data, 7)\n", "\n", "# Plot results\n", "import pylab as pl\n", "sc.options(jupyter=True)\n", "pl.scatter(x, data, c='k', label='Data')\n", "pl.plot(x, smooth, label='Smoothed')\n", "pl.plot(x, roll, label='Rolling average')\n", "pl.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also smooth 2D data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create the data\n", "raw = pl.rand(20,20)\n", "\n", "# Smooth it\n", "smooth = sc.gauss2d(raw, scale=2)\n", "\n", "# Plot\n", "fig = pl.figure(figsize=(8,4))\n", "\n", "ax1 = sc.ax3d(121)\n", "sc.bar3d(raw, ax=ax1)\n", "pl.title('Raw')\n", "\n", "ax2 = sc.ax3d(122)\n", "sc.bar3d(smooth, ax=ax2)\n", "pl.title('Smoothed');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding a line of best fit\n", "\n", "It's also easy to do a very simple linear regression in Sciris:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate the data\n", "n = 100\n", "x = np.arange(n)\n", "y = x*np.random.rand() + 0.2*np.random.randn(n)*x\n", "\n", "# Calcualte the line of best fit\n", "m,b = sc.linregress(x, y)\n", "\n", "# Plot\n", "pl.style.use('sciris.simple')\n", "pl.scatter(x, y, c='k', alpha=0.2, label='Data')\n", "pl.plot(x, m*x+b, c='forestgreen', label=f'Line of best fit: {m:0.2f}*x + {b:0.2f}')\n", "pl.legend();" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }