{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Scatter" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib ipympl\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import mpl_interactions.ipyplot as iplt\n", "import ipywidgets as widgets\n", "import pandas as pd\n", "from matplotlib.colors import to_rgba_array, TABLEAU_COLORS, XKCD_COLORS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic example" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gif": "scatter1.png" }, "outputs": [], "source": [ "\n", "N = 50\n", "x = np.random.rand(N)\n", "def f_y(x, tau):\n", " return np.sin(x*tau)**2 + np.random.randn(N)*.01\n", "fig, ax = plt.subplots()\n", "controls = iplt.scatter(x,f_y, tau = (1, 2*np.pi, 100)) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using functions and broadcasting\n", "You can also use multiple functions. If there are fewer `x` inputs than `y` inputs then the `x` input will be broadcast to fit the `y` inputs. Similarly `y` inputs can be broadcast to fit `x`. You can also choose colors and sizes for each line" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gif": "scatter2.png" }, "outputs": [], "source": [ "N = 50\n", "x = np.random.rand(N)\n", "def f_y1(x, tau):\n", " return np.sin(x*tau)**2 + np.random.randn(N)*.01\n", "def f_y2(x, tau):\n", " return np.cos(x*tau)**2 + np.random.randn(N)*.1\n", "fig, ax = plt.subplots()\n", "controls = iplt.scatter(x,f_y1, tau = (1, 2*np.pi, 100), c = 'blue', s = 5) \n", "_ = iplt.scatter(x,f_y2, controls= controls, c = 'red', s = 20) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions for both x and y\n", "\n", "The function for `y` should accept `x` and then any parameters that you will be varying. The function for `x` should accept only the parameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gif": "scatter3.png" }, "outputs": [], "source": [ "N = 50\n", "def f_x(mean):\n", " return np.random.rand(N) + mean\n", "def f_y(x, mean):\n", " return np.random.rand(N) - mean\n", "fig, ax = plt.subplots()\n", "controls = iplt.scatter(f_x, f_y, mean = (0, 1, 100), s = None, c = np.random.randn(N))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using functions for other attributes\n", "\n", "You can also use functions to dynamically update other scatter attributes such as the `size`, `color`, `edgecolor`, and `alpha`.\n", "\n", "The function for `alpha` needs to accept the parameters but not the xy positions as it affects every point. The functions for `size`, `color` and `edgecolor` all should accept `x, y, `\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gif": "scatter4.png" }, "outputs": [], "source": [ "N = 50\n", "mean = 0\n", "x = np.random.rand(N) + mean - 0.5\n", "\n", "\n", "def f(x, mean):\n", " return np.random.rand(N) + mean - 0.5\n", "\n", "def c_func(x, y, mean):\n", " return x\n", "\n", "def s_func(x, y, mean):\n", " return np.abs(40 / (x + 0.001))\n", "\n", "def ec_func(x, y, mean):\n", " if np.random.rand() > 0.5:\n", " return \"black\"\n", " else:\n", " return \"red\"\n", "\n", "\n", "fig, ax = plt.subplots()\n", "sliders = iplt.scatter(\n", " x,\n", " f,\n", " mean=(0, 1, 100),\n", " c=c_func,\n", " s=s_func,\n", " edgecolors=ec_func,\n", " alpha=0.5,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modifying the colors of individual points" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gif": "scatter5.png" }, "outputs": [], "source": [ "N = 500\n", "x = np.random.rand(N) - 0.5\n", "y = np.random.rand(N) - 0.5\n", "\n", "\n", "def f(mean):\n", " x = (np.random.rand(N) - 0.5) + mean\n", " y = 10 * (np.random.rand(N) - 0.5) + mean\n", " return x, y\n", "\n", "\n", "def threshold(x, y, mean):\n", " colors = np.zeros((len(x), 4))\n", " colors[:, -1] = 1\n", " deltas = np.abs(y - mean)\n", " idx = deltas < 0.01\n", " deltas /= deltas.max()\n", " colors[~idx, -1] = np.clip(0.8 - deltas[~idx], 0, 1)\n", " return colors\n", "\n", "\n", "fig, ax = plt.subplots()\n", "sliders = iplt.scatter(x, y, mean=(0, 1, 100), alpha=None, c=threshold)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Putting it together - Wealth of Nations\n", "Using interactive_scatter we can recreate the interactive [wealth of nations](https://observablehq.com/@mbostock/the-wealth-health-of-nations) plot using Matplotlib!\n", "\n", "\n", "The data preprocessing was taken from an [example notebook](https://github.com/bqplot/bqplot/blob/55152feb645b523faccb97ea4083ca505f26f6a2/examples/Applications/Wealth%20Of%20Nations/Bubble%20Chart.ipynb) from the [bqplot](https://github.com/bqplot/bqplot) library. If you are working in jupyter notebooks then you should definitely check out bqplot!\n", "\n", "\n", "### Data preprocessing" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# this cell was taken wholesale from the bqplot example \n", "# bqplot is under the apache license, see their license file here:\n", "# https://github.com/bqplot/bqplot/blob/55152feb645b523faccb97ea4083ca505f26f6a2/LICENSE\n", "data = pd.read_json('nations.json')\n", "def clean_data(data):\n", " for column in ['income', 'lifeExpectancy', 'population']:\n", " data = data.drop(data[data[column].apply(len) <= 4].index)\n", " return data\n", "\n", "def extrap_interp(data):\n", " data = np.array(data)\n", " x_range = np.arange(1800, 2009, 1.)\n", " y_range = np.interp(x_range, data[:, 0], data[:, 1])\n", " return y_range\n", "\n", "def extrap_data(data):\n", " for column in ['income', 'lifeExpectancy', 'population']:\n", " data[column] = data[column].apply(extrap_interp)\n", " return data\n", "data = clean_data(data)\n", "data = extrap_data(data)\n", "income_min, income_max = np.min(data['income'].apply(np.min)), np.max(data['income'].apply(np.max))\n", "life_exp_min, life_exp_max = np.min(data['lifeExpectancy'].apply(np.min)), np.max(data['lifeExpectancy'].apply(np.max))\n", "pop_min, pop_max = np.min(data['population'].apply(np.min)), np.max(data['population'].apply(np.max))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define functions to provide the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def x(year):\n", " return data[\"income\"].apply(lambda x: x[year - 1800])\n", "\n", "\n", "def y(x, year):\n", " return data[\"lifeExpectancy\"].apply(lambda x: x[year - 1800])\n", "\n", "\n", "def s(x, y, year):\n", " pop = data[\"population\"].apply(lambda x: x[year - 1800])\n", " return 6000 * pop.values / pop_max\n", "\n", "\n", "regions = data[\"region\"].unique().tolist()\n", "c = data[\"region\"].apply(lambda x: list(TABLEAU_COLORS)[regions.index(x)]).values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Marvel at data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gif": "scatter6.png" }, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(10, 4.8))\n", "controls = iplt.scatter(\n", " x,\n", " y,\n", " s=s,\n", " year=np.arange(1800, 2009),\n", " c=c,\n", " edgecolors=\"k\",\n", " slider_formats=\"{:d}\",\n", " play_buttons=True,\n", " play_button_pos=\"left\",\n", ")\n", "fs = 15\n", "ax.set_xscale(\"log\")\n", "ax.set_ylim([0, 100])\n", "ax.set_xlim([200, income_max * 1.05])\n", "ax.set_xlabel(\"Income\", fontsize=fs)\n", "_ = ax.set_ylabel(\"Life Expectancy\", fontsize=fs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }