{ "cells": [ { "cell_type": "markdown", "id": "63d2cb88", "metadata": {}, "source": [ "# Descriptive Statistics\n", "\n", "Numpy offers many statistical functions, but if you want to obtain several\n", "statistical variables from the same array, it' s necessary to process the data\n", "several times to calculate the various parameters. This example shows how to use\n", "the `DescriptiveStatistics` class to obtain several statistical variables with a\n", "single calculation. Also, the calculation algorithm is incremental and is more\n", "numerically stable. \n", "\n", "---\n", "**Reference**\n", "\n", "Pébay, P., Terriberry, T.B., Kolla, H. et al.\n", "Numerically stable, scalable formulas for parallel and online\n", " computation of higher-order multivariate central moments\n", " with arbitrary weights.\n", "Comput Stat 31, 1305–1325,\n", "2016,\n", "https://doi.org/10.1007/s00180-015-0637-z\n", "\n", "---\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b6356254", "metadata": {}, "outputs": [], "source": [ "import dask.array\n", "import numpy\n", "import pyinterp" ] }, { "cell_type": "markdown", "id": "d67101e4", "metadata": {}, "source": [ "Create a random array" ] }, { "cell_type": "code", "execution_count": null, "id": "52a782e4", "metadata": {}, "outputs": [], "source": [ "values = numpy.random.random_sample((2, 4, 6, 8))" ] }, { "cell_type": "markdown", "id": "520cf30f", "metadata": {}, "source": [ "Create a DescriptiveStatistics object." ] }, { "cell_type": "code", "execution_count": null, "id": "07af71d5", "metadata": {}, "outputs": [], "source": [ "ds = pyinterp.DescriptiveStatistics(values)" ] }, { "cell_type": "markdown", "id": "76596226", "metadata": {}, "source": [ "The constructor will calculate the\n", "statistical variables on the provided data. The calculated variables are\n", "stored in the instance and can be accessed using different methods:\n", "- mean\n", "- var\n", "- std\n", "- skewness\n", "- kurtosis\n", "- min\n", "- max\n", "- sum\n", "- sum_of_weights\n", "- count" ] }, { "cell_type": "code", "execution_count": null, "id": "2b7d0937", "metadata": {}, "outputs": [], "source": [ "ds.count()" ] }, { "cell_type": "code", "execution_count": null, "id": "9e0cf65f", "metadata": {}, "outputs": [], "source": [ "ds.mean()" ] }, { "cell_type": "markdown", "id": "c6de8f16", "metadata": {}, "source": [ "It's possible to get a structured numpy array containing the different\n", "statistical variables calculated." ] }, { "cell_type": "code", "execution_count": null, "id": "814f6571", "metadata": {}, "outputs": [], "source": [ "ds.array()" ] }, { "cell_type": "markdown", "id": "58dd3642", "metadata": {}, "source": [ "Like numpy, it's possible to compute statistics along axis." ] }, { "cell_type": "code", "execution_count": null, "id": "572c7f83", "metadata": {}, "outputs": [], "source": [ "ds = pyinterp.DescriptiveStatistics(values, axis=(1, 2))\n", "ds.mean()" ] }, { "cell_type": "markdown", "id": "7d51b332", "metadata": {}, "source": [ "The class can also process a dask array. In this case, the call to the\n", "constructor triggers the calculation." ] }, { "cell_type": "code", "execution_count": null, "id": "80bbc72d", "metadata": {}, "outputs": [], "source": [ "ds = pyinterp.DescriptiveStatistics(\n", " dask.array.from_array(values, chunks=(2, 2, 2, 2)),\n", " axis=(1, 2))\n", "ds.mean()" ] }, { "cell_type": "markdown", "id": "2936e6b2", "metadata": {}, "source": [ "Finally, it's possible to calculate weighted statistics." ] }, { "cell_type": "code", "execution_count": null, "id": "67da8b94", "metadata": {}, "outputs": [], "source": [ "weights = numpy.random.random_sample((2, 4, 6, 8))\n", "ds = pyinterp.DescriptiveStatistics(values, weights=weights, axis=(1, 2))\n", "ds.mean()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.11" } }, "nbformat": 4, "nbformat_minor": 5 }