{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## xskillscore-tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to the [xskillscore](https://github.com/raybellwaves/xskillscore) tutorial.\n", "\n", "This was created for a talk at the [Data Science Study Group: South Florida](https://www.meetup.com/Data-Science-Study-Group-South-Florida/) on April 1 st 2020. The associated slides with the talk can be found [here](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/xskillscore-tutorial.pdf).\n", "\n", "The repository for this tutorial is hosted on GitHub here: [xskillscore-tutorial](https://github.com/raybellwaves/xskillscore-tutorial)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Motivation for xskillscore" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`xskillscore` provides a one-stop shop for metrics used in verification of forecasts.\n", "\n", "It is an extension of [`xarray`](http://xarray.pydata.org/en/stable/) which is a library that handles labelled n-dimensional arrays. Find out more information about `xarray` [here](http://xarray.pydata.org/en/stable/why-xarray.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## History of xskillscore" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`xskillscore` was developed by Ray Bell while at the University of Miami during the [SubX project](https://journals.ametsoc.org/doi/full/10.1175/BAMS-D-18-0270.1) in 2018.\n", "\n", "In 2019, Aaron Spring, Andrew Huang and Riley Brady greatly improved `xskillscore`. Aaron, Andrew and Riley provided upstream fixes and enhancement of `xskillscore` as it used extensively in [climpred](https://climpred.readthedocs.io/en/stable/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## xskillscore overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The verification metrics in `xskillscore` are split into two types: **deterministic** and **probabilistic**.\n", "\n", "**Deterministic** metrics consist of correlation metrics (e.g. [pearson r](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient)) and distance metrics (e.g. [root-mean-square error](https://en.wikipedia.org/wiki/Root-mean-square_deviation)). These metrics adapt the implementation in [`scikit-learn`](https://scikit-learn.org/stable/) and [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html).\n", "\n", "**Probabilistic** metrics can be calculated when the forecast consists of multiple forecasts for the same target. Examples, include [Continuous Ranked Probability Score](https://climpred.readthedocs.io/en/stable/metrics.html#continuous-ranked-probability-score-crps) and [Brier Score](https://journals.ametsoc.org/doi/abs/10.1175/1520-0493%281950%29078%3C0001%3AVOFEIT%3E2.0.CO%3B2).\n", "\n", "`xskillscore` works on `xarray` objects which requires data to be castable to an `ndarray`. It works with [`numpy.array`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html), [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) and [`dask.array`](https://docs.dask.org/en/latest/array.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see the metrics available in `xskillscore` by running `dir(xs)`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['XSkillScoreAccessor',\n", " '__builtins__',\n", " '__cached__',\n", " '__doc__',\n", " '__file__',\n", " '__loader__',\n", " '__name__',\n", " '__package__',\n", " '__path__',\n", " '__spec__',\n", " 'brier_score',\n", " 'core',\n", " 'crps_ensemble',\n", " 'crps_gaussian',\n", " 'crps_quadrature',\n", " 'effective_sample_size',\n", " 'mae',\n", " 'mape',\n", " 'median_absolute_error',\n", " 'mse',\n", " 'pearson_r',\n", " 'pearson_r_eff_p_value',\n", " 'pearson_r_p_value',\n", " 'r2',\n", " 'rmse',\n", " 'smape',\n", " 'spearman_r',\n", " 'spearman_r_eff_p_value',\n", " 'spearman_r_p_value',\n", " 'threshold_brier_score']" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import xskillscore as xs\n", "dir(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Contents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [01_Deterministic.ipynb](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/01_Determinisitic.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook I show how `xskillscore` can be dropped in a typical data science task where the data is a [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).\n", "\n", "I use the metric root-mean-squared error (RMSE) to verify forecasts of items sold.\n", "\n", "I also show how you can applies weights to the verification and handle missing values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [02_Probabilistic.ipynb](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/02_Probabilistic.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook shows how to use probabilistic metrics in a typical data science task where the data is a [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).\n", "\n", "The metric Continuous Ranked Probability Score (CRPS) is used to verify multiple forecasts for the same target." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [03_Big_Data.ipynb](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/03_Big_Data.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`xarray` can handle big data, therefore `xskillscore` can handle big data.\n", "\n", "In this notebook I verify 12 million forecasts in a couple of seconds using the RMSE metric on a `dask.array`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial was adapted from the [dask-tutorial](https://github.com/dask/dask-tutorial).\n", "\n", "The interactive session is hosted by [Binder](https://mybinder.readthedocs.io/en/latest/) \n", "and runs on [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 4 }