{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## xskillscore-tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Welcome to the [xskillscore](https://github.com/raybellwaves/xskillscore) tutorial.\n",
    "\n",
    "This was created for a talk at the [Data Science Study Group: South Florida](https://www.meetup.com/Data-Science-Study-Group-South-Florida/) on April 1 st 2020. The associated slides with the talk can be found [here](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/xskillscore-tutorial.pdf).\n",
    "\n",
    "The repository for this tutorial is hosted on GitHub here: [xskillscore-tutorial](https://github.com/raybellwaves/xskillscore-tutorial)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Motivation for xskillscore"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`xskillscore` provides a one-stop shop for metrics used in verification of forecasts.\n",
    "\n",
    "It is an extension of [`xarray`](http://xarray.pydata.org/en/stable/) which is a library that handles labelled n-dimensional arrays. Find out more information about `xarray` [here](http://xarray.pydata.org/en/stable/why-xarray.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## History of xskillscore"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`xskillscore` was developed by Ray Bell while at the University of Miami during the [SubX project](https://journals.ametsoc.org/doi/full/10.1175/BAMS-D-18-0270.1) in 2018.\n",
    "\n",
    "In 2019, Aaron Spring, Andrew Huang and Riley Brady greatly improved `xskillscore`. Aaron, Andrew and Riley provided upstream fixes and enhancement of `xskillscore` as it used extensively in [climpred](https://climpred.readthedocs.io/en/stable/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## xskillscore overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The verification metrics in `xskillscore` are split into two types: **deterministic** and **probabilistic**.\n",
    "\n",
    "**Deterministic** metrics consist of correlation metrics (e.g. [pearson r](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient)) and distance metrics (e.g. [root-mean-square error](https://en.wikipedia.org/wiki/Root-mean-square_deviation)). These metrics adapt the implementation in [`scikit-learn`](https://scikit-learn.org/stable/) and [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html).\n",
    "\n",
    "**Probabilistic** metrics can be calculated when the forecast consists of multiple forecasts for the same target. Examples, include [Continuous Ranked Probability Score](https://climpred.readthedocs.io/en/stable/metrics.html#continuous-ranked-probability-score-crps) and [Brier Score](https://journals.ametsoc.org/doi/abs/10.1175/1520-0493%281950%29078%3C0001%3AVOFEIT%3E2.0.CO%3B2).\n",
    "\n",
    "`xskillscore` works on `xarray` objects which requires data to be castable to an `ndarray`. It works with [`numpy.array`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html), [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) and [`dask.array`](https://docs.dask.org/en/latest/array.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see the metrics available in `xskillscore` by running `dir(xs)`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['XSkillScoreAccessor',\n",
       " '__builtins__',\n",
       " '__cached__',\n",
       " '__doc__',\n",
       " '__file__',\n",
       " '__loader__',\n",
       " '__name__',\n",
       " '__package__',\n",
       " '__path__',\n",
       " '__spec__',\n",
       " 'brier_score',\n",
       " 'core',\n",
       " 'crps_ensemble',\n",
       " 'crps_gaussian',\n",
       " 'crps_quadrature',\n",
       " 'effective_sample_size',\n",
       " 'mae',\n",
       " 'mape',\n",
       " 'median_absolute_error',\n",
       " 'mse',\n",
       " 'pearson_r',\n",
       " 'pearson_r_eff_p_value',\n",
       " 'pearson_r_p_value',\n",
       " 'r2',\n",
       " 'rmse',\n",
       " 'smape',\n",
       " 'spearman_r',\n",
       " 'spearman_r_eff_p_value',\n",
       " 'spearman_r_p_value',\n",
       " 'threshold_brier_score']"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import xskillscore as xs\n",
    "dir(xs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table of Contents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [01_Deterministic.ipynb](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/01_Determinisitic.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook I show how `xskillscore` can be dropped in a typical data science task where the data is a [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).\n",
    "\n",
    "I use the metric root-mean-squared error (RMSE) to verify forecasts of items sold.\n",
    "\n",
    "I also show how you can applies weights to the verification and handle missing values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [02_Probabilistic.ipynb](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/02_Probabilistic.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook shows how to use probabilistic metrics in a typical data science task where the data is a [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).\n",
    "\n",
    "The metric Continuous Ranked Probability Score (CRPS) is used to verify multiple forecasts for the same target."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [03_Big_Data.ipynb](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/03_Big_Data.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`xarray` can handle big data, therefore `xskillscore` can handle big data.\n",
    "\n",
    "In this notebook I verify 12 million forecasts in a couple of seconds using the RMSE metric on a `dask.array`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This tutorial was adapted from the [dask-tutorial](https://github.com/dask/dask-tutorial).\n",
    "\n",
    "The interactive session is hosted by [Binder](https://mybinder.readthedocs.io/en/latest/) \n",
    "and runs on [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}