{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Brier score\n",
    "\n",
    "The Brier score is the most commonly used verification metric for evaluating a probability of a binary outcome forecast, such as a \"chance of rainfall\" forecast.\n",
    "\n",
    "Probabilistic forecasts of binary events are expressed as values between 0 and 1, and observations are exactly 0 (event did not occur), or 1 (event occured).\n",
    "\n",
    "The metric is then calculated the same way as MSE. The Brier score is a [strictly proper scoring rule](https://sites.stat.washington.edu/people/raftery/Research/PDF/Gneiting2007jasa.pdf) where lower values are better (it is negatively oriented) where a perfect score is 0 and the worst score is 1.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scores.probability import brier_score\n",
    "from scipy.stats import beta, binom\n",
    "\n",
    "import numpy as np\n",
    "import xarray as xr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# To learn more about the implemenation of the Brier score, uncomment the following\n",
    "# help(brier_score)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We generate two synthetic forecasts. By design, `fcst1` is a good forecast, while `fcst2` is a poor forecast. We measure the difference in skill by calculating and comparing their Brier Scores."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "fcst1 = beta.rvs(2, 1, size=1000)\n",
    "obs = binom.rvs(1, fcst1)\n",
    "fcst2 = beta.rvs(0.5, 1, size=1000)\n",
    "fcst1 = xr.DataArray(data=fcst1, dims=\"time\", coords={\"time\": np.arange(0, 1000)})\n",
    "fcst2 = xr.DataArray(data=fcst2, dims=\"time\", coords={\"time\": np.arange(0, 1000)})\n",
    "obs = xr.DataArray(data=obs, dims=\"time\", coords={\"time\": np.arange(0, 1000)})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Brier score for fcst1 = 0.16\n",
      "Brier score for fcst2 = 0.43\n"
     ]
    }
   ],
   "source": [
    "brier_fcst1 = brier_score(fcst1, obs)\n",
    "brier_fcst2 = brier_score(fcst2, obs)\n",
    "\n",
    "print(f\"Brier score for fcst1 = {brier_fcst1.item():.2f}\")\n",
    "print(f\"Brier score for fcst2 = {brier_fcst2.item():.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected, fcst1 has the lower Brier Score quantifying the degree to which it is better than fcst2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notes\n",
    "- If you are using the Brier score on large data with Dask, consider setting `check_args` arg to `False` in `brier_score`. \n",
    "- In the future, the Brier score components calculation will be added.\n",
    "- You may be interested in working through the Murphy Diagram tutorial which allows you to break down the performance of the Brier score based on each threshold probability.\n",
    "\n",
    "**Reference**: [Brier, G.W., 1950. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1), pp.1-3.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=feee6551179612b9691f021b583d8a99b81b9b86)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}