{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Brier score\n", "\n", "The Brier score is the most commonly used verification metric for evaluating a probability of a binary outcome forecast, such as a \"chance of rainfall\" forecast.\n", "\n", "Probabilistic forecasts of binary events are expressed as values between 0 and 1, and observations are exactly 0 (event did not occur), or 1 (event occured).\n", "\n", "The metric is then calculated the same way as MSE. The Brier score is a [strictly proper scoring rule](https://sites.stat.washington.edu/people/raftery/Research/PDF/Gneiting2007jasa.pdf) where lower values are better (it is negatively oriented) where a perfect score is 0 and the worst score is 1.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from scores.probability import brier_score\n", "from scipy.stats import beta, binom\n", "\n", "import numpy as np\n", "import xarray as xr" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# To learn more about the implemenation of the Brier score, uncomment the following\n", "# help(brier_score)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We generate two synthetic forecasts. By design, `fcst1` is a good forecast, while `fcst2` is a poor forecast. We measure the difference in skill by calculating and comparing their Brier Scores." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "\n", "fcst1 = beta.rvs(2, 1, size=1000)\n", "obs = binom.rvs(1, fcst1)\n", "fcst2 = beta.rvs(0.5, 1, size=1000)\n", "fcst1 = xr.DataArray(data=fcst1, dims=\"time\", coords={\"time\": np.arange(0, 1000)})\n", "fcst2 = xr.DataArray(data=fcst2, dims=\"time\", coords={\"time\": np.arange(0, 1000)})\n", "obs = xr.DataArray(data=obs, dims=\"time\", coords={\"time\": np.arange(0, 1000)})" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Brier score for fcst1 = 0.16\n", "Brier score for fcst2 = 0.43\n" ] } ], "source": [ "brier_fcst1 = brier_score(fcst1, obs)\n", "brier_fcst2 = brier_score(fcst2, obs)\n", "\n", "print(f\"Brier score for fcst1 = {brier_fcst1.item():.2f}\")\n", "print(f\"Brier score for fcst2 = {brier_fcst2.item():.2f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, fcst1 has the lower Brier Score quantifying the degree to which it is better than fcst2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notes\n", "- If you are using the Brier score on large data with Dask, consider setting `check_args` arg to `False` in `brier_score`. \n", "- In the future, the Brier score components calculation will be added.\n", "- You may be interested in working through the Murphy Diagram tutorial which allows you to break down the performance of the Brier score based on each threshold probability.\n", "\n", "**Reference**: [Brier, G.W., 1950. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1), pp.1-3.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=feee6551179612b9691f021b583d8a99b81b9b86)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 4 }