{ "cells": [ { "cell_type": "markdown", "id": "ec2bdd49-2e83-4dd9-96ce-1b9a73f1ad45", "metadata": {}, "source": [ "# Introduction to weighting scores\n", "\n", "A common task or goal in verification is to understand how the accuracy of a model varies, or account for it.\n", "\n", "One of the most common factors to take into account is that most geospatial data arrays don't represent equal amounts of physical area at each coordinate, particularly for one of the most common representations, the \"LLXY\" representation whereby each part of the array represents some even subdivision of latitude and longitude. Without going into the details on why this happens, what's important to know is that those apparently equal subdivisions of latitude do not represent an equal area. Lines of longitude are physically closer together nearer the poles, and are further apart where latitude equals zero. This is taken into account by reducing the values towards the poles by 'weighting' the results. In this case, all the weights are less than or equal to 1.\n", "\n", "Another common weighting is to account for the effect of accuracy on people, and so the results may be weighted by population density. In this case, the weightings could be greater than one, and increase the values in some places, depending on the expression of density. Any normalisation is the responsibility of the user when creating the weightings array. One approach could be to divide the weightings array by its maximum weight.\n", "\n", "Weighting in this context means multiplying. Internally to the scores package, the process is as follows:\n", "\n", "1. Calculate the underlying error metric for the score (e.g. absolute error)\n", "2. Multiply those errors by a factor supplied to the algorithm (for example, latitude weighting or population density)\n", "3. Perform dimensionality reduction (e.g. calculate the mean) of the weighted value\n", "\n", "It is important to realise that this factor can greatly distort the intuitive meaning of the scores. Latitude weighting apply a maximum weighting of 1 at the equator (so no change), and reduce the errors significantly towards the poles, as the area represented by each region also reduces significatly (going to zero in the extreme). Latitude weighting by cosine (the method implemented in this package) is inherently normalised between zero and one.\n", "\n", "Population density, by contrast, may not be normalised naturally. It could be expressed as a number of people per kilometer. In this case, perhaps it's appropriate to weight by both population density AND latitude/area. Perhaps it would also be useful to mask out oceans and lakes, since those areas don't impact the population in the same way.\n", "\n", "Sometimes, it's useful to calculate a few different perspectives at once. A more complex example might be to compare the latitude-weighted score to a population-weighted one, meaning both things need to be collected.\n", "\n", "This notebook will go through some examples from the simple to the complex, showing both the importance and significance of considering weighting when calculating verification scores.\n", "\n", "**Note:** In this tutorial we use the forecast and analysis grids that are downloaded or derived in `First_Data_Fetching.ipynb`. Please run through this tutorial first to fetch data." ] }, { "cell_type": "code", "execution_count": 1, "id": "bebeb555-b58a-40dc-87ab-d5572faf9fb5", "metadata": {}, "outputs": [], "source": [ "import io\n", "import pandas\n", "import scores\n", "import xarray\n", "import zipfile\n", "\n", "# Note - while not imported, xarray depends on rasterio and rioxarray being installed to load the geotiffs\n", "# for exploring population density in the latter part of the notebook" ] }, { "cell_type": "code", "execution_count": 2, "id": "4d4b2938-73a4-43e7-93ea-accb49fbf413", "metadata": {}, "outputs": [], "source": [ "# Here we consider the errors at 4 days lead time into the prediction, at a specific hour, compared to the analysis for that time step.\n", "forecast = xarray.open_dataset('forecast_grid.nc')\n", "analysis = xarray.open_dataset('analysis_grid.nc')\n", "time_step_of_interest = forecast.temp_scrn[24*4-1]" ] }, { "cell_type": "code", "execution_count": 3, "id": "60207ad5-1da3-42ea-b197-308e5999b281", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The maximum weighting in the array is 0.9999994770895914. This has an insignificant floating point rounding error.\n" ] } ], "source": [ "# The standard latitude weight array has a magnitude of around 1 at the equator, and reduce to zero approaching the poles\n", "weights = scores.functions.create_latitude_weights(analysis.lat)\n", "print(f\"The maximum weighting in the array is {weights.max().values}. This has an insignificant floating point rounding error.\")" ] }, { "cell_type": "code", "execution_count": 4, "id": "0a664c02-163f-4dbe-bdeb-2069ccd9a4eb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.DataArray 'lat' (lat: 1536)>\n",
"array([0.00102266, 0.00306796, 0.00511325, ..., 0.00511325, 0.00306796,\n",
" 0.00102266])\n",
"Coordinates:\n",
" * lat (lat) float64 89.94 89.82 89.71 89.59 ... -89.71 -89.82 -89.94\n",
"Attributes:\n",
" long_name: latitudes\n",
" type: uniform\n",
" units: degrees_north\n",
" valid_min: -90.0\n",
" valid_max: 90.0\n",
" axis: Y