{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "# Sea ice forecasting using IceNet"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Context\n",
    "### Purpose\n",
    "Demonstrate IceNet, a deep learning sea ice forecasting system trained using climate simulations and observational data.\n",
    "\n",
    "### Modelling approach\n",
    "**IceNet** is a probabilistic, deep learning sea ice forecasting system. The model, an ensemble of U-Net networks, learns how sea ice changes from climate simulations and observational data to forecast up to 6 months of monthly-averaged sea ice concentration maps at 25 km resolution. IceNet advances the range of accurate sea ice forecasts, outperforming a state-of-the-art dynamical model in seasonal forecasts of summer sea ice, particularly for extreme sea ice events. IceNet was implemented in Python 3.7 using TensorFlow v2.2.0. Further details can be found in the Nature Communications paper *Seasonal Arctic sea ice forecasting with probabilistic deep learning* {cite:p}`Andersson2021.\n",
    "\n",
    "### Highlights\n",
    "* Clone and access IceNet's codebase to produce seasonal Arctic sea ice forecasts using 3 out of 25 five pre-trained IceNet models [downloaded from the Polar Data Centre](https://doi.org/10.5285/71820e7d-c628-4e32-969f-464b7efb187c).\n",
    "* Forecast a single year, 2020, using IceNet's preprocessed environmental input data downloaded from a Zenodo repository.\n",
    "* Visualise IceNet’s seasonal ice edge predictions at 4- to 1-month lead times.\n",
    "* Interactive plots comparing IceNet predictions against ECMWF SEAS5 physics-based sea ice concentration and a linear trend statistical benchmark.\n",
    "\n",
    ":::{note}\n",
    "The notebook contributors acknowledge the IceNet developers for providing a fully reproducible and public code available at [https://github.com/tom-andersson/icenet-paper](https://github.com/tom-andersson/icenet-paper). Some snippets from IceNet's source code were adapted to this notebook.\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clone the IceNet GitHub repo"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "source": "!git clone -q https://github.com/tom-andersson/icenet-paper.git notebook",
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install dependencies (only for the Pangeo JupyterHub)"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import os\n",
    "\n",
    "if os.environ['CONDA_DEFAULT_ENV'] == 'notebook':\n",
    "    !pip -q install tensorflow\n",
    "    !pip -q install scitools-iris"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load libraries"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "source": [
    "# system\n",
    "import sys\n",
    "sys.path.insert(0, os.path.join(os.getcwd(), 'notebook', 'icenet'))\n",
    "\n",
    "# data\n",
    "import json\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import xarray as xr\n",
    "\n",
    "# custom functions from the icenet repo\n",
    "from utils import IceNetDataLoader, create_results_dataset_index, arr_to_ice_edge_arr\n",
    "\n",
    "# modelling\n",
    "from tensorflow.keras.models import load_model\n",
    "\n",
    "# plotting\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib.figure import Figure\n",
    "from matplotlib.backends.backend_agg import FigureCanvas\n",
    "from matplotlib.offsetbox import AnchoredText\n",
    "\n",
    "import holoviews as hv\n",
    "\n",
    "import hvplot.pandas\n",
    "import hvplot.xarray\n",
    "\n",
    "from bokeh.models.formatters import DatetimeTickFormatter\n",
    "\n",
    "import panel as pn\n",
    "pn.extension()\n",
    "\n",
    "# utils\n",
    "import urllib.request\n",
    "import re\n",
    "from tqdm.notebook import tqdm\n",
    "import calendar\n",
    "from pprint import pprint\n",
    "import warnings\n",
    "warnings.filterwarnings(action='ignore')\n",
    "\n",
    "pd.options.display.max_columns = 10\n",
    "hv.extension('bokeh', width=100)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set project structure\n",
    "\n",
    "Let's follow the structure of the IceNet paper as it is indicated in the source code [config.py](https://github.com/tom-andersson/icenet-paper/blob/main/icenet/config.py) file. The structure allows conveniently using IceNet's custom data loader."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# data folder\n",
    "data_folder = './data'\n",
    "notebook_folder = './notebook'\n",
    "\n",
    "config = {\n",
    "    'obs_data_folder': os.path.join(data_folder, 'obs'),\n",
    "    'mask_data_folder': os.path.join(data_folder, 'masks'),\n",
    "    'forecast_data_folder': os.path.join(data_folder, 'forecasts'),\n",
    "    'network_dataset_folder': os.path.join(data_folder, 'network_datasets'),\n",
    "    'dataloader_config_folder': os.path.join(data_folder, 'dataloader_configs'),\n",
    "    'network_h5_files_folder': os.path.join(data_folder, 'networks'),\n",
    "    'forecast_results_folder': os.path.join(data_folder, 'results'),\n",
    "}\n",
    "\n",
    "# Generate the folder structure through a list of comprehension\n",
    "[os.makedirs(val) for key, val in config.items() if not os.path.exists(val)]"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download input data and models\n",
    "\n",
    "IceNet consists of 25 ensemble members i.e. models. For this demonstrator, we only download three of them to reduce computational cost (note that this will reduce performance compared with the full ensemble). We also fetch analysis-ready i.e. preprocessed data of climate observations, ground truth sea ice concentration (SIC) and a IceNet's project configuration file from a Zenodo repository. Finally, we call a script from the IceNet paper repo to generate masks required for computing metrics and visualisation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download pretrained IceNet models\n",
    "\n",
    "Let's download 3 out of 25 ensemble members [retrieved from the Polar Data Centre](https://doi.org/10.5285/71820e7d-c628-4e32-969f-464b7efb187c). The models are numbered from 36 to 60. For this example we use the networks 36, 42 and 53. It is worth to mention other pre-computed results from the Nature Communications paper can be downloaded including output results table, uncertainty, netCDF forecast of the 25 ensemble members, among others."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "url = 'https://ramadda.data.bas.ac.uk/repository/entry/get/'\n",
    "\n",
    "target_networks = [36, 42, 53]\n",
    "\n",
    "for network in target_networks:\n",
    "    urllib.request.urlretrieve(url + f'network_tempscaled_{network}.h5?entryid=synth%3A71820e7d-c628-4e32-969f-464b7efb187c%3AL25ldXJhbF9uZXR3b3JrX21vZGVsL25ldHdvcmtfdGVtcHNjYWxlZF8zNi5oNQ%3D%3D',\n",
    "                               os.path.join(config['network_h5_files_folder'],f'network_tempscaled_{network}.h5'))"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download ERA5 data (climate observations)\n",
    "\n",
    "Let's download analysis-ready i.e. preprocessed ERA5 observations from a zenodo repository.\n",
    "\n",
    ":::{note}\n",
    "The analysis-ready data were generated by running the script `python3 icenet/preproc_icenet_data.py` in step **3.2) Preprocess the raw data** according to the [icenet-paper repository](https://github.com/tom-andersson/icenet-paper). The scripts normalise the raw NetCDF data, downloaded using the bash file `./download_era5_data_in_parallel.sh` (see the step **2) Download data**), and saves it as monthly `NumPy` files.\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "filename = 'dataset1.zip'\n",
    "url = f'https://zenodo.org/record/5516869/files/{filename}?download=1'\n",
    "\n",
    "if not os.path.isfile(config['network_dataset_folder'] + '/dataset1.zip') or os.path.getsize(config['network_dataset_folder'] + '/dataset1.zip') == 0:\n",
    "    urllib.request.urlretrieve(url, config['network_dataset_folder'] + '/dataset1.zip')\n",
    "    !unzip -qq ./data/network_datasets/dataset1.zip -d ./data/network_datasets"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download ground truth SIC\n",
    "\n",
    "We additionally download analysis-ready i.e. ground truth SIC data from a zenodo repository.\n",
    "\n",
    ":::{note}\n",
    "The analysis-ready ground truth SIC data were generated by running the script `python3 icenet/download_sic_data.py` in step **2) Download data** according to the [icenet-paper repository](https://github.com/tom-andersson/icenet-paper). The script downloads and concatenate [OSI-SAF SIC data](https://osisaf-hl.met.no/v2p1-sea-ice-index), OSI-450 (1979-2015) and OSI-430-b (2016-ownards), and saves it as monthly averages in a `netCDF` file.\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "filename = 'siconca_EASE.nc'\n",
    "url = f'https://zenodo.org/record/5516869/files/{filename}?download=1'\n",
    "\n",
    "if not os.path.isfile(filename) or os.path.getsize(filename) == 0:\n",
    "    urllib.request.urlretrieve(url, config['obs_data_folder'] + '/' + filename)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download mask\n",
    "\n",
    "The script `icenet/gen_masks.py` generates masks for land, the polar holes, OSI-SAF monthly maximum ice extent (the *active\n",
    "grid cell region*), and the Arctic regions & coastline. Figures of the\n",
    "masks are saved in the **./figures** folder."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "!python notebook/icenet/gen_masks.py"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data loader\n",
    "\n",
    "The following lines show how to download and read a given IceNet's configuration `JSON` file into a custom loader, **IceNetDataLoader**. The loader conveniently dictates which variables are input to the networks, which climate simulations are used for pre-training, and how far ahead to forecast."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "dataloader_ID = '2021_09_03_1300_icenet_demo.json'\n",
    "url = f'https://zenodo.org/record/5516869/files/{dataloader_ID}?download=1'\n",
    "\n",
    "if not os.path.isfile(config['dataloader_config_folder'] + '/' + dataloader_ID) or os.path.getsize(config['dataloader_config_folder'] + '/' + dataloader_ID) == 0:\n",
    "    urllib.request.urlretrieve(url, config['dataloader_config_folder'] + '/' + dataloader_ID)\n",
    "\n",
    "with open(config['dataloader_config_folder'] + '/' + dataloader_ID, 'r') as readfile:\n",
    "    dataloader_config = json.load(readfile)\n",
    "\n",
    "pprint(dataloader_config['input_data'])"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `input_data` element of the IceNet's `JSON` file lists input variables and corresponding settings. We use the same input data as in the Nature Communications paper which consists of SIC, 11 climate variables, statistical SIC forecasts, and metadata (see [Supplementary Table 2](https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-021-25257-4/MediaObjects/41467_2021_25257_MOESM1_ESM.pdf)). These layers are stacked in an identical manner to the RGB channels of a traditional image, amounting to 50 channels in total."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Load dataloader\n",
    "dataloader_config_fpath = os.path.join(config['dataloader_config_folder'], dataloader_ID)\n",
    "\n",
    "# Data loader\n",
    "print(\"\\nSetting up the data loader with config file: {}\\n\\n\".format(dataloader_ID))\n",
    "dataloader = IceNetDataLoader(dataloader_config_fpath)\n",
    "print('\\n\\nDone.\\n')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load networks\n",
    "\n",
    "Let's also load the ensemble IceNet's members using the `load_model` function imported from Keras API with Tensorflow backend."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "network_regex = re.compile('^network_tempscaled_([0-9]*).h5$')\n",
    "\n",
    "network_fpaths = [os.path.join(config['network_h5_files_folder'], f) for f in\n",
    "                      sorted(os.listdir(config['network_h5_files_folder'])) if network_regex.match(f)]\n",
    "\n",
    "ensemble_seeds = [network_regex.match(f)[1] for f in\n",
    "                  sorted(os.listdir(config['network_h5_files_folder'])) if network_regex.match(f)]\n",
    "\n",
    "networks = []\n",
    "for network_fpath in network_fpaths:\n",
    "    print('Loading model from {}... '.format(network_fpath), end='', flush=True)\n",
    "    networks.append(load_model(network_fpath, compile=False))\n",
    "    print('Done.')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modelling\n",
    "\n",
    "### Forecast settings\n",
    "Now let's set the target model and forecast dates, start `forecast_start` (Jan 2020) and end `forecast_end` (Dec 2020). We also extract the number of forecast months from the IceNet's custom dataloader."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "model = 'IceNet'\n",
    "\n",
    "forecast_start = pd.Timestamp('2020-01-01')\n",
    "forecast_end = pd.Timestamp('2020-12-01')\n",
    "\n",
    "n_forecast_months = dataloader.config['n_forecast_months']\n",
    "print('\\n# of forecast months: {}\\n'.format(n_forecast_months))"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up forecast folder"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "forecast_folder = os.path.join(config['forecast_data_folder'], 'icenet', dataloader_ID, model)\n",
    "\n",
    "if not os.path.exists(forecast_folder):\n",
    "    os.makedirs(forecast_folder)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load ground truth SIC"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "print('Loading ground truth SIC... ', end='', flush=True)\n",
    "true_sic_fpath = os.path.join(config['obs_data_folder'], 'siconca_EASE.nc')\n",
    "true_sic_da = xr.open_dataarray(true_sic_fpath)\n",
    "print('Done.')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up forecast DataArray dictionary\n",
    "\n",
    "Now we are setting up an empty `xarray DataArray` object that we will use to store IceNet's forecasts. `DataArrays` let you conveniently handle, query and visualise spatio-temporal data, such as the forecast predictions generated by the IceNet system."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# define list of lead times\n",
    "leadtimes = np.arange(1, n_forecast_months+1)\n",
    "\n",
    "# add ensemble to the list of models\n",
    "ensemble_seeds_and_mean = ensemble_seeds.copy()\n",
    "ensemble_seeds_and_mean.append('ensemble')\n",
    "\n",
    "all_target_dates = pd.date_range(\n",
    "    start=forecast_start,\n",
    "    end=forecast_end,\n",
    "    freq='MS'\n",
    ")\n",
    "\n",
    "all_start_dates = pd.date_range(\n",
    "    start=forecast_start - pd.DateOffset(months=n_forecast_months-1),\n",
    "    end=forecast_end,\n",
    "    freq='MS'\n",
    ")\n",
    "\n",
    "shape = (len(all_target_dates),\n",
    "         *dataloader.config['raw_data_shape'],\n",
    "         n_forecast_months)\n",
    "\n",
    "coords = {\n",
    "    'time': all_target_dates,  # To be sliced to target dates\n",
    "    'yc': true_sic_da.coords['yc'],\n",
    "    'xc': true_sic_da.coords['xc'],\n",
    "    'lon': true_sic_da.isel(time=0).coords['lon'],\n",
    "    'lat': true_sic_da.isel(time=0).coords['lat'],\n",
    "    'leadtime': leadtimes,\n",
    "    'seed': ensemble_seeds_and_mean,\n",
    "    'ice_class': ['no_ice', 'marginal_ice', 'full_ice']\n",
    "}\n",
    "\n",
    "# Probabilistic SIC class forecasts\n",
    "dims = ('seed', 'time', 'yc', 'xc', 'leadtime', 'ice_class')\n",
    "shape = (len(ensemble_seeds_and_mean), *shape, 3)\n",
    "\n",
    "model_forecast = xr.DataArray(\n",
    "    data=np.zeros(shape, dtype=np.float32),\n",
    "    coords=coords,\n",
    "    dims=dims\n",
    ")"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build up forecasts\n",
    "\n",
    "In this step, we generate IceNet's forecast for the target period and write it into the empty `DataArrays` object. IceNet’s outputs are forecasts of three sea ice concentration (SIC) classes: open-water (SIC ≤ 15%), marginal ice (15% < SIC < 80%) and full ice (SIC ≥ 80%) for the following 6 months in the form of discrete probability distributions at each grid cell."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "for start_date in tqdm(all_start_dates):\n",
    "\n",
    "    # Target forecast dates for the forecast beginning at this `start_date`\n",
    "    target_dates = pd.date_range(\n",
    "        start=start_date,\n",
    "        end=start_date + pd.DateOffset(months=n_forecast_months-1),\n",
    "        freq='MS'\n",
    "    )\n",
    "\n",
    "    X, y, sample_weights = dataloader.data_generation([start_date])\n",
    "    mask = sample_weights > 0\n",
    "    pred = np.array([network.predict(X)[0] for network in networks])\n",
    "    pred *= mask  # mask outside active grid cell region to zero\n",
    "    # concat ensemble mean to the set of network predictions\n",
    "    ensemble_mean_pred = pred.mean(axis=0, keepdims=True)\n",
    "    pred = np.concatenate([pred, ensemble_mean_pred], axis=0)\n",
    "\n",
    "    for i, (target_date, leadtime) in enumerate(zip(target_dates, leadtimes)):\n",
    "        if target_date in all_target_dates:\n",
    "                model_forecast.\\\n",
    "                    loc[:, target_date, :, :, leadtime] = pred[..., i]\n",
    "                \n",
    "print('Saving forecast NetCDF for {}... '.format(model), end='', flush=True)\n",
    "\n",
    "forecast_fpath = os.path.join(forecast_folder, f'{model.lower()}_forecasts.nc'.format(model.lower()))\n",
    "model_forecast.to_netcdf(forecast_fpath) #export file as Net\n",
    "\n",
    "print('Done.')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Settings\n",
    "\n",
    "The IceNet codebase allows computing operations in the memory or with `dask`. The computation in dask is optimal for predicting longer target periods (see further info in [icenet/analyse_heldout_predictions.py](https://github.com/tom-andersson/icenet-paper/blob/27ca44694eaa3cb5f02fd824c618c46a6701a301/icenet/analyse_heldout_predictions.py#L23)). The following lines show how to compute in the memory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "metric_compute_list = ['Binary accuracy', 'SIE error']\n",
    "\n",
    "forecast_fpath = os.path.join(forecast_folder, f'{model.lower()}_forecasts.nc'.format(model.lower()))\n",
    "\n",
    "chunks = {'seed': 1}\n",
    "icenet_forecast_da = xr.open_dataarray(forecast_fpath, chunks=chunks)\n",
    "icenet_seeds = icenet_forecast_da.seed.values"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Monthly masks (active grid cell regions to compute metrics over)"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "mask_fpath_format = os.path.join(config['mask_data_folder'], 'active_grid_cell_mask_{}.npy')\n",
    "\n",
    "month_mask_da = xr.DataArray(np.array(\n",
    "    [np.load(mask_fpath_format.format('{:02d}'.format(month))) for\n",
    "     month in np.arange(1, 12+1)],\n",
    "))"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download previous results"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "url = 'https://ramadda.data.bas.ac.uk/repository/entry/get/'\n",
    "fn = '2021_07_01_183913_forecast_results.csv'\n",
    "fn_suffix = '?entryid=synth%3A71820e7d-c628-4e32-969f-464b7efb187c%3AL3Jlc3VsdHMvZm9yZWNhc3RfcmVzdWx0cy8yMDIxXzA3XzAxXzE4MzkxM19mb3JlY2FzdF9yZXN1bHRzLmNzdg%3D%3D'\n",
    "\n",
    "if not os.path.isfile(os.path.join(config['forecast_results_folder'],fn)):\n",
    "    urllib.request.urlretrieve(url + fn + fn_suffix, os.path.join(config['forecast_results_folder'],fn))"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initialise results dataframe\n",
    "\n",
    "Now we write forecast results over a old results file generated for IceNet's nature communications paper. The old results file contains the performance of all 25 ensemble models,  ECMWF SEAS5 physics-based sea ice probability forecast and linear trend benchmark. For the purposes of this demonstrator, we remove the IceNet's ensemble records to replace with the performance of 3 assessed ensemble models."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "now = pd.Timestamp.now()\n",
    "new_results_df_fname = now.strftime('%Y_%m_%d_%H%M%S_forecast_results.csv')\n",
    "new_results_df_fpath = os.path.join(config['forecast_results_folder'], new_results_df_fname)\n",
    "\n",
    "print('New results will be saved to {}\\n\\n'.format(new_results_df_fpath))\n",
    "\n",
    "results_df_fnames = sorted([f for f in os.listdir(config['forecast_results_folder']) if re.compile('.*.csv').match(f)])\n",
    "if len(results_df_fnames) >= 1:\n",
    "    old_results_df_fname = results_df_fnames[-1]\n",
    "    old_results_df_fpath = os.path.join(config['forecast_results_folder'], old_results_df_fname)\n",
    "    print('\\n\\nLoading previous results dataset from {}'.format(old_results_df_fpath))\n",
    "\n",
    "# Load previous results, do not interpret 'NA' as NaN\n",
    "results_df = pd.read_csv(old_results_df_fpath, keep_default_na=False, comment='#')\n",
    "\n",
    "# Remove existing IceNet results\n",
    "results_df = results_df[~results_df['Model'].str.startswith('IceNet')]\n",
    "\n",
    "# Drop spurious index column if present\n",
    "results_df = results_df.drop('Unnamed: 0', axis=1, errors='ignore')\n",
    "results_df['Forecast date'] = [pd.Timestamp(date) for date in results_df['Forecast date']]\n",
    "\n",
    "results_df = results_df.set_index(['Model', 'Ensemble member', 'Leadtime', 'Forecast date'])\n",
    "\n",
    "# Add new models to the dataframe\n",
    "multi_index = create_results_dataset_index([model], leadtimes, all_target_dates, model, icenet_seeds)\n",
    "results_df = results_df.append(pd.DataFrame(index=multi_index)).sort_index()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compute IceNet SIC"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We obtain the sea ice probability (SIC>15%) for each ensemble member and ensemble mean by summing IceNet’s marginal ice (15%<SIC<80%) and full ice class (SIC>80%) probabilities."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "icenet_sip_da = icenet_forecast_da.sel(ice_class=['marginal_ice', 'full_ice']).sum('ice_class')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ground truth SIC\n",
    "\n",
    "Let's also load ground truth SIC which was already preprocessed and generated from ."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "true_sic_fpath = os.path.join(config['obs_data_folder'], 'siconca_EASE.nc')\n",
    "true_sic_da = xr.open_dataarray(true_sic_fpath, chunks={})\n",
    "true_sic_da = true_sic_da.load()\n",
    "true_sic_da = true_sic_da.sel(time=all_target_dates)\n",
    "\n",
    "if 'Binary accuracy' in metric_compute_list:\n",
    "    binary_true_da = true_sic_da > 0.15"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Monthwise masks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we are showing in the next section, the monthly masks, stacked into a `DataArrays` object, are relevant to compute metrics only in the active grid cell region."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "months = [pd.Timestamp(date).month - 1 for date in all_target_dates]\n",
    "mask_da = xr.DataArray(\n",
    "    [month_mask_da[month] for month in months],\n",
    "    dims=('time', 'yc', 'xc'),\n",
    "    coords={\n",
    "        'time': true_sic_da.time.values,\n",
    "        'yc': true_sic_da.yc.values,\n",
    "        'xc': true_sic_da.xc.values,\n",
    "    }\n",
    ")"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compute performance metrics\n",
    "\n",
    "To analyse the forecast performance, IceNet's researchers compute two metrics, `Binary accuracy` and `Sea Ice Extent (SIE) error`. The former is generated over an active grid cell region for a given calendar month and can be seen as a normalised version of the integrated ice edge error (IIEE) (see further information of the meaning in Methods in the IceNet's [*Nature communications*](https://www.nature.com/articles/s41467-021-25257-4) paper. The latter, SIE error, is the difference between the overpredicted area and the underpredicted area. Both metrics are complementary, being the binary accuracy more robust for assessing IceNet’s relative seasonal forecast skill for September."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "print('Analysing forecasts: \\n\\n')\n",
    "\n",
    "print('Computing metrics:')\n",
    "print(metric_compute_list)\n",
    "\n",
    "binary_forecast_da = icenet_sip_da > 0.5\n",
    "\n",
    "compute_ds = xr.Dataset()\n",
    "for metric in metric_compute_list:\n",
    "\n",
    "    if metric == 'Binary accuracy':\n",
    "        binary_correct_da = (binary_forecast_da == binary_true_da).astype(np.float32)\n",
    "        binary_correct_weighted_da = binary_correct_da.weighted(mask_da)\n",
    "\n",
    "        # Mean percentage of correct classifications over the active\n",
    "        #   grid cell area\n",
    "        ds_binacc = (binary_correct_weighted_da.mean(dim=['yc', 'xc']) * 100)\n",
    "        compute_ds[metric] = ds_binacc\n",
    "\n",
    "    elif metric == 'SIE error':\n",
    "        binary_forecast_weighted_da = binary_forecast_da.astype(int).weighted(mask_da)\n",
    "        binary_true_weighted_da = binary_true_da.astype(int).weighted(mask_da)\n",
    "\n",
    "        ds_sie_error = (\n",
    "            binary_forecast_weighted_da.sum(['xc', 'yc']) -\n",
    "            binary_true_weighted_da.sum(['xc', 'yc'])\n",
    "        ) * 25**2\n",
    "\n",
    "        compute_ds[metric] = ds_sie_error\n",
    "\n",
    "print('Writing to results dataset...')\n",
    "for compute_da in iter(compute_ds.data_vars.values()):\n",
    "    metric = compute_da.name\n",
    "\n",
    "    compute_df_index = results_df.loc[\n",
    "        pd.IndexSlice[model, :, leadtimes, all_target_dates], metric].\\\n",
    "        droplevel(0).index\n",
    "\n",
    "    # Ensure indexes are aligned for assigning to results_df\n",
    "    compute_df = compute_da.to_dataframe().reset_index().\\\n",
    "        set_index(['seed', 'leadtime', 'time']).\\\n",
    "        reindex(index=compute_df_index)\n",
    "\n",
    "    results_df.loc[pd.IndexSlice[model, :, leadtimes, all_target_dates], metric] = \\\n",
    "        compute_df.values\n",
    "\n",
    "print('\\nCheckpointing results dataset... ', end='', flush=True)\n",
    "results_df.to_csv(new_results_df_fpath)\n",
    "print('Done.')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analysis\n",
    "\n",
    "In this section, we explore the forecast results and provide some interpretation. Note we use a small sample of the data so the results are only for demonstration purposes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot settings"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "settings_lineplots = dict(padding=0.1, height=400, width=700, fontsize={'title': '120%','labels': '120%', 'ticks': '100%'})"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Preprocess results dataset"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Reset index to preprocess results dataset\n",
    "results_df = results_df.reset_index()\n",
    "\n",
    "results_df['Forecast date'] = pd.to_datetime(results_df['Forecast date'])\n",
    "\n",
    "month_names = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',\n",
    "                        'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec'])\n",
    "forecast_month_names = month_names[results_df['Forecast date'].dt.month.values - 1]\n",
    "results_df['Calendar month'] = forecast_month_names\n",
    "\n",
    "results_df = results_df.set_index(['Model', 'Ensemble member', 'Leadtime', 'Forecast date'])\n",
    "\n",
    "# subset target period\n",
    "results_df = results_df.loc(axis=0)[pd.IndexSlice[:, :, :, slice(forecast_start, forecast_end)]]\n",
    "\n",
    "results_df = results_df.sort_index()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's inspect the results `pandas data.frame` reporting the monthly performance of each ensemble member for the target period."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "results_df.head()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ice edge\n",
    "\n",
    "The following figure shows a method to interactively plotting how **IceNet** updates its forecasts using new initial conditions as the lead time decreases, with the predicted ice edge approaching the true ice edge. The observed ice edge (in black) is defined as the sea ice concentration (SIC)=15% contour. IceNet’s predicted ice edge (in green) is determined from its sea ice probability forecast as the P(SIC>15%)=0.5 contour.\n",
    "\n",
    "The dashboard (sliders + figure) is generated through the `panel` library, [an open-source Python library that lets you create custom interactive web apps and dashboards](https://panel.holoviz.org/index.html). In the settings below, we define two sliders which essentially allow us to interact with two variables, the month and lead time. "
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# set target year \n",
    "year = 2020\n",
    "\n",
    "# set sliders\n",
    "month_name = [f'{calendar.month_name[m]} {year}' for m in list(range(1, 13))]\n",
    "\n",
    "month_slider = pn.widgets.DiscreteSlider(name=\"Month\", options=month_name, value='September 2020', width=200)\n",
    "\n",
    "lead_slider = pn.widgets.IntSlider(name=\"Lead time (months)\", start=1, end=4, step=1, value=4, direction='rtl', width=200)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "::::{important}\n",
    "The interactive figure below essentially reproduces [Figure 2](https://www.nature.com/articles/s41467-021-25257-4/figures/2) of the IceNet paper, however it covers a larger geographical extent i.e. in March when the ice edge extent is largest. Also, we visualise each month of the target period of this demonstrator (January to December 2020). Some script snippets were extracted from the IceNet script `python3 icenet/plot_paper_figures.py` (see [line 182](https://github.com/tom-andersson/icenet-paper/blob/main/icenet/plot_paper_figures.py)). Note we define alpha and colours for coastline and land mask object. These configurations allow overlapping these layers correctly to differentiate IceNet predictions and SIC ground truth.\n",
    "::::"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "source": [
    "## set boundaries\n",
    "mask = np.load(os.path.join(config['mask_data_folder'],\n",
    "                            'active_grid_cell_mask_{}.npy'.format('03')))\n",
    "\n",
    "min_0 = np.min(np.argwhere(mask)[:, 0])\n",
    "max_0 = np.max(np.argwhere(mask)[:, 0])\n",
    "mid_0 = np.mean((min_0, max_0)).astype(int)\n",
    "min_1 = np.min(np.argwhere(mask)[:, 1])\n",
    "max_1 = np.max(np.argwhere(mask)[:, 1])\n",
    "mid_1 = np.mean((min_1, max_1)).astype(int)\n",
    "max_diff = np.max([mid_0-min_0, mid_1-min_1])\n",
    "max_diff *= .85  # Zoom in\n",
    "max_diff = int(max_diff)\n",
    "top = mid_0 - max_diff + 10\n",
    "bot = mid_0 + max_diff + 10\n",
    "left = mid_1 - max_diff\n",
    "right = mid_1 + max_diff\n",
    "\n",
    "## land and region masks\n",
    "land_mask = np.load(os.path.join(config['mask_data_folder'], 'land_mask.npy'))\n",
    "region_mask = np.load(os.path.join(config['mask_data_folder'], 'region_mask.npy'))\n",
    "\n",
    "## define coastline and land layers\n",
    "arr = region_mask == 13\n",
    "coastline_rgba_arr = np.zeros((*arr.shape, 4))\n",
    "coastline_rgba_arr[:, :, 3] = arr  # alpha channel\n",
    "coastline_rgba_arr[:, :, :3] = .5  # black coastline\n",
    "land_mask_rgba_arr = np.zeros((*arr.shape, 4))\n",
    "land_mask_rgba_arr[:, :, 3] = land_mask  # alpha channel\n",
    "land_mask_rgba_arr[:, :, :3] = .5  # gray land\n",
    "\n",
    "## line colours\n",
    "pred_ice_edge_rgb = 'green'\n",
    "true_ice_edge_rgb = 'black'\n",
    "\n",
    "## define plot function\n",
    "@pn.depends(month_slider.param.value, lead_slider.param.value)\n",
    "def plot_forecast(month, leadtime):\n",
    "    tdate = pd.Timestamp(year,month_name.index(month)+1,1)\n",
    "\n",
    "    fig0 = Figure(figsize=(8, 8))\n",
    "    ax0 = fig0.subplots()\n",
    "    FigureCanvas(fig0)  # not needed for mpl >= 3.1\n",
    "\n",
    "    ax0.imshow(coastline_rgba_arr[top:bot, left:right, :], zorder=20)\n",
    "    ax0.imshow(land_mask_rgba_arr[top:bot, left:right, :], zorder=1)\n",
    "\n",
    "    icenet_sip = icenet_sip_da.sel(time=tdate, leadtime=leadtime, seed='ensemble').data\n",
    "    ax0.contour(\n",
    "        icenet_sip[top:bot, left:right],\n",
    "        levels=[0.5],\n",
    "        colors=[pred_ice_edge_rgb],\n",
    "        zorder=1,\n",
    "        linewidths=1.5,\n",
    "    )\n",
    "\n",
    "    groundtruth_sic = true_sic_da.sel(time=tdate)\n",
    "    gt_img = (groundtruth_sic>0.15).data\n",
    "\n",
    "    ax0.contour(\n",
    "        gt_img[top:bot, left:right],\n",
    "        levels=[0.5],\n",
    "        colors=[true_ice_edge_rgb],\n",
    "        zorder=1,\n",
    "        linewidths=1.5\n",
    "    )\n",
    "    ax0.tick_params(which='both', bottom=False, left=False, labelbottom=False, labelleft=False)\n",
    "\n",
    "    proxy = [plt.Line2D([0], [1], color=true_ice_edge_rgb),\n",
    "                         plt.Line2D([0], [1], color=pred_ice_edge_rgb)]\n",
    "\n",
    "    ax0.legend(proxy, ['Observed', 'Predicted'],\n",
    "                          loc='upper left', fontsize=11)\n",
    "\n",
    "    ax0.set_title(f'Date = {month} & Lead time = {leadtime} months')\n",
    "\n",
    "    acc = results_df.loc['IceNet', 'ensemble', leadtime, tdate]['Binary accuracy']\n",
    "    sie_err = results_df.loc['IceNet', 'ensemble', leadtime, tdate]['SIE error']\n",
    "\n",
    "    Afont = {\n",
    "        'backgroundcolor': 'lightgray',\n",
    "        'color':  'black',\n",
    "        'weight': 'normal',\n",
    "        'size': 11,\n",
    "        }\n",
    "\n",
    "    t = AnchoredText('Binary acc: {:.1f}% \\nSIE error: {:+.3f} mil km$^2$'.format(acc,sie_err/1e6), prop=Afont, loc='lower right', pad=0.5, borderpad=0.4, frameon=False)\n",
    "    t = ax0.add_artist(t)\n",
    "    t.zorder = 21\n",
    "\n",
    "    return pn.pane.Matplotlib(fig0, tight=True, dpi=150)\n",
    "\n",
    "plot_ie = pn.Row(\n",
    "    plot_forecast,\n",
    "    pn.Column(pn.Spacer(height=5), month_slider, pn.Spacer(height=15), lead_slider, background='#f0f0f0', sizing_mode=\"scale_both\"),\n",
    "    width_policy='fit', height_policy='max', \n",
    ")\n",
    "\n",
    "plot_ie.embed()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model performance comparison\n",
    "\n",
    "The figure below shows the mean binary accuracy versus lead time over the 12 forecasted dates for IceNet, SEAS5 and linear trend benchmark. We observe IceNet outperform SEAS5 and linear trend models at lead times of 2 months and beyond."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "source": [
    "results_mean = results_df['Binary accuracy'].groupby(['Model','Ensemble member','Leadtime']).mean().reset_index()\n",
    "results_mean = results_mean[results_mean['Ensemble member'].isin(['NA','ensemble'])]\n",
    "\n",
    "plot_ba = results_mean.hvplot(x='Leadtime', y='Binary accuracy', by='Model',\n",
    "                        label='Lead times comparison',\n",
    "                        ylabel='Binary accuracy',\n",
    "                        xlabel='Lead time (months)',\n",
    "                        color=['#1f77b4', 'gray', '#d62728'])\n",
    "plot_ba.opts(legend_position='top_right', **settings_lineplots)\n",
    "plot_ba"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Monthly performance comparison\n",
    "\n",
    "Different to the previous plot, the following figure compares the performance of the three models from January to December in 2020 by seasonal lead time. We confirm IceNet’s ability of seasonal forecast of summer ice (August, September and October) at lead times of two months and beyond outperforming both SEAS5 and the linear trend.\n",
    "\n",
    "We also observe SEAS5 outperforms IceNet at a 1-month lead time over time, except in October. According to the Nature communications paper, this is likely because IceNet only receives monthly averages as input, smearing the weather phenomena and initial conditions that dominate predictability on such short timescales."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "source": [
    "lead_slider = pn.widgets.IntSlider(name=\"Lead time (months)\", start=1, end=4, step=1, width=150)\n",
    "\n",
    "results_plot = results_df.reset_index()\n",
    "\n",
    "formatter = DatetimeTickFormatter(months='%b')\n",
    "\n",
    "@pn.depends(lead_slider.param.value)\n",
    "def plot_month(leadtime):\n",
    "\n",
    "    results_lt = results_plot[results_plot.Leadtime==leadtime]\n",
    "    plot_ba_month = results_lt.hvplot(x='Forecast date',\n",
    "                                y='Binary accuracy',\n",
    "                                by='Model',\n",
    "                                label='Monthly comparison',\n",
    "                                ylabel='Binary accuracy',\n",
    "                                xlabel='Forecast month',\n",
    "                                color=['#1f77b4', 'gray', '#d62728'],\n",
    "                                xformatter=formatter)\n",
    "\n",
    "    return plot_ba_month.opts(legend_position='bottom_left', **settings_lineplots)\n",
    "\n",
    "plot_month = pn.Row(\n",
    "    plot_month,\n",
    "    pn.Column(pn.Spacer(height=5), lead_slider, background='#f0f0f0'),\n",
    "    width_policy='max', height_policy='max'\n",
    ")\n",
    "\n",
    "plot_month.embed()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "This notebook has demonstrated the use of:\n",
    "\n",
    "* A custom dataloader, `IceNetDataLoader`, to conveniently dictate which variables are input to the networks, which climate simulations are used for pre-training, and how far ahead to forecast.\n",
    "* How to append, filter, and manipulate new forecast results using `pandas`.\n",
    "* `matplotlib` framed into a `panels` dashboard to visualise the IceNet forecast within the modelled period and four lead times.\n",
    "* `hvplot` to plot time series data for comparing the performance of IceNet predictions against ECMWF SEAS5 physics-based sea ice concentration and a linear trend statistical benchmark.\n",
    " \n",
    "The IceNet's Nature Communications paper [*Seasonal Arctic sea ice forecasting with probabilistic deep learning*](https://www.nature.com/articles/s41467-021-25257-4) provides\n",
    "further information of other key aspects e.g. variable importance, model calibration, etc. which for sake of simplicity  are not covered in the demonstrator."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Citing this Notebook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "Please see [CITATION.cff](https://github.com/eds-book/ac327c3a-5264-40a2-8c6e-1e8d7c4b37ef/blob/main/CITATION.cff) for the full citation information. The citation file can be exported to APA or BibTex formats (learn more [here](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files))."
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Additional information\n",
    "\n",
    "**Review**: This notebook has been reviewed by one or more members of the Environmental Data Science book community. The open review is available [here](https://github.com/alan-turing-institute/environmental-ds-book/pull/6).\n",
    "\n",
    "**Codebase**: `IceNet` 1.0.0 with commit <mark>[9d69ad7](https://github.com/tom-andersson/icenet-paper/compare/v1.0.0...main)</mark>\n",
    "\n",
    "**License**: The code in this notebook is licensed under the MIT License. The Environmental Data Science book is licensed under the Creative Commons by Attribution 4.0 license. See further details [here](https://github.com/alan-turing-institute/environmental-ds-book/blob/main/LICENSE).\n",
    "\n",
    "**Contact**: If you have any suggestion or report an issue with this notebook, feel free to [create an issue](https://github.com/alan-turing-institute/environmental-ds-book/issues/new/choose) or send a direct message to [environmental.ds.book@gmail.com](mailto:environmental.ds.book@gmail.com)."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "source": [
    "from datetime import date\n",
    "\n",
    "print('Notebook repository version: v2025.6.0')\n",
    "print(f'Last tested: {date.today()}')"
   ],
   "outputs": [],
   "execution_count": null
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}