{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c95c46d0-3d33-49f2-8ba7-13caec2d7d67",
   "metadata": {},
   "source": [
    "# Sorting pre-requisits for ibicus: downloading and preprocessing data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7fedb2d8-e9f2-45f3-96b8-17022c08dc47",
   "metadata": {},
   "source": [
    "This notebook shows how to download and preprocess climate model data for bias correction and further use. To apply a bias adjustment method, three datasets are needed: 1) observation or reanalysis data; 2) historical climate model data over the same reference period that observations are available for; and 3) climate model data for a future, or more generally, application, period that is to be bias corrected. \n",
    "\n",
    "Here we will download and preprocess CMIP6 data from the Climate Data Store (CDS) as climate model input and two reanalysis datasets: 1) ERA5 from the CDS and 2) NCEP/DOE Reanalysis II from the PSL datastore (NOAA).\n",
    "\n",
    "There are many ways to access climate data on different temporal or spatial resolutions. This notebook is meant to illustrate one possible way to download data at daily resolution which is currently the primary temporal resolution supported in ibicus, although some can be applied at monthly resolution. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b9520592-e401-4d71-a6a3-548be5f093fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import urllib\n",
    "\n",
    "# Scientific computing\n",
    "import iris\n",
    "import xarray\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da7b72f3-3501-48f8-adf5-7b2f05a4cfdd",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 1. Download data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75cbb2e5-c01f-489f-aed9-0e499bb8de14",
   "metadata": {},
   "source": [
    "Let's create a data-directory where our inputs will go, if it does not yet exist:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0acdff88-ff49-4229-ac48-55d32f70a4de",
   "metadata": {},
   "outputs": [],
   "source": [
    "DATADIR = \"data_download_and_preprocessing\"\n",
    "\n",
    "if not os.path.exists(DATADIR):\n",
    "    os.mkdir(DATADIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb41aae4-bac2-4ed9-b29d-ec40aa8ffa88",
   "metadata": {},
   "source": [
    "### 1.1. Download climate model data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90af16d4-fd41-4a04-8615-5e6dea8ade6c",
   "metadata": {},
   "source": [
    "To request climate data from the Climate Data Store (CDS) we will use the CDS API. Run the following cell if you have not yet istalled it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0ef08598-52a1-4d7d-bd3e-703064e527c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip install cdsapi\n",
    "import cdsapi\n",
    "\n",
    "# We disable urllib3 (used by cdsapi) warning\n",
    "import urllib3\n",
    "urllib3.disable_warnings()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "729ef51b-67c0-4af6-aae3-d5b008329dee",
   "metadata": {},
   "source": [
    "We make use of the option to manually set the CDS API credentials. First, you have to set two variables: URL and KEY which build together your CDS API key. The string of characters that make up your KEY include your personal User ID and CDS API key. To obtain these, first register or login to the CDS (http://cds.climate.copernicus.eu), then visit https://cds.climate.copernicus.eu/api-how-to and copy the string of characters listed after \"key:\". Replace the ######### below with your key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "56ae4058-3761-48bd-a7dc-4a44eabbaa5c",
   "metadata": {},
   "outputs": [],
   "source": [
    "URL = 'https://cds.climate.copernicus.eu/api/v2'\n",
    "KEY = '########################################' # enter your key instead"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6de2538-68a0-48a8-9592-56306a4cfac9",
   "metadata": {},
   "source": [
    "Let's choose a model and variable of interest, and fix some meta-paramters. If we are interested in multiple variable we can just iterate the code below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "11ef38ef-3f3b-48bf-acd3-afcd3a31f48e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# choose model\n",
    "model = 'mpi_esm1_2_lr'\n",
    "\n",
    "# choose variables to extract (not all variables available at daily resolution for all cmip6 models at the moment)\n",
    "variable = 'precipitation'\n",
    "\n",
    "# choose area to extract\n",
    "area = [80, 3, 20, 30]\n",
    "\n",
    "# choose a historical period to extract\n",
    "period_hist = '1979-01-01/2005-12-31' \n",
    "\n",
    "# choose a future period to extract:\n",
    "period_fut = '2050-01-01/2070-12-31'\n",
    "\n",
    "# choose a filename for the historical cm data\n",
    "fname_cm_hist = f\"cmip6_daily_1979-2015_ipsl_historical_{variable}.zip\"\n",
    "\n",
    "# choose a filename for the future cm data\n",
    "fname_cm_future = f\"cmip6_daily_2050-2070_ipsl_ssp5_8_5_{variable}.zip\"\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9887dc47-e259-4fcc-a359-01c9c8e71299",
   "metadata": {},
   "source": [
    "Both datasets will be in `DATADIR` under `fname_cm_hist` and `fname_cm_future`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "375a109a-e1f2-475d-9c52-84fb3243b528",
   "metadata": {},
   "source": [
    "#### 1.1.1. Download historical climate model data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8dac41c9-d734-455c-a352-58e8e1c3b6c1",
   "metadata": {},
   "source": [
    "Executing the following cell will retrieve historical climate model data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c3ab1271-9f60-4824-95c2-32d76c0c1f9c",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-08-22 20:15:46,493 INFO Welcome to the CDS\n",
      "2022-08-22 20:15:46,494 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/projections-cmip6\n",
      "2022-08-22 20:15:47,110 INFO Request is completed\n",
      "2022-08-22 20:15:47,112 INFO Downloading https://download-0010-clone.copernicus-climate.eu/cache-compute-0010/cache/data4/adaptor.esgf_wps.retrieve-1661187996.0821903-27851-13-11a6e1e3-b6cb-4e5f-952a-b9c75475a54f.zip to data_download_and_preprocessing/cmip6_daily_1979-2015_ipsl_historical_precipitation.zip (11.6M)\n",
      "2022-08-22 20:15:59,062 INFO Download rate 994.1K/s                                                                                                                                                        \n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Result(content_length=12162765,content_type=application/zip,location=https://download-0010-clone.copernicus-climate.eu/cache-compute-0010/cache/data4/adaptor.esgf_wps.retrieve-1661187996.0821903-27851-13-11a6e1e3-b6cb-4e5f-952a-b9c75475a54f.zip)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# download historical climate model data\n",
    "\n",
    "c = cdsapi.Client(url=URL, key=KEY)\n",
    "\n",
    "c.retrieve(\n",
    "    'projections-cmip6',\n",
    "    {\n",
    "        'temporal_resolution': 'daily',\n",
    "        'experiment': 'historical',\n",
    "        'level': 'single_levels',\n",
    "        'variable': variable,\n",
    "        'model': model,\n",
    "        'date': period_hist,\n",
    "        'area': area,\n",
    "        'format': 'zip',\n",
    "    },\n",
    "    f'{DATADIR}/{fname_cm_hist}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "397898f9-76d6-4124-8070-114082c987cd",
   "metadata": {},
   "source": [
    "After unzipping the folder..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ae3e69ac-3eff-4a0d-ae15-f0bce4d75811",
   "metadata": {},
   "outputs": [],
   "source": [
    "import zipfile\n",
    "\n",
    "with zipfile.ZipFile(f'{DATADIR}/{fname_cm_hist}', 'r') as zip_ref:\n",
    "    zip_ref.extractall(DATADIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36c9a9d8-e2ba-479e-a885-a5746ee12078",
   "metadata": {},
   "source": [
    "...the file is now in `DATADIR/pr_day_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_19790101-20141231_*.nc`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9598cbb-980c-4e2c-b580-e14320435d4d",
   "metadata": {},
   "source": [
    "#### 1.1.2. Download future climate model data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8aecd2cf-9380-4437-b4f3-12205efa5b3a",
   "metadata": {},
   "source": [
    "Now we go through the same steps to download climate data in the future or application period:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "914d9a6a-8ae1-4649-a235-bc9a1cdb9845",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-08-22 20:16:00,107 INFO Welcome to the CDS\n",
      "2022-08-22 20:16:00,110 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/projections-cmip6\n",
      "2022-08-22 20:16:00,307 INFO Request is completed\n",
      "2022-08-22 20:16:00,309 INFO Downloading https://download-0004-clone.copernicus-climate.eu/cache-compute-0004/cache/data6/adaptor.esgf_wps.retrieve-1661033873.2348728-23827-20-937b4c7b-1630-4e08-9600-6d3d1912cc8b.zip to data_download_and_preprocessing/cmip6_daily_2050-2070_ipsl_ssp5_8_5_precipitation.zip (8.2M)\n",
      "2022-08-22 20:16:07,124 INFO Download rate 1.2M/s                                                                                                                                                          \n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Result(content_length=8643246,content_type=application/zip,location=https://download-0004-clone.copernicus-climate.eu/cache-compute-0004/cache/data6/adaptor.esgf_wps.retrieve-1661033873.2348728-23827-20-937b4c7b-1630-4e08-9600-6d3d1912cc8b.zip)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# download future climate model data\n",
    "\n",
    "c = cdsapi.Client(url=URL, key=KEY)\n",
    "\n",
    "c.retrieve(\n",
    "    'projections-cmip6',\n",
    "    {\n",
    "        'temporal_resolution': 'daily',\n",
    "        'experiment': 'ssp5_8_5',\n",
    "        'level': 'single_levels',\n",
    "        'variable': variable,\n",
    "        'model': model,\n",
    "        'date': period_fut,\n",
    "        'area': area,\n",
    "        'format': 'zip',\n",
    "    },\n",
    "    f'{DATADIR}/{fname_cm_future}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ceb17dcd-83ec-4398-aaa8-deda7352680e",
   "metadata": {},
   "source": [
    "Again, we need to unzip the folder:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9fe03dff-feb4-479e-8193-b5fdd5007e8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import zipfile\n",
    "\n",
    "with zipfile.ZipFile(f'{DATADIR}/{fname_cm_future}', 'r') as zip_ref:\n",
    "    zip_ref.extractall(DATADIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3f5f16b-7b7e-485c-929d-9bc4ea1d639b",
   "metadata": {},
   "source": [
    "The file is now in `DATADIR/pr_day_MPI-ESM1-2-LR_ssp585_r1i1p1f1_gn_20500101-20701231_*.nc`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13c8add2-e122-4f8c-9e37-009ca80aa946",
   "metadata": {},
   "source": [
    "### 1.2. Download observations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29542667-2566-45af-bd2c-96ce4d9e740b",
   "metadata": {},
   "source": [
    "Now we will download observations. We will first download ERA5 data from the CDS and afterwards the NCEP/DOE II Reanalysis from the PSL."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c841313-f1d9-41f8-908d-710cc89db253",
   "metadata": {},
   "source": [
    "#### 1.2.1. Download ERA5"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e9bfb15-7c17-4a48-b90b-2e2a01bb3114",
   "metadata": {},
   "source": [
    "We will download ERA5 on daily temporal resolution (if the climate model were on other temporal resolutions we would also need a different one for ERA5). The script is inspired by [this discussion](https://confluence.ecmwf.int/pages/viewpage.action?pageId=228867588) and uses the [\n",
    "Daily statistics calculated from ERA5 data\n",
    "](https://cds.climate.copernicus.eu/cdsapp#!/software/app-c3s-daily-era5-statistics?tab=overview) application. The output of this application is a separate netCDF file for chosen daily statistic for each month for each year. We concatenate those files then manually. First we need to make some selections (make sure the data chosen here is consistent with the cm data pulled above):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "536eb6b2-13ac-464d-81c7-edbee4edcad5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# choose years to request (this should overlap with the `period_hist` chosen for the cm data)\n",
    "# this is chosen shorter for demonstration purposes\n",
    "years =  list(map(str, range(1979, 1981)))\n",
    "\n",
    "# choose months to request\n",
    "months = list(map(str, range(10, 12)))\n",
    "\n",
    "# choose a variable (must be a valid ERA5 CDS API name)\n",
    "variable = \"total_precipitation\"\n",
    "\n",
    "# choose a required statistic (valid names given in the application description above)\n",
    "statistic = \"daily_mean\"\n",
    "\n",
    "# choose an area (should be the same as above)\n",
    "area = {\"lat\": [30, 80], \"lon\": [3, 20]}\n",
    "\n",
    "# choose a filename\n",
    "fname_era5 = f\"era5_{variable}_{statistic}_1979_1981.nc\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f505c0ad-a44d-4c6b-bfaa-9f97a429fae5",
   "metadata": {},
   "source": [
    "And now we can request the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "e7e985d4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----- Requesting year: 1979 -----\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-08-22 20:16:07,851 INFO Welcome to the CDS\n",
      "2022-08-22 20:16:07,853 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/tasks/services/tool/toolbox/orchestrator/workflow/clientid-18d6bb5a46ad46178f7d69d70b376b61\n",
      "2022-08-22 20:16:08,008 INFO Request is completed\n",
      "2022-08-22 20:16:08,676 INFO Welcome to the CDS\n",
      "2022-08-22 20:16:08,677 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/tasks/services/tool/toolbox/orchestrator/workflow/clientid-75eb56b162124e4aa0e2c14e282263c2\n",
      "2022-08-22 20:16:09,858 INFO Welcome to the CDS\n",
      "2022-08-22 20:16:09,859 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/tasks/services/tool/toolbox/orchestrator/workflow/clientid-af6e4ff44c9842d4b24313e3c054e09f\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----- Requesting year: 1980 -----\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-08-22 20:16:10,877 INFO Welcome to the CDS\n",
      "2022-08-22 20:16:10,878 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/tasks/services/tool/toolbox/orchestrator/workflow/clientid-7aeb9221b2e446e49aa968e789eecf83\n"
     ]
    }
   ],
   "source": [
    "c = cdsapi.Client(url=URL, key=KEY, timeout=300)\n",
    "\n",
    "# Loop over years and months\n",
    "filenames_for_cleanup= []\n",
    "for yr in years:\n",
    "    print(f\"----- Requesting year: {yr} -----\")\n",
    "    for mn in months:\n",
    "        result = c.service(\n",
    "            \"tool.toolbox.orchestrator.workflow\",\n",
    "            params={\n",
    "                 \"realm\": \"user-apps\",\n",
    "                 \"project\": \"app-c3s-daily-era5-statistics\",\n",
    "                 \"version\": \"master\",\n",
    "                 \"kwargs\": {\n",
    "                     \"dataset\": \"reanalysis-era5-single-levels\",\n",
    "                     \"product_type\": \"reanalysis\",\n",
    "                     \"variable\": variable,\n",
    "                     \"statistic\": statistic,\n",
    "                     \"year\": yr,\n",
    "                     \"month\": mn,\n",
    "                     \"time_zone\": \"UTC+00:0\",\n",
    "                     \"frequency\": \"1-hourly\",\n",
    "                     \"grid\": \"1.0/1.0\",\n",
    "                     \"area\": area,\n",
    "                     },\n",
    "            \"workflow_name\": \"application\"\n",
    "        })  \n",
    "\n",
    " \n",
    "        filename = f\"{DATADIR}/era5_{variable}_{statistic}_{yr}_{mn}.nc\"\n",
    "        url = result[0]['location']\n",
    "\n",
    "        # Download nc file\n",
    "        urllib.request.urlretrieve(url, filename)\n",
    "        # Append filename to list of filenames to cleanup\n",
    "        filenames_for_cleanup.append(filename)\n",
    "\n",
    "# Combine monthly data\n",
    "combined_data = xarray.open_mfdataset(f\"{DATADIR}/era5_{variable}_{statistic}_*.nc\", combine = 'nested', concat_dim=\"time\")\n",
    "combined_data.to_netcdf(f\"{DATADIR}/{fname_era5}\")\n",
    "\n",
    "# Cleanup\n",
    "for filename in filenames_for_cleanup:\n",
    "    os.remove(filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e83ba25e-7370-4e44-a329-dbbd808001be",
   "metadata": {},
   "source": [
    "#### 1.2.2. Download NCEP/DOE II"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a4cefbf-c5ce-4f85-865f-eb17f3765447",
   "metadata": {},
   "source": [
    "We now download the [NCEP/DOE II data](https://psl.noaa.gov/data/gridded/data.ncep.reanalysis2.html). [Here is an overview](https://psl.noaa.gov/data/gridded/data.ncep.reanalysis2.html) of the possible variables and we take the data from [the datastore here](https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis2/Dailies/gaussian_grid/). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f63f4b45-65db-4071-893f-36631511e3b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Variable name. Needs to be one of the NCEP-names in https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis2/Dailies/gaussian_grid/. \n",
    "variable = \"prate.sfc.gauss\"\n",
    "\n",
    "# choose years to request (this should overlap with the `period_hist` chosen for the cm data)*\n",
    "years = map(str, range(1979, 2005))\n",
    "\n",
    "# choose an area (should be the same as above)\n",
    "area = {\"lat\": [30, 80], \"lon\": [3, 20]}\n",
    "\n",
    "# choose a filename\n",
    "fname_ncep_doe = f\"ncep_doe_{variable}_1979_2005.nc\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9417f6d-1870-486f-adf1-affffd3d82ff",
   "metadata": {},
   "source": [
    "Now we can download the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "3d5daf08-f9b8-467d-b578-a5f6ea427ab3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download data year by year\n",
    "filenames_for_cleanup = []\n",
    "for year in years:\n",
    "    url = f\"https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis2/Dailies/gaussian_grid/{variable}.{year}.nc\"\n",
    "    filename = f\"{DATADIR}/{variable}_{year}.nc\"\n",
    "    # Download nc file\n",
    "    urllib.request.urlretrieve(url, filename)\n",
    "    # Append filename to list of filenames to cleanup\n",
    "    filenames_for_cleanup.append(filename)\n",
    "\n",
    "# Combine data for variable\n",
    "combined_data = xarray.open_mfdataset(f\"{DATADIR}/{variable}_*.nc\", combine = 'nested', concat_dim=\"time\")\n",
    "# Select area\n",
    "combined_data = combined_data.sel(lon=slice(min(area[\"lon\"]), max(area[\"lon\"])),lat=slice(max(area[\"lat\"]), min(area[\"lat\"])))\n",
    "# Write to file                                                                                        \n",
    "combined_data.to_netcdf(f\"{DATADIR}/{fname_ncep_doe}\")\n",
    "# Cleanup\n",
    "for filename in filenames_for_cleanup:\n",
    "    os.remove(filename)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4c95ab3-a0c1-4cc5-bb7c-3f1239afc400",
   "metadata": {},
   "source": [
    "It is also possible (and probably easier) to download the data via ftp through the same links, or via the visual interface accessible via the graph-icon next to a variable in the [NCEP/DOE 2 overview page](https://psl.noaa.gov/data/gridded/data.ncep.reanalysis2.html). The latter also provides an option to select a range of dates and access merged data for that range that can directly be used for the further preprocessing steps."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e20de0c5-c35f-40f4-b5c5-296fdd34d69e",
   "metadata": {},
   "source": [
    "## 2. Convert and regrid data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd4c0bd2-117d-42c0-a27c-449d287b749b",
   "metadata": {},
   "source": [
    "Now that we have downloaded the data we need to make sure that observations and climate model data are:\n",
    "\n",
    "- on the same temporal resolution: this is covered because we downloaded the data on daily resolution.\n",
    "- on the same spatial resolution: we need to regrid the data.\n",
    "- in the same units: we may need to convert units.\n",
    "\n",
    "Furthermore we might want to extract additional information and need to get the numpy arrays corresponding to the data. In the numpy arrays we need to make sure that they have the form `[t,x,y]`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3f178f8-29e9-48d0-8411-f23cdb1808e3",
   "metadata": {},
   "source": [
    "### 2.1. Regrid data "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8dce633b-ec93-411a-8fc9-092754eb124f",
   "metadata": {},
   "source": [
    "Now that we have data on the same temporal resolution for both the climate model and observations we need to make sure they are also on the same spatial one and regrid the datasets. The climate model data is on a coarser grid, therefore we will regrid the observational data onto this resolution. However there are also other solutions, where the [climate model data is regridded onto the resolution of the observations](https://esd.copernicus.org/articles/9/313/2018/).\n",
    "\n",
    "We will use iris for the regridding, however there are also xarray solutions. Different variables might require different regridding schemes: [a list of ones available in iris is given here](https://scitools-iris.readthedocs.io/en/latest/userguide/interpolation_and_regridding.html?highlight=interpolate#regridding). For precipitation we choose a regridder based on Nearest values. Other regridders like linear ones *might* introduce negative values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "6b04a48a-28ab-4818-b2d5-7bc49b3f1a2f",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jakobwes/anaconda3/envs/ibias4/lib/python3.9/site-packages/pyproj/__init__.py:89: UserWarning: pyproj unable to set database path.\n",
      "  _pyproj_global_context_initialize()\n",
      "/home/jakobwes/anaconda3/envs/ibias4/lib/python3.9/site-packages/iris/fileformats/cf.py:859: UserWarning: Missing CF-netCDF measure variable 'areacella', referenced by netCDF variable 'pr'\n",
      "  warnings.warn(\n",
      "/home/jakobwes/anaconda3/envs/ibias4/lib/python3.9/site-packages/iris/fileformats/cf.py:1154: UserWarning: Ignoring variable 'lat_bnds' referenced by variable 'lat': Dimensions ('time', 'lat', 'bnds') do not span ('lat',)\n",
      "  warnings.warn(msg)\n",
      "/home/jakobwes/anaconda3/envs/ibias4/lib/python3.9/site-packages/iris/fileformats/cf.py:1154: UserWarning: Ignoring variable 'lon_bnds' referenced by variable 'lon': Dimensions ('time', 'lon', 'bnds') do not span ('lon',)\n",
      "  warnings.warn(msg)\n",
      "/home/jakobwes/anaconda3/envs/ibias4/lib/python3.9/site-packages/iris/fileformats/cf.py:859: UserWarning: Missing CF-netCDF measure variable 'areacella', referenced by netCDF variable 'pr'\n",
      "  warnings.warn(\n",
      "/home/jakobwes/anaconda3/envs/ibias4/lib/python3.9/site-packages/iris/fileformats/cf.py:1154: UserWarning: Ignoring variable 'lat_bnds' referenced by variable 'lat': Dimensions ('time', 'lat', 'bnds') do not span ('lat',)\n",
      "  warnings.warn(msg)\n",
      "/home/jakobwes/anaconda3/envs/ibias4/lib/python3.9/site-packages/iris/fileformats/cf.py:1154: UserWarning: Ignoring variable 'lon_bnds' referenced by variable 'lon': Dimensions ('time', 'lon', 'bnds') do not span ('lon',)\n",
      "  warnings.warn(msg)\n"
     ]
    }
   ],
   "source": [
    "cm_hist = iris.load_cube(f\"{DATADIR}/pr_day_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_19790101-20051231_v20190710.nc\", \"precipitation_flux\")\n",
    "cm_future = iris.load_cube(f\"{DATADIR}/pr_day_MPI-ESM1-2-LR_ssp585_r1i1p1f1_gn_20500101-20701231_v20190710.nc\", \"precipitation_flux\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f9d787b-ca5a-4b99-9e12-b42ebf4f0515",
   "metadata": {},
   "source": [
    "First let's take care of the ERA5 reanalysis:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "043d4317-c93b-4405-92f7-ef0b98e9ce7b",
   "metadata": {},
   "outputs": [],
   "source": [
    "obs_era5 = iris.load_cube(f\"{DATADIR}/{fname_era5}\")\n",
    "obs_era5 = obs_era5.regrid(cm_hist, iris.analysis.Nearest())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbda65fc-7629-4c25-8ad7-4e9784f047bc",
   "metadata": {},
   "source": [
    "And now of the NCEP/DOE II data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "6791bf12-a051-47a7-8599-8e70a91e9cef",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jakobwes/anaconda3/envs/ibias4/lib/python3.9/site-packages/iris/fileformats/_nc_load_rules/helpers.py:645: UserWarning: Ignoring netCDF variable 'prate' invalid units 'Kg/m^2/s'\n",
      "  warnings.warn(msg)\n"
     ]
    }
   ],
   "source": [
    "obs_ncep_doe = iris.load_cube(f\"{DATADIR}/{fname_ncep_doe}\")\n",
    "obs_ncep_doe = obs_ncep_doe.regrid(cm_hist, iris.analysis.Nearest())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e304b9e1-b541-4046-985e-b15a1d35ba16",
   "metadata": {},
   "source": [
    "### 2.1. Extract additional information"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db439b44-6c42-49a7-a6db-7be618f73895",
   "metadata": {},
   "source": [
    "The data objects are now all at the same temporal and spatial resolution. Because some debiasers need the dates as input, it is useful to extract them here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "9fb98c01-3788-404e-9dbf-d0ed8bdc6a82",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_dates(x):\n",
    "    time_dimension = x.coords()[0]\n",
    "    dates = time_dimension.units.num2date(time_dimension.points)\n",
    "    return dates\n",
    "\n",
    "get_dates = np.vectorize(get_dates)\n",
    "\n",
    "dates_cm_hist = get_dates(cm_hist)\n",
    "dates_cm_future = get_dates(cm_future)\n",
    "\n",
    "dates_obs_era5 = get_dates(obs_era5)\n",
    "dates_obs_ncep_doe = get_dates(obs_ncep_doe)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf713960-d6c3-477d-b27b-1d70c2ae1152",
   "metadata": {},
   "source": [
    "### 2.3. Get numpy arrays"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4786319e-d30a-41ab-9722-cf95f21f695a",
   "metadata": {},
   "source": [
    "In order to start working with ibicus, we need to get the numpy arrays associated with the data from the iris cubes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "ace3159f-1254-431e-9c91-a82be4d61a77",
   "metadata": {},
   "outputs": [],
   "source": [
    "cm_hist = cm_hist.data\n",
    "cm_future = cm_future.data\n",
    "\n",
    "obs_era5 = obs_era5.data\n",
    "obs_ncep_doe = obs_ncep_doe.data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6dc14be7-320a-4b5d-a8ab-9f21acca627b",
   "metadata": {},
   "source": [
    "We look at the shapes to make sure they are all in the form `[t, x, y]`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "3383220e-cc77-4c57-8bac-00a628b156a9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Shape cm_hist: (9862, 32, 15)\n",
      "Shape cm_future: (7670, 32, 15)\n",
      "Shape obs_era5: (122, 32, 15)\n",
      "Shape obs_ncep_doe: (9497, 32, 15)\n"
     ]
    }
   ],
   "source": [
    "print(f\"Shape cm_hist: {cm_hist.shape}\")\n",
    "print(f\"Shape cm_future: {cm_future.shape}\")\n",
    "\n",
    "print(f\"Shape obs_era5: {obs_era5.shape}\")\n",
    "print(f\"Shape obs_ncep_doe: {obs_ncep_doe.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e60cc6fb-ac5b-4a05-ba78-ed103ec65e9e",
   "metadata": {},
   "source": [
    "### 2.4. Convert units"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4768d42a-c804-46ff-a094-fc8dbc57c3ce",
   "metadata": {},
   "source": [
    "From the [ERA5 documentation](https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation) we can see that the precipitation is measured in m, whilst in `cm_hist` and `cm_future` it is measured as flux (m / s^-1). To convert we need to divide the ERA5-values by 86 400 (the number of seconds per day):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "5446c1e4-00ca-4d48-b442-f038b4b34733",
   "metadata": {},
   "outputs": [],
   "source": [
    "obs_era5 = obs_era5/86400"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "313f2d58-4e8a-48b0-8c75-d3de388e9b68",
   "metadata": {},
   "source": [
    "The values in the NCEP/DOE II reanalysis are in the same units."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f809d9c9-9423-42f8-b954-52b7e5d84f19",
   "metadata": {},
   "source": [
    "## 3. Apply debiaser"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7cd0d8c4-8e13-4944-8cc3-eb4ea5c75250",
   "metadata": {},
   "source": [
    "After these preparations we can finally apply a bias adjustment method. For a detailed introduction into the actual application of bias correction using ibicus, we refer you to the other notebooks.\n",
    "\n",
    "For illustrative purposes we give one example here using a simple quantile mapping methodology that we apply to both ERA5 and NCEP/DOE II data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "e6ccab25-5249-425b-95f0-b4b6b7170580",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:----- Running debiasing for variable: Daily mean precipitation -----\n",
      "100%|████████████████████████████████████████████| 4/4 [00:00<00:00, 269.56it/s]\n",
      "INFO:root:----- Running debiasing for variable: Daily mean precipitation -----\n",
      "100%|████████████████████████████████████████████| 4/4 [00:00<00:00, 271.81it/s]\n"
     ]
    }
   ],
   "source": [
    "from ibicus.debias import QuantileMapping\n",
    "\n",
    "debiaser = QuantileMapping.from_variable(\"pr\")\n",
    "\n",
    "debiased_cm_future_era5 = debiaser.apply(obs_era5, cm_hist, cm_future)\n",
    "debiased_cm_future_ncep_doe = debiaser.apply(obs_ncep_doe, cm_hist, cm_future)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}