{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Processing WGMS mass-balance data for OGGM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, we use the most recent lookup table provided by the WGMS to prepare the reference mass-balance data for the OGGM model.\n",
    "\n",
    "For this to work you'll need the latest lookup table and the latest WGMS FoG data (available [here](http://wgms.ch/data_databaseversions/)), and the latest RGI version (available [here](http://www.glims.org/RGI/))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import geopandas as gpd\n",
    "import os\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Read the WGMS files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# just download the newest data, change the path_to_download_data and the year and month accordingly. If you run the entire notebook, the new WGMS MB data should be processed for OGGM\n",
    "year = '2021'\n",
    "month = '05'\n",
    "path_to_download_data = '/home/lilianschuster/Downloads/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idir = f'{path_to_download_data}DOI-WGMS-FoG-{year}-{month}'\n",
    "df_links = pd.read_csv(os.path.join(idir, f'WGMS-FoG-{year}-{month}-AA-GLACIER-ID-LUT.csv'), encoding='iso8859_15')\n",
    "df_mb_all = pd.read_csv(os.path.join(idir, f'WGMS-FoG-{year}-{month}-EE-MASS-BALANCE.csv'), encoding='iso8859_15')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'Total number of links: {}'.format(len(df_links))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links = df_links.dropna(subset=['RGI_ID'])  # keep the ones with a valid RGI ID\n",
    "'Total number of RGI links: {}'.format(len(df_links))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Select WGMS IDs with more than N years of mass-balance "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_mb = df_mb_all[df_mb_all.LOWER_BOUND.isin([9999])].copy()  # remove the profiles\n",
    "gp_id = df_mb.groupby('WGMS_ID')\n",
    "ids_5 = []\n",
    "ids_1 = []\n",
    "for wgmsid, group in gp_id:\n",
    "    if np.sum(np.isfinite(group.ANNUAL_BALANCE.values)) >= 5:\n",
    "        ids_5.append(wgmsid)\n",
    "    if np.sum(np.isfinite(group.ANNUAL_BALANCE.values)) >= 1:\n",
    "        ids_1.append(wgmsid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Number of glaciers with more than 1 MB years: {}'.format(len(ids_1)))\n",
    "print('Number of glaciers with more than 5 MB years: {}'.format(len(ids_5)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Number of glaciers in the lookup table with at least 5 years of valid MB data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'Number of matches in the WGMS lookup-table: {}'.format(len(df_links.loc[df_links.WGMS_ID.isin(ids_5)]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# keep those\n",
    "df_links_sel = df_links.loc[df_links.WGMS_ID.isin(ids_5)].copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add some simple stats\n",
    "df_links_sel['RGI_REG'] = [rid.split('-')[1].split('.')[0] for rid in df_links_sel.RGI_ID]\n",
    "df_links_sel['N_MB_YRS'] = [len(df_mb.loc[df_mb.WGMS_ID == wid]) for wid in df_links_sel.WGMS_ID]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Duplicates?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Yes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel.loc[df_links_sel.duplicated('RGI_ID', keep=False)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Careser is an Italian glacier which is now disintegrated in smaller parts. Here a screenshot from the WGMS exploration tool:\n",
    "\n",
    "<img src=\"https://www.dropbox.com/s/a0eoq6rrhimrolu/wgms_1.jpg?dl=1\" width=\"80%\">\n",
    "\n",
    "We keep the oldest MB series and discard the newer ones which are for the smaller glaciers (not represented in RGI)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We keep CARESER as this is the longest before they split\n",
    "df_links_sel = df_links_sel.loc[~ df_links_sel.WGMS_ID.isin([3346, 3345])]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The two norwegian glaciers are part of an ice cap:\n",
    "\n",
    "<img src=\"https://www.dropbox.com/s/q6nh7qef4mrf1hz/wgms_2.jpg?dl=1\" width=\"80%\">\n",
    "\n",
    "The two mass-balance time series are very close to each other, unsurprisingly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_mb.loc[df_mb.WGMS_ID.isin([3339])].set_index('YEAR').ANNUAL_BALANCE.plot()\n",
    "df_mb.loc[df_mb.WGMS_ID.isin([3343])].set_index('YEAR').ANNUAL_BALANCE.plot();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since there is no reason for picking one series over the other, we have to remove both from the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The two norwegians glaciers are some part of an ice cap. I'll just remove them both\n",
    "df_links_sel = df_links_sel.loc[~ df_links_sel.WGMS_ID.isin([3339, 3343])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel.loc[df_links_sel.duplicated('RGI_ID', keep=False)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In previous WGMS refmb dataset, there were also two duplicate glaciers in Iceland (WGMS number: 3089 and 3110). Glacier 3110 got apparently removed in the newest refmb dataset. So, no need to remove sth. there!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Antarctic glaciers link to a huge non-divided ice cap. We simply ignore them: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel = df_links_sel.loc[~ df_links_sel.WGMS_ID.isin([10404, 10403])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel.loc[df_links_sel.duplicated('RGI_ID', keep=False)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Remove suspicious links "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See [PDF document from Betka](https://www.dropbox.com/s/ufh07zq0tfnf805/betka_incorrect_links.pdf?dl=0) + old Urumqi n1:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel.loc[df_links_sel.WGMS_ID.isin([3972, 1318, 10401, 1354, 853])]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We remove these as well:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel = df_links_sel.loc[~ df_links_sel.WGMS_ID.isin([3972, 1318, 10401, 1354, 853])]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Remove glaciers we can't handle "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "WARD H. I. RISE has really bad DEMs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel = df_links_sel.loc[~ df_links_sel.WGMS_ID.isin([53])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'Final number of matches in the WGMS lookup-table: {}'.format(len(df_links_sel))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Write out the mass-balance data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#odir = '/home/mowglie/Documents/git/oggm-sample-data/wgms'\n",
    "odir = '/home/lilianschuster/oggm-sample-data/wgms'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Annual MB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from oggm import utils"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "utils.mkdir(odir + '/mbdata', reset=True)\n",
    "for rid, wid in zip(df_links_sel.RGI_ID, df_links_sel.WGMS_ID):\n",
    "    df_mb_sel = df_mb.loc[df_mb.WGMS_ID == wid].copy()\n",
    "    df_mb_sel = df_mb_sel[['YEAR', 'WGMS_ID', 'POLITICAL_UNIT', 'NAME', 'AREA', 'WINTER_BALANCE', \n",
    "                           'SUMMER_BALANCE',  'ANNUAL_BALANCE', 'REMARKS']].set_index('YEAR')\n",
    "    df_mb_sel['RGI_ID'] = rid\n",
    "    df_mb_sel.to_csv(os.path.join(odir, 'mbdata', 'mbdata_WGMS-{:05d}.csv'.format(wid)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Profiles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "utils.mkdir(odir + '/mb_profiles', reset=True)\n",
    "for rid, wid in zip(df_links_sel.RGI_ID, df_links_sel.WGMS_ID):\n",
    "    df_mb_sel = df_mb_all.loc[df_mb_all.WGMS_ID == wid].copy()\n",
    "    df_mb_sel = df_mb_sel.loc[df_mb_sel.LOWER_BOUND != 9999]\n",
    "    df_mb_sel = df_mb_sel.loc[df_mb_sel.UPPER_BOUND != 9999]\n",
    "    if len(df_mb_sel) == 0:\n",
    "        df_links_sel.loc[df_links_sel.RGI_ID == rid, 'HAS_PROFILE'] = False\n",
    "        continue\n",
    "    lb = set()\n",
    "    for yr in df_mb_sel.YEAR.unique():\n",
    "        df_mb_sel_yr = df_mb_sel.loc[df_mb_sel.YEAR == yr]\n",
    "        mids = df_mb_sel_yr.LOWER_BOUND.values*1.\n",
    "        mids += df_mb_sel_yr.UPPER_BOUND.values[:len(mids)]\n",
    "        mids *= 0.5\n",
    "        [lb.add(int(m)) for m in mids]\n",
    "    prof = pd.DataFrame(columns=sorted(list(lb)), index=sorted(df_mb_sel.YEAR.unique()))\n",
    "    for yr in df_mb_sel.YEAR.unique():\n",
    "        df_mb_sel_yr = df_mb_sel.loc[df_mb_sel.YEAR == yr]\n",
    "        mids = df_mb_sel_yr.LOWER_BOUND.values*1.\n",
    "        mids += df_mb_sel_yr.UPPER_BOUND.values[:len(mids)]\n",
    "        mids *= 0.5\n",
    "        prof.loc[yr, mids.astype(int)] = df_mb_sel_yr.ANNUAL_BALANCE.values\n",
    "    prof.to_csv(os.path.join(odir, 'mb_profiles', 'profile_WGMS-{:05d}.csv'.format(wid)))\n",
    "    df_links_sel.loc[df_links_sel.RGI_ID == rid, 'HAS_PROFILE'] = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Links: add RGI6 to 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidePrompt": true
   },
   "source": [
    "We use our previous list of links:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ref_df = pd.read_csv(odir + '/rgi_wgms_links_20200414_manual_addition.csv') \n",
    "# ok, this file is changed afterwards and then saved under rgi_wgms_links_20200415.csv, so it should be fine to first load that file, do the changes and then save it under the name that is also used in OGGM \n",
    "len(ref_df), len(df_links_sel)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel_bck = df_links_sel.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for did, rid in df_links_sel[['RGI_ID']].iterrows():\n",
    "    if 'RGI50' in rid.RGI_ID:\n",
    "        df_links_sel.loc[did, 'RGI40_ID'] = ''\n",
    "        df_links_sel.loc[did, 'RGI50_ID'] = rid.RGI_ID\n",
    "        df_links_sel.loc[did, 'RGI60_ID'] = ''\n",
    "    elif 'RGI60' in rid.RGI_ID:\n",
    "        df_links_sel.loc[did, 'RGI40_ID'] = ''\n",
    "        df_links_sel.loc[did, 'RGI50_ID'] = ''\n",
    "        df_links_sel.loc[did, 'RGI60_ID'] = rid.RGI_ID\n",
    "    elif 'RGI40' in rid.RGI_ID:\n",
    "        df_links_sel.loc[did, 'RGI40_ID'] = rid.RGI_ID\n",
    "        df_links_sel.loc[did, 'RGI50_ID'] = ''\n",
    "        df_links_sel.loc[did, 'RGI60_ID'] = ''\n",
    "    else:\n",
    "        raise RuntimeError()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Try to convert the RGI4 ad RGI5 links to RGI6 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i, r in df_links_sel.iterrows():\n",
    "    rid4 = r.RGI40_ID\n",
    "    rid5 = r.RGI50_ID\n",
    "    rid6 = r.RGI60_ID\n",
    "    if rid6 != '':\n",
    "        # check if rgi5 could need as well\n",
    "        if rid5 == '':\n",
    "            ref = ref_df.loc[ref_df.RGI60_ID == rid6]\n",
    "            if len(ref) == 1:\n",
    "                df_links_sel.loc[i, 'RGI50_ID'] = ref.RGI50_ID.iloc[0]\n",
    "        continue\n",
    "    if rid4 != '':\n",
    "        ref = ref_df.loc[ref_df.RGI40_ID == rid4]\n",
    "    if rid5 != '':\n",
    "        ref = ref_df.loc[ref_df.RGI50_ID == rid5]\n",
    "    if len(ref) == 0:\n",
    "        # Decide what to do here\n",
    "        if 'RGI40' in rid4:\n",
    "            # URUMQI N1 - it is now splitted, just ignore\n",
    "            raise RuntimeError()\n",
    "        else:\n",
    "            # I checked them all: simply take it\n",
    "            rid6 = rid5.replace('RGI50', 'RGI60')\n",
    "            # Check\n",
    "#             sh5 = utils.get_rgi_glacier_entities([rid5], version='50')\n",
    "#             sh6 = utils.get_rgi_glacier_entities([rid5.replace('RGI50', 'RGI60')], version='60')\n",
    "#             f, (ax1, ax2) = plt.subplots(1, 2)\n",
    "#             sh5.plot(ax=ax1)\n",
    "#             sh6.plot(ax=ax2)\n",
    "    else:\n",
    "        rid6 = ref.RGI60_ID.iloc[0]\n",
    "    df_links_sel.loc[i, 'RGI60_ID'] = rid6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Last check:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel.loc[df_links_sel.duplicated('RGI60_ID', keep=False)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Some stats "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the RGI\n",
    "#df_rgi = pd.read_hdf(utils.file_downloader('https://cluster.klima.uni-bremen.de/~oggm/rgi/rgi62_allglaciers_stats.h5'))\n",
    "df_rgi = pd.read_hdf(utils.file_downloader('https://cluster.klima.uni-bremen.de/~oggm/rgi/rgi62_stats.h5'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add lons and lats and other attrs to the WGMS ones\n",
    "smdf = df_rgi.loc[df_links_sel.RGI60_ID]\n",
    "df_links_sel['CenLon'] = smdf.CenLon.values\n",
    "df_links_sel['CenLat'] = smdf.CenLat.values\n",
    "df_links_sel['GlacierType'] = smdf.GlacierType.values\n",
    "df_links_sel['TerminusType'] = smdf.TerminusType.values\n",
    "df_links_sel['IsTidewater'] = smdf.IsTidewater.values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add region names\n",
    "sr = gpd.read_file(utils.get_rgi_dir(version='62') + '/00_rgi62_regions/00_rgi62_O1Regions.shp')\n",
    "sr['RGI_CODE'] = ['{:02d}'.format(int(s)) for s in sr['RGI_CODE']]\n",
    "sr = sr.drop_duplicates(subset='RGI_CODE')\n",
    "sr = sr.set_index('RGI_CODE')\n",
    "sr['FULL_NAME'] = [s + ': ' + n for s, n in sr.FULL_NAME.items()]\n",
    "df_links_sel['RGI_REG_NAME'] = sr.loc[df_links_sel.RGI_REG].FULL_NAME.values\n",
    "df_rgi['RGI_REG_NAME'] = sr.loc[df_rgi.O1Region].FULL_NAME.values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel = df_links_sel[['CenLon', 'CenLat',\n",
    "                             'POLITICAL_UNIT', 'NAME', 'WGMS_ID', 'PSFG_ID', 'WGI_ID', 'GLIMS_ID',\n",
    "                             'RGI40_ID', 'RGI50_ID', 'RGI60_ID', 'RGI_REG', 'RGI_REG_NAME', \n",
    "                             'GlacierType', 'TerminusType', \n",
    "                             'IsTidewater', 'N_MB_YRS', 'HAS_PROFILE', 'REMARKS']]\n",
    "df_links_sel.to_csv(os.path.join(odir, 'rgi_wgms_links_20200415.csv'.format(wid)), index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Some plots "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "# sns.set_context('talk')\n",
    "sns.set_style('whitegrid')\n",
    "pdir = odir+'/plots'\n",
    "utils.mkdir(pdir)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel['N_MB_YRS'].plot(kind='hist', color='C3', bins=np.arange(21)*5);\n",
    "plt.xlim(5, 100);\n",
    "plt.ylabel('Number of glaciers')\n",
    "plt.xlabel('Length of the timeseries (years)');\n",
    "plt.tight_layout();\n",
    "plt.savefig(os.path.join(pdir, 'nglacier-hist.png'), dpi=150)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cartopy\n",
    "import cartopy.crs as ccrs\n",
    "\n",
    "f = plt.figure(figsize=(12, 7))\n",
    "ax = plt.axes(projection=ccrs.Robinson())\n",
    "# mark a known place to help us geo-locate ourselves\n",
    "ax.set_extent([-180, 180, -90, 90], crs=ccrs.PlateCarree())\n",
    "ax.stock_img()\n",
    "ax.add_feature(cartopy.feature.COASTLINE);\n",
    "s = df_links_sel.loc[df_links_sel.N_MB_YRS < 10]\n",
    "print(len(s))\n",
    "ax.scatter(s.CenLon, s.CenLat, label='< 10 MB years', s=50,\n",
    "           edgecolor='k', facecolor='C0', transform=ccrs.PlateCarree(), zorder=99)\n",
    "s = df_links_sel.loc[(df_links_sel.N_MB_YRS >= 10) & (df_links_sel.N_MB_YRS < 30)]\n",
    "print(len(s))\n",
    "ax.scatter(s.CenLon, s.CenLat, label='$\\geq$ 10 and < 30 MB years', s=50,\n",
    "           edgecolor='k', facecolor='C2', transform=ccrs.PlateCarree(), zorder=99)\n",
    "s = df_links_sel.loc[df_links_sel.N_MB_YRS >= 30]\n",
    "print(len(s))\n",
    "ax.scatter(s.CenLon, s.CenLat, label='$\\geq$ 30 MB years', s=50,\n",
    "           edgecolor='k', facecolor='C3', transform=ccrs.PlateCarree(), zorder=99)\n",
    "plt.title('WGMS glaciers with at least 5 years of mass-balance data')\n",
    "plt.legend(loc=4, frameon=True)\n",
    "plt.tight_layout();\n",
    "plt.savefig(os.path.join(pdir, 'glacier-map.png'), dpi=150)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_links_sel.TerminusType.value_counts().to_frame()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ax = sns.countplot(x='RGI_REG', hue=\"TerminusType\", data=df_links_sel);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "md = pd.concat([df_rgi.GlacierType.value_counts().to_frame(name='RGI V6').T, \n",
    "                df_links_sel.GlacierType.value_counts().to_frame(name='WGMS').T]\n",
    "          ).T\n",
    "md"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "md = pd.concat([df_rgi.TerminusType.value_counts().to_frame(name='RGI V6').T, \n",
    "                df_links_sel.TerminusType.value_counts().to_frame(name='WGMS').T],\n",
    "                sort=False).T\n",
    "md"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "area_per_reg = df_rgi[['Area', 'RGI_REG_NAME']].groupby('RGI_REG_NAME').sum()\n",
    "area_per_reg['N_WGMS'] = df_links_sel.RGI_REG_NAME.value_counts()\n",
    "area_per_reg = area_per_reg.reset_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.barplot(x=\"Area\", y=\"RGI_REG_NAME\", data=area_per_reg);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "area_per_reg['N_WGMS_PER_UNIT'] = area_per_reg.N_WGMS / area_per_reg.Area * 1000"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(9, 6))\n",
    "sns.barplot(x=\"N_WGMS\", y=\"RGI_REG_NAME\", data=area_per_reg);  # , palette=sns.husl_palette(19, s=.7, l=.5)\n",
    "plt.ylabel('')\n",
    "plt.xlabel('')\n",
    "plt.title('Number of WGMS glaciers per RGI region');\n",
    "plt.tight_layout();\n",
    "plt.savefig(os.path.join(pdir, 'barplot-ng.png'), dpi=150)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(9, 6))\n",
    "sns.barplot(x=\"N_WGMS_PER_UNIT\", y=\"RGI_REG_NAME\", data=area_per_reg);\n",
    "plt.ylabel('')\n",
    "plt.xlabel('')\n",
    "plt.title('Number of WGMS glaciers per 1,000 km$^2$ of ice');\n",
    "plt.tight_layout();\n",
    "plt.savefig(os.path.join(pdir, 'barplot-perice.png'), dpi=150)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nmb_yrs = df_links_sel[[\"RGI_REG\", 'N_MB_YRS']].groupby(\"RGI_REG\").sum()\n",
    "i = []\n",
    "for k, d in nmb_yrs.iterrows():\n",
    "     i.extend([k] * d.values[0])\n",
    "df = pd.DataFrame()\n",
    "df[\"RGI_REG\"] = i\n",
    "ax = sns.countplot(x=\"RGI_REG\", data=df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "nbTranslate": {
   "displayLangs": [
    "*"
   ],
   "hotkey": "alt-t",
   "langInMainMenu": true,
   "sourceLang": "en",
   "targetLang": "fr",
   "useGoogleTranslate": true
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}