{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Processing WGMS mass-balance data for OGGM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we use the most recent lookup table provided by the WGMS to prepare the reference mass-balance data for the OGGM model.\n", "\n", "For this to work you'll need the latest lookup table and the latest WGMS FoG data (available [here](http://wgms.ch/data_databaseversions/)), and the latest RGI version (available [here](http://www.glims.org/RGI/))." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import geopandas as gpd\n", "import os\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read the WGMS files" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# just download the newest data, change the path_to_download_data and the year and month accordingly. If you run the entire notebook, the new WGMS MB data should be processed for OGGM\n", "year = '2021'\n", "month = '05'\n", "path_to_download_data = '/home/lilianschuster/Downloads/'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "idir = f'{path_to_download_data}DOI-WGMS-FoG-{year}-{month}'\n", "df_links = pd.read_csv(os.path.join(idir, f'WGMS-FoG-{year}-{month}-AA-GLACIER-ID-LUT.csv'), encoding='iso8859_15')\n", "df_mb_all = pd.read_csv(os.path.join(idir, f'WGMS-FoG-{year}-{month}-EE-MASS-BALANCE.csv'), encoding='iso8859_15')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'Total number of links: {}'.format(len(df_links))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links = df_links.dropna(subset=['RGI_ID']) # keep the ones with a valid RGI ID\n", "'Total number of RGI links: {}'.format(len(df_links))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Select WGMS IDs with more than N years of mass-balance " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_mb = df_mb_all[df_mb_all.LOWER_BOUND.isin([9999])].copy() # remove the profiles\n", "gp_id = df_mb.groupby('WGMS_ID')\n", "ids_5 = []\n", "ids_1 = []\n", "for wgmsid, group in gp_id:\n", " if np.sum(np.isfinite(group.ANNUAL_BALANCE.values)) >= 5:\n", " ids_5.append(wgmsid)\n", " if np.sum(np.isfinite(group.ANNUAL_BALANCE.values)) >= 1:\n", " ids_1.append(wgmsid)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Number of glaciers with more than 1 MB years: {}'.format(len(ids_1)))\n", "print('Number of glaciers with more than 5 MB years: {}'.format(len(ids_5)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Number of glaciers in the lookup table with at least 5 years of valid MB data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'Number of matches in the WGMS lookup-table: {}'.format(len(df_links.loc[df_links.WGMS_ID.isin(ids_5)]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# keep those\n", "df_links_sel = df_links.loc[df_links.WGMS_ID.isin(ids_5)].copy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# add some simple stats\n", "df_links_sel['RGI_REG'] = [rid.split('-')[1].split('.')[0] for rid in df_links_sel.RGI_ID]\n", "df_links_sel['N_MB_YRS'] = [len(df_mb.loc[df_mb.WGMS_ID == wid]) for wid in df_links_sel.WGMS_ID]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Duplicates?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel.loc[df_links_sel.duplicated('RGI_ID', keep=False)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Careser is an Italian glacier which is now disintegrated in smaller parts. Here a screenshot from the WGMS exploration tool:\n", "\n", "\n", "\n", "We keep the oldest MB series and discard the newer ones which are for the smaller glaciers (not represented in RGI)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We keep CARESER as this is the longest before they split\n", "df_links_sel = df_links_sel.loc[~ df_links_sel.WGMS_ID.isin([3346, 3345])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The two norwegian glaciers are part of an ice cap:\n", "\n", "\n", "\n", "The two mass-balance time series are very close to each other, unsurprisingly:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_mb.loc[df_mb.WGMS_ID.isin([3339])].set_index('YEAR').ANNUAL_BALANCE.plot()\n", "df_mb.loc[df_mb.WGMS_ID.isin([3343])].set_index('YEAR').ANNUAL_BALANCE.plot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since there is no reason for picking one series over the other, we have to remove both from the list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The two norwegians glaciers are some part of an ice cap. I'll just remove them both\n", "df_links_sel = df_links_sel.loc[~ df_links_sel.WGMS_ID.isin([3339, 3343])]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel.loc[df_links_sel.duplicated('RGI_ID', keep=False)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In previous WGMS refmb dataset, there were also two duplicate glaciers in Iceland (WGMS number: 3089 and 3110). Glacier 3110 got apparently removed in the newest refmb dataset. So, no need to remove sth. there!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Antarctic glaciers link to a huge non-divided ice cap. We simply ignore them: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel = df_links_sel.loc[~ df_links_sel.WGMS_ID.isin([10404, 10403])]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel.loc[df_links_sel.duplicated('RGI_ID', keep=False)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Remove suspicious links " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See [PDF document from Betka](https://www.dropbox.com/s/ufh07zq0tfnf805/betka_incorrect_links.pdf?dl=0) + old Urumqi n1:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel.loc[df_links_sel.WGMS_ID.isin([3972, 1318, 10401, 1354, 853])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We remove these as well:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel = df_links_sel.loc[~ df_links_sel.WGMS_ID.isin([3972, 1318, 10401, 1354, 853])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Remove glaciers we can't handle " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "WARD H. I. RISE has really bad DEMs:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel = df_links_sel.loc[~ df_links_sel.WGMS_ID.isin([53])]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'Final number of matches in the WGMS lookup-table: {}'.format(len(df_links_sel))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write out the mass-balance data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#odir = '/home/mowglie/Documents/git/oggm-sample-data/wgms'\n", "odir = '/home/lilianschuster/oggm-sample-data/wgms'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Annual MB" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from oggm import utils" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "utils.mkdir(odir + '/mbdata', reset=True)\n", "for rid, wid in zip(df_links_sel.RGI_ID, df_links_sel.WGMS_ID):\n", " df_mb_sel = df_mb.loc[df_mb.WGMS_ID == wid].copy()\n", " df_mb_sel = df_mb_sel[['YEAR', 'WGMS_ID', 'POLITICAL_UNIT', 'NAME', 'AREA', 'WINTER_BALANCE', \n", " 'SUMMER_BALANCE', 'ANNUAL_BALANCE', 'REMARKS']].set_index('YEAR')\n", " df_mb_sel['RGI_ID'] = rid\n", " df_mb_sel.to_csv(os.path.join(odir, 'mbdata', 'mbdata_WGMS-{:05d}.csv'.format(wid)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Profiles" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "utils.mkdir(odir + '/mb_profiles', reset=True)\n", "for rid, wid in zip(df_links_sel.RGI_ID, df_links_sel.WGMS_ID):\n", " df_mb_sel = df_mb_all.loc[df_mb_all.WGMS_ID == wid].copy()\n", " df_mb_sel = df_mb_sel.loc[df_mb_sel.LOWER_BOUND != 9999]\n", " df_mb_sel = df_mb_sel.loc[df_mb_sel.UPPER_BOUND != 9999]\n", " if len(df_mb_sel) == 0:\n", " df_links_sel.loc[df_links_sel.RGI_ID == rid, 'HAS_PROFILE'] = False\n", " continue\n", " lb = set()\n", " for yr in df_mb_sel.YEAR.unique():\n", " df_mb_sel_yr = df_mb_sel.loc[df_mb_sel.YEAR == yr]\n", " mids = df_mb_sel_yr.LOWER_BOUND.values*1.\n", " mids += df_mb_sel_yr.UPPER_BOUND.values[:len(mids)]\n", " mids *= 0.5\n", " [lb.add(int(m)) for m in mids]\n", " prof = pd.DataFrame(columns=sorted(list(lb)), index=sorted(df_mb_sel.YEAR.unique()))\n", " for yr in df_mb_sel.YEAR.unique():\n", " df_mb_sel_yr = df_mb_sel.loc[df_mb_sel.YEAR == yr]\n", " mids = df_mb_sel_yr.LOWER_BOUND.values*1.\n", " mids += df_mb_sel_yr.UPPER_BOUND.values[:len(mids)]\n", " mids *= 0.5\n", " prof.loc[yr, mids.astype(int)] = df_mb_sel_yr.ANNUAL_BALANCE.values\n", " prof.to_csv(os.path.join(odir, 'mb_profiles', 'profile_WGMS-{:05d}.csv'.format(wid)))\n", " df_links_sel.loc[df_links_sel.RGI_ID == rid, 'HAS_PROFILE'] = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Links: add RGI6 to 5" ] }, { "cell_type": "markdown", "metadata": { "hidePrompt": true }, "source": [ "We use our previous list of links:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ref_df = pd.read_csv(odir + '/rgi_wgms_links_20200414_manual_addition.csv') \n", "# ok, this file is changed afterwards and then saved under rgi_wgms_links_20200415.csv, so it should be fine to first load that file, do the changes and then save it under the name that is also used in OGGM \n", "len(ref_df), len(df_links_sel)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel_bck = df_links_sel.copy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for did, rid in df_links_sel[['RGI_ID']].iterrows():\n", " if 'RGI50' in rid.RGI_ID:\n", " df_links_sel.loc[did, 'RGI40_ID'] = ''\n", " df_links_sel.loc[did, 'RGI50_ID'] = rid.RGI_ID\n", " df_links_sel.loc[did, 'RGI60_ID'] = ''\n", " elif 'RGI60' in rid.RGI_ID:\n", " df_links_sel.loc[did, 'RGI40_ID'] = ''\n", " df_links_sel.loc[did, 'RGI50_ID'] = ''\n", " df_links_sel.loc[did, 'RGI60_ID'] = rid.RGI_ID\n", " elif 'RGI40' in rid.RGI_ID:\n", " df_links_sel.loc[did, 'RGI40_ID'] = rid.RGI_ID\n", " df_links_sel.loc[did, 'RGI50_ID'] = ''\n", " df_links_sel.loc[did, 'RGI60_ID'] = ''\n", " else:\n", " raise RuntimeError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Try to convert the RGI4 ad RGI5 links to RGI6 " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i, r in df_links_sel.iterrows():\n", " rid4 = r.RGI40_ID\n", " rid5 = r.RGI50_ID\n", " rid6 = r.RGI60_ID\n", " if rid6 != '':\n", " # check if rgi5 could need as well\n", " if rid5 == '':\n", " ref = ref_df.loc[ref_df.RGI60_ID == rid6]\n", " if len(ref) == 1:\n", " df_links_sel.loc[i, 'RGI50_ID'] = ref.RGI50_ID.iloc[0]\n", " continue\n", " if rid4 != '':\n", " ref = ref_df.loc[ref_df.RGI40_ID == rid4]\n", " if rid5 != '':\n", " ref = ref_df.loc[ref_df.RGI50_ID == rid5]\n", " if len(ref) == 0:\n", " # Decide what to do here\n", " if 'RGI40' in rid4:\n", " # URUMQI N1 - it is now splitted, just ignore\n", " raise RuntimeError()\n", " else:\n", " # I checked them all: simply take it\n", " rid6 = rid5.replace('RGI50', 'RGI60')\n", " # Check\n", "# sh5 = utils.get_rgi_glacier_entities([rid5], version='50')\n", "# sh6 = utils.get_rgi_glacier_entities([rid5.replace('RGI50', 'RGI60')], version='60')\n", "# f, (ax1, ax2) = plt.subplots(1, 2)\n", "# sh5.plot(ax=ax1)\n", "# sh6.plot(ax=ax2)\n", " else:\n", " rid6 = ref.RGI60_ID.iloc[0]\n", " df_links_sel.loc[i, 'RGI60_ID'] = rid6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Last check:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel.loc[df_links_sel.duplicated('RGI60_ID', keep=False)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some stats " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get the RGI\n", "#df_rgi = pd.read_hdf(utils.file_downloader('https://cluster.klima.uni-bremen.de/~oggm/rgi/rgi62_allglaciers_stats.h5'))\n", "df_rgi = pd.read_hdf(utils.file_downloader('https://cluster.klima.uni-bremen.de/~oggm/rgi/rgi62_stats.h5'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# add lons and lats and other attrs to the WGMS ones\n", "smdf = df_rgi.loc[df_links_sel.RGI60_ID]\n", "df_links_sel['CenLon'] = smdf.CenLon.values\n", "df_links_sel['CenLat'] = smdf.CenLat.values\n", "df_links_sel['GlacierType'] = smdf.GlacierType.values\n", "df_links_sel['TerminusType'] = smdf.TerminusType.values\n", "df_links_sel['IsTidewater'] = smdf.IsTidewater.values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# add region names\n", "sr = gpd.read_file(utils.get_rgi_dir(version='62') + '/00_rgi62_regions/00_rgi62_O1Regions.shp')\n", "sr['RGI_CODE'] = ['{:02d}'.format(int(s)) for s in sr['RGI_CODE']]\n", "sr = sr.drop_duplicates(subset='RGI_CODE')\n", "sr = sr.set_index('RGI_CODE')\n", "sr['FULL_NAME'] = [s + ': ' + n for s, n in sr.FULL_NAME.items()]\n", "df_links_sel['RGI_REG_NAME'] = sr.loc[df_links_sel.RGI_REG].FULL_NAME.values\n", "df_rgi['RGI_REG_NAME'] = sr.loc[df_rgi.O1Region].FULL_NAME.values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel = df_links_sel[['CenLon', 'CenLat',\n", " 'POLITICAL_UNIT', 'NAME', 'WGMS_ID', 'PSFG_ID', 'WGI_ID', 'GLIMS_ID',\n", " 'RGI40_ID', 'RGI50_ID', 'RGI60_ID', 'RGI_REG', 'RGI_REG_NAME', \n", " 'GlacierType', 'TerminusType', \n", " 'IsTidewater', 'N_MB_YRS', 'HAS_PROFILE', 'REMARKS']]\n", "df_links_sel.to_csv(os.path.join(odir, 'rgi_wgms_links_20200415.csv'.format(wid)), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some plots " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import seaborn as sns\n", "# sns.set_context('talk')\n", "sns.set_style('whitegrid')\n", "pdir = odir+'/plots'\n", "utils.mkdir(pdir)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel['N_MB_YRS'].plot(kind='hist', color='C3', bins=np.arange(21)*5);\n", "plt.xlim(5, 100);\n", "plt.ylabel('Number of glaciers')\n", "plt.xlabel('Length of the timeseries (years)');\n", "plt.tight_layout();\n", "plt.savefig(os.path.join(pdir, 'nglacier-hist.png'), dpi=150)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import cartopy\n", "import cartopy.crs as ccrs\n", "\n", "f = plt.figure(figsize=(12, 7))\n", "ax = plt.axes(projection=ccrs.Robinson())\n", "# mark a known place to help us geo-locate ourselves\n", "ax.set_extent([-180, 180, -90, 90], crs=ccrs.PlateCarree())\n", "ax.stock_img()\n", "ax.add_feature(cartopy.feature.COASTLINE);\n", "s = df_links_sel.loc[df_links_sel.N_MB_YRS < 10]\n", "print(len(s))\n", "ax.scatter(s.CenLon, s.CenLat, label='< 10 MB years', s=50,\n", " edgecolor='k', facecolor='C0', transform=ccrs.PlateCarree(), zorder=99)\n", "s = df_links_sel.loc[(df_links_sel.N_MB_YRS >= 10) & (df_links_sel.N_MB_YRS < 30)]\n", "print(len(s))\n", "ax.scatter(s.CenLon, s.CenLat, label='$\\geq$ 10 and < 30 MB years', s=50,\n", " edgecolor='k', facecolor='C2', transform=ccrs.PlateCarree(), zorder=99)\n", "s = df_links_sel.loc[df_links_sel.N_MB_YRS >= 30]\n", "print(len(s))\n", "ax.scatter(s.CenLon, s.CenLat, label='$\\geq$ 30 MB years', s=50,\n", " edgecolor='k', facecolor='C3', transform=ccrs.PlateCarree(), zorder=99)\n", "plt.title('WGMS glaciers with at least 5 years of mass-balance data')\n", "plt.legend(loc=4, frameon=True)\n", "plt.tight_layout();\n", "plt.savefig(os.path.join(pdir, 'glacier-map.png'), dpi=150)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_links_sel.TerminusType.value_counts().to_frame()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ax = sns.countplot(x='RGI_REG', hue=\"TerminusType\", data=df_links_sel);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "md = pd.concat([df_rgi.GlacierType.value_counts().to_frame(name='RGI V6').T, \n", " df_links_sel.GlacierType.value_counts().to_frame(name='WGMS').T]\n", " ).T\n", "md" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "md = pd.concat([df_rgi.TerminusType.value_counts().to_frame(name='RGI V6').T, \n", " df_links_sel.TerminusType.value_counts().to_frame(name='WGMS').T],\n", " sort=False).T\n", "md" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "area_per_reg = df_rgi[['Area', 'RGI_REG_NAME']].groupby('RGI_REG_NAME').sum()\n", "area_per_reg['N_WGMS'] = df_links_sel.RGI_REG_NAME.value_counts()\n", "area_per_reg = area_per_reg.reset_index()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sns.barplot(x=\"Area\", y=\"RGI_REG_NAME\", data=area_per_reg);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "area_per_reg['N_WGMS_PER_UNIT'] = area_per_reg.N_WGMS / area_per_reg.Area * 1000" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(9, 6))\n", "sns.barplot(x=\"N_WGMS\", y=\"RGI_REG_NAME\", data=area_per_reg); # , palette=sns.husl_palette(19, s=.7, l=.5)\n", "plt.ylabel('')\n", "plt.xlabel('')\n", "plt.title('Number of WGMS glaciers per RGI region');\n", "plt.tight_layout();\n", "plt.savefig(os.path.join(pdir, 'barplot-ng.png'), dpi=150)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(9, 6))\n", "sns.barplot(x=\"N_WGMS_PER_UNIT\", y=\"RGI_REG_NAME\", data=area_per_reg);\n", "plt.ylabel('')\n", "plt.xlabel('')\n", "plt.title('Number of WGMS glaciers per 1,000 km$^2$ of ice');\n", "plt.tight_layout();\n", "plt.savefig(os.path.join(pdir, 'barplot-perice.png'), dpi=150)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nmb_yrs = df_links_sel[[\"RGI_REG\", 'N_MB_YRS']].groupby(\"RGI_REG\").sum()\n", "i = []\n", "for k, d in nmb_yrs.iterrows():\n", " i.extend([k] * d.values[0])\n", "df = pd.DataFrame()\n", "df[\"RGI_REG\"] = i\n", "ax = sns.countplot(x=\"RGI_REG\", data=df)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.4" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "nbTranslate": { "displayLangs": [ "*" ], "hotkey": "alt-t", "langInMainMenu": true, "sourceLang": "en", "targetLang": "fr", "useGoogleTranslate": true }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }