{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Aggregation helper function in the Country Converter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The country converter (coco) includes a helper function which provides country classification concordances.\n",
    "The concordances are provided in dict, vector and matrix format. The helper function can be used independent from the country classification included in coco (see 'Small example' below), but can also make use of the different classifications provided by coco ('WIOD example' below). The aggregation function can also be used in Matlab to build concordance matrices (see 'Use in Matlab' at the end)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Small example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example is based on a small country set *co*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "co = [\"c1\", \"c2\", \"c3\", \"c4\", \"c5\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assume, these countries can be classified in two ways (e.g. one could be geographic classification, the other one a political one):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "geographic_reg = {\"c1\": \"geo1\", \"c2\": \"geo1\", \"c3\": \"geo2\", \"c4\": \"geo3\", \"c5\": \"geo3\"}\n",
    "political_reg = {\"c1\": \"pol1\", \"c3\": \"pol1\", \"c5\": \"pol2\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Single classification aggregation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we want to aggregated the countries *co* to political regions, assigning all non classified countries to a Rest of the World region RoW:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import country_converter as coco"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " **vector representation**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "concordance_vec = coco.agg_conc(\n",
    "    original_countries=co,\n",
    "    aggregates=political_reg,\n",
    "    missing_countries=\"RoW\",\n",
    "    as_dataframe=\"sparse\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This results in a concordance which maps countries *co* to the regions given in *political_reg*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>original</th>\n",
       "      <th>aggregated</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>c1</td>\n",
       "      <td>pol1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>c2</td>\n",
       "      <td>RoW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c3</td>\n",
       "      <td>pol1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>c4</td>\n",
       "      <td>RoW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>c5</td>\n",
       "      <td>pol2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  original aggregated\n",
       "0       c1       pol1\n",
       "1       c2        RoW\n",
       "2       c3       pol1\n",
       "3       c4        RoW\n",
       "4       c5       pol2"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "concordance_vec"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Two other output formats are available:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " **full matrix representation**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>aggregated</th>\n",
       "      <th>RoW</th>\n",
       "      <th>pol1</th>\n",
       "      <th>pol2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>original</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>c1</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c4</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c5</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "aggregated  RoW  pol1  pol2\n",
       "original                   \n",
       "c1          0.0   1.0   0.0\n",
       "c2          1.0   0.0   0.0\n",
       "c3          0.0   1.0   0.0\n",
       "c4          1.0   0.0   0.0\n",
       "c5          0.0   0.0   1.0"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "concordance_matrix = coco.agg_conc(\n",
    "    original_countries=co,\n",
    "    aggregates=political_reg,\n",
    "    missing_countries=\"RoW\",\n",
    "    as_dataframe=\"full\",\n",
    ")\n",
    "concordance_matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The numerical values are available in the *pandas.DataFrame* attribute *values*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0., 1., 0.],\n",
       "       [1., 0., 0.],\n",
       "       [0., 1., 0.],\n",
       "       [1., 0., 0.],\n",
       "       [0., 0., 1.]])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "concordance_matrix.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " **dictionary representation**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "OrderedDict([('c1', 'pol1'),\n",
       "             ('c2', 'RoW'),\n",
       "             ('c3', 'pol1'),\n",
       "             ('c4', 'RoW'),\n",
       "             ('c5', 'pol2')])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "concordance_dict = coco.agg_conc(\n",
    "    original_countries=co,\n",
    "    aggregates=political_reg,\n",
    "    missing_countries=\"RoW\",\n",
    "    as_dataframe=False,\n",
    ")\n",
    "concordance_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multiple classification aggregations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The *aggregates* parameter of *coco.agg_conc* accepts several classification for the building of the concordance. This allows for a sequential aggregation.\n",
    "\n",
    "Based on the two classifications above *geographic_reg* and *politic_reg* we can build a concordance which first aggregates into political regions and subsequently all countries not included in a political region into geographic regions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>original</th>\n",
       "      <th>aggregated</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>c1</td>\n",
       "      <td>pol1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>c2</td>\n",
       "      <td>geo1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c3</td>\n",
       "      <td>pol1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>c4</td>\n",
       "      <td>geo3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>c5</td>\n",
       "      <td>pol2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  original aggregated\n",
       "0       c1       pol1\n",
       "1       c2       geo1\n",
       "2       c3       pol1\n",
       "3       c4       geo3\n",
       "4       c5       pol2"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conc_pol_geo = coco.agg_conc(\n",
    "    original_countries=co,\n",
    "    aggregates=[political_reg, geographic_reg],\n",
    "    missing_countries=\"RoW\",\n",
    "    as_dataframe=\"sparse\",\n",
    ")\n",
    "conc_pol_geo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The order of the classifications is important here, as subsequent classifications are only applied to countries not already classified:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>original</th>\n",
       "      <th>aggregated</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>c1</td>\n",
       "      <td>geo1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>c2</td>\n",
       "      <td>geo1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c3</td>\n",
       "      <td>geo2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>c4</td>\n",
       "      <td>geo3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>c5</td>\n",
       "      <td>geo3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  original aggregated\n",
       "0       c1       geo1\n",
       "1       c2       geo1\n",
       "2       c3       geo2\n",
       "3       c4       geo3\n",
       "4       c5       geo3"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conc_geo_pol = coco.agg_conc(\n",
    "    original_countries=co,\n",
    "    aggregates=[geographic_reg, political_reg],\n",
    "    missing_countries=\"RoW\",\n",
    "    as_dataframe=\"sparse\",\n",
    ")\n",
    "conc_geo_pol"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can be used to classify into different political regions (see example WIOD below) and two exclude certain countries from the classification as for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>original</th>\n",
       "      <th>aggregated</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>c1</td>\n",
       "      <td>pol1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>c2</td>\n",
       "      <td>c2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c3</td>\n",
       "      <td>pol1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>c4</td>\n",
       "      <td>geo3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>c5</td>\n",
       "      <td>pol2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  original aggregated\n",
       "0       c1       pol1\n",
       "1       c2         c2\n",
       "2       c3       pol1\n",
       "3       c4       geo3\n",
       "4       c5       pol2"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conc_c2_pol_geo = coco.agg_conc(\n",
    "    original_countries=co,\n",
    "    aggregates=[{\"c2\": \"c2\"}, political_reg, geographic_reg],\n",
    "    missing_countries=\"RoW\",\n",
    "    as_dataframe=\"sparse\",\n",
    ")\n",
    "conc_c2_pol_geo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## WIOD Example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Besides passing custom dictionaries with countries and classification, *coco.agg_conc* also accepts strings corresponding to any classification available in *coco*.\n",
    "For example, to aggregated the [WIOD](http://www.wiod.org/home) countries into EU, OECD and 'OTHER' use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "conc_wiod_3reg = coco.agg_conc(\n",
    "    original_countries=\"WIOD\",\n",
    "    aggregates=[\"EU28\", \"OECD\"],\n",
    "    missing_countries=\"OTHER\",\n",
    "    as_dataframe=\"sparse\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>original</th>\n",
       "      <th>aggregated</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>RoW</td>\n",
       "      <td>nan_&amp;_EU28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AUS</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AUT</td>\n",
       "      <td>EU28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BEL</td>\n",
       "      <td>EU28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>BRA</td>\n",
       "      <td>OTHER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>BGR</td>\n",
       "      <td>EU28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>CAN</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>CHN</td>\n",
       "      <td>OTHER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>CYP</td>\n",
       "      <td>EU28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>CZE</td>\n",
       "      <td>EU28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  original  aggregated\n",
       "0      RoW  nan_&_EU28\n",
       "1      AUS        OECD\n",
       "2      AUT        EU28\n",
       "3      BEL        EU28\n",
       "4      BRA       OTHER\n",
       "5      BGR        EU28\n",
       "6      CAN        OECD\n",
       "7      CHN       OTHER\n",
       "8      CYP        EU28\n",
       "9      CZE        EU28"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conc_wiod_3reg.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "WIOD already includes a ROW region, which consists of EU and non-EU countries. This leads the to the compound name for the region shown above. To set such many to many matches to None and pass them through until a unique matching is found, set *merge_multiple_strings* to None:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>original</th>\n",
       "      <th>aggregated</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>RoW</td>\n",
       "      <td>OTHER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AUS</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AUT</td>\n",
       "      <td>EU28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BEL</td>\n",
       "      <td>EU28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>BRA</td>\n",
       "      <td>OTHER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>BGR</td>\n",
       "      <td>EU28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>CAN</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>CHN</td>\n",
       "      <td>OTHER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>CYP</td>\n",
       "      <td>EU28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>CZE</td>\n",
       "      <td>EU28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  original aggregated\n",
       "0      RoW      OTHER\n",
       "1      AUS       OECD\n",
       "2      AUT       EU28\n",
       "3      BEL       EU28\n",
       "4      BRA      OTHER\n",
       "5      BGR       EU28\n",
       "6      CAN       OECD\n",
       "7      CHN      OTHER\n",
       "8      CYP       EU28\n",
       "9      CZE       EU28"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conc_wiod_3reg = coco.agg_conc(\n",
    "    original_countries=\"WIOD\",\n",
    "    aggregates=[\"EU28\", \"OECD\"],\n",
    "    missing_countries=\"OTHER\",\n",
    "    merge_multiple_string=None,\n",
    "    as_dataframe=\"sparse\",\n",
    ")\n",
    "conc_wiod_3reg.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Internally, the dicts required for the aggregation are build be calling *CountryConverter.get_correspondence_dict(original_class, target_class)*. If a fine grained control over the aggregation is necessary, call this low level function directly and modify the resulting dict."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As in the small example given above, single countries can be separated by passing a custom dict:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>original</th>\n",
       "      <th>aggregated</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>RoW</td>\n",
       "      <td>Non_OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AUS</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AUT</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BEL</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>BRA</td>\n",
       "      <td>Non_OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>BGR</td>\n",
       "      <td>Non_OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>CAN</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>CHN</td>\n",
       "      <td>Non_OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>CYP</td>\n",
       "      <td>Non_OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>CZE</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>DNK</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>EST</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>FIN</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>FRA</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>DEU</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>GRC</td>\n",
       "      <td>OECD</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   original aggregated\n",
       "0       RoW   Non_OECD\n",
       "1       AUS       OECD\n",
       "2       AUT       OECD\n",
       "3       BEL       OECD\n",
       "4       BRA   Non_OECD\n",
       "5       BGR   Non_OECD\n",
       "6       CAN       OECD\n",
       "7       CHN   Non_OECD\n",
       "8       CYP   Non_OECD\n",
       "9       CZE       OECD\n",
       "10      DNK       OECD\n",
       "11      EST       OECD\n",
       "12      FIN       OECD\n",
       "13      FRA       OECD\n",
       "14      DEU       OECD\n",
       "15      GRC       OECD"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conc_wiod_DE_Aus_OECD = coco.agg_conc(\n",
    "    original_countries=\"WIOD\",\n",
    "    aggregates=[{\"Deu\": \"Deu\", \"Aus\": \"Aus\"}, \"OECD\"],\n",
    "    missing_countries=\"Non_OECD\",\n",
    "    merge_multiple_string=None,\n",
    "    as_dataframe=\"sparse\",\n",
    ")\n",
    "conc_wiod_DE_Aus_OECD.head(16)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use in Matlab"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python function can be called directly in Matlab. However, Matlab follows its trademark and makes simple things complicated. To achieve the above concordance *conc_wiod_3reg* in Matlab use:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "conc_wiod_3reg_dataframe = py.country_converter.agg_conc(...\n",
    "    pyargs('original_countries','WIOD', ...\n",
    "           'aggregates',py.list({'EU', 'OECD'}), ...\n",
    "           'missing_countries', 'OTHER', ...\n",
    "           'merge_multiple_string', py.None, ...\n",
    "           'as_dataframe', 'full'));\n",
    "```\n",
    "\n",
    "This returns the data as pandas DataFrame. Since Matlab does not provide any function for converting pandas DataFrames or Numpy arrays to double, two additional steps are required to get to a valid Matlab array:\n",
    "\n",
    "\n",
    "```\n",
    "temp = cellfun(@cell,cell(conc_wiod_3reg_dataframe.values.tolist),'un',0);\n",
    "conc_matrix = cell2mat(vertcat(temp {:}));\n",
    "```\n",
    "\n",
    "Since Matlab only works with numerical values, all names provided in the original DataFrame are lost.\n",
    "\n",
    "Additional dicts for the *aggregates* list can be build in Matlab with for example:\n",
    "\n",
    "```\n",
    "spec_country_dict = py.dict(pyargs('Aus', 'Aus', 'Deu', 'Deu'))\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Further details"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more details about *agg_conc* see the docstring:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function agg_conc in module country_converter.country_converter:\n",
      "\n",
      "agg_conc(original_countries, aggregates, missing_countries='missing', merge_multiple_string='_&_', log_missing_countries=None, log_merge_multiple_strings=None, coco=None, as_dataframe='sparse', original_countries_class=None)\n",
      "    Builds an aggregation concordance dict, vec or matrix\n",
      "    \n",
      "    Parameters\n",
      "    ----------\n",
      "    \n",
      "    original_countries: list or str\n",
      "        List of countries to aggregated, also accepts and valid column name of\n",
      "        CountryConverter.data (e.g. name_short, WIOD, ...)\n",
      "    \n",
      "    aggregates: list of dict or str\n",
      "        List of aggregation information. This can either be dict mapping the\n",
      "        names of 'original_countries' to aggregates, or a valid column name of\n",
      "        CountryConverter.data Aggregation happens in order given in this\n",
      "        parameter.  Thus, country assigned to an aggregate are not re-assigned\n",
      "        by the following aggregation information.\n",
      "    \n",
      "    missing_countries: str, boolean, None\n",
      "        Entry to fill in for countries in 'original_countries' which do not\n",
      "        appear in 'aggregates'.  str: Use the given name for all missing\n",
      "        countries True: Use the name in original_countries for missing\n",
      "        countries False: Skip these countries None: Use None for these\n",
      "        countries\n",
      "    \n",
      "    merge_multiple_string: str or None, optional\n",
      "        If multiple correspondence entries are given in one of the aggregates\n",
      "        join them with the given string (default: '_&_'.  To skip these entries,\n",
      "        pass None.\n",
      "    \n",
      "    log_missing_countries: function, optional\n",
      "        This function is called with country is country is in\n",
      "        'original_countries' but missing in all 'aggregates'.\n",
      "        For example, pass\n",
      "        lambda x: logging.error('Country {} missing'.format(x))\n",
      "        to log errors for such countries.  Default: do nothing\n",
      "    \n",
      "    log_merge_multiple_strings: function, optional\n",
      "        Function to call for logging multiple strings, see\n",
      "        log_missing_countries Default: do nothing\n",
      "    \n",
      "    coco: instance of CountryConverter, optional\n",
      "        CountryConverter instance used for the conversion.  Pass a custom one\n",
      "        if additional data is needed in addition to the custom country\n",
      "        converter file.  If None (default), the bare CountryConverter is used\n",
      "    \n",
      "    as_dataframe: boolean or st, optional\n",
      "        If False, output as OrderedDict.  If True or str, output as pandas\n",
      "        dataframe.  If str and 'full', output as a full matrix, otherwise only\n",
      "        two columns with the original and aggregated names are returned.\n",
      "    \n",
      "    original_countries_class: str, optional\n",
      "        Valid column name of CountryConverter.data.  This parameter is needed\n",
      "        if a list of countries is passed to 'original_countries' and strings\n",
      "        corresponding to data in CountryConverter.data are used subsequently.\n",
      "        Can be omitted otherwise.\n",
      "    \n",
      "    Returns\n",
      "    -------\n",
      "    \n",
      "    OrderedDict or DataFrame (defined by 'as_dataframe')\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(coco.agg_conc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}