{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Country Converter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The country converter (coco) is a Python package to convert country names into different classifications and between different naming versions. Internally it uses regular expressions to match country names.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The package is available as PyPI, use "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "pip install country_converter  -upgrade"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "from the command line or use your preferred python package installer.\n",
    "The source code is available on github: https://github.com/IndEcol/country_converter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conversion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The country converter provides one main class which is used for the conversion:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import country_converter as coco"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "converter = coco.CountryConverter()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given a list of countries is a certain classification:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "iso3_codes = [\"USA\", \"VUT\", \"TKL\", \"AUT\", \"AFG\", \"ALB\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can be converted to any classification provided by:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['United States of America',\n",
       " 'Republic of Vanuatu',\n",
       " 'Tokelau',\n",
       " 'Republic of Austria',\n",
       " 'Islamic Republic of Afghanistan',\n",
       " 'Republic of Albania']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "converter.convert(names=iso3_codes, src=\"ISO3\", to=\"name_official\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['America', 'Oceania', 'Oceania', 'Europe', 'Asia', 'Europe']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "converter.convert(names=iso3_codes, src=\"ISO3\", to=\"continent\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The parameter \"src\" specifies the input-, \"to\" the output format. Possible values for both parameter can be found by:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['APEC',\n",
       " 'BASIC',\n",
       " 'BRIC',\n",
       " 'CIS',\n",
       " 'Cecilia2050',\n",
       " 'DACcode',\n",
       " 'EEA',\n",
       " 'EU',\n",
       " 'EU12',\n",
       " 'EU15',\n",
       " 'EU25',\n",
       " 'EU27',\n",
       " 'EU27_2007',\n",
       " 'EU28',\n",
       " 'EURO',\n",
       " 'EXIO1',\n",
       " 'EXIO1_3L',\n",
       " 'EXIO2',\n",
       " 'EXIO2_3L',\n",
       " 'EXIO3',\n",
       " 'EXIO3_3L',\n",
       " 'Eora',\n",
       " 'FAOcode',\n",
       " 'G20',\n",
       " 'G7',\n",
       " 'GBDcode',\n",
       " 'GWcode',\n",
       " 'IEA',\n",
       " 'IMAGE',\n",
       " 'ISO2',\n",
       " 'ISO3',\n",
       " 'ISOnumeric',\n",
       " 'MESSAGE',\n",
       " 'OECD',\n",
       " 'REMIND',\n",
       " 'Schengen',\n",
       " 'UN',\n",
       " 'UNcode',\n",
       " 'UNmember',\n",
       " 'UNregion',\n",
       " 'WIOD',\n",
       " 'ccTLD',\n",
       " 'continent',\n",
       " 'name_official',\n",
       " 'name_short',\n",
       " 'obsolete',\n",
       " 'regex']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "converter.valid_class"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Internally, these names are the column header of the underlying pandas dataframe (see below)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The convert function can also be accessed without initiating the CountryConverter. This can be useful for one time usage. For multiple matches, initiating the CountryConverter avoids that the file providing the matching data gets read in for each conversion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['US', 'VU', 'TK', 'AT', 'AF', 'AL']"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "converter.convert(names=iso3_codes, src=\"ISO3\", to=\"ISO2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some of the classifications can be accessed by some shortcuts. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name_short</th>\n",
       "      <th>EU27</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Austria</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Belgium</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>Croatia</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>Cyprus</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>Czech Republic</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60</th>\n",
       "      <td>Denmark</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70</th>\n",
       "      <td>Estonia</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>76</th>\n",
       "      <td>Finland</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77</th>\n",
       "      <td>France</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>Germany</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>Greece</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>Hungary</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>107</th>\n",
       "      <td>Ireland</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>110</th>\n",
       "      <td>Italy</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122</th>\n",
       "      <td>Latvia</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>128</th>\n",
       "      <td>Lithuania</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129</th>\n",
       "      <td>Luxembourg</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>137</th>\n",
       "      <td>Malta</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>156</th>\n",
       "      <td>Netherlands</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>177</th>\n",
       "      <td>Poland</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>Portugal</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>182</th>\n",
       "      <td>Romania</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196</th>\n",
       "      <td>Slovakia</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>197</th>\n",
       "      <td>Slovenia</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>204</th>\n",
       "      <td>Spain</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>215</th>\n",
       "      <td>Sweden</td>\n",
       "      <td>EU27</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         name_short  EU27\n",
       "14          Austria  EU27\n",
       "21          Belgium  EU27\n",
       "35         Bulgaria  EU27\n",
       "55          Croatia  EU27\n",
       "58           Cyprus  EU27\n",
       "59   Czech Republic  EU27\n",
       "60          Denmark  EU27\n",
       "70          Estonia  EU27\n",
       "76          Finland  EU27\n",
       "77           France  EU27\n",
       "84          Germany  EU27\n",
       "87           Greece  EU27\n",
       "101         Hungary  EU27\n",
       "107         Ireland  EU27\n",
       "110           Italy  EU27\n",
       "122          Latvia  EU27\n",
       "128       Lithuania  EU27\n",
       "129      Luxembourg  EU27\n",
       "137           Malta  EU27\n",
       "156     Netherlands  EU27\n",
       "177          Poland  EU27\n",
       "178        Portugal  EU27\n",
       "182         Romania  EU27\n",
       "196        Slovakia  EU27\n",
       "197        Slovenia  EU27\n",
       "204           Spain  EU27\n",
       "215          Sweden  EU27"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "converter.EU27"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ISO2</th>\n",
       "      <th>OECD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>AU</td>\n",
       "      <td>1971.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>AT</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>BE</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>CA</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>CL</td>\n",
       "      <td>2010.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>CO</td>\n",
       "      <td>2020.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>CR</td>\n",
       "      <td>2021.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>CZ</td>\n",
       "      <td>1995.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60</th>\n",
       "      <td>DK</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70</th>\n",
       "      <td>EE</td>\n",
       "      <td>2010.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>76</th>\n",
       "      <td>FI</td>\n",
       "      <td>1969.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77</th>\n",
       "      <td>FR</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>DE</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>GR</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>HU</td>\n",
       "      <td>1996.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>IS</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>107</th>\n",
       "      <td>IE</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109</th>\n",
       "      <td>IL</td>\n",
       "      <td>2010.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>110</th>\n",
       "      <td>IT</td>\n",
       "      <td>1962.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>112</th>\n",
       "      <td>JP</td>\n",
       "      <td>1964.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122</th>\n",
       "      <td>LV</td>\n",
       "      <td>2016.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>128</th>\n",
       "      <td>LT</td>\n",
       "      <td>2018.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129</th>\n",
       "      <td>LU</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>MX</td>\n",
       "      <td>1994.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>156</th>\n",
       "      <td>NL</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>158</th>\n",
       "      <td>NZ</td>\n",
       "      <td>1973.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>166</th>\n",
       "      <td>NO</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>177</th>\n",
       "      <td>PL</td>\n",
       "      <td>1996.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>PT</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196</th>\n",
       "      <td>SK</td>\n",
       "      <td>2000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>197</th>\n",
       "      <td>SI</td>\n",
       "      <td>2010.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>202</th>\n",
       "      <td>KR</td>\n",
       "      <td>1996.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>204</th>\n",
       "      <td>ES</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>215</th>\n",
       "      <td>SE</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>216</th>\n",
       "      <td>CH</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>228</th>\n",
       "      <td>TR</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>235</th>\n",
       "      <td>GB</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>236</th>\n",
       "      <td>US</td>\n",
       "      <td>1961.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    ISO2    OECD\n",
       "13    AU  1971.0\n",
       "14    AT  1961.0\n",
       "21    BE  1961.0\n",
       "41    CA  1961.0\n",
       "45    CL  2010.0\n",
       "49    CO  2020.0\n",
       "53    CR  2021.0\n",
       "59    CZ  1995.0\n",
       "60    DK  1961.0\n",
       "70    EE  2010.0\n",
       "76    FI  1969.0\n",
       "77    FR  1961.0\n",
       "84    DE  1961.0\n",
       "87    GR  1961.0\n",
       "101   HU  1996.0\n",
       "102   IS  1961.0\n",
       "107   IE  1961.0\n",
       "109   IL  2010.0\n",
       "110   IT  1962.0\n",
       "112   JP  1964.0\n",
       "122   LV  2016.0\n",
       "128   LT  2018.0\n",
       "129   LU  1961.0\n",
       "143   MX  1994.0\n",
       "156   NL  1961.0\n",
       "158   NZ  1973.0\n",
       "166   NO  1961.0\n",
       "177   PL  1996.0\n",
       "178   PT  1961.0\n",
       "196   SK  2000.0\n",
       "197   SI  2010.0\n",
       "202   KR  1996.0\n",
       "204   ES  1961.0\n",
       "215   SE  1961.0\n",
       "216   CH  1961.0\n",
       "228   TR  1961.0\n",
       "235   GB  1961.0\n",
       "236   US  1961.0"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "converter.OECDas(\"ISO2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Handling missing data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The return value for non-found entries is be default set to 'not found':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ABC not found in ISO3\n",
      "XXX not found in ISO3\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['not found', 'AUT', 'not found']"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "iso3_codes_missing = [\"ABC\", \"AUT\", \"XXX\"]\n",
    "converter.convert(iso3_codes_missing, src=\"ISO3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "but can also be rest to something else:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ABC not found in ISO3\n",
      "XXX not found in ISO3\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['missing', 'AUT', 'missing']"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "converter.convert(iso3_codes_missing, src=\"ISO3\", not_found=\"missing\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternativly, the non-found entries can be passed through by passing None to not_found:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ABC not found in ISO3\n",
      "XXX not found in ISO3\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['ABC', 'AUT', 'XXX']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "converter.convert(iso3_codes_missing, src=\"ISO3\", not_found=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To extend the underlying dataset, an additional dataframe (or file) can be passed. Note, that all entries below (name_short, name_official, regex, ISO2 and ISO3) must be specified."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "add_data = pd.DataFrame.from_dict(\n",
    "    {\n",
    "        \"name_short\": [\"xxx country\", \"abc country\"],\n",
    "        \"name_official\": [\"The XXX country\", \"The ABC country\"],\n",
    "        \"regex\": [\"xxx country\", \"abc country\"],\n",
    "        \"ISO2\": [\"xx\", \"ab\"],\n",
    "        \"ISO3\": [\"xxx\", \"abc\"],\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name_short</th>\n",
       "      <th>name_official</th>\n",
       "      <th>regex</th>\n",
       "      <th>ISO2</th>\n",
       "      <th>ISO3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>xxx country</td>\n",
       "      <td>The XXX country</td>\n",
       "      <td>xxx country</td>\n",
       "      <td>xx</td>\n",
       "      <td>xxx</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>abc country</td>\n",
       "      <td>The ABC country</td>\n",
       "      <td>abc country</td>\n",
       "      <td>ab</td>\n",
       "      <td>abc</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    name_short    name_official        regex ISO2 ISO3\n",
       "0  xxx country  The XXX country  xxx country   xx  xxx\n",
       "1  abc country  The ABC country  abc country   ab  abc"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "add_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['abc country', 'Austria', 'xxx country']"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "extended_converter = coco.CountryConverter(additional_data=add_data)\n",
    "extended_converter.convert(iso3_codes_missing, src=\"ISO3\", to=\"name_short\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively to a ad hoc dataframe, additional datafiles can be passed. These must have the same format as basic data set. \n",
    "An example can be found here: \n",
    "https://github.com/IndEcol/country_converter/tree/master/tests/custom_data_example.txt\n",
    "\n",
    "The custom data example contains the ISO3 code mapping for Romania before 2002 and switches the regex matching for congo between DR Congo and Congo Republic.\n",
    "\n",
    "To use is pass the path to the additional country file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# extended_converter = coco.CountryConverter(additional_data=path/to/datafile)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The passed data (file or dataframe) must at least contain the headers 'name_official', 'name_short' and 'regex'. Of course, if the additional data shall be used to a conversion to any other field, these must also be included. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Additionally passed data always overwrites the existing one.\n",
    "This can be used to adjust coco for datasets with wrong country names. \n",
    "For example, assuming a dataset erroneous switched the ISO2 codes for India (IN) and Indonesia (ID) (therefore assuming 'ID' for India and 'IN' for Indonesia), one can accomedate for that by: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Duplicated values in column name_short of merged data - keep last one\n",
      "Duplicated values in column regex of merged data - keep last one\n"
     ]
    }
   ],
   "source": [
    "switched_converter = coco.CountryConverter(\n",
    "    additional_data=pd.DataFrame.from_dict(\n",
    "        {\n",
    "            \"name_short\": [\"India\", \"Indonesia\"],\n",
    "            \"name_official\": [\"India\", \"Indonesia\"],\n",
    "            \"regex\": [\"india\", \"indonesia\"],\n",
    "            \"ISO2\": [\"ID\", \"IN\"],\n",
    "            \"ISO3\": [\"IDN\", \"IND\"],\n",
    "        }\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'India'"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "converter.convert(\"IN\", src=\"ISO2\", to=\"name_short\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'India'"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "switched_converter.convert(\"ID\", src=\"ISO2\", to=\"name_short\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Regular expression matching"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The input parameter \"src\" can be set to \"regex\" to use regular expression matching for a given country list. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "some_names = [\n",
    "    \"United Rep. of Tanzania\",\n",
    "    \"Cape Verde\",\n",
    "    \"Burma\",\n",
    "    \"Iran (Islamic Republic of)\",\n",
    "    \"Korea, Republic of\",\n",
    "    \"Dem. People's Rep. of Korea\",\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Tanzania', 'Cabo Verde', 'Myanmar', 'Iran', 'South Korea', 'North Korea']"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "coco.convert(names=some_names, src=\"regex\", to=\"name_short\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The regular expressions can also be used to match any list of countries to any other. For example: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'norway': 'Norway is a Kingdom too',\n",
       " 'united_states': 'USA',\n",
       " 'china': 'Peoples Republic of China',\n",
       " 'taiwan': 'Republic of China'}"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "match_these = [\"norway\", \"united_states\", \"china\", \"taiwan\"]\n",
    "master_list = [\n",
    "    \"USA\",\n",
    "    \"The Swedish Kingdom\",\n",
    "    \"Norway is a Kingdom too\",\n",
    "    \"Peoples Republic of China\",\n",
    "    \"Republic of China\",\n",
    "]\n",
    "\n",
    "coco.match(match_these, master_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the regular expression matches several times, all results are given as list and a warning is generated:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Multiple matches for name taiwan in list_b\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'norway': 'Norway is a Kingdom too',\n",
       " 'united_states': 'USA',\n",
       " 'china': 'Peoples Republic of China',\n",
       " 'taiwan': ['Taiwan, province of china', 'Republic of China']}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "match_these = [\"norway\", \"united_states\", \"china\", \"taiwan\"]\n",
    "master_list = [\n",
    "    \"USA\",\n",
    "    \"The Swedish Kingdom\",\n",
    "    \"Norway is a Kingdom too\",\n",
    "    \"Peoples Republic of China\",\n",
    "    \"Taiwan, province of china\",\n",
    "    \"Republic of China\",\n",
    "]\n",
    "\n",
    "coco.match(match_these, master_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The parameter \"enforce_sublist\" can be set to ensure consistent output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Multiple matches for name taiwan in list_b\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'norway': ['Norway is a Kingdom too'],\n",
       " 'united_states': ['USA'],\n",
       " 'china': ['Peoples Republic of China'],\n",
       " 'taiwan': ['Taiwan, province of china', 'Republic of China']}"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "coco.match(match_these, master_list, enforce_sublist=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You get a warning if one of the names couldn't be found:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Could not identify some other country in list_a\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'norway': 'Norway is a Kingdom too',\n",
       " 'united_states': 'USA',\n",
       " 'china': 'Peoples Republic of China',\n",
       " 'taiwan': 'Republic of China',\n",
       " 'some other country': 'not_found'}"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "match_these = [\"norway\", \"united_states\", \"china\", \"taiwan\", \"some other country\"]\n",
    "master_list = [\n",
    "    \"USA\",\n",
    "    \"The Swedish Kingdom\",\n",
    "    \"Norway is a Kingdom too\",\n",
    "    \"Peoples Republic of China\",\n",
    "    \"Republic of China\",\n",
    "]\n",
    "coco.match(match_these, master_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the value for non found countries can be specified: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Could not identify some other country in list_a\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'norway': 'Norway is a Kingdom too',\n",
       " 'united_states': 'USA',\n",
       " 'china': 'Peoples Republic of China',\n",
       " 'taiwan': 'Republic of China',\n",
       " 'some other country': 'its not there'}"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "coco.match(match_these, master_list, not_found=\"its not there\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can also be used to pass the not found country to the new classification:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Could not identify some other country in list_a\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'norway': 'Norway is a Kingdom too',\n",
       " 'united_states': 'USA',\n",
       " 'china': 'Peoples Republic of China',\n",
       " 'taiwan': 'Republic of China',\n",
       " 'some other country': 'some other country'}"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "coco.match(match_these, master_list, not_found=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Internals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Within the new instance, the raw data for the conversion is saved within a pandas dataframe. \n",
    "This dataframe can be accessed directly with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>APEC</th>\n",
       "      <th>BASIC</th>\n",
       "      <th>BRIC</th>\n",
       "      <th>CIS</th>\n",
       "      <th>Cecilia2050</th>\n",
       "      <th>DACcode</th>\n",
       "      <th>EEA</th>\n",
       "      <th>EU</th>\n",
       "      <th>EU12</th>\n",
       "      <th>EU15</th>\n",
       "      <th>...</th>\n",
       "      <th>UNcode</th>\n",
       "      <th>UNmember</th>\n",
       "      <th>UNregion</th>\n",
       "      <th>WIOD</th>\n",
       "      <th>ccTLD</th>\n",
       "      <th>continent</th>\n",
       "      <th>name_official</th>\n",
       "      <th>name_short</th>\n",
       "      <th>obsolete</th>\n",
       "      <th>regex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>RoW</td>\n",
       "      <td>625.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1946.0</td>\n",
       "      <td>Southern Asia</td>\n",
       "      <td>RoW</td>\n",
       "      <td>af</td>\n",
       "      <td>Asia</td>\n",
       "      <td>Islamic Republic of Afghanistan</td>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>NaN</td>\n",
       "      <td>afghan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>RoW</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>248.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Northern Europe</td>\n",
       "      <td>RoW</td>\n",
       "      <td>ax</td>\n",
       "      <td>Europe</td>\n",
       "      <td>Åland Islands</td>\n",
       "      <td>Aland Islands</td>\n",
       "      <td>NaN</td>\n",
       "      <td>\\b(a|å)land</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>RoW</td>\n",
       "      <td>71.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>8.0</td>\n",
       "      <td>1955.0</td>\n",
       "      <td>Southern Europe</td>\n",
       "      <td>RoW</td>\n",
       "      <td>al</td>\n",
       "      <td>Europe</td>\n",
       "      <td>Republic of Albania</td>\n",
       "      <td>Albania</td>\n",
       "      <td>NaN</td>\n",
       "      <td>albania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>RoW</td>\n",
       "      <td>130.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>12.0</td>\n",
       "      <td>1962.0</td>\n",
       "      <td>Northern Africa</td>\n",
       "      <td>RoW</td>\n",
       "      <td>dz</td>\n",
       "      <td>Africa</td>\n",
       "      <td>People's Democratic Republic of Algeria</td>\n",
       "      <td>Algeria</td>\n",
       "      <td>NaN</td>\n",
       "      <td>algeria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>RoW</td>\n",
       "      <td>880.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>16.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Polynesia</td>\n",
       "      <td>RoW</td>\n",
       "      <td>as</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>American Samoa</td>\n",
       "      <td>American Samoa</td>\n",
       "      <td>NaN</td>\n",
       "      <td>^(?=.*americ).*samoa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 47 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "  APEC BASIC BRIC  CIS Cecilia2050  DACcode  EEA   EU EU12 EU15  ... UNcode  \\\n",
       "0  NaN   NaN  NaN  NaN         RoW    625.0  NaN  NaN  NaN  NaN  ...    4.0   \n",
       "1  NaN   NaN  NaN  NaN         RoW      NaN  NaN  NaN  NaN  NaN  ...  248.0   \n",
       "2  NaN   NaN  NaN  NaN         RoW     71.0  NaN  NaN  NaN  NaN  ...    8.0   \n",
       "3  NaN   NaN  NaN  NaN         RoW    130.0  NaN  NaN  NaN  NaN  ...   12.0   \n",
       "4  NaN   NaN  NaN  NaN         RoW    880.0  NaN  NaN  NaN  NaN  ...   16.0   \n",
       "\n",
       "  UNmember         UNregion WIOD  ccTLD continent  \\\n",
       "0   1946.0    Southern Asia  RoW     af      Asia   \n",
       "1      NaN  Northern Europe  RoW     ax    Europe   \n",
       "2   1955.0  Southern Europe  RoW     al    Europe   \n",
       "3   1962.0  Northern Africa  RoW     dz    Africa   \n",
       "4      NaN        Polynesia  RoW     as   Oceania   \n",
       "\n",
       "                             name_official      name_short obsolete  \\\n",
       "0          Islamic Republic of Afghanistan     Afghanistan      NaN   \n",
       "1                            Åland Islands   Aland Islands      NaN   \n",
       "2                      Republic of Albania         Albania      NaN   \n",
       "3  People's Democratic Republic of Algeria         Algeria      NaN   \n",
       "4                           American Samoa  American Samoa      NaN   \n",
       "\n",
       "                  regex  \n",
       "0                afghan  \n",
       "1           \\b(a|å)land  \n",
       "2               albania  \n",
       "3               algeria  \n",
       "4  ^(?=.*americ).*samoa  \n",
       "\n",
       "[5 rows x 47 columns]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "converter.data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This dataframe can be extended in both directions. The only requirement is to provide unique values for name_short, name_official and regex.\n",
    "\n",
    "Internally, the data is saved in country_data.txt as tab-separated values (utf-8 encoded)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course, all pandas indexing and matching methods can be used. For example, to get new OECD members since 1995 present in a list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "59     Czech Republic\n",
       "70            Estonia\n",
       "101           Hungary\n",
       "122            Latvia\n",
       "128         Lithuania\n",
       "Name: name_short, dtype: object"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "some_countries = [\n",
    "    \"Australia\",\n",
    "    \"Belgium\",\n",
    "    \"Brazil\",\n",
    "    \"Bulgaria\",\n",
    "    \"Cyprus\",\n",
    "    \"Czech Republic\",\n",
    "    \"Denmark\",\n",
    "    \"Estonia\",\n",
    "    \"Finland\",\n",
    "    \"France\",\n",
    "    \"Germany\",\n",
    "    \"Greece\",\n",
    "    \"Hungary\",\n",
    "    \"India\",\n",
    "    \"Indonesia\",\n",
    "    \"Ireland\",\n",
    "    \"Italy\",\n",
    "    \"Japan\",\n",
    "    \"Latvia\",\n",
    "    \"Lithuania\",\n",
    "    \"Luxembourg\",\n",
    "    \"Malta\",\n",
    "    \"Romania\",\n",
    "    \"Russia\",\n",
    "    \"Turkey\",\n",
    "    \"United Kingdom\",\n",
    "    \"United States\",\n",
    "]\n",
    "converter.data[\n",
    "    (converter.data.OECD >= 1995) & converter.data.name_short.isin(some_countries)\n",
    "].name_short"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Further information can be found here: http://pandas.pydata.org/pandas-docs/stable/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Testing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All regular expressions of the country converter are tested for a unique match to name_short and name_official. \n",
    "Test sets for alternative names found in various databases are also available. \n",
    "\n",
    "The test sets are stored in the ``tests/`` subdirectory. To tests require pytest.\n",
    "I recommend to rerun the test if a regular expression is changed. \n",
    "\n",
    "To specify a new test set just add a tab-separated file with headers \"name\\_short\" and \"name\\_test\" and provide name (corresponding to the short name in the main classification file) and the alternative name which should be tested (one pair per row in the file). If the file name starts with \"test\\_regex\\_ \" it will be automatically recognised by the test functions.\n",
    "\n",
    "Please see the file CONTRIBUTING.rst for further information."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Konstantin Stadler"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}