{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Predicting Sovereign Debt Crises\n", "\n", "> **Author**\n", "- Raphaël Grach (#49059165)\n", "\n", "**Objective**\n", "\n", "Use an artificial neural network (ANN) analysis to build a model trying to predict sovereign debt crises using joint data from the International Monetary fund (IMF) and the World Bank. The methodology will be following closely [Fioramanti (2008)](#nn-fioramanti2008) and build on the work of [Manasse, Roubini and Schimmelpfennig (2003)](#nn-MRS2003) (MRS 2003).\n", "\n", "### Outline\n", "\n", "- [Background: probit and logit](#Background:-Probit-and-logit)\n", "- [Early Warning Systems (EWS)](#Early-Warning-Systems-%28EWS%29)\n", "- [Data](#Data)\n", " - [World Bank data](#World-Bank-data)\n", " - [Exploring the data](#Exploring-the-data)\n", "- [The Model](#The-Model)\n", " - [In-sample estimation](#In-sample-estimation-model-%28IS%29-%28Train:-1980-2004-/-Test:-2002-2004%29)\n", " - [Out-of-sample estimation 1](#Out-of-sample-estimation-model-1-%28OS1%29-%28Train:-1981-2001-/-Test:-2002-2004%29)\n", " - [Out-of-sample estimation 2](#Out-of-sample-estimation-model-2-%28OS2%29-%28Train:-1991-2001-/-Test:-2002-2004%29)\n", "- [Performance](#Performance)\n", " - [Mean squared errors (MSE)](#Mean-squared-errors-%28MSE%29)\n", " - [Hit rate](#Hit-rate)\n", "- [Conclusion](#Conclusion)\n", "- [References](#References)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Background: probit and logit\n", "\n", "The previous episodes of sovereign debt crises has led international organisations to develop mechanisms aiming at predicting their occurence. The traditional parametric identification strategy used in the literature is the random effect probit (REP) estimator. Probit and logit estimations method (though more information can be retrieved [here](http://www.columbia.edu/~so33/SusDev/Lecture_9.pdf)) are basiacally mostly trying to estimate effect of independent variables on the *probability* of a certain event happening. They usually do that by transforming a discontinuous dependent variable $Y$ (that is in general a dummy) into a smoother function using transformations.\n", "\n", "The process is usually conducted by first tranform the original $Y$ (that takes values $\\{0,1\\}$) into a **probability** (that is distributed over the interval $[0,1]$) and then into a desired function.\n", "\n", "- The **probit** method usually uses the odd-ratio: $\\frac{p}{(1-p)}$ where p is the probability of event $Y$ happening. It is defined on $[0,+\\infty)$\n", "- The **logit** method uses the log-transform of the odd-ratio: $\\ln(\\frac{p}{(1-p)})$ that is defined on $(-\\infty,+\\infty)$\n", "\n", "The \"random effects\" hypothesis is an alternative to the \"fixed-effects\" hypothesis that is usually used in econometrics. The random effects assumption stipulates that the unobserved heterogeneity is continuous and **uncorrelated with the independent variables**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Early Warning Systems (EWS)\n", "\n", "In the past two decades, researchers have tried to develop models able to recognize signals annoucing a potential crisis, the so-called Early Warning Systems (EWS). These models sought to predict the likelihood of a financial crisis using both parametric and non-paramteric estimation methods. A more extensive discussion of methodologies is proposed in [Ciarlone and Trebeschi (2005)](#nn-ciarlone2005). \n", "\n", "A number of emerging countries have known what is referred as a \"sovereign debt crisis\" indicating that the level of sovereign debt reaches levels that the country alone cannot sustain. Facing such crises required the help of international organizations such as the IMF to adjust their situation. The objective of this project will be to identify early signs of a sovereign debt crisis and use them as a predictor.\n", "\n", "This study will be using an artificial neural network (ANN) and compare its performance with the REP estimators.\n", "\n", "**Disclaimer**: Since some of the referenced papers had access to greater resources that this current research, this study has room for improvement. Fioramanti's ANN *beats* the REP in terms of predictive power, it is not given that this one will." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "\n", "The data is composed of different macro-economic indicators for a variety of developing countries as outlined in [Fioramanti (2008)](#nn-fioramanti2008). The definition of a sovereign debt crisis will be the one in [MRS (2003)](#nn-MRS2003):\n", "\n", "> \"A country is defined to be in a debt crisis if it is classified as being in default by Standard & Poor’s or if it receives a large nonconcessional IMF loan defined as access in excess of 100% of quota.\" (p.8)\n", "\n", "The panel data is collected from the World Bank and the IMF databases. Let's first have a look at the episodes of sovereign debt crisis for each of these countries." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# copy and paste this to import all the packages\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import matplotlib.colors as mplc\n", "import matplotlib.patches as patches\n", "from matplotlib import cm\n", "%matplotlib inline\n", "import math\n", "#!pip install qeds\n", "import qeds # QuantEcon collab\n", "qeds.themes.mpl_style();\n", "import seaborn as sns\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn import preprocessing, pipeline, neural_network, metrics # Neural network package\n", "import statsmodels.formula.api as smf # for doing statistical regression\n", "import statsmodels.api as sm " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "crisis_episodes = pd.read_excel(\"Crisis_episodes.xlsx\",index_col=\"Countries\")\n", "countries_0 = ['China', 'Colombia', 'Cyprus', 'Egypt', 'Israel', 'Morocco', 'Oman'] # Countries with 0 years in crisis" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "colors = cm.coolwarm(crisis_episodes.drop(countries_0\n", " ).sort_values(by=\"Number of years in crisis\")[\"Number of years in crisis\"]/22,\n", " bytes = False)\n", "\n", "fig = crisis_episodes.drop(countries_0\n", " ).sort_values(by=\"Number of years in crisis\"\n", " ).plot(y = \"Number of years in crisis\", \n", " kind = \"barh\", \n", " figsize = (10,15),\n", " color=colors,\n", " legend = False)\n", "fig.set_title('Number of years in debt crisis by country (1980-2004)')\n", "fig.set_xlabel('Number of years in debt crisis')\n", "fig.set_facecolor('w')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With some countries like Argentina spending **22 years** in a debt crisis, a model that could help better anticipate those episodes can definitely benefit both the country and international organisations.\n", "\n", "### World Bank data\n", "\n", "[Table of indicators of the World Bank.](http://wdi.worldbank.org/tables) A handy feature of the World Bank data is that it uses different sources, notably the IMF to compute its own indicators. That is, data originally coming from the IMF can be accessed via the World Bank API." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: world_bank_data in /anaconda3/lib/python3.7/site-packages (0.1.3)\n", "Requirement already satisfied: cachetools in /anaconda3/lib/python3.7/site-packages (from world_bank_data) (4.1.0)\n", "Requirement already satisfied: pandas in /anaconda3/lib/python3.7/site-packages (from world_bank_data) (0.23.4)\n", "Requirement already satisfied: requests in /anaconda3/lib/python3.7/site-packages (from world_bank_data) (2.21.0)\n", "Requirement already satisfied: python-dateutil>=2.5.0 in /anaconda3/lib/python3.7/site-packages (from pandas->world_bank_data) (2.7.5)\n", "Requirement already satisfied: pytz>=2011k in /anaconda3/lib/python3.7/site-packages (from pandas->world_bank_data) (2018.7)\n", "Requirement already satisfied: numpy>=1.9.0 in /anaconda3/lib/python3.7/site-packages (from pandas->world_bank_data) (1.15.4)\n", "Requirement already satisfied: idna<2.9,>=2.5 in /anaconda3/lib/python3.7/site-packages (from requests->world_bank_data) (2.8)\n", "Requirement already satisfied: certifi>=2017.4.17 in /anaconda3/lib/python3.7/site-packages (from requests->world_bank_data) (2018.11.29)\n", "Requirement already satisfied: urllib3<1.25,>=1.21.1 in /anaconda3/lib/python3.7/site-packages (from requests->world_bank_data) (1.24.1)\n", "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /anaconda3/lib/python3.7/site-packages (from requests->world_bank_data) (3.0.4)\n", "Requirement already satisfied: six>=1.5 in /anaconda3/lib/python3.7/site-packages (from python-dateutil>=2.5.0->pandas->world_bank_data) (1.12.0)\n" ] } ], "source": [ "!pip install world_bank_data\n", "import world_bank_data as wb\n", "\n", "countries = [\"Algeria\", \"Argentina\", \"Bolivia\",\n", "\"Brazil\", \"Chile\", \"China\", \"Colombia\", \"Costa Rica\", \"Cyprus\", \"Czech Republic\", \"Dominican Republic\", \"Ecuador\",\n", "\"Egypt, Arab Rep.\", \"El Salvador\", \"Estonia\", \"Guatemala\", \"Hungary\", \"India\", \"Indonesia\", \"Israel\", \"Jamaica\",\n", "\"Jordan\", \"Kazakhstan\", \"Latvia\", \"Lithuania\", \"Malaysia\", \"Mexico\", \"Morocco\", \"Oman\", \"Pakistan\", \"Panama\",\n", "\"Paraguay\", \"Peru\", \"Philippines\", \"Poland\", \"Romania\", \"Russian Federation\", \"Slovak Republic\", \"South Africa\",\n", "\"Thailand\", \"Trinidad and Tobago\", \"Tunisia\", \"Turkey\", \"Ukraine\", \"Uruguay\", \"Venezuela, RB\"]\n", "\n", "# 3-letter ISO code of each country\n", "iso = [\"DZA\",\n", "\"ARG\", \"BOL\", \"BRA\", \"CHL\", \"CHN\", \"COL\", \"CRI\", \"CYP\", \"CZE\", \"DOM\", \"ECU\", \"EGY\", \"SLV\", \"EST\", \"GTM\",\n", "\"HUN\", \"IND\", \"IDN\", \"ISR\", \"JAM\", \"JOR\", \"KAZ\", \"LVA\", \"LTU\", \"MYS\", \"MEX\", \"MAR\", \"OMN\", \"PAK\", \"PAN\",\n", "\"PRY\", \"PER\", \"PHL\", \"POL\", \"ROU\", \"RUS\", \"SVK\", \"ZAF\", \"THA\", \"TTO\", \"TUN\", \"TUR\", \"UKR\", \"URY\", \"VEN\"]\n", "\n", "dates = list(range(1980,2005)) # Dates range from 1980 to 2004\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "#wb.get_topics() # Displays the complete list of World Bank data topics (For consultative use)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "#wb.get_sources() # Displays the full list of sources (For consultative use)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Here is an extensive list of World Bank indicators](https://data.worldbank.org/indicator?tab=all)\n", "\n", "The indicator can be found in the link of the resource when accessed. For example:\n", "\n", "Exports of goods and services: `https://data.worldbank.org/indicator/NE.EXP.GNFS.CD?view=chart`\n", "\n", "The indicator code is represented by the series of **UPPERCASE CHARACTERS**." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "#wb.get_indicators() # Displays the full list of indicators (For consultative use)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# List of indicators we'll use\n", "ind_use = [\n", " #GNI Growth rate\n", " \"NY.GNP.MKTP.KD.ZG\", \n", " #GDP Growth Rate\n", " \"NY.GDP.MKTP.KD.ZG\",\n", " # Broad money growth (% annual)\n", " \"FM.LBL.BMNY.ZG\",\n", " # Inflation (consumer prices)\n", " \"FP.CPI.TOTL.ZG\",\n", " # Real interest rate\n", " \"FR.INR.RINR\",\n", " # Deposit interest rate\n", " \"FR.INR.DPST\",\n", " # Taxes on international trade (% of revenue) (proxy for openness, higher = less open)\n", " \"GC.TAX.INTT.RV.ZS\",\n", " # Net trade in goods and services (BoP, in USD) (Proxy for trade balance)\n", " \"BN.GSR.GNFS.CD\",\n", " # Present value of external debt (% of GNI and % of exports)\n", " \"DT.DOD.PVLX.GN.ZS\", \"DT.DOD.PVLX.EX.ZS\",\n", " # Total external debt (Current USD and % of GNI)\n", " \"DT.DOD.DECT.CD\", \"DT.DOD.DECT.GN.ZS\",\n", " # Short-term debt as % of total reserves\n", " \"DT.DOD.DSTC.IR.ZS\",\n", " # Short-term external debt stocks (Current USD)\n", " \"DT.DOD.DSTC.CD\",\n", " # Exports of goods and services (% of GDP and current USD)\n", " \"NE.EXP.GNFS.ZS\", \"NE.EXP.GNFS.CD\",\n", " # Total reserves (Current USD including gold) (and as % total external debt)\n", " \"FI.RES.TOTL.CD\", \"FI.RES.TOTL.DT.ZS\",\n", " # Total debt service (% of exports)\n", " \"DT.TDS.DECT.EX.ZS\",\n", " # Public debt service (% of GNI and % of exports)\n", " \"DT.TDS.DPPG.GN.ZS\", \"DT.TDS.DPPG.XP.ZS\",\n", " # Current account (% of GDP and in Current USD)\n", " \"BN.CAB.XOKA.GD.ZS\", \"BN.CAB.XOKA.CD\",\n", " # Portfolio equity, net inflows (Current USD)\n", " \"BX.PEF.TOTL.CD.WD\",\n", " # Foreign Direct Investment (net inflow in current USD and % of GDP)\n", " \"BX.KLT.DINV.CD.WD\", \"BX.KLT.DINV.WD.GD.ZS\"]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "master_countries = pd.read_excel(\"master_countries.xls\")\n", "master_countries[\"year\"] = master_countries[\"year\"].astype(str) # Set the correct type for \"year\"" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(master_countries.loc[0][0]) # should be \"str\"" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "# Query data through pandas_datareader (This might take some time)\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "import pandas_datareader\n", "from pandas_datareader import wb\n", "wb_data = pandas_datareader.wb.WorldBankReader(symbols = ind_use, countries = iso, start = 1980, end = 2004).read()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "# Create the lacking variables\n", "\n", "# Total external debt as % of reserves\n", "wb_data[\"ted_res\"] = (wb_data[\"DT.DOD.DECT.CD\"] / wb_data[\"FI.RES.TOTL.CD\"])*100\n", "# Short-term external debt as % of reserves\n", "wb_data[\"sted_res\"] = (wb_data[\"DT.DOD.DSTC.CD\"] / wb_data[\"FI.RES.TOTL.CD\"])*100\n", "# Short-term external debt as % of exports\n", "wb_data[\"sted_xgs\"] = (wb_data[\"DT.DOD.DSTC.CD\"] / wb_data[\"NE.EXP.GNFS.CD\"])*100\n", "# Short-term external debt as % of total external debt\n", "wb_data[\"sted_ted\"] = (wb_data[\"DT.DOD.DSTC.CD\"] / wb_data[\"DT.DOD.DECT.CD\"])*100\n", "# Current account as % of reserves\n", "wb_data[\"cac_res\"] = (wb_data[\"BN.CAB.XOKA.CD\"] / wb_data[\"FI.RES.TOTL.CD\"])*100\n", "# Current account as % of portfolio equity, net inflows\n", "wb_data[\"cac_pef\"] = (wb_data[\"BN.CAB.XOKA.CD\"] / wb_data[\"BX.PEF.TOTL.CD.WD\"])*100 # This produced 'inf' values\n", "# Current account as % of FDI\n", "wb_data[\"cac_fdi\"] = (wb_data[\"BN.CAB.XOKA.CD\"] / wb_data[\"BX.KLT.DINV.CD.WD\"])*100\n", "# Total external debt as % of exports \n", "wb_data[\"ted_xgs\"] = (wb_data[\"DT.DOD.DECT.CD\"] / wb_data[\"NE.EXP.GNFS.CD\"])*100" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NY.GNP.MKTP.KD.ZGNY.GDP.MKTP.KD.ZGFM.LBL.BMNY.ZGFP.CPI.TOTL.ZGFR.INR.RINRFR.INR.DPSTGC.TAX.INTT.RV.ZSBN.GSR.GNFS.CDDT.DOD.PVLX.GN.ZSDT.DOD.PVLX.EX.ZS...BX.KLT.DINV.WD.GD.ZSted_ressted_ressted_xgssted_tedcac_rescac_pefcac_fdited_xgscrisis
countryyear
Argentina20041.0693679.02957321.427310NaN-9.7891582.61496815.7995451.193361e+10NaNNaN...2.505018854.542849135.04209767.61078615.80284716.337013-3728.13697077.867050427.8392811.0
200310.2012178.83704129.634853NaN7.82885210.16070116.4708691.561150e+10NaNNaN...1.2948111154.477701158.12546467.66391413.69671057.49640912455.899005492.728858494.0158151.0
2002-13.204427-10.89448519.708854NaN16.17980439.24844613.6385321.571728e+10NaNNaN...2.1989581413.593093141.38541353.48439510.00184783.551802-7565.248533407.956127534.7452011.0
2001-4.613779-4.408840-19.436218NaN29.12028516.1635974.2844763.521811e+09NaNNaN...0.8061641048.740175137.43155064.29561813.104442-25.972378-12145.175563-174.523751490.6398711.0
2000-0.831529-0.7889991.531034NaN9.9450728.3378784.893369-1.831883e+09NaNNaN...3.665791596.621456112.61604490.71739718.875627-35.705216278.280224-86.200297480.6059961.0
\n", "

5 rows × 35 columns

\n", "
" ], "text/plain": [ " NY.GNP.MKTP.KD.ZG NY.GDP.MKTP.KD.ZG FM.LBL.BMNY.ZG \\\n", "country year \n", "Argentina 2004 1.069367 9.029573 21.427310 \n", " 2003 10.201217 8.837041 29.634853 \n", " 2002 -13.204427 -10.894485 19.708854 \n", " 2001 -4.613779 -4.408840 -19.436218 \n", " 2000 -0.831529 -0.788999 1.531034 \n", "\n", " FP.CPI.TOTL.ZG FR.INR.RINR FR.INR.DPST GC.TAX.INTT.RV.ZS \\\n", "country year \n", "Argentina 2004 NaN -9.789158 2.614968 15.799545 \n", " 2003 NaN 7.828852 10.160701 16.470869 \n", " 2002 NaN 16.179804 39.248446 13.638532 \n", " 2001 NaN 29.120285 16.163597 4.284476 \n", " 2000 NaN 9.945072 8.337878 4.893369 \n", "\n", " BN.GSR.GNFS.CD DT.DOD.PVLX.GN.ZS DT.DOD.PVLX.EX.ZS ... \\\n", "country year ... \n", "Argentina 2004 1.193361e+10 NaN NaN ... \n", " 2003 1.561150e+10 NaN NaN ... \n", " 2002 1.571728e+10 NaN NaN ... \n", " 2001 3.521811e+09 NaN NaN ... \n", " 2000 -1.831883e+09 NaN NaN ... \n", "\n", " BX.KLT.DINV.WD.GD.ZS ted_res sted_res sted_xgs \\\n", "country year \n", "Argentina 2004 2.505018 854.542849 135.042097 67.610786 \n", " 2003 1.294811 1154.477701 158.125464 67.663914 \n", " 2002 2.198958 1413.593093 141.385413 53.484395 \n", " 2001 0.806164 1048.740175 137.431550 64.295618 \n", " 2000 3.665791 596.621456 112.616044 90.717397 \n", "\n", " sted_ted cac_res cac_pef cac_fdi ted_xgs \\\n", "country year \n", "Argentina 2004 15.802847 16.337013 -3728.136970 77.867050 427.839281 \n", " 2003 13.696710 57.496409 12455.899005 492.728858 494.015815 \n", " 2002 10.001847 83.551802 -7565.248533 407.956127 534.745201 \n", " 2001 13.104442 -25.972378 -12145.175563 -174.523751 490.639871 \n", " 2000 18.875627 -35.705216 278.280224 -86.200297 480.605996 \n", "\n", " crisis \n", "country year \n", "Argentina 2004 1.0 \n", " 2003 1.0 \n", " 2002 1.0 \n", " 2001 1.0 \n", " 2000 1.0 \n", "\n", "[5 rows x 35 columns]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Merge the two data sets\n", "master = pd.merge(wb_data.reset_index(), master_countries.drop(\"iso\",axis=1), on = [\"country\",\"year\"], how=\"left\")\n", "master = master.set_index([\"country\",\"year\"])\n", "master.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before digging into the neural network model, we could have a look at how the data behaves.\n", "\n", "Variables that are known to have a significant effect on the likelihood of debt crisis is the total external debt as % of reserves and GDP growth. Let's see how GDP growth looks on total external debt." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "def plot_country(x = \"NY.GDP.MKTP.KD.ZG\", y = \"DT.DOD.DECT.GN.ZS\", country = \"Argentina\"):\n", " \n", " \"\"\"\n", " When called, returns the scatter plot of the desired indicators for the desired coutry.\n", " By default, it plots GDP growth against total external debt as % of GNI for Argentina.\n", " --\n", " Arguments:\n", " x: str. Indicator on the x axis, needs to be a World Bank indicator. Default: GDP growth\n", " y: str. Indicator in the y axis, needs to be a World Bank indicator Default: Total external debt as % of GNI\n", " country: str. Needs to be a country in the list 'countries'. Default: 'Argentina'\n", " \"\"\"\n", " fig,ax = plt.subplots()\n", "\n", " df = wb_data.reset_index().groupby(\"country\").get_group(country)\n", " df = df.replace([np.inf,-np.inf], np.nan)\n", " df = df.fillna(value=0) # Replace NaN values by 0 (avoids errors)\n", " df.plot(x, y, ax=ax, kind = \"scatter\")\n", " \n", " # Add linear regression line\n", " lr = LinearRegression()\n", " X = df[x].values.reshape(-1,1) # Note the uppercase X here (for linear fitting)\n", " y = df[y].values.reshape(-1,1)\n", " lr.fit(X,y)\n", " \n", " # If the graph is a flat line, it is likely that the data is not available\n", " \n", " _x = np.linspace(-10, 10,100).reshape(-1,1) # Note the lowercase _x here (for plotting)\n", " y_pred = lr.predict(_x)\n", " ax.plot(_x, y_pred, color=\"red\",linestyle=\"--\")\n", " \n", " \n", " ax.set_title(f\"Plot for {country}\")\n", " \n", " \n", " return fig,ax" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(
,\n", " )" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_country(country=\"Bolivia\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could also have a look at which countries experience the highest debts by comparing them all." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "scrolled": false }, "outputs": [], "source": [ "wb1 = wb_data.reset_index()\n", "reduced = wb1[wb1[\"country\"].isin([\"Argentina\",\"Bolivia\",\"Costa Rica\",\"Indonesia\",\"Jamaica\",\"Jordan\"])]\n", "colors = qeds.themes.COLOR_CYCLE\n", "cmap = dict(zip(reduced[\"country\"].unique(), colors))\n", "nom_color = (0.8,0.8,0.8)\n", "wb1 = wb1.replace([np.inf,-np.inf], np.nan)\n", "wb1 = wb1.fillna(value=0) \n", "wb1 = wb1.sort_values(by=\"year\")\n", "wb1.index = wb1[\"year\"]\n", "gwb1 = wb1.groupby([\"country\"])\n" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig1, ax1 = plt.subplots(figsize = (16,8))\n", "for count,df in gwb1:\n", " if df[\"DT.DOD.DECT.GN.ZS\"].max() >= 155:\n", " df.plot(x=\"year\",y=\"DT.DOD.DECT.GN.ZS\", ax=ax1,legend=True, alpha = 1.0, label = count, color = cmap[count])\n", " else:\n", " df.plot(x=\"year\",y=\"DT.DOD.DECT.GN.ZS\", ax=ax1, legend=False, alpha = 0.4, color = nom_color)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can note that four countries had an external debt superior to 155% of their GNI: Argentina, Bolivia, Indonesia, Jamaica and Jordan." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Model\n", "### Network construction\n", "We will be using a multi-layer perceptron (MLP) ANN to conduct the prediction. Following [Fioramanti (2008)](#nn-fioramanti2008), the ANN in this study will be a two-layer neural network of the form:\n", "\n", "$$\n", "Y = F \\big( F (X \\omega_1 + b_1)\\omega_2 + b_2 \\big) \\omega_3 + b_3\n", "$$\n", "\n", "Where the activation function $F(\\cdot)$ is the **sigmoid** function (also called logistic function): $F(x) = \\frac{1}{1+e^{-x}}$ which is smooth and differentiable. \n", "\n", "The $\\omega_i$ are a series of *weight matrices*, $\\omega_1$ multiplies the feature matrix $X$. Each layer is adjusted of a *bias matrix* $b_i$.\n", "\n", "This neural network has three units in the hidden layer and one unit in the output layer. The author tried additional models with more hidden units but there was not significant performance improvement. This rather simple architecture has, however, a pretty strong approximation power according to the [Universal approximation theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem) stipulating that neural nets with a single layer and a finite number of neurons can approximate a wide range of continuous functions on $\\mathbb{R}^n$.\n", "\n", "The network uses three different training sets (identical to the REP referenced in Fioramanti's paper): \n", "\n", "- It first uses the whole sample to train and uses 2002-2004 to test its prediction (in-sample forecast).\n", "- In a second time, the training set uses the years 1981-2001 to train and the rest to assess the performance (out-of-sample forecast).\n", "- The third try uses the same approach, but the training years are 1990-2001.\n", "\n", "Number of observations: 1150\n", "\n", "Number of independent variables: 35\n", "\n", "Count of $crisis = 1$: 421\n", "\n", "Measure of performance: mean-squared errors (MSE)\n", "\n", "The reference paper uses the [Levenberg–Marquardt](http://people.duke.edu/~hpgavin/ce281/lm.pdf) (LM) algorithm to minimise the MSE of its ANN model. Since the `sklearn` package does not allow for such an algorithm, the solver will be the [LBFGS](http://aria42.com/blog/2014/12/understanding-lbfgs) (for Limited-Memory Broyden–Fletcher–Goldfarb–Shanno) algorithm which is deemed appropriate for smaller datasets. The LBFGS algorithm is designed to optimise smooth function without constraints. We thus set the regularization parameter $\\alpha = 0$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### In-sample estimation model (IS) (Train: 1980-2004 / Test: 2002-2004)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "def nn_in_sample():\n", " \n", " \"\"\"\n", " When called, this function returns the artificial neural network model\n", " using the in-sample training method.\n", " It takes the whole sample as training (1980-2004) and the years 2002-2004 as testing.\n", " \"\"\"\n", " # Flag missing values with an indicator (=1 if value is missing)\n", " indic = master.drop(\"crisis\",axis=1).isnull().replace(to_replace = [True, False], value = [1, 0])\n", " for col in list(indic.columns):\n", " indic = indic.rename(columns={f\"{col}\":f\"missing_{col}\" })\n", " \n", " _master = pd.merge(master, indic, left_index=True, right_index=True, how=\"left\")\n", " \n", " # Some variables spat out \"inf\" values, let's replace them by NaN\n", " df = _master.replace([np.inf, -np.inf], np.nan).copy()\n", " # Replace NaN values by 0 (drops them from the model)\n", " df = df.fillna(value=0)\n", " \n", " \n", " \n", " # Training data\n", " train_data = df # Takes the whole dataset as training\n", " X_train = train_data.drop(\"crisis\", axis=1)\n", " y_train = train_data[\"crisis\"]\n", " \n", " # Testing data\n", " testing_dates = [\"2002\",\"2003\",\"2004\"] # Tests for dates 2002-2004\n", " test_data = df.loc[pd.IndexSlice[:, testing_dates], :]\n", " X_test = test_data.drop(\"crisis\", axis=1)\n", " y_test = test_data[\"crisis\"]\n", " \n", " #Parameters of the model\n", " nn_scaled_model = pipeline.make_pipeline(\n", " preprocessing.StandardScaler(), # this will do the input scaling\n", " neural_network.MLPRegressor((100, 100),\n", " activation = \"logistic\",\n", " solver = \"lbfgs\",\n", " alpha = 0, # LBFGS is designed to work better without penalty\n", " max_iter = 10_000, # Keep = 100 to save computing time, can get up to 10000\n", " learning_rate = \"adaptive\")\n", " )\n", " \n", " return (nn_scaled_model, X_train, y_train, X_test, y_test)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "# Now on to compute the MSE\n", "def mse_in_sample():\n", " \"\"\"\n", " When called, this function returns the train and test MSE of the in-sample training method.\n", " \"\"\"\n", " nn_scaled_model, X_train, y_train, X_test, y_test = nn_in_sample() # Import the values from the in-sample model\n", " nn_scaled_model.fit(X_train,y_train)\n", " \n", " #MSE on test and training\n", " mse_train = metrics.mean_squared_error(y_train, nn_scaled_model.predict(X_train))\n", " mse_test = metrics.mean_squared_error(y_test, nn_scaled_model.predict(X_test))\n", " \n", " return [mse_train , mse_test] # Return as a list to be displayed later" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Out-of-sample estimation model 1 (OS1) (Train: 1981-2001 / Test: 2002-2004)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "def nn_out_sample_1():\n", " \n", " \"\"\"\n", " When called, this function returns the artificial neural network model\n", " using the first out-of-sample training model.\n", " It takes the years 1981-2001 as training and 2002-2004 as testing.\n", " \"\"\"\n", " # Flag missing values with an indicator (=1 if value is missing)\n", " indic = master.drop(\"crisis\",axis=1).isnull().replace(to_replace = [True, False], value = [1, 0])\n", " for col in list(indic.columns):\n", " indic = indic.rename(columns={f\"{col}\":f\"missing_{col}\" })\n", " \n", " _master = pd.merge(master, indic, left_index=True, right_index=True, how=\"left\")\n", " \n", " # Some variables spat out \"inf\" values, let's replace them by NaN\n", " df = _master.replace([np.inf, -np.inf], np.nan).copy()\n", " # Replace NaN values by 0 (drops them from the model)\n", " df = df.fillna(value=0)\n", " \n", " \n", " # Training data\n", " training_dates = [\"1981\",\"1982\",\"1983\",\"1984\",\"1985\",\"1986\",\"1987\",\"1988\",\"1989\",\"1990\"\n", " \"1991\",\"1992\",\"1993\",\"1994\",\"1995\",\"1996\",\"1997\",\"1998\",\"1999\",\"2000\",\"2001\"]\n", " train_data = df.loc[pd.IndexSlice[:, training_dates], :] \n", " X_train = train_data.drop(\"crisis\", axis=1)\n", " y_train = train_data[\"crisis\"]\n", " \n", " # Testing data\n", " testing_dates = [\"2002\",\"2003\",\"2004\"] # Tests for dates 2002-2004\n", " test_data = df.loc[pd.IndexSlice[:, testing_dates], :]\n", " X_test = test_data.drop(\"crisis\", axis=1)\n", " y_test = test_data[\"crisis\"]\n", " \n", " #Parameters of the model\n", " nn_scaled_model = pipeline.make_pipeline(\n", " preprocessing.StandardScaler(), # this will do the input scaling\n", " neural_network.MLPRegressor((100, 100),\n", " activation = \"logistic\",\n", " solver = \"lbfgs\",\n", " alpha = 0, # LBFGS is designed to work better without penalty\n", " max_iter = 10_000, # Keep = 100 to save computing time, can get up to 10000\n", " learning_rate = \"adaptive\")\n", " )\n", " \n", " return (nn_scaled_model, X_train, y_train, X_test, y_test)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "# Now on to compute the MSE\n", "def mse_out_sample_1():\n", " \"\"\"\n", " When called, this function returns the train and test MSE of the first out-of-sample training method.\n", " \"\"\"\n", " nn_scaled_model, X_train, y_train, X_test, y_test = nn_out_sample_1() # Import the values\n", " nn_scaled_model.fit(X_train,y_train)\n", " \n", " #MSE on test and training\n", " mse_train = metrics.mean_squared_error(y_train, nn_scaled_model.predict(X_train))\n", " mse_test = metrics.mean_squared_error(y_test, nn_scaled_model.predict(X_test))\n", " \n", " return [mse_train , mse_test] # Return as a list to be displayed later" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Out-of-sample estimation model 2 (OS2) (Train: 1991-2001 / Test: 2002-2004)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "def nn_out_sample_2():\n", " \n", " \"\"\"\n", " When called, this function returns the artificial neural network model\n", " using the second out-of-sample training model.\n", " It takes the years 1991-2001 as training and 2002-2004 as testing.\n", " \"\"\"\n", " # Flag missing values with an indicator (=1 if value is missing)\n", " indic = master.drop(\"crisis\",axis=1).isnull().replace(to_replace = [True, False], value = [1, 0])\n", " for col in list(indic.columns):\n", " indic = indic.rename(columns={f\"{col}\":f\"missing_{col}\" })\n", " \n", " _master = pd.merge(master, indic, left_index=True, right_index=True, how=\"left\")\n", " \n", " # Some variables spat out \"inf\" values, let's replace them by NaN\n", " df = _master.replace([np.inf, -np.inf], np.nan).copy()\n", " # Replace NaN values by 0 (drops them from the model)\n", " df = df.fillna(value=0)\n", " \n", " \n", " # Training data\n", " training_dates = [\"1991\",\"1992\",\"1993\",\"1994\",\"1995\",\"1996\",\"1997\",\"1998\",\"1999\",\"2000\",\"2001\"]\n", " train_data = df.loc[pd.IndexSlice[:, training_dates], :] \n", " X_train = train_data.drop(\"crisis\", axis=1)\n", " y_train = train_data[\"crisis\"]\n", " \n", " # Testing data\n", " testing_dates = [\"2002\",\"2003\",\"2004\"] # Tests for dates 2002-2004\n", " test_data = df.loc[pd.IndexSlice[:, testing_dates], :]\n", " X_test = test_data.drop(\"crisis\", axis=1)\n", " y_test = test_data[\"crisis\"]\n", " \n", " #Parameters of the model\n", " nn_scaled_model = pipeline.make_pipeline(\n", " preprocessing.StandardScaler(), # this will do the input scaling\n", " neural_network.MLPRegressor((100, 100),\n", " activation = \"logistic\",\n", " solver = \"lbfgs\",\n", " alpha = 0, # LBFGS is designed to work better without penalty\n", " max_iter = 10_000, # Keep = 100 to save computing time, can get up to 10000\n", " learning_rate = \"adaptive\")\n", " )\n", " \n", " return (nn_scaled_model, X_train, y_train, X_test, y_test)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "# Now on to compute the MSE\n", "def mse_out_sample_2():\n", " \"\"\"\n", " When called, this function returns the train and test MSE of the second out-of-sample training method.\n", " \"\"\"\n", " nn_scaled_model, X_train, y_train, X_test, y_test = nn_out_sample_2() # Import the values\n", " nn_scaled_model.fit(X_train,y_train)\n", " \n", " #MSE on test and training\n", " mse_train = metrics.mean_squared_error(y_train, nn_scaled_model.predict(X_train))\n", " mse_test = metrics.mean_squared_error(y_test, nn_scaled_model.predict(X_test))\n", " \n", " return [mse_train , mse_test] # Return as a list to be displayed later" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Performance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mean squared errors (MSE)\n", "\n", "Now let's finally measure the MSE of these three models. It displays the results of Training and Testing for the in-sample (IS), the out-of-sample 1 (OS1) and the out-of-sample 2 (OS2)." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ISOS1OS2
Training MSE0.0009840.0001100.000297
Testing MSE0.0004080.1681140.173089
\n", "
" ], "text/plain": [ " IS OS1 OS2\n", "Training MSE 0.000984 0.000110 0.000297\n", "Testing MSE 0.000408 0.168114 0.173089" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table_mse = pd.DataFrame(columns = [\"IS\", \"OS1\",\"OS2\"])\n", "table_mse[\"IS\"] = mse_in_sample()\n", "table_mse[\"OS1\"] = mse_out_sample_1()\n", "table_mse[\"OS2\"] = mse_out_sample_2()\n", "table_mse.index = [\"Training MSE\", \"Testing MSE\"]\n", "table_mse" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The outcome usually changes each time we run the code. However, the order of the results remain relatively the same. In terms of prediction (\"testing\"), the IS model performs better than the two others. This result follows quite logically since it uses more data as training and is used to predict outcomes that are included in its training set. The testing MSE of OS2 is larger because it uses less data as training.\n", "\n", "But more important than the MSE is the model's **predictive power**. It needs to be able to correctly predict the outcomes when the data is fed to it. The proportion of correct predictions is referred as the \"hit rate\". This hit rate is particularly interesting since it can be used for *comparison* between models. Here, the interest is to compare the ANN's hit rate with the REP in [Fioramanti (2008)](#nn-fioramanti2008)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hit rate " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since model predictions are continuous over roughly $[0,1]$ (I say roughly as some predictions might be slightly higher or lower than the bounds). Since predictions are not exactly 0 or 1 but are closer to a probability (though not exactly since some values are out of $[0,1]$), defining what is a \"success\" remains arbitrary. \n", "\n", "In this case, the chosen boundary for a success is in $\\pm 15\\%$ of the correct outcome. For instance:\n", "\n", "- A prediction **superior** to $0.85$ while $crisis = 1$ is considered as a success, the `upper_bound`.\n", "- A prediction **inferior** to $0.15$ while $crisis = 0$ is considered as a success, the `lower_bound`.\n", "- Every other configuration is considered as a failure.\n", "\n", "It is possible to change these bounds and see the predictive power of the model under different conditions." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "def hit_rate(upper_bound = 0.85, lower_bound = 0.15):\n", " \n", " \"\"\"\n", " When called, returns the proportion of correctly predicted outcomes for the three different ANN models.\n", " ---\n", " Arguments:\n", " upper_bound: float. The bound above which a prediction is considered a success. Default = 0.85\n", " lower_bound: float. The bound below which a prediction is considered a success. Default = 0.15\n", " \"\"\"\n", " \n", " # Flag missing values with an indicator (=1 if value is missing)\n", " indic = master.drop(\"crisis\",axis=1).isnull().replace(to_replace = [True, False], value = [1, 0])\n", " for col in list(indic.columns):\n", " indic = indic.rename(columns={f\"{col}\":f\"missing_{col}\" })\n", " \n", " _master = pd.merge(master, indic, left_index=True, right_index=True, how=\"left\")\n", " \n", " # Some variables spat out \"inf\" values, let's replace them by NaN\n", " df = _master.replace([np.inf, -np.inf], np.nan).copy()\n", " # Replace NaN values by 0 (drops them from the model)\n", " df = df.fillna(value=0)\n", " \n", " \n", " \n", " # Import models\n", " nn_scaled_is, X_train_is, y_train_is, X_test_is, y_test_is = nn_in_sample()\n", " nn_scaled_os1, X_train_os1, y_train_os1, X_test_os1, y_test_os1 = nn_out_sample_1()\n", " nn_scaled_os2, X_train_os2, y_train_os2, X_test_os2, y_test_os2 = nn_out_sample_2()\n", " \n", " # Fit models\n", " nn_scaled_is.fit(X_train_is, y_train_is)\n", " nn_scaled_os1.fit(X_train_os1, y_train_os1)\n", " nn_scaled_os2.fit(X_train_os2, y_train_os2)\n", " \n", " # Predict outcome\n", " y_predict_is = nn_scaled_is.predict(X_test_is)\n", " y_predict_os1 = nn_scaled_os1.predict(X_test_os1)\n", " y_predict_os2 = nn_scaled_os2.predict(X_test_os2)\n", " \n", " predict = df.loc[pd.IndexSlice[:,[\"2002\",\"2003\",\"2004\"]], :]\n", " \n", " predict[\"crisis_prediction_is\"] = y_predict_is\n", " predict[\"crisis_prediction_os1\"] = y_predict_os1\n", " predict[\"crisis_prediction_os2\"] = y_predict_os2\n", " \n", " \n", " predict[\"success_is\"] = np.nan \n", " predict[\"success_os1\"] = np.nan\n", " predict[\"success_os2\"] = np.nan\n", " \n", " \n", " for i in range(len(predict[\"crisis\"])): # There are 138 rows in the \"predict\" DataFrame\n", " if (predict[\"crisis_prediction_is\"][i] >= upper_bound) & (predict[\"crisis\"][i] == 1):\n", " predict[\"success_is\"][i] = 1 # Correctly predicted a crisis \n", " elif (predict[\"crisis_prediction_is\"][i] <= lower_bound) & (predict[\"crisis\"][i] == 0):\n", " predict[\"success_is\"][i] = 1 # Correctly predicted no crisis\n", " else:\n", " predict[\"success_is\"][i] = 0\n", "\n", " \n", " if (predict[\"crisis_prediction_os1\"][i] >= upper_bound) & (predict[\"crisis\"][i] == 1):\n", " predict[\"success_os1\"][i] = 1\n", " elif (predict[\"crisis_prediction_os1\"][i] <= lower_bound) & (predict[\"crisis\"][i] == 0):\n", " predict[\"success_os1\"][i] = 1\n", " else:\n", " predict[\"success_os1\"][i] = 0\n", "\n", " \n", " if (predict[\"crisis_prediction_os2\"][i] >= upper_bound) & (predict[\"crisis\"][i] == 1):\n", " predict[\"success_os2\"][i] = 1\n", " elif (predict[\"crisis_prediction_os2\"][i] <= lower_bound) & (predict[\"crisis\"][i] == 0):\n", " predict[\"success_os2\"][i] = 1\n", " else:\n", " predict[\"success_os2\"][i] = 0\n", "\n", " \n", " hit_rate_is = (int(predict.loc[predict[\"success_is\"] == 1][\"success_is\"].value_counts()) / len(predict[\"crisis\"]))*100\n", " \n", " hit_rate_os1 = (int(predict.loc[predict[\"success_os1\"] == 1][\"success_os1\"].value_counts()) / len(predict[\"crisis\"]))*100\n", " \n", " hit_rate_os2 = (int(predict.loc[predict[\"success_os2\"] == 1][\"success_os2\"].value_counts()) / len(predict[\"crisis\"]))*100\n", " \n", " return [hit_rate_is, hit_rate_os1, hit_rate_os2]\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's compare our hit rate with the one of the REP. The performance of our neural network increases as we increase the maximum number of iterations. Abbreviations stand for random effect probit (REP), restricted REP (R_REP), in-sample model (IS), out-of-sample model 1 (OS1) and out-of-sample model 2 (OS2)." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
REPR_REPISOS1OS2
Prediction success (%)87.6281.55100.075.36231971.73913
\n", "
" ], "text/plain": [ " REP R_REP IS OS1 OS2\n", "Prediction success (%) 87.62 81.55 100.0 75.362319 71.73913" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def comparison_rep():\n", " \n", " hit_rate_is, hit_rate_os1, hit_rate_os2 = hit_rate() # Import hit rates\n", " \n", " comp = pd.DataFrame(columns = [\"REP\", \"R_REP\", \"IS\", \"OS1\", \"OS2\"])\n", " comp[\"REP\"] = [87.62] # Average computed from Fioramanti (2008)\n", " comp[\"R_REP\"] = [81.55] # Average computed from Fioramanti (2008)\n", " comp[\"IS\"] = [hit_rate_is]\n", " comp[\"OS1\"] = [hit_rate_os1]\n", " comp[\"OS2\"] = [hit_rate_os2]\n", " comp.index = [\"Prediction success (%)\"]\n", " return comp\n", "\n", "comparison_rep()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "It is possible to see that with a threshold of $\\pm 15\\%$ for success flagging, the model probably will not beat the REP and the restricted REP. Variying this threshold might change the results. Note that the prediction results of the IS model approach 100% (and sometimes hits 100%). Nonetheless, this result is not that exceptional since it was asked to predict data it was already trained on. A more interesting form of predicting power lies in models OS1 and OS2 where the ANN had to predict data it was not trained on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "\n", "Ciarlone, A., Trebeschi, G., 2005. Designing an Early Warning System for debt crises. *Emerging Makets Review*, No. 6 Vol. 4, pp. 376–395.\n", "\n", "\n", "Fioramanti, M., 2008. Predicting sovereign debt crises using artificial neural networks: A comparative approach. *Journal of Financial Stability*, Vol. 4, pp. 149-164.\n", "\n", "\n", "Manasse, P., Roubini, N., Schimmelpfenning, A., 2003. Predicting sovereign debt crises. *IMF Working Paper* No. 221." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 4 }