{ "cells": [ { "cell_type": "markdown", "id": "930433a2", "metadata": {}, "source": [ "## Theory Versus Machine Learning for Exchange Rate Predictions" ] }, { "cell_type": "markdown", "id": "6d74777f", "metadata": {}, "source": [ "### Annabella Stoll-Dansereau, Silas Kwok, [Redacted]" ] }, { "cell_type": "markdown", "id": "c1123762", "metadata": {}, "source": [ "In this project, we look at long term exchange rates and predict them using the Purchasing Power Parity theory (PPP), the Real Exchange Rate theory (RER), a linear regression, a lasso regression and two neural networks. To compare the different models, we use their predictions to decide which currencies to buy and then compare the average rate of returns. We find that the PPP model performs the best and does give positive returns in some cases, the machine learning models do not perform well and all give negative returns." ] }, { "cell_type": "markdown", "id": "27bb4ae8", "metadata": {}, "source": [ "Importing all the necessary packages." ] }, { "cell_type": "code", "execution_count": 1, "id": "37d43b92", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np \n", "import seaborn as sns\n", "from sklearn import linear_model\n", "from sklearn.model_selection import cross_val_score\n", "from sklearn import tree\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn import neural_network\n", "from sklearn import preprocessing, pipeline\n", "from sklearn import metrics\n", "import matplotlib.pyplot as plt\n", "import altair as alt" ] }, { "cell_type": "markdown", "id": "9287855b", "metadata": {}, "source": [ "First we load and clean the data." ] }, { "cell_type": "code", "execution_count": 2, "id": "73214efc", "metadata": { "scrolled": true }, "outputs": [], "source": [ "def data_comp(variables):\n", "\n", " '''\n", " cleaning OECD data pass in the name of csv files you are using\n", "\n", " '''\n", " for n, var in enumerate(variables):\n", "\n", " #read in file OECD data\n", "\n", " df = pd.read_csv(f\"{var}.csv\")\n", "\n", " #drop redundent columns and transform it into Year, Country, Variable\n", "\n", " df.columns = df.iloc[2]\n", " df.drop([0,1,2], inplace=True)\n", " df.drop(df.columns[1:44], axis=1, inplace=True)\n", " df.set_index('Country Name', inplace = True, drop = True)\n", " df = df.T\n", " df.index = [round(idx) for idx in df.index]\n", " df.index = pd.to_datetime(df.index, format='%Y')\n", " df = df.reset_index()\n", " df = pd.melt(df, id_vars=['index'], value_vars=df.columns[1:], var_name='Country Name', value_name='Value')\n", "\n", " df.set_index(['index', 'Country Name'], inplace=True)\n", " df.index.names = [\"Year\", \"Country Name\"]\n", " df.rename(columns={'Value': var}, inplace=True)\n", "\n", " #making the dataframe and combining over all variables\n", " if n == 0: \n", " full_df = df\n", " else: \n", " full_df = pd.concat([full_df, df], axis =1)\n", " return(full_df)\n", "\n", "df_ML = data_comp(['GDP', 'Govt_debt_gdp', 'GDP growth rate','GNI per capita',\n", " 'Inflation', 'Lending rate', 'Net exports', 'Total_reserve', 'Forex rate'])\n", "\n", "#writting to a csv\n", "#full_df.to_csv('EX_rate_table.csv', index=True)\n", "\n" ] }, { "cell_type": "markdown", "id": "032293f8", "metadata": {}, "source": [ "## PPP and RER" ] }, { "cell_type": "markdown", "id": "830c9edb", "metadata": {}, "source": [ "Here we construct the PPP and RER models. Based on the idea that in the long run we will converge back to an average real exchange rate value. While it doesnt always hold mean reversion seems to be the most consistent (and easiest!) way to predict exchange rates. " ] }, { "cell_type": "code", "execution_count": 3, "id": "469a4a40", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " RER Forex rate PPP\n", "Year Country Name \n", "2000-01-01 Armenia 74.010935 539.525833 0.234525\n", "2001-01-01 Armenia 73.034791 555.078258 0.231919\n", "2002-01-01 Armenia 73.665189 573.353333 0.226287\n", "2003-01-01 Armenia 67.937520 578.762954 0.229938\n", "2004-01-01 Armenia 71.634581 533.450833 0.258205\n", "... ... ... ...\n", "2017-01-01 Zambia 94.221086 9.517500 0.440513\n", "2018-01-01 Zambia 88.528850 10.458333 0.420550\n", "2019-01-01 Zambia 79.795053 12.890000 0.360806\n", "2020-01-01 Zambia 66.438740 18.344093 0.284940\n", "2021-01-01 Zambia 69.764269 20.018487 0.319845\n", "\n", "[1766 rows x 3 columns]\n" ] } ], "source": [ "df_PPP_model= data_comp(['RER', 'Forex rate', 'PPP'])\n", "df_PPP = pd.read_csv(f\"PPP_US.csv\")\n", "\n", "df_PPP_model = df_PPP_model[df_PPP_model.notna().all(axis=1)]\n", "print(df_PPP_model)\n", "\n", "#making the contructed RER which I am defining from 456 notes as RER = (forgien exchnage rate * P_in country)/ P_in US\n", "#here P_in country/P in US == PPP\n", "\n", "df_PPP_model['con_RER'] = (df_PPP_model['Forex rate'] * df_PPP_model['PPP'])\n", "\n", "#Now we have to see how this constructed RER compares with the reported RER and NomER (Forex rate)\n", "#theretically in the long run these would be zero since the NOM*PPP == RER_normed\n", "#Then I divided by 100 to get the percetage if it is negative that implies currency will appriciate\n", "#if is positive it will depreiciate\n", "\n", "#Clearly a lot of issues and gross assumptions\n", "# 1. this only holds for inflation bearing countries which might be why issues arise in non advanced countries\n", "# 2. not clear if mean reverting in long long \n", "# 3. not clear if assumptions on PPP hold in long run\n", "# 4. clealy issues in this not taking into consideration risks i.e. a forgein exchange might appear undervalued but that is because its risky\n", "# 5. very US centric by norming to the US each period as the US curency itself might have issues\n", "\n", "df =df_PPP_model.reset_index()\n", "#df_PPP_model.to_csv('testing1.csv', index=True)\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "5cd96ec2", "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PPP_for_ER
YearCountry Name
1960-01-01Australia0.700946
1961-01-01Australia0.701324
1962-01-01Australia0.687990
1963-01-01Australia0.698321
1964-01-01Australia0.708346
.........
2017-01-01Cameroon232.801392
2018-01-01Cameroon231.682690
2019-01-01Cameroon230.273904
2020-01-01Cameroon229.137013
2021-01-01Cameroon227.378726
\n", "

1765 rows × 1 columns

\n", "
" ], "text/plain": [ " PPP_for_ER\n", "Year Country Name \n", "1960-01-01 Australia 0.700946\n", "1961-01-01 Australia 0.701324\n", "1962-01-01 Australia 0.687990\n", "1963-01-01 Australia 0.698321\n", "1964-01-01 Australia 0.708346\n", "... ...\n", "2017-01-01 Cameroon 232.801392\n", "2018-01-01 Cameroon 231.682690\n", "2019-01-01 Cameroon 230.273904\n", "2020-01-01 Cameroon 229.137013\n", "2021-01-01 Cameroon 227.378726\n", "\n", "[1765 rows x 1 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df= pd.read_csv(f\"PPP_US.csv\")\n", "\n", "country_mapping = {\n", "\n", " 'ARM': 'Armenia', 'ATG': 'Antigua and Barbuda', 'AUS': 'Australia', 'BDI': 'Burundi', 'BGR': 'Bulgaria', \n", " 'BHR': 'Bahrain', 'BHS': 'Bahamas, The', 'BLZ': 'Belize', 'BOL': 'Bolivia', 'BRA': 'Brazil', \n", " 'CAF': 'Central African Republic', 'CAN': 'Canada', 'CHE': 'Switzerland', 'CHL': 'Chile', 'CHN': 'China', \n", " 'CIV': \"Cote d'Ivoire\", 'CMR': 'Cameroon', 'COD': 'Congo, Dem. Rep.', 'COL': 'Colombia', 'CRI': 'Costa Rica', \n", " 'CYP': 'Cyprus', 'CZE': 'Czechia', 'DMA': 'Dominica', 'DNK': 'Denmark', 'DOM': 'Dominican Republic', \n", " 'DZA': 'Algeria', 'FJI': 'Fiji', 'GAB': 'Gabon', 'GBR': 'United Kingdom', 'GEO': 'Georgia', 'GHA': 'Ghana', \n", " 'GMB': 'Gambia, The', 'GNQ': 'Equatorial Guinea', 'GRC': 'Greece', 'GRD': 'Grenada', 'GUY': 'Guyana', \n", " 'HKG': 'Hong Kong SAR, China', 'HRV': 'Croatia', 'HUN': 'Hungary', 'IRN': 'Iran, Islamic Rep.', 'ISL': 'Iceland', \n", " 'ISR': 'Israel', 'JPN': 'Japan', 'KNA': 'St. Kitts and Nevis', 'KOR': 'Korea, Rep.', 'LCA': 'St. Lucia', \n", " 'LSO': 'Lesotho', 'LVA': 'Latvia', 'MAR': 'Morocco', 'MDA': 'Moldova', 'MEX': 'Mexico', 'MKD': 'North Macedonia', \n", " 'MLT': 'Malta', 'MWI': 'Malawi', 'MYS': 'Malaysia', 'NGA': 'Nigeria', 'NIC': 'Nicaragua', 'NOR': 'Norway', \n", " 'NZL': 'New Zealand', 'PAK': 'Pakistan', 'PHL': 'Philippines', 'PNG': 'Papua New Guinea', 'POL': 'Poland', \n", " 'PRY': 'Paraguay', 'ROU': 'Romania', 'RUS': 'Russian Federation', 'SAU': 'Saudi Arabia', 'SGP': 'Singapore', \n", " 'SLB': 'Solomon Islands', 'SLE': 'Sierra Leone', 'SVK': 'Slovak Republic', 'SWE': 'Sweden', 'TGO': 'Togo', \n", " 'TTO': 'Trinidad and Tobago', 'TUN': 'Tunisia', 'UGA': 'Uganda', 'UKR': 'Ukraine', 'URY': 'Uruguay', 'USA': 'United States',\n", " 'VCT': 'St. Vincent and the Grenadines', 'VEN': 'Venezuela, RB', 'WSM': 'Samoa', 'ZAF': 'South Africa',\n", " 'ZMB': 'Zambia'}\n", "\n", "def apply_country_mapping(location):\n", " if location in country_mapping:\n", " return country_mapping[location]\n", " else:\n", " return None\n", "\n", "df['LOCATION'] = df['LOCATION'].apply(apply_country_mapping)\n", "\n", "# Extract only the necessary columns\n", "df = df[['LOCATION', 'TIME', 'Value']]\n", "df['TIME'] = pd.to_datetime(df['TIME'], format='%Y')\n", "\n", "df = df.rename(columns={'Value': 'PPP_for_ER'})\n", "df = df.dropna()\n", "df.set_index(['TIME', 'LOCATION'], inplace=True)\n", "df.index.set_names(['Year', 'Country Name'], inplace=True)\n", "df\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "61aaff23", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64\n" ] } ], "source": [ "df\n", "df_merged = df.merge(df_PPP_model, left_index=True, right_index=True, how='inner')\n", "\n", "#theretically in the long run these would be zero since the NOM*PPP == RER_normed\n", "#so now we will calculate the difference between the constructed Real exchange rate where RER = PPP*NER\n", "#versus the calculated PPP\n", "\n", "#if the PPP*nom is higher than the PPP based on the exchange rate the currency would be overvalued\n", "#id PPP *nom is lower than PPP the currency would be undervalued\n", "\n", "#a lot of issues because I dont know the methodology of how the real excahnge rate of PPP was calculated for the country which I grabbed\n", "#from OCED but still shows what we want to buy or sell depending on the year\n", "\n", "df_merged['Final_diff'] = df_merged['con_RER'] - df_merged['PPP_for_ER']\n", "#filtering out unstable ones\n", "df_merged = df_merged[df_merged['Final_diff']>-0.1]\n", "df_merged = df_merged[df_merged['Final_diff']<0.1]\n", "\n", "print(df_merged['Final_diff'].dtype)\n", "df_merged.sort_index(inplace=True)\n", "\n", "# Select all rows with years before 2015 and including 2015\n", "df_year_before_2015 = df_merged.loc[(slice(None, '2015-01-01'), slice(None)), :]\n", "\n", "\n", "\n", "\n", "df_merged['diff_in_ave_RER'] = df_merged['RER'].div(df_year_before_2015.groupby('Country Name').mean()['RER'])\n", "\n", "df_years={}\n", "for yr in range(2000, 2022):\n", " name = f'df_{yr}'\n", " dt_year= pd.to_datetime(f'{yr}-01-01')\n", " df_year = df_merged.loc[(dt_year, slice(None))]\n", " df_years[yr] = df_year\n", "\n", "\n", "#ones below 1 short if the 'diff in ave RER' agrees\n" ] }, { "cell_type": "markdown", "id": "1899ed89", "metadata": {}, "source": [ "Below we calculate the returns from the RER model." ] }, { "cell_type": "code", "execution_count": 6, "id": "821afc2e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average Returns for 1 year holding period: 0.01100000000000001\n", "Average Returns for 2 year holding period: 0.17499999999999996\n", "Average Returns for 3 year holding period: -0.7763333333333334\n", "Average Returns for 4 year holding period: -2.1503333333333337\n" ] } ], "source": [ "# Define the holdings() function\n", "\n", "\n", "def holdings(row):\n", " if row['diff_in_ave_RER'] > 1:\n", " loan_amount = 1000 * row['Forex rate']\n", " row['usd_amount'] = loan_amount/row['Forex rate']\n", " row['home_currency_amount'] = -loan_amount\n", " else:\n", " loan_amount = 1000\n", " row['usd_amount'] = -loan_amount\n", " row['home_currency_amount'] = loan_amount * row['Forex rate']\n", " return row\n", "\n", "yr_past = 2015\n", "yr_future = 2019\n", "\n", "# Apply the function to the 'Amount' column in the DataFrame\n", "#df_mean_RER_investment = df_years[yr_past].apply(holdings, axis=1)\n", "#df_mean_RER_investment[f'{yr_future}_ER'] = df_years[yr_future]['Forex rate']\n", "##df_mean_RER_investment['new_usd_amount'] = (1/df_mean_RER_investment[f'{yr_future}_ER'])*df_mean_RER_investment['home_currency_amount'] \n", "#df_mean_RER_investment['return_USD'] = df_mean_RER_investment['new_usd_amount'] + df_mean_RER_investment['usd_amount']\n", "returns = {}\n", "\n", "for yr_past in range(2015,2018):\n", " list = []\n", " for yr in range(1,5):\n", " yr_future = yr_past+yr\n", " df_mean_RER_investment = df_years[yr_past].apply(holdings, axis=1)\n", " df_mean_RER_investment[f'{yr_future}_ER'] = df_years[yr_future]['Forex rate']\n", " df_mean_RER_investment['new_usd_amount'] = (1/df_mean_RER_investment[f'{yr_future}_ER'])*df_mean_RER_investment['home_currency_amount'] \n", " df_mean_RER_investment['return_USD'] = df_mean_RER_investment['new_usd_amount'] + df_mean_RER_investment['usd_amount']\n", " total_investment = df_mean_RER_investment['return_USD'].count() *1000\n", " gains = df_mean_RER_investment['return_USD'].sum()\n", " return_to = (((gains+total_investment)/total_investment-1)*100).round(3)\n", " list.append(return_to)\n", " returns[yr_past] = list \n", "\n", "for i in range(0,4):\n", "\n", " elements = [v[i] for v in returns.values()]\n", " average = sum(elements) / len(elements)\n", "\n", " print(f\"Average Returns for {i+1} year holding period:\", average)\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "9f240e38", "metadata": {}, "source": [ "As we can see above the average returns for the RER model are either close to 0 or negative. This indicates this rule isnt a good investment strategy. \n", "\n", "Now we calculate the return for the PPP model." ] }, { "cell_type": "code", "execution_count": 7, "id": "eb10cb46", "metadata": {}, "outputs": [], "source": [ "# Define the holdings() function\n", "\n", "\n", "def holdings(row):\n", " if row['Final_diff'] > 0:\n", " loan_amount = 1000 * row['Forex rate']\n", " row['usd_amount'] = loan_amount/row['Forex rate']\n", " row['home_currency_amount'] = -loan_amount\n", " else:\n", " loan_amount = 1000\n", " row['usd_amount'] = -loan_amount\n", " row['home_currency_amount'] = loan_amount * row['Forex rate']\n", " return row\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "0f926f90", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{2009: [2.385, 5.431, 3.62, 2.171], 2010: [1.885, 1.378, 1.67, 2.385], 2011: [2.043, 2.201, 3.096, 7.39], 2012: [0.541, 1.369, 4.757, 4.861], 2013: [-0.107, -0.924, -0.556, -0.7], 2014: [-3.203, -2.152, -2.732, -3.082], 2015: [0.017, -0.53, 0.256, 1.198], 2016: [-1.394, -1.955, -0.692, 0.744], 2017: [-0.227, -0.12, 1.197, 1.476]}\n", "Average Returns for 1 year holding period: 0.21555555555555553\n", "Average Returns for 2 year holding period: 0.5219999999999999\n", "Average Returns for 3 year holding period: 1.1795555555555555\n", "Average Returns for 4 year holding period: 1.8269999999999997\n" ] } ], "source": [ "returns = {}\n", "\n", "for yr_past in range(2009,2018):\n", " list = []\n", " for yr in range(1,5):\n", " yr_future = yr_past+yr\n", " df_mean_PPP_diff_investment = df_years[yr_past].apply(holdings, axis=1)\n", " df_mean_PPP_diff_investment[f'{yr_future}_ER'] = df_years[yr_future]['Forex rate']\n", " df_mean_PPP_diff_investment['new_usd_amount'] = (1/df_mean_PPP_diff_investment[f'{yr_future}_ER'])*df_mean_PPP_diff_investment['home_currency_amount'] \n", " df_mean_PPP_diff_investment['return_USD'] = df_mean_PPP_diff_investment['new_usd_amount'] + df_mean_PPP_diff_investment['usd_amount']\n", " total_investment = df_mean_PPP_diff_investment['return_USD'].count() *1000\n", " gains = df_mean_PPP_diff_investment['return_USD'].sum()\n", " return_to = (((gains+total_investment)/total_investment-1)*100).round(3)\n", " list.append(return_to)\n", " returns[yr_past] = list\n", "print(returns) \n", "\n", "for i in range(0,4):\n", "\n", " elements = [v[i] for v in returns.values()]\n", " average = sum(elements) / len(elements)\n", "\n", " print(f\"Average Returns for {i+1} year holding period:\", average)" ] }, { "cell_type": "markdown", "id": "282e985c", "metadata": {}, "source": [ "As we can see above the returns for the PPP model are better and are all positive. So if we systematically invested based on this PPP rule we could get a positive return (although given the risk a four year return of 1.8% is pretty terrible). " ] }, { "cell_type": "markdown", "id": "64677c86", "metadata": {}, "source": [ "## Data for Machine Learning" ] }, { "cell_type": "code", "execution_count": 9, "id": "21ed15c9", "metadata": {}, "outputs": [], "source": [ "forex_data = pd.read_csv(\"EX_rate_table.csv\")" ] }, { "cell_type": "code", "execution_count": 10, "id": "fcf4db4b", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCountry NameGDPGovt_debt_gdpGDP growth rateGNI per capitaInflationLending rateNet exportsTotal_reserveForex rate
02000-01-01Aruba1.873184e+09NaN7.61658920390.04.04402112.0666673.097765e+08235455000.01.79
12001-01-01Aruba1.896648e+09NaN4.19596320510.02.88360412.6000004.304581e+08321360614.51.79
22002-01-01Aruba1.962011e+09NaN-0.95777119290.03.31524711.300000-1.377374e+08374002933.01.79
32003-01-01Aruba2.044134e+09NaN1.12087921120.03.65636510.700000-2.437989e+07336945700.01.79
42004-01-01Aruba2.254749e+09NaN7.28102624010.02.5291299.6000004.535251e+08339224000.01.79
\n", "
" ], "text/plain": [ " Year Country Name GDP Govt_debt_gdp GDP growth rate \\\n", "0 2000-01-01 Aruba 1.873184e+09 NaN 7.616589 \n", "1 2001-01-01 Aruba 1.896648e+09 NaN 4.195963 \n", "2 2002-01-01 Aruba 1.962011e+09 NaN -0.957771 \n", "3 2003-01-01 Aruba 2.044134e+09 NaN 1.120879 \n", "4 2004-01-01 Aruba 2.254749e+09 NaN 7.281026 \n", "\n", " GNI per capita Inflation Lending rate Net exports Total_reserve \\\n", "0 20390.0 4.044021 12.066667 3.097765e+08 235455000.0 \n", "1 20510.0 2.883604 12.600000 4.304581e+08 321360614.5 \n", "2 19290.0 3.315247 11.300000 -1.377374e+08 374002933.0 \n", "3 21120.0 3.656365 10.700000 -2.437989e+07 336945700.0 \n", "4 24010.0 2.529129 9.600000 4.535251e+08 339224000.0 \n", "\n", " Forex rate \n", "0 1.79 \n", "1 1.79 \n", "2 1.79 \n", "3 1.79 \n", "4 1.79 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forex_data.head()" ] }, { "cell_type": "code", "execution_count": 11, "id": "cf4d413f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Year 5852\n", "Country Name 5852\n", "GDP 5609\n", "Govt_debt_gdp 1378\n", "GDP growth rate 5529\n", "GNI per capita 5285\n", "Inflation 4992\n", "Lending rate 2806\n", "Net exports 3884\n", "Total_reserve 3801\n", "Forex rate 3954\n", "dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forex_data.count()" ] }, { "cell_type": "markdown", "id": "d9f20618", "metadata": {}, "source": [ "As we can see above, we lose about a two-thirds of our dataset if we include government debt to gdp ratio. This is why we'll drop this variable." ] }, { "cell_type": "code", "execution_count": 12, "id": "beead410", "metadata": {}, "outputs": [], "source": [ "forex_data_2 = forex_data.drop([\"Govt_debt_gdp\"], axis = 1)" ] }, { "cell_type": "code", "execution_count": 13, "id": "d516e4fa", "metadata": {}, "outputs": [], "source": [ "forex_data_2.dropna(inplace = True)" ] }, { "cell_type": "markdown", "id": "415a9ad4", "metadata": {}, "source": [ "Since in the dataset the Foreign Exchange Rate is in Local Currency Units (LCU) per US\\$, we take the reciprocal to convert it to US\\$ per LCU.\n", "\n", "It is natural for a country with a large GDP to have a high absolute value of net exports or total reserves, therefore we take these variables as percentages of GDP in order to control for the different sizes.\n", "\n", "We divide GDP by a million so that scaling is less of an issue in our model." ] }, { "cell_type": "code", "execution_count": 14, "id": "d9d17ee5", "metadata": {}, "outputs": [], "source": [ "#Converted Net exports and total reserves as percentages of gdp to normalise across countries.\n", "forex_data_2[\"Forex rate\"] = 1/forex_data_2[\"Forex rate\"]\n", "forex_data_2[\"Net Exports to GDP\"] = forex_data_2[\"Net exports\"]/forex_data_2[\"GDP\"] * 100\n", "forex_data_2[\"Total Reserve to GDP\"] = forex_data_2[\"Total_reserve\"]/forex_data_2[\"GDP\"] * 100\n", "forex_data_2[\"GDP per mil\"] = forex_data_2[\"GDP\"]/1_000_000\n", "forex_data_2 = forex_data_2.drop([\"Net exports\", \"Total_reserve\", \"GDP\"], axis = 1)" ] }, { "cell_type": "code", "execution_count": 15, "id": "126c9371", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCountry NameGDP growth rateGNI per capitaInflationLending rateForex rateNet Exports to GDPTotal Reserve to GDPGDP per mil
02000-01-01Aruba7.61658920390.04.04402112.0666670.55865916.53742912.5697721873.184358
12001-01-01Aruba4.19596320510.02.88360412.6000000.55865922.69572916.9436081896.648045
22002-01-01Aruba-0.95777119290.03.31524711.3000000.558659-7.02021619.0622221962.011173
32003-01-01Aruba1.12087921120.03.65636510.7000000.558659-1.19267616.4835422044.134078
42004-01-01Aruba7.28102624010.02.5291299.6000000.55865920.11422215.0448702254.748603
\n", "
" ], "text/plain": [ " Year Country Name GDP growth rate GNI per capita Inflation \\\n", "0 2000-01-01 Aruba 7.616589 20390.0 4.044021 \n", "1 2001-01-01 Aruba 4.195963 20510.0 2.883604 \n", "2 2002-01-01 Aruba -0.957771 19290.0 3.315247 \n", "3 2003-01-01 Aruba 1.120879 21120.0 3.656365 \n", "4 2004-01-01 Aruba 7.281026 24010.0 2.529129 \n", "\n", " Lending rate Forex rate Net Exports to GDP Total Reserve to GDP \\\n", "0 12.066667 0.558659 16.537429 12.569772 \n", "1 12.600000 0.558659 22.695729 16.943608 \n", "2 11.300000 0.558659 -7.020216 19.062222 \n", "3 10.700000 0.558659 -1.192676 16.483542 \n", "4 9.600000 0.558659 20.114222 15.044870 \n", "\n", " GDP per mil \n", "0 1873.184358 \n", "1 1896.648045 \n", "2 1962.011173 \n", "3 2044.134078 \n", "4 2254.748603 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forex_data_2.head()" ] }, { "cell_type": "code", "execution_count": 16, "id": "2e8d62b1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['2000-01-01', '2001-01-01', '2002-01-01', '2003-01-01',\n", " '2004-01-01', '2005-01-01', '2006-01-01', '2007-01-01',\n", " '2008-01-01', '2009-01-01', '2010-01-01', '2011-01-01',\n", " '2012-01-01', '2013-01-01', '2014-01-01', '2015-01-01',\n", " '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01',\n", " '2020-01-01', '2021-01-01'], dtype=object)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forex_data_2['Year'].unique()" ] }, { "cell_type": "code", "execution_count": 17, "id": "0a83cf77", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCountry NameGDP growth rateGNI per capitaInflationLending rateForex rateNet Exports to GDPTotal Reserve to GDPGDP per mil
56542000-01-01Vietnam6.787316380.0-1.71033710.5500000.000071-0.56139210.96001031172.518403
56552001-01-01Vietnam6.192893400.0-0.4315459.4200000.000068-0.27841311.24229232685.198735
56562002-01-01Vietnam6.320821420.03.8308289.0616670.000065-5.14486211.75290735064.105501
56572003-01-01Vietnam6.899063470.03.2346489.4800000.000064-8.49250715.73650039552.513316
56582004-01-01Vietnam7.536411540.07.7549479.7225000.000064-6.95388315.50031545427.854693
\n", "
" ], "text/plain": [ " Year Country Name GDP growth rate GNI per capita Inflation \\\n", "5654 2000-01-01 Vietnam 6.787316 380.0 -1.710337 \n", "5655 2001-01-01 Vietnam 6.192893 400.0 -0.431545 \n", "5656 2002-01-01 Vietnam 6.320821 420.0 3.830828 \n", "5657 2003-01-01 Vietnam 6.899063 470.0 3.234648 \n", "5658 2004-01-01 Vietnam 7.536411 540.0 7.754947 \n", "\n", " Lending rate Forex rate Net Exports to GDP Total Reserve to GDP \\\n", "5654 10.550000 0.000071 -0.561392 10.960010 \n", "5655 9.420000 0.000068 -0.278413 11.242292 \n", "5656 9.061667 0.000065 -5.144862 11.752907 \n", "5657 9.480000 0.000064 -8.492507 15.736500 \n", "5658 9.722500 0.000064 -6.953883 15.500315 \n", "\n", " GDP per mil \n", "5654 31172.518403 \n", "5655 32685.198735 \n", "5656 35064.105501 \n", "5657 39552.513316 \n", "5658 45427.854693 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forex_data_2[forex_data_2['Country Name'] == \"Vietnam\"].head()" ] }, { "cell_type": "markdown", "id": "be34dd73", "metadata": {}, "source": [ "Below we can the plots for our explanatory variables." ] }, { "cell_type": "code", "execution_count": 18, "id": "3a607a01", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/uddhav/opt/anaconda3/envs/Py_2/lib/python3.8/site-packages/altair/utils/core.py:283: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.\n", " for col_name, dtype in df.dtypes.iteritems():\n" ] }, { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a selection that chooses the nearest point & selects based on x-value\n", "nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])\n", "\n", "# The basic line\n", "line = alt.Chart(forex_data_2).mark_line().encode(\n", " alt.X('Year:T', title='Year'),\n", " y='Total Reserve to GDP:Q',\n", " color='Country Name:N',\n", " tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Total Reserve to GDP:Q')]\n", ")\n", "\n", "# Transparent selectors across the chart. This is what tells us the x-value of the cursor.\n", "selectors = alt.Chart(forex_data_2).mark_point().encode(\n", " x='Year:T',\n", " opacity=alt.value(0)\n", ")\n", "\n", "selectors_near = selectors.encode(\n", " opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))\n", ").add_selection(nearest)\n", "\n", "# Draw points on the line, and highlight based on selection.\n", "points = line.mark_circle().encode(\n", " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", ").add_selection(nearest)\n", "\n", "# Draw text labels near the points, and highlight based on selection.\n", "text = line.mark_text(align='left', dx=5, dy=-5).encode(\n", " text=alt.condition(nearest, 'Total Reserve to GDP:Q', alt.value(' '))\n", ")\n", "\n", "# Draw a rule at the location of the selection.\n", "rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(\n", " x='Year:T'\n", ").transform_filter(\n", " nearest\n", ")\n", "\n", "# Add vertical line at \"Year\" == 2015\n", "vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(\n", " x='Year:T'\n", ")\n", "\n", "# Put the six layers into a chart and bind the data.\n", "layers = alt.layer(\n", " line, selectors, points, text, rules, vline\n", ").properties(\n", " width=1000, height=800,\n", " title=\"Total Reserve to GDP for All Countries Over Time\"\n", ")\n", "\n", "layers" ] }, { "cell_type": "code", "execution_count": 19, "id": "c2dbba7d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a selection that chooses the nearest point & selects based on x-value\n", "nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])\n", "\n", "# The basic line\n", "line = alt.Chart(forex_data_2).mark_line().encode(\n", " alt.X('Year:T', title='Year'),\n", " y='Net Exports to GDP:Q',\n", " color='Country Name:N',\n", " tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Net Exports to GDP:Q')]\n", ")\n", "\n", "# Transparent selectors across the chart. This is what tells us the x-value of the cursor.\n", "selectors = alt.Chart(forex_data_2).mark_point().encode(\n", " x='Year:T',\n", " opacity=alt.value(0)\n", ")\n", "\n", "selectors_near = selectors.encode(\n", " opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))\n", ").add_selection(nearest)\n", "\n", "# Draw points on the line, and highlight based on selection.\n", "points = line.mark_circle().encode(\n", " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", ").add_selection(nearest)\n", "\n", "# Draw text labels near the points, and highlight based on selection.\n", "text = line.mark_text(align='left', dx=5, dy=-5).encode(\n", " text=alt.condition(nearest, 'Net Exports to GDP:Q', alt.value(' '))\n", ")\n", "\n", "# Draw a rule at the location of the selection.\n", "rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(\n", " x='Year:T'\n", ").transform_filter(\n", " nearest\n", ")\n", "\n", "# Add vertical line at \"Year\" == 2015\n", "vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(\n", " x='Year:T'\n", ")\n", "\n", "# Put the six layers into a chart and bind the data.\n", "layers = alt.layer(\n", " line, selectors, points, text, rules, vline\n", ").properties(\n", " width=1000, height=800,\n", " title=\"Net Exports to GDP for All Countries Over Time\"\n", ")\n", "\n", "layers" ] }, { "cell_type": "code", "execution_count": 20, "id": "32ec3ba8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a selection that chooses the nearest point & selects based on x-value\n", "nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])\n", "\n", "# The basic line\n", "line = alt.Chart(forex_data_2).mark_line().encode(\n", " alt.X('Year:T', title='Year'),\n", " y='Lending rate:Q',\n", " color='Country Name:N',\n", " tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Lending rate:Q')]\n", ")\n", "\n", "# Transparent selectors across the chart. This is what tells us the x-value of the cursor.\n", "selectors = alt.Chart(forex_data_2).mark_point().encode(\n", " x='Year:T',\n", " opacity=alt.value(0)\n", ")\n", "\n", "selectors_near = selectors.encode(\n", " opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))\n", ").add_selection(nearest)\n", "\n", "# Draw points on the line, and highlight based on selection.\n", "points = line.mark_circle().encode(\n", " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", ").add_selection(nearest)\n", "\n", "# Draw text labels near the points, and highlight based on selection.\n", "text = line.mark_text(align='left', dx=5, dy=-5).encode(\n", " text=alt.condition(nearest, 'Lending rate:Q', alt.value(' '))\n", ")\n", "\n", "# Draw a rule at the location of the selection.\n", "rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(\n", " x='Year:T'\n", ").transform_filter(\n", " nearest\n", ")\n", "\n", "# Add vertical line at \"Year\" == 2015\n", "vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(\n", " x='Year:T'\n", ")\n", "\n", "# Put the six layers into a chart and bind the data.\n", "layers = alt.layer(\n", " line, selectors, points, text, rules, vline\n", ").properties(\n", " width=1000, height=800,\n", " title=\"Lending rate for All Countries Over Time\"\n", ")\n", "\n", "layers" ] }, { "cell_type": "code", "execution_count": 21, "id": "2abb91c4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a selection that chooses the nearest point & selects based on x-value\n", "nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])\n", "\n", "# The basic line\n", "line = alt.Chart(forex_data_2).mark_line().encode(\n", " alt.X('Year:T', title='Year'),\n", " y='Inflation:Q',\n", " color='Country Name:N',\n", " tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Inflation:Q')]\n", ")\n", "\n", "# Transparent selectors across the chart. This is what tells us the x-value of the cursor.\n", "selectors = alt.Chart(forex_data_2).mark_point().encode(\n", " x='Year:T',\n", " opacity=alt.value(0)\n", ")\n", "\n", "selectors_near = selectors.encode(\n", " opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))\n", ").add_selection(nearest)\n", "\n", "# Draw points on the line, and highlight based on selection.\n", "points = line.mark_circle().encode(\n", " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", ").add_selection(nearest)\n", "\n", "# Draw text labels near the points, and highlight based on selection.\n", "text = line.mark_text(align='left', dx=5, dy=-5).encode(\n", " text=alt.condition(nearest, 'Inflation:Q', alt.value(' '))\n", ")\n", "\n", "# Draw a rule at the location of the selection.\n", "rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(\n", " x='Year:T'\n", ").transform_filter(\n", " nearest\n", ")\n", "\n", "# Add vertical line at \"Year\" == 2015\n", "vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(\n", " x='Year:T'\n", ")\n", "\n", "# Put the six layers into a chart and bind the data.\n", "layers = alt.layer(\n", " line, selectors, points, text, rules, vline\n", ").properties(\n", " width=1000, height=800,\n", " title=\"Inflation for All Countries Over Time\"\n", ")\n", "\n", "layers" ] }, { "cell_type": "code", "execution_count": 22, "id": "88ba98ac", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a selection that chooses the nearest point & selects based on x-value\n", "nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])\n", "\n", "# The basic line\n", "line = alt.Chart(forex_data_2).mark_line().encode(\n", " alt.X('Year:T', title='Year'),\n", " y='GNI per capita:Q',\n", " color='Country Name:N',\n", " # tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('GDP growth rate:Q')]\n", ")\n", "\n", "# Transparent selectors across the chart. This is what tells us the x-value of the cursor.\n", "selectors = alt.Chart(forex_data_2).mark_point().encode(\n", " x='Year:T',\n", " opacity=alt.value(0)\n", ")\n", "\n", "selectors_near = selectors.encode(\n", " opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))\n", ").add_selection(nearest)\n", "\n", "# Draw points on the line, and highlight based on selection.\n", "points = line.mark_circle().encode(\n", " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", ").add_selection(nearest)\n", "\n", "# Draw text labels near the points, and highlight based on selection.\n", "text = line.mark_text(align='left', dx=5, dy=-5).encode(\n", " text=alt.condition(nearest, 'GNI per capita:Q', alt.value(' '))\n", ")\n", "\n", "# Draw a rule at the location of the selection.\n", "rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(\n", " x='Year:T'\n", ").transform_filter(\n", " nearest\n", ")\n", "\n", "# Add vertical line at \"Year\" == 2015\n", "vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(\n", " x='Year:T'\n", ")\n", "\n", "# Put the six layers into a chart and bind the data.\n", "layers = alt.layer(\n", " line, selectors, points, text, rules, vline\n", ").properties(\n", " width=1000, height=800,\n", " title=\"GNI per capita for All Countries Over Time\"\n", ")\n", "\n", "layers" ] }, { "cell_type": "code", "execution_count": 23, "id": "81c4ab13", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a selection that chooses the nearest point & selects based on x-value\n", "nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])\n", "\n", "# The basic line\n", "line = alt.Chart(forex_data_2).mark_line().encode(\n", " alt.X('Year:T', title='Year'),\n", " y='GDP growth rate:Q',\n", " color='Country Name:N',\n", " # tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('GDP growth rate:Q')]\n", ")\n", "\n", "# Transparent selectors across the chart. This is what tells us the x-value of the cursor.\n", "selectors = alt.Chart(forex_data_2).mark_point().encode(\n", " x='Year:T',\n", " opacity=alt.value(0)\n", ")\n", "\n", "selectors_near = selectors.encode(\n", " opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))\n", ").add_selection(nearest)\n", "\n", "# Draw points on the line, and highlight based on selection.\n", "points = line.mark_circle().encode(\n", " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", ").add_selection(nearest)\n", "\n", "# Draw text labels near the points, and highlight based on selection.\n", "text = line.mark_text(align='left', dx=5, dy=-5).encode(\n", " text=alt.condition(nearest, 'GDP growth rate:Q', alt.value(' '))\n", ")\n", "\n", "# Draw a rule at the location of the selection.\n", "rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(\n", " x='Year:T'\n", ").transform_filter(\n", " nearest\n", ")\n", "\n", "# Add vertical line at \"Year\" == 2015\n", "vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(\n", " x='Year:T'\n", ")\n", "\n", "# Put the six layers into a chart and bind the data.\n", "layers = alt.layer(\n", " line, selectors, points, text, rules, vline\n", ").properties(\n", " width=1000, height=800,\n", " title=\"GDP growth rate for All Countries Over Time\"\n", ")\n", "\n", "layers" ] }, { "cell_type": "code", "execution_count": 24, "id": "b130f603", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a selection that chooses the nearest point & selects based on x-value\n", "nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])\n", "\n", "# The basic line\n", "line = alt.Chart(forex_data_2).mark_line().encode(\n", " alt.X('Year:T', title='Year'),\n", " y='GDP per mil:Q',\n", " color='Country Name:N',\n", " tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('GDP per mil:Q')]\n", ")\n", "\n", "# Transparent selectors across the chart. This is what tells us the x-value of the cursor.\n", "selectors = alt.Chart(forex_data_2).mark_point().encode(\n", " x='Year:T',\n", " opacity=alt.value(0)\n", ")\n", "\n", "selectors_near = selectors.encode(\n", " opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))\n", ").add_selection(nearest)\n", "\n", "# Draw points on the line, and highlight based on selection.\n", "points = line.mark_circle().encode(\n", " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", ").add_selection(nearest)\n", "\n", "# Draw text labels near the points, and highlight based on selection.\n", "text = line.mark_text(align='left', dx=5, dy=-5).encode(\n", " text=alt.condition(nearest, 'GDP per mil:Q', alt.value(' '))\n", ")\n", "\n", "# Draw a rule at the location of the selection.\n", "rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(\n", " x='Year:T'\n", ").transform_filter(\n", " nearest\n", ")\n", "\n", "# Add vertical line at \"Year\" == 2015\n", "vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(\n", " x='Year:T'\n", ")\n", "\n", "# Put the six layers into a chart and bind the data.\n", "layers = alt.layer(\n", " line, selectors, points, text, rules, vline\n", ").properties(\n", " width=1000, height=800,\n", " title=\"GDP per mil for All Countries Over Time\"\n", ")\n", "\n", "layers\n" ] }, { "cell_type": "code", "execution_count": 25, "id": "a714bc24", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a selection that chooses the nearest point & selects based on x-value\n", "nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])\n", "\n", "# The basic line\n", "line = alt.Chart(forex_data_2).mark_line().encode(\n", " alt.X('Year:T', title='Year'),\n", " y='Forex rate:Q',\n", " color='Country Name:N',\n", " tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Forex rate:Q')]\n", ")\n", "\n", "# Transparent selectors across the chart. This is what tells us the x-value of the cursor.\n", "selectors = alt.Chart(forex_data_2).mark_point().encode(\n", " x='Year:T',\n", " opacity=alt.value(0)\n", ")\n", "\n", "selectors_near = selectors.encode(\n", " opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))\n", ").add_selection(nearest)\n", "\n", "# Draw points on the line, and highlight based on selection.\n", "points = line.mark_circle().encode(\n", " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", ").add_selection(nearest)\n", "\n", "# Draw text labels near the points, and highlight based on selection.\n", "text = line.mark_text(align='left', dx=5, dy=-5).encode(\n", " text=alt.condition(nearest, 'Forex rate:Q', alt.value(' '))\n", ")\n", "\n", "# Draw a rule at the location of the selection.\n", "rules = alt.Chart(forex_data_2).mark_rule(color='gray').encode(\n", " x='Year:T'\n", ").transform_filter(\n", " nearest\n", ")\n", "\n", "# Add vertical line at \"Year\" == 2015\n", "vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(\n", " x='Year:T'\n", ")\n", "\n", "# Put the six layers into a chart and bind the data.\n", "layers = alt.layer(\n", " line, selectors, points, text, rules, vline\n", ").properties(\n", " width=1000, height=800,\n", " title=\"Forex Rates for All Countries Over Time\"\n", ")\n", "\n", "layers\n" ] }, { "cell_type": "code", "execution_count": 26, "id": "8b818616", "metadata": {}, "outputs": [], "source": [ "# select_countries = ['Canada', 'Switzerland', 'China', 'United Kingdom', 'India', 'Japan', 'China', 'Mexico', 'Singapore', 'United States']\n", "select_countries = ['Canada', 'Switzerland', 'United Kingdom', 'China', 'Singapore', 'United States']" ] }, { "cell_type": "code", "execution_count": 27, "id": "c24f62d5", "metadata": {}, "outputs": [], "source": [ "select_forex_data_2 = forex_data_2[forex_data_2['Country Name'].isin(select_countries)]" ] }, { "cell_type": "code", "execution_count": 28, "id": "d299a6d3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a selection that chooses the nearest point & selects based on x-value\n", "nearest = alt.selection_single(nearest=True, on='mouseover',\n", " fields=['Year'])\n", "\n", "# # The basic line\n", "line = alt.Chart(select_forex_data_2).mark_line().encode(\n", " alt.X('Year:T', title = 'year'),\n", " y='Forex rate:Q',\n", " color='Country Name:N',\n", " tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Forex rate:Q')]\n", ")\n", "\n", "\n", "# Transparent selectors across the chart. This is what tells us the x-value of the cursor\n", "selectors = alt.Chart(select_forex_data_2).mark_point().encode(\n", " x='Year:T',\n", " opacity=alt.value(0)\n", "\n", ")\n", "\n", "selectors_near = selectors.encode(\n", " opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))\n", ").add_selection(nearest)\n", "\n", "# Draw points on the line, and highlight based on selection\n", "points = line.mark_circle().encode(\n", " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", ").add_selection(nearest)\n", "\n", "# Draw text labels near the points, and highlight based on selection\n", "text = line.mark_text(align='left', dx=5, dy=-5).encode(\n", " text=alt.condition(nearest, 'Forex rate:Q', alt.value(' '))\n", ")\n", "\n", "# Draw a rule at the location of the selection\n", "rules = alt.Chart(select_forex_data_2).mark_rule(color='gray').encode(\n", " x='Year:T',\n", ").transform_filter(\n", " nearest\n", ")\n", "\n", "# Add vertical line at \"Year\" == 2015\n", "vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(\n", " x='Year:T'\n", ")\n", "\n", "# Put the five layers into a chart and bind the data\n", "layers = alt.layer(\n", " line, selectors, points, text, rules, vline\n", ").properties(\n", " width=1000, height=800,\n", " title = \"Forex Rates for Select Countries Over Time\"\n", ")\n", "\n", "layers" ] }, { "cell_type": "markdown", "id": "c7912004", "metadata": {}, "source": [ "We use 2015 as the cut-off for training and testing dataset." ] }, { "cell_type": "code", "execution_count": 29, "id": "421cd552", "metadata": {}, "outputs": [], "source": [ "#2015 as the cut-off year for testing.\n", "forex_data_2_test = forex_data_2[forex_data_2['Year'].isin(['2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01', '2020-01-01', '2021-01-01'])]" ] }, { "cell_type": "code", "execution_count": 30, "id": "b53b7626", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a selection that chooses the nearest point & selects based on x-value\n", "nearest = alt.selection_single(nearest=True, on='mouseover', fields=['Year'])\n", "\n", "# The basic line\n", "line = alt.Chart(forex_data_2_test).mark_line().encode(\n", " alt.X('Year:T', title='Year'),\n", " y='Forex rate:Q',\n", " color='Country Name:N',\n", " tooltip=[alt.Tooltip('Year:T'), alt.Tooltip('Country Name:N'), alt.Tooltip('Forex rate:Q')]\n", ")\n", "\n", "# Transparent selectors across the chart. This is what tells us the x-value of the cursor.\n", "selectors = alt.Chart(forex_data_2_test).mark_point().encode(\n", " x='Year:T',\n", " opacity=alt.value(0)\n", ")\n", "\n", "selectors_near = selectors.encode(\n", " opacity=alt.condition(nearest, alt.value(0.5), alt.value(0.3))\n", ").add_selection(nearest)\n", "\n", "# Draw points on the line, and highlight based on selection.\n", "points = line.mark_circle().encode(\n", " opacity=alt.condition(nearest, alt.value(1), alt.value(0))\n", ").add_selection(nearest)\n", "\n", "# Draw text labels near the points, and highlight based on selection.\n", "text = line.mark_text(align='left', dx=5, dy=-5).encode(\n", " text=alt.condition(nearest, 'Forex rate:Q', alt.value(' '))\n", ")\n", "\n", "# Draw a rule at the location of the selection.\n", "rules = alt.Chart(forex_data_2_test).mark_rule(color='gray').encode(\n", " x='Year:T'\n", ").transform_filter(\n", " nearest\n", ")\n", "\n", "# Add vertical line at \"Year\" == 2015\n", "vline = alt.Chart({'values': [{'Year': '2015-01-01'}]}).mark_rule(color='red').encode(\n", " x='Year:T'\n", ")\n", "\n", "# Put the six layers into a chart and bind the data.\n", "layers = alt.layer(\n", " line, selectors, points, text, rules, vline\n", ").properties(\n", " width=1000, height=800,\n", " title=\"Forex Rates for All Countries TESTING DATA\"\n", ")\n", "\n", "layers\n" ] }, { "cell_type": "code", "execution_count": 31, "id": "d6c377a7", "metadata": {}, "outputs": [], "source": [ "forex_data_3 = forex_data_2.drop(forex_data_2_test.index)" ] }, { "cell_type": "code", "execution_count": 32, "id": "f158b5d3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCountry NameGDP growth rateGNI per capitaInflationLending rateForex rateNet Exports to GDPTotal Reserve to GDPGDP per mil
02000-01-01Aruba7.61658920390.04.04402112.0666670.55865916.53742912.5697721873.184358
12001-01-01Aruba4.19596320510.02.88360412.6000000.55865922.69572916.9436081896.648045
22002-01-01Aruba-0.95777119290.03.31524711.3000000.558659-7.02021619.0622221962.011173
32003-01-01Aruba1.12087921120.03.65636510.7000000.558659-1.19267616.4835422044.134078
42004-01-01Aruba7.28102624010.02.5291299.6000000.55865920.11422215.0448702254.748603
\n", "
" ], "text/plain": [ " Year Country Name GDP growth rate GNI per capita Inflation \\\n", "0 2000-01-01 Aruba 7.616589 20390.0 4.044021 \n", "1 2001-01-01 Aruba 4.195963 20510.0 2.883604 \n", "2 2002-01-01 Aruba -0.957771 19290.0 3.315247 \n", "3 2003-01-01 Aruba 1.120879 21120.0 3.656365 \n", "4 2004-01-01 Aruba 7.281026 24010.0 2.529129 \n", "\n", " Lending rate Forex rate Net Exports to GDP Total Reserve to GDP \\\n", "0 12.066667 0.558659 16.537429 12.569772 \n", "1 12.600000 0.558659 22.695729 16.943608 \n", "2 11.300000 0.558659 -7.020216 19.062222 \n", "3 10.700000 0.558659 -1.192676 16.483542 \n", "4 9.600000 0.558659 20.114222 15.044870 \n", "\n", " GDP per mil \n", "0 1873.184358 \n", "1 1896.648045 \n", "2 1962.011173 \n", "3 2044.134078 \n", "4 2254.748603 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forex_data_3.head()" ] }, { "cell_type": "code", "execution_count": 33, "id": "e3a9c03d", "metadata": {}, "outputs": [], "source": [ "forex_data_3_train_y = forex_data_3[[\"Forex rate\"]]" ] }, { "cell_type": "code", "execution_count": 34, "id": "5539680c", "metadata": {}, "outputs": [], "source": [ "forex_data_3_train_X = forex_data_3.drop([\"Year\", \"Country Name\",\"Forex rate\", \"GDP per mil\"], axis = 1)" ] }, { "cell_type": "code", "execution_count": 35, "id": "504cffa1", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCountry NameGDP growth rateGNI per capitaInflationLending rateForex rateNet Exports to GDPTotal Reserve to GDPGDP per mil
152015-01-01Aruba3.54363026180.00.4747647.00.55865910.27762427.5317312963.128492
162016-01-01Aruba2.11671626650.0-0.9311966.90.55865911.17838630.9186692983.798883
172017-01-01Aruba5.48237127720.0-1.0282826.30.5586598.64656129.3356733092.178771
182018-01-01Aruba5.25785629030.03.6260416.00.5586598.97513730.6113633202.234637
192019-01-01Aruba0.63502930000.04.2574625.70.5586599.54785029.1572773368.970253
\n", "
" ], "text/plain": [ " Year Country Name GDP growth rate GNI per capita Inflation \\\n", "15 2015-01-01 Aruba 3.543630 26180.0 0.474764 \n", "16 2016-01-01 Aruba 2.116716 26650.0 -0.931196 \n", "17 2017-01-01 Aruba 5.482371 27720.0 -1.028282 \n", "18 2018-01-01 Aruba 5.257856 29030.0 3.626041 \n", "19 2019-01-01 Aruba 0.635029 30000.0 4.257462 \n", "\n", " Lending rate Forex rate Net Exports to GDP Total Reserve to GDP \\\n", "15 7.0 0.558659 10.277624 27.531731 \n", "16 6.9 0.558659 11.178386 30.918669 \n", "17 6.3 0.558659 8.646561 29.335673 \n", "18 6.0 0.558659 8.975137 30.611363 \n", "19 5.7 0.558659 9.547850 29.157277 \n", "\n", " GDP per mil \n", "15 2963.128492 \n", "16 2983.798883 \n", "17 3092.178771 \n", "18 3202.234637 \n", "19 3368.970253 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forex_data_2_test.head()" ] }, { "cell_type": "code", "execution_count": 36, "id": "31214d21", "metadata": {}, "outputs": [], "source": [ "forex_data_3_test_y = forex_data_2_test[[\"Forex rate\"]]" ] }, { "cell_type": "code", "execution_count": 37, "id": "3bfdf15e", "metadata": {}, "outputs": [], "source": [ "forex_data_3_test_X = forex_data_2_test.drop([\"Year\",\"Country Name\",\"Forex rate\", \"GDP per mil\"], axis = 1)" ] }, { "cell_type": "markdown", "id": "8cb083c4", "metadata": {}, "source": [ "Now we fit a linear regression." ] }, { "cell_type": "code", "execution_count": 38, "id": "a1004c04", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LinearRegression()" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Linear regression\n", "lr_mod = linear_model.LinearRegression()\n", "lr_mod.fit(forex_data_3_train_X, forex_data_3_train_y)" ] }, { "cell_type": "code", "execution_count": 39, "id": "9a98a571", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mean squred error in the training dataset is 0.47435291908321897\n", "The mean squred error in the testing dataset is 0.3015103066706014\n" ] } ], "source": [ "mse_lr_train = metrics.mean_squared_error(forex_data_3_train_y, lr_mod.predict(forex_data_3_train_X))\n", "mse_lr_test = metrics.mean_squared_error(forex_data_3_test_y, lr_mod.predict(forex_data_3_test_X))\n", "print(f\"The mean squred error in the training dataset is {mse_lr_train}\")\n", "print(f\"The mean squred error in the testing dataset is {mse_lr_test}\")" ] }, { "cell_type": "markdown", "id": "62ce0594", "metadata": {}, "source": [ "The mean squred errors for our linear regression are given above. As we can see the mean squared error is higher for our training dataset than our testing dataset." ] }, { "cell_type": "markdown", "id": "743565fa", "metadata": {}, "source": [ "Now we fit our lasso regression, we use a 5 fold cross validation in order to choose the optimal $\\alpha$ for our lasso regression." ] }, { "cell_type": "code", "execution_count": 40, "id": "fd14046b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
LassoCV(cv=5)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LassoCV(cv=5)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#k=5 cv lasso\n", "lasso_mod = linear_model.LassoCV(cv=5)\n", "lasso_mod.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())" ] }, { "cell_type": "code", "execution_count": 41, "id": "b4972346", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mean squred error in the training dataset is 0.49614895650008806\n", "The mean squred error in the testing dataset is 0.23705686900933912\n" ] } ], "source": [ "mse_lasso_train = metrics.mean_squared_error(forex_data_3_train_y, lasso_mod.predict(forex_data_3_train_X))\n", "mse_lasso_test = metrics.mean_squared_error(forex_data_3_test_y, lasso_mod.predict(forex_data_3_test_X))\n", "print(f\"The mean squred error in the training dataset is {mse_lasso_train}\")\n", "print(f\"The mean squred error in the testing dataset is {mse_lasso_test}\")" ] }, { "cell_type": "markdown", "id": "37ab3d67", "metadata": {}, "source": [ "As we can see above the mean squared error for our training dataset is higher than that of our testing dataset." ] }, { "cell_type": "markdown", "id": "248ab402", "metadata": {}, "source": [ "Now we fit our first neural network with 3 hidden layers." ] }, { "cell_type": "code", "execution_count": 42, "id": "5a88de45", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
MLPRegressor(hidden_layer_sizes=(50, 35, 20), max_iter=10000)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "MLPRegressor(hidden_layer_sizes=(50, 35, 20), max_iter=10000)" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#The first neural network, 3 hidden layers. \n", "nn_mod_1 = neural_network.MLPRegressor((50, 35, 20), max_iter = 10000)\n", "nn_mod_1.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())" ] }, { "cell_type": "code", "execution_count": 43, "id": "6a162c1b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mean squred error in the training dataset is 0.6253439030142882\n", "The mean squred error in the testing dataset is 0.5258991107828873\n" ] } ], "source": [ "mse_nn_1_train = metrics.mean_squared_error(forex_data_3_train_y, nn_mod_1.predict(forex_data_3_train_X))\n", "mse_nn_1_test = metrics.mean_squared_error(forex_data_3_test_y, nn_mod_1.predict(forex_data_3_test_X))\n", "print(f\"The mean squred error in the training dataset is {mse_nn_1_train}\")\n", "print(f\"The mean squred error in the testing dataset is {mse_nn_1_test}\")" ] }, { "cell_type": "markdown", "id": "ea976372", "metadata": {}, "source": [ "As we can see above the mean squared error is roughly the same for both our training and testing datasets, however this model does suffer from input scaling. This is why below we run a scaled version of our neural network." ] }, { "cell_type": "code", "execution_count": 44, "id": "64ec9e7a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Pipeline(steps=[('standardscaler', StandardScaler()),\n",
       "                ('mlpregressor',\n",
       "                 MLPRegressor(hidden_layer_sizes=(50, 35, 20),\n",
       "                              max_iter=10000))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(steps=[('standardscaler', StandardScaler()),\n", " ('mlpregressor',\n", " MLPRegressor(hidden_layer_sizes=(50, 35, 20),\n", " max_iter=10000))])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#First neural netwrok but scaled. Sacled is better. \n", "nn_mod_1_scaled = pipeline.make_pipeline(\n", " preprocessing.StandardScaler(),\n", " neural_network.MLPRegressor((50, 35, 20), max_iter = 10000))\n", "nn_mod_1_scaled.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())" ] }, { "cell_type": "code", "execution_count": 45, "id": "2f95566e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mean squred error in the training dataset is 0.054302709057648856\n", "The mean squred error in the testing dataset is 0.40455307013410235\n" ] } ], "source": [ "mse_nn_1_scaled_train = metrics.mean_squared_error(forex_data_3_train_y, nn_mod_1_scaled.predict(forex_data_3_train_X))\n", "mse_nn_1_scaled_test = metrics.mean_squared_error(forex_data_3_test_y, nn_mod_1_scaled.predict(forex_data_3_test_X))\n", "print(f\"The mean squred error in the training dataset is {mse_nn_1_scaled_train}\")\n", "print(f\"The mean squred error in the testing dataset is {mse_nn_1_scaled_test}\")" ] }, { "cell_type": "markdown", "id": "90c52d7a", "metadata": {}, "source": [ "As we can see above the mean squared error for our training data is much lower than that of our testing data. This could mean that our model is overfitting. " ] }, { "cell_type": "markdown", "id": "5aa76948", "metadata": {}, "source": [ "Now we run our second neural network with 4 hidden layers." ] }, { "cell_type": "code", "execution_count": 46, "id": "981c2c93", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Pipeline(steps=[('standardscaler', StandardScaler()),\n",
       "                ('mlpregressor',\n",
       "                 MLPRegressor(hidden_layer_sizes=(50, 35, 20, 10),\n",
       "                              max_iter=10000))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(steps=[('standardscaler', StandardScaler()),\n", " ('mlpregressor',\n", " MLPRegressor(hidden_layer_sizes=(50, 35, 20, 10),\n", " max_iter=10000))])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Second neural network but scaled.\n", "nn_mod_2_scaled = pipeline.make_pipeline(\n", " preprocessing.StandardScaler(),\n", " neural_network.MLPRegressor((50, 35, 20, 10), max_iter = 10000))\n", "nn_mod_2_scaled.fit(forex_data_3_train_X, forex_data_3_train_y.values.ravel())" ] }, { "cell_type": "code", "execution_count": 47, "id": "bac042d8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mean squred error in the training dataset is 0.0456499285270549\n", "The mean squred error in the testing dataset is 0.3405035580847174\n" ] } ], "source": [ "mse_nn_2_scaled_train = metrics.mean_squared_error(forex_data_3_train_y, nn_mod_2_scaled.predict(forex_data_3_train_X))\n", "mse_nn_2_scaled_test = metrics.mean_squared_error(forex_data_3_test_y, nn_mod_2_scaled.predict(forex_data_3_test_X))\n", "print(f\"The mean squred error in the training dataset is {mse_nn_2_scaled_train}\")\n", "print(f\"The mean squred error in the testing dataset is {mse_nn_2_scaled_test}\")" ] }, { "cell_type": "markdown", "id": "0163b469", "metadata": {}, "source": [ "As we can see above the mean squared error in our training data is lower than that of our testing data." ] }, { "cell_type": "markdown", "id": "78994a34", "metadata": {}, "source": [ "Now we can extract the predicted values for all our models and add it to our testing dataset." ] }, { "cell_type": "code", "execution_count": 48, "id": "5ed7faab", "metadata": {}, "outputs": [], "source": [ "#Extracting predicted values\n", "lr_predict = lr_mod.predict(forex_data_3_test_X)\n", "lasso_predict = lasso_mod.predict(forex_data_3_test_X)\n", "nn_1_scaled_predict = nn_mod_1_scaled.predict(forex_data_3_test_X)\n", "nn_2_scaled_predict = nn_mod_2_scaled.predict(forex_data_3_test_X)" ] }, { "cell_type": "code", "execution_count": 49, "id": "982163f6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCountry NameGDP growth rateGNI per capitaInflationLending rateForex rateNet Exports to GDPTotal Reserve to GDPGDP per mil
152015-01-01Aruba3.54363026180.00.4747647.00.55865910.27762427.5317312963.128492
162016-01-01Aruba2.11671626650.0-0.9311966.90.55865911.17838630.9186692983.798883
172017-01-01Aruba5.48237127720.0-1.0282826.30.5586598.64656129.3356733092.178771
182018-01-01Aruba5.25785629030.03.6260416.00.5586598.97513730.6113633202.234637
192019-01-01Aruba0.63502930000.04.2574625.70.5586599.54785029.1572773368.970253
\n", "
" ], "text/plain": [ " Year Country Name GDP growth rate GNI per capita Inflation \\\n", "15 2015-01-01 Aruba 3.543630 26180.0 0.474764 \n", "16 2016-01-01 Aruba 2.116716 26650.0 -0.931196 \n", "17 2017-01-01 Aruba 5.482371 27720.0 -1.028282 \n", "18 2018-01-01 Aruba 5.257856 29030.0 3.626041 \n", "19 2019-01-01 Aruba 0.635029 30000.0 4.257462 \n", "\n", " Lending rate Forex rate Net Exports to GDP Total Reserve to GDP \\\n", "15 7.0 0.558659 10.277624 27.531731 \n", "16 6.9 0.558659 11.178386 30.918669 \n", "17 6.3 0.558659 8.646561 29.335673 \n", "18 6.0 0.558659 8.975137 30.611363 \n", "19 5.7 0.558659 9.547850 29.157277 \n", "\n", " GDP per mil \n", "15 2963.128492 \n", "16 2983.798883 \n", "17 3092.178771 \n", "18 3202.234637 \n", "19 3368.970253 " ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forex_data_2_test.head()" ] }, { "cell_type": "code", "execution_count": 50, "id": "7d40b26e", "metadata": {}, "outputs": [], "source": [ "#Putting the predicted vals in dataset.\n", "df = forex_data_2_test.assign(Linear_Predict = lr_predict)\n", "df = df.assign(Lasso_Predict = lasso_predict)\n", "df = df.assign(NN_1_Predict = nn_1_scaled_predict)\n", "df = df.assign(NN_2_Predict = nn_2_scaled_predict)" ] }, { "cell_type": "code", "execution_count": 51, "id": "be30862a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCountry NameGDP growth rateGNI per capitaInflationLending rateForex rateNet Exports to GDPTotal Reserve to GDPGDP per milLinear_PredictLasso_PredictNN_1_PredictNN_2_Predict
152015-01-01Aruba3.54363026180.00.4747647.00.55865910.27762427.5317312963.1284920.6138640.6183691.2986030.235513
162016-01-01Aruba2.11671626650.0-0.9311966.90.55865911.17838630.9186692983.7988830.5956820.6252600.8859490.245483
172017-01-01Aruba5.48237127720.0-1.0282826.30.5586598.64656129.3356733092.1787710.6144440.6409480.6175960.065980
182018-01-01Aruba5.25785629030.03.6260416.00.5586598.97513730.6113633202.2346370.6869400.6601541.1905000.478023
192019-01-01Aruba0.63502930000.04.2574625.70.5586599.54785029.1572773368.9702530.7085180.6743750.5008760.415586
\n", "
" ], "text/plain": [ " Year Country Name GDP growth rate GNI per capita Inflation \\\n", "15 2015-01-01 Aruba 3.543630 26180.0 0.474764 \n", "16 2016-01-01 Aruba 2.116716 26650.0 -0.931196 \n", "17 2017-01-01 Aruba 5.482371 27720.0 -1.028282 \n", "18 2018-01-01 Aruba 5.257856 29030.0 3.626041 \n", "19 2019-01-01 Aruba 0.635029 30000.0 4.257462 \n", "\n", " Lending rate Forex rate Net Exports to GDP Total Reserve to GDP \\\n", "15 7.0 0.558659 10.277624 27.531731 \n", "16 6.9 0.558659 11.178386 30.918669 \n", "17 6.3 0.558659 8.646561 29.335673 \n", "18 6.0 0.558659 8.975137 30.611363 \n", "19 5.7 0.558659 9.547850 29.157277 \n", "\n", " GDP per mil Linear_Predict Lasso_Predict NN_1_Predict NN_2_Predict \n", "15 2963.128492 0.613864 0.618369 1.298603 0.235513 \n", "16 2983.798883 0.595682 0.625260 0.885949 0.245483 \n", "17 3092.178771 0.614444 0.640948 0.617596 0.065980 \n", "18 3202.234637 0.686940 0.660154 1.190500 0.478023 \n", "19 3368.970253 0.708518 0.674375 0.500876 0.415586 " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "id": "739a59ec", "metadata": {}, "source": [ "Now we want to check whether each model predicts the exchange rate is going to go up or down. We do this by taking the difference between the future prdeicted exchange rate (2016, 2017, 2018, 2019 and 2020) and the predicted 2015 exchange rate. We do this for different holding years, 1 year, 2 years and so on." ] }, { "cell_type": "code", "execution_count": 52, "id": "e74ff06f", "metadata": {}, "outputs": [], "source": [ "df_2 = df.copy()" ] }, { "cell_type": "code", "execution_count": 53, "id": "c1f9961f", "metadata": { "tags": [] }, "outputs": [], "source": [ "for i in range(1,6):\n", " grouped = df_2.groupby('Country Name')\n", " df_2[f\"lag_linear_{i}\"] = grouped['Linear_Predict'].shift(i)\n", " df_2[f\"lag_lasso_{i}\"] = grouped['Lasso_Predict'].shift(i)\n", " df_2[f\"lag_nn_1_{i}\"] = grouped['NN_1_Predict'].shift(i)\n", " df_2[f\"lag_nn_2_{i}\"] = grouped['NN_2_Predict'].shift(i)" ] }, { "cell_type": "code", "execution_count": 54, "id": "41790837", "metadata": {}, "outputs": [], "source": [ "df_4 = df_2.drop([\"GDP per mil\", \"GDP growth rate\", \"GNI per capita\", \"Inflation\", \"Lending rate\", \"Net Exports to GDP\", \"Total Reserve to GDP\"], axis = 1)" ] }, { "cell_type": "code", "execution_count": 55, "id": "0c6ca9da", "metadata": {}, "outputs": [], "source": [ "a = [\"Linear_Predict\", \"Lasso_Predict\", \"NN_1_Predict\", \"NN_2_Predict\"]\n", "b = [\"lag_linear\", \"lag_lasso\", \"lag_nn_1\", \"lag_nn_2\"]\n", "for (mod, lag) in zip(a,b):\n", " for i in range(1,6):\n", " df_4[f\"diff_{mod}_{i}\"] = df_4[f\"{mod}\"] - df_4[f\"{lag}_{i}\"]" ] }, { "cell_type": "code", "execution_count": 56, "id": "25cc39e4", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCountry NameForex rateLinear_PredictLasso_PredictNN_1_PredictNN_2_Predictlag_linear_1lag_lasso_1lag_nn_1_1...diff_NN_1_Predict_1diff_NN_1_Predict_2diff_NN_1_Predict_3diff_NN_1_Predict_4diff_NN_1_Predict_5diff_NN_2_Predict_1diff_NN_2_Predict_2diff_NN_2_Predict_3diff_NN_2_Predict_4diff_NN_2_Predict_5
152015-01-01Aruba0.5586590.6138640.6183691.2986030.235513NaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
162016-01-01Aruba0.5586590.5956820.6252600.8859490.2454830.6138640.6183691.298603...-0.412653NaNNaNNaNNaN0.009970NaNNaNNaNNaN
172017-01-01Aruba0.5586590.6144440.6409480.6175960.0659800.5956820.6252600.885949...-0.268353-0.681006NaNNaNNaN-0.179503-0.169533NaNNaNNaN
182018-01-01Aruba0.5586590.6869400.6601541.1905000.4780230.6144440.6409480.617596...0.5729040.304551-0.108102NaNNaN0.4120430.2325400.242510NaNNaN
192019-01-01Aruba0.5586590.7085180.6743750.5008760.4155860.6869400.6601541.190500...-0.689624-0.116720-0.385073-0.797726NaN-0.0624370.3496060.1701020.180073NaN
\n", "

5 rows × 47 columns

\n", "
" ], "text/plain": [ " Year Country Name Forex rate Linear_Predict Lasso_Predict \\\n", "15 2015-01-01 Aruba 0.558659 0.613864 0.618369 \n", "16 2016-01-01 Aruba 0.558659 0.595682 0.625260 \n", "17 2017-01-01 Aruba 0.558659 0.614444 0.640948 \n", "18 2018-01-01 Aruba 0.558659 0.686940 0.660154 \n", "19 2019-01-01 Aruba 0.558659 0.708518 0.674375 \n", "\n", " NN_1_Predict NN_2_Predict lag_linear_1 lag_lasso_1 lag_nn_1_1 ... \\\n", "15 1.298603 0.235513 NaN NaN NaN ... \n", "16 0.885949 0.245483 0.613864 0.618369 1.298603 ... \n", "17 0.617596 0.065980 0.595682 0.625260 0.885949 ... \n", "18 1.190500 0.478023 0.614444 0.640948 0.617596 ... \n", "19 0.500876 0.415586 0.686940 0.660154 1.190500 ... \n", "\n", " diff_NN_1_Predict_1 diff_NN_1_Predict_2 diff_NN_1_Predict_3 \\\n", "15 NaN NaN NaN \n", "16 -0.412653 NaN NaN \n", "17 -0.268353 -0.681006 NaN \n", "18 0.572904 0.304551 -0.108102 \n", "19 -0.689624 -0.116720 -0.385073 \n", "\n", " diff_NN_1_Predict_4 diff_NN_1_Predict_5 diff_NN_2_Predict_1 \\\n", "15 NaN NaN NaN \n", "16 NaN NaN 0.009970 \n", "17 NaN NaN -0.179503 \n", "18 NaN NaN 0.412043 \n", "19 -0.797726 NaN -0.062437 \n", "\n", " diff_NN_2_Predict_2 diff_NN_2_Predict_3 diff_NN_2_Predict_4 \\\n", "15 NaN NaN NaN \n", "16 NaN NaN NaN \n", "17 -0.169533 NaN NaN \n", "18 0.232540 0.242510 NaN \n", "19 0.349606 0.170102 0.180073 \n", "\n", " diff_NN_2_Predict_5 \n", "15 NaN \n", "16 NaN \n", "17 NaN \n", "18 NaN \n", "19 NaN \n", "\n", "[5 rows x 47 columns]" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_4.head()" ] }, { "cell_type": "markdown", "id": "295db848", "metadata": {}, "source": [ "Now we make a different dataframe for each holding period." ] }, { "cell_type": "code", "execution_count": 57, "id": "d5823e7c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_1', 'diff_Lasso_Predict_1', 'diff_NN_1_Predict_1', 'diff_NN_2_Predict_1']\n" ] } ], "source": [ "c=['Year', 'Country Name', 'Forex rate']\n", "for mod in a:\n", " d = f\"diff_{mod}_1\"\n", " c.append(d)\n", "\n", "print(c)" ] }, { "cell_type": "code", "execution_count": 58, "id": "c123a9ef", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_2', 'diff_Lasso_Predict_2', 'diff_NN_1_Predict_2', 'diff_NN_2_Predict_2']\n" ] } ], "source": [ "e=['Year', 'Country Name', 'Forex rate']\n", "for mod in a:\n", " f = f\"diff_{mod}_2\"\n", " e.append(f)\n", "\n", "print(e)" ] }, { "cell_type": "code", "execution_count": 59, "id": "706cdd17", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_3', 'diff_Lasso_Predict_3', 'diff_NN_1_Predict_3', 'diff_NN_2_Predict_3']\n" ] } ], "source": [ "g=['Year', 'Country Name', 'Forex rate']\n", "for mod in a:\n", " h = f\"diff_{mod}_3\"\n", " g.append(h)\n", "\n", "print(g)" ] }, { "cell_type": "code", "execution_count": 60, "id": "5fd4cd53", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_4', 'diff_Lasso_Predict_4', 'diff_NN_1_Predict_4', 'diff_NN_2_Predict_4']\n" ] } ], "source": [ "i=['Year', 'Country Name', 'Forex rate']\n", "for mod in a:\n", " j = f\"diff_{mod}_4\"\n", " i.append(j)\n", "\n", "print(i)" ] }, { "cell_type": "code", "execution_count": 61, "id": "a5c860e9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Year', 'Country Name', 'Forex rate', 'diff_Linear_Predict_5', 'diff_Lasso_Predict_5', 'diff_NN_1_Predict_5', 'diff_NN_2_Predict_5']\n" ] } ], "source": [ "k=['Year', 'Country Name', 'Forex rate']\n", "for mod in a:\n", " l = f\"diff_{mod}_5\"\n", " k.append(l)\n", "\n", "print(k)" ] }, { "cell_type": "code", "execution_count": 62, "id": "3cedc457", "metadata": {}, "outputs": [], "source": [ "df_5 = df_4[c] #1 yr\n", "df_6 = df_4[e] #2 yrs\n", "df_7 = df_4[g] #3 yrs\n", "df_8 = df_4[i] #4 yrs\n", "df_9 = df_4[k] #5 yrs" ] }, { "cell_type": "code", "execution_count": 63, "id": "91c53a59", "metadata": {}, "outputs": [], "source": [ "df_5 = df_5[df_5['Year'].isin(['2015-01-01', '2016-01-01'])]\n", "df_6 = df_6[df_6['Year'].isin(['2015-01-01', '2017-01-01'])]\n", "df_7 = df_7[df_7['Year'].isin(['2015-01-01', '2018-01-01'])]\n", "df_8 = df_8[df_8['Year'].isin(['2015-01-01', '2019-01-01'])]\n", "df_9 = df_9[df_9['Year'].isin(['2015-01-01', '2020-01-01'])]" ] }, { "cell_type": "code", "execution_count": 64, "id": "0b40bbd0", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCountry NameForex ratediff_Linear_Predict_1diff_Lasso_Predict_1diff_NN_1_Predict_1diff_NN_2_Predict_1
152015-01-01Aruba0.558659NaNNaNNaNNaN
162016-01-01Aruba0.558659-0.0181820.006891-0.4126530.009970
592015-01-01Afghanistan0.016355NaNNaNNaNNaN
602016-01-01Afghanistan0.0147350.049683-0.0005860.0470960.082432
1032015-01-01Angola0.008329NaNNaNNaNNaN
\n", "
" ], "text/plain": [ " Year Country Name Forex rate diff_Linear_Predict_1 \\\n", "15 2015-01-01 Aruba 0.558659 NaN \n", "16 2016-01-01 Aruba 0.558659 -0.018182 \n", "59 2015-01-01 Afghanistan 0.016355 NaN \n", "60 2016-01-01 Afghanistan 0.014735 0.049683 \n", "103 2015-01-01 Angola 0.008329 NaN \n", "\n", " diff_Lasso_Predict_1 diff_NN_1_Predict_1 diff_NN_2_Predict_1 \n", "15 NaN NaN NaN \n", "16 0.006891 -0.412653 0.009970 \n", "59 NaN NaN NaN \n", "60 -0.000586 0.047096 0.082432 \n", "103 NaN NaN NaN " ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_5.head()" ] }, { "cell_type": "markdown", "id": "5573f2e9", "metadata": {}, "source": [ "Now we define a function which tells us whether to buy or short a currency based on whether the model predicts that the exchange rate is going to be higher or lower. " ] }, { "cell_type": "code", "execution_count": 65, "id": "de558bd2", "metadata": {}, "outputs": [], "source": [ "def buy_or_short(x):\n", " if x >= 0:\n", " return \"Buy\"\n", " elif x < 0:\n", " return \"Short\"\n", " else:\n", " return \"NaN\"" ] }, { "cell_type": "code", "execution_count": 66, "id": "574f812e", "metadata": {}, "outputs": [], "source": [ "df_10 = df_5.copy()\n", "df_10[['diff_Linear_Predict_1', 'diff_Lasso_Predict_1', 'diff_NN_1_Predict_1', 'diff_NN_2_Predict_1']] = df_10[['diff_Linear_Predict_1', 'diff_Lasso_Predict_1', 'diff_NN_1_Predict_1', 'diff_NN_2_Predict_1']].applymap(buy_or_short)\n", "df_11 = df_6.copy()\n", "df_11[['diff_Linear_Predict_2', 'diff_Lasso_Predict_2', 'diff_NN_1_Predict_2', 'diff_NN_2_Predict_2']] = df_11[['diff_Linear_Predict_2', 'diff_Lasso_Predict_2', 'diff_NN_1_Predict_2', 'diff_NN_2_Predict_2']].applymap(buy_or_short)\n", "df_12 = df_7.copy()\n", "df_12[['diff_Linear_Predict_3', 'diff_Lasso_Predict_3', 'diff_NN_1_Predict_3', 'diff_NN_2_Predict_3']] = df_12[['diff_Linear_Predict_3', 'diff_Lasso_Predict_3', 'diff_NN_1_Predict_3', 'diff_NN_2_Predict_3']].applymap(buy_or_short)\n", "df_13 = df_8.copy()\n", "df_13[['diff_Linear_Predict_4', 'diff_Lasso_Predict_4', 'diff_NN_1_Predict_4', 'diff_NN_2_Predict_4']] = df_13[['diff_Linear_Predict_4', 'diff_Lasso_Predict_4', 'diff_NN_1_Predict_4', 'diff_NN_2_Predict_4']].applymap(buy_or_short)\n", "df_14 = df_9.copy()\n", "df_14[['diff_Linear_Predict_5', 'diff_Lasso_Predict_5', 'diff_NN_1_Predict_5', 'diff_NN_2_Predict_5']] = df_14[['diff_Linear_Predict_5', 'diff_Lasso_Predict_5', 'diff_NN_1_Predict_5', 'diff_NN_2_Predict_5']].applymap(buy_or_short)" ] }, { "cell_type": "code", "execution_count": 67, "id": "e3760ab6", "metadata": {}, "outputs": [], "source": [ "group = df_10.groupby('Country Name')\n", "df_10['Forex rate lag'] = group['Forex rate'].shift(1)\n", "df_10['Forex diff'] = df_10['Forex rate'] - df_10['Forex rate lag']" ] }, { "cell_type": "code", "execution_count": 68, "id": "0e31d341", "metadata": {}, "outputs": [], "source": [ "group = df_12.groupby('Country Name')\n", "df_12['Forex rate lag'] = group['Forex rate'].shift(1)\n", "df_12['Forex diff'] = df_12['Forex rate'] - df_12['Forex rate lag']" ] }, { "cell_type": "code", "execution_count": 69, "id": "ad2a74bf", "metadata": {}, "outputs": [], "source": [ "group = df_14.groupby('Country Name')\n", "df_14['Forex rate lag'] = group['Forex rate'].shift(1)\n", "df_14['Forex diff'] = df_14['Forex rate'] - df_14['Forex rate lag']" ] }, { "cell_type": "code", "execution_count": 70, "id": "8f3ee4ae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearCountry NameForex ratediff_Linear_Predict_3diff_Lasso_Predict_3diff_NN_1_Predict_3diff_NN_2_Predict_3Forex rate lagForex diff
152015-01-01Aruba0.558659NaNNaNNaNNaNNaNNaN
182018-01-01Aruba0.558659BuyBuyShortBuy0.5586590.000000
592015-01-01Afghanistan0.016355NaNNaNNaNNaNNaNNaN
1032015-01-01Angola0.008329NaNNaNNaNNaNNaNNaN
1062018-01-01Angola0.003955BuyShortBuyShort0.008329-0.004374
1252015-01-01Albania0.007939NaNNaNNaNNaNNaNNaN
1282018-01-01Albania0.009260BuyBuyBuyShort0.0079390.001321
2352015-01-01Armenia0.002092NaNNaNNaNNaNNaNNaN
2382018-01-01Armenia0.002070BuyBuyBuyBuy0.002092-0.000022
2792015-01-01Antigua and Barbuda0.370370NaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " Year Country Name Forex rate diff_Linear_Predict_3 \\\n", "15 2015-01-01 Aruba 0.558659 NaN \n", "18 2018-01-01 Aruba 0.558659 Buy \n", "59 2015-01-01 Afghanistan 0.016355 NaN \n", "103 2015-01-01 Angola 0.008329 NaN \n", "106 2018-01-01 Angola 0.003955 Buy \n", "125 2015-01-01 Albania 0.007939 NaN \n", "128 2018-01-01 Albania 0.009260 Buy \n", "235 2015-01-01 Armenia 0.002092 NaN \n", "238 2018-01-01 Armenia 0.002070 Buy \n", "279 2015-01-01 Antigua and Barbuda 0.370370 NaN \n", "\n", " diff_Lasso_Predict_3 diff_NN_1_Predict_3 diff_NN_2_Predict_3 \\\n", "15 NaN NaN NaN \n", "18 Buy Short Buy \n", "59 NaN NaN NaN \n", "103 NaN NaN NaN \n", "106 Short Buy Short \n", "125 NaN NaN NaN \n", "128 Buy Buy Short \n", "235 NaN NaN NaN \n", "238 Buy Buy Buy \n", "279 NaN NaN NaN \n", "\n", " Forex rate lag Forex diff \n", "15 NaN NaN \n", "18 0.558659 0.000000 \n", "59 NaN NaN \n", "103 NaN NaN \n", "106 0.008329 -0.004374 \n", "125 NaN NaN \n", "128 0.007939 0.001321 \n", "235 NaN NaN \n", "238 0.002092 -0.000022 \n", "279 NaN NaN " ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_12.head(10)" ] }, { "cell_type": "markdown", "id": "aeb22c77", "metadata": {}, "source": [ "Now we can calculate the average rate of return for each model for a 1 year, 3 years and 5 years holding period." ] }, { "cell_type": "code", "execution_count": 71, "id": "13396ba6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the linear regression model and 1 year holding period is -6.682001712313225\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1716736160.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_linear_1_buy['Return_Rate'] = df_linear_1_buy['Forex diff']/df_linear_1_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for Linear reg and holding the currency for 1 year when model says to buy\n", "\n", "df_linear_1 = df_10[['Year', 'Country Name', 'diff_Linear_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_linear_1_buy = df_linear_1[df_linear_1['diff_Linear_Predict_1'].isin(['Buy'])]\n", "\n", "df_linear_1_buy['Return_Rate'] = df_linear_1_buy['Forex diff']/df_linear_1_buy['Forex rate lag'] * 100\n", "\n", "linear_1_return = df_linear_1_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the linear regression model and 1 year holding period is {linear_1_return}\")" ] }, { "cell_type": "code", "execution_count": 72, "id": "2f05d6be", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the lasso regression model and 1 year holding period is -2.2038092077826814\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/339817703.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_lasso_1_buy['Return_Rate'] = df_lasso_1_buy['Forex diff']/df_lasso_1_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for Lasso reg and holding the currency for 1 year when model says to buy\n", "\n", "df_lasso_1 = df_10[['Year', 'Country Name', 'diff_Lasso_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_lasso_1_buy = df_lasso_1[df_lasso_1['diff_Lasso_Predict_1'].isin(['Buy'])]\n", "\n", "df_lasso_1_buy['Return_Rate'] = df_lasso_1_buy['Forex diff']/df_lasso_1_buy['Forex rate lag'] * 100\n", "\n", "lasso_1_return = df_lasso_1_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the lasso regression model and 1 year holding period is {lasso_1_return}\")" ] }, { "cell_type": "code", "execution_count": 73, "id": "0e406bac", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the first neural network and 1 year holding period is -4.228952725618851\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2819496941.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_nn_1_1_buy['Return_Rate'] = df_nn_1_1_buy['Forex diff']/df_nn_1_1_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for 1st NN and holding the currency for 1 year when model says to buy\n", "\n", "df_nn_1_1 = df_10[['Year', 'Country Name', 'diff_NN_1_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_nn_1_1_buy = df_nn_1_1[df_nn_1_1['diff_NN_1_Predict_1'].isin(['Buy'])]\n", "\n", "df_nn_1_1_buy['Return_Rate'] = df_nn_1_1_buy['Forex diff']/df_nn_1_1_buy['Forex rate lag'] * 100\n", "\n", "nn_1_1_return = df_nn_1_1_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the first neural network and 1 year holding period is {nn_1_1_return}\")" ] }, { "cell_type": "code", "execution_count": 74, "id": "98f69e1c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the second neural network and 1 year holding period is -5.661574799802205\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2864552994.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_nn_2_1_buy['Return_Rate'] = df_nn_2_1_buy['Forex diff']/df_nn_2_1_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for 2nd NN and holding the currency for 1 year when model says to buy\n", "\n", "df_nn_2_1 = df_10[['Year', 'Country Name', 'diff_NN_2_Predict_1', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_nn_2_1_buy = df_nn_2_1[df_nn_2_1['diff_NN_2_Predict_1'].isin(['Buy'])]\n", "\n", "df_nn_2_1_buy['Return_Rate'] = df_nn_2_1_buy['Forex diff']/df_nn_2_1_buy['Forex rate lag'] * 100\n", "\n", "nn_2_1_return = df_nn_2_1_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the second neural network and 1 year holding period is {nn_2_1_return}\")" ] }, { "cell_type": "code", "execution_count": 75, "id": "3de89c30", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the linear regression model and 3 years holding period is -3.979679534791286\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/3889504242.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_linear_3_buy['Return_Rate'] = df_linear_3_buy['Forex diff']/df_linear_3_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for Linear reg and holding the currency for 3 years when model says to buy\n", "\n", "df_linear_3 = df_12[['Year', 'Country Name', 'diff_Linear_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_linear_3_buy = df_linear_3[df_linear_3['diff_Linear_Predict_3'].isin(['Buy'])]\n", "\n", "df_linear_3_buy['Return_Rate'] = df_linear_3_buy['Forex diff']/df_linear_3_buy['Forex rate lag'] * 100\n", "\n", "linear_3_return = df_linear_3_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the linear regression model and 3 years holding period is {linear_3_return}\")" ] }, { "cell_type": "code", "execution_count": 76, "id": "9ae7e1ef", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the lasso regression model and 3 years holding period is -1.8921836297212749\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1695144407.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_lasso_3_buy['Return_Rate'] = df_lasso_3_buy['Forex diff']/df_lasso_3_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for Lasso reg and holding the currency for 3 years when model says to buy\n", "\n", "df_lasso_3 = df_12[['Year', 'Country Name', 'diff_Lasso_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_lasso_3_buy = df_lasso_3[df_lasso_3['diff_Lasso_Predict_3'].isin(['Buy'])]\n", "\n", "df_lasso_3_buy['Return_Rate'] = df_lasso_3_buy['Forex diff']/df_lasso_3_buy['Forex rate lag'] * 100\n", "\n", "lasso_3_return = df_lasso_3_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the lasso regression model and 3 years holding period is {lasso_3_return}\")" ] }, { "cell_type": "code", "execution_count": 77, "id": "ad2a654a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the first neural network and 3 years holding period is -5.250983001440971\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2278624619.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_nn_1_3_buy['Return_Rate'] = df_nn_1_3_buy['Forex diff']/df_nn_1_3_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for 1st NN and holding the currency for 3 years when model says to buy\n", "\n", "df_nn_1_3 = df_12[['Year', 'Country Name', 'diff_NN_1_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_nn_1_3_buy = df_nn_1_3[df_nn_1_3['diff_NN_1_Predict_3'].isin(['Buy'])]\n", "\n", "df_nn_1_3_buy['Return_Rate'] = df_nn_1_3_buy['Forex diff']/df_nn_1_3_buy['Forex rate lag'] * 100\n", "\n", "nn_1_3_return = df_nn_1_3_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the first neural network and 3 years holding period is {nn_1_3_return}\")" ] }, { "cell_type": "code", "execution_count": 78, "id": "f497e890", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the second neural networl and 3 years holding period is -5.548735188527842\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/2694513607.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_nn_2_3_buy['Return_Rate'] = df_nn_2_3_buy['Forex diff']/df_nn_2_3_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for 2nd NN and holding the currency for 3 years when model says to buy\n", "\n", "df_nn_2_3 = df_12[['Year', 'Country Name', 'diff_NN_2_Predict_3', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_nn_2_3_buy = df_nn_2_3[df_nn_2_3['diff_NN_2_Predict_3'].isin(['Buy'])]\n", "\n", "df_nn_2_3_buy['Return_Rate'] = df_nn_2_3_buy['Forex diff']/df_nn_2_3_buy['Forex rate lag'] * 100\n", "\n", "nn_2_3_return = df_nn_2_3_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the second neural networl and 3 years holding period is {nn_2_3_return}\")" ] }, { "cell_type": "code", "execution_count": 79, "id": "d55e5af0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the linear regression model and 5 years holding period is -17.218054184651734\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1940798184.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_linear_5_buy['Return_Rate'] = df_linear_5_buy['Forex diff']/df_linear_5_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for Linear reg and holding the currency for 5 years when model says to buy\n", "\n", "df_linear_5 = df_14[['Year', 'Country Name', 'diff_Linear_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_linear_5_buy = df_linear_5[df_linear_5['diff_Linear_Predict_5'].isin(['Buy'])]\n", "\n", "df_linear_5_buy['Return_Rate'] = df_linear_5_buy['Forex diff']/df_linear_5_buy['Forex rate lag'] * 100\n", "\n", "linear_5_return = df_linear_5_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the linear regression model and 5 years holding period is {linear_5_return}\")" ] }, { "cell_type": "code", "execution_count": 80, "id": "319104e6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the lasso regression model and 5 years holding period is -5.925652587401351\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/885340572.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_lasso_5_buy['Return_Rate'] = df_lasso_5_buy['Forex diff']/df_lasso_5_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for Lasso reg and holding the currency for 5 years when model says to buy\n", "\n", "df_lasso_5 = df_14[['Year', 'Country Name', 'diff_Lasso_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_lasso_5_buy = df_lasso_5[df_lasso_5['diff_Lasso_Predict_5'].isin(['Buy'])]\n", "\n", "df_lasso_5_buy['Return_Rate'] = df_lasso_5_buy['Forex diff']/df_lasso_5_buy['Forex rate lag'] * 100\n", "\n", "lasso_5_return = df_lasso_5_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the lasso regression model and 5 years holding period is {lasso_5_return}\")" ] }, { "cell_type": "code", "execution_count": 81, "id": "b50085f0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the first neural network and 5 years holding period is -9.570827108863499\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/555516999.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_nn_1_5_buy['Return_Rate'] = df_nn_1_5_buy['Forex diff']/df_nn_1_5_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for 1st NN and holding the currency for 5 years when model says to buy\n", "\n", "df_nn_1_5 = df_14[['Year', 'Country Name', 'diff_NN_1_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_nn_1_5_buy = df_nn_1_5[df_nn_1_5['diff_NN_1_Predict_5'].isin(['Buy'])]\n", "\n", "df_nn_1_5_buy['Return_Rate'] = df_nn_1_5_buy['Forex diff']/df_nn_1_5_buy['Forex rate lag'] * 100\n", "\n", "nn_1_5_return = df_nn_1_5_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the first neural network and 5 years holding period is {nn_1_5_return}\")" ] }, { "cell_type": "code", "execution_count": 82, "id": "f9edea76", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The average return for the second neural network and 5 years holding period is -11.086691753890205\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/5b/jc4253hj4rl3y0jlxy2y3trm0000gn/T/ipykernel_11083/1121885410.py:7: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_nn_2_5_buy['Return_Rate'] = df_nn_2_5_buy['Forex diff']/df_nn_2_5_buy['Forex rate lag'] * 100\n" ] } ], "source": [ "#Average rate of return for 2nd NN and holding the currency for 5 years when model says to buy\n", "\n", "df_nn_2_5 = df_14[['Year', 'Country Name', 'diff_NN_2_Predict_5', 'Forex rate','Forex rate lag', 'Forex diff']]\n", "\n", "df_nn_2_5_buy = df_nn_2_5[df_nn_2_5['diff_NN_2_Predict_5'].isin(['Buy'])]\n", "\n", "df_nn_2_5_buy['Return_Rate'] = df_nn_2_5_buy['Forex diff']/df_nn_2_5_buy['Forex rate lag'] * 100\n", "\n", "nn_2_5_return = df_nn_2_5_buy['Return_Rate'].mean()\n", "\n", "print(f\"The average return for the second neural network and 5 years holding period is {nn_2_5_return}\")" ] }, { "cell_type": "markdown", "id": "daedefed", "metadata": {}, "source": [ "Below we can see the plots of these returns." ] }, { "cell_type": "code", "execution_count": 83, "id": "77f522fd-0da3-42d5-91ab-ffc90a6df256", "metadata": {}, "outputs": [], "source": [ "#Preparing Returns plot\n", "df_return_rates = pd.DataFrame({\n", " 'Model':[\"Linear Model 1 Years Hold Return\", \"Lasso Model 1 Years Hold Return\", \"1st Neural Network 1 Years Hold Return\", \"2nd Neural Network 1 Years Hold Return\",\n", " \"Linear Model 3 Years Hold Return\", \"Lasso Model 3 Years Hold Return\", \"1st Neural Network 3 Years Hold Return\", \"2nd Neural Network 3 Years Hold Return\",\n", " \"Linear Model 5 Years Hold Return\", \"Lasso Model 5 Years Hold Return\", \"1st Neural Network 5 Years Hold Return\", \"2nd Neural Network 5 Years Hold Return\"],\n", " 'Return Rate': [linear_1_return, lasso_1_return, nn_1_1_return, nn_1_3_return,\n", " linear_3_return, lasso_3_return, nn_1_3_return, nn_2_3_return,\n", " linear_5_return, lasso_5_return, nn_1_5_return, nn_2_5_return]\n", "})" ] }, { "cell_type": "code", "execution_count": 84, "id": "6a8ed12e-8fbb-4d76-9601-a40fddb7b275", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ModelReturn Rate
0Linear Model 1 Years Hold Return-6.682002
1Lasso Model 1 Years Hold Return-2.203809
21st Neural Network 1 Years Hold Return-4.228953
32nd Neural Network 1 Years Hold Return-5.250983
4Linear Model 3 Years Hold Return-3.979680
5Lasso Model 3 Years Hold Return-1.892184
61st Neural Network 3 Years Hold Return-5.250983
72nd Neural Network 3 Years Hold Return-5.548735
8Linear Model 5 Years Hold Return-17.218054
9Lasso Model 5 Years Hold Return-5.925653
101st Neural Network 5 Years Hold Return-9.570827
112nd Neural Network 5 Years Hold Return-11.086692
\n", "
" ], "text/plain": [ " Model Return Rate\n", "0 Linear Model 1 Years Hold Return -6.682002\n", "1 Lasso Model 1 Years Hold Return -2.203809\n", "2 1st Neural Network 1 Years Hold Return -4.228953\n", "3 2nd Neural Network 1 Years Hold Return -5.250983\n", "4 Linear Model 3 Years Hold Return -3.979680\n", "5 Lasso Model 3 Years Hold Return -1.892184\n", "6 1st Neural Network 3 Years Hold Return -5.250983\n", "7 2nd Neural Network 3 Years Hold Return -5.548735\n", "8 Linear Model 5 Years Hold Return -17.218054\n", "9 Lasso Model 5 Years Hold Return -5.925653\n", "10 1st Neural Network 5 Years Hold Return -9.570827\n", "11 2nd Neural Network 5 Years Hold Return -11.086692" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_return_rates" ] }, { "cell_type": "code", "execution_count": 85, "id": "83aff6fc-348e-4e99-a143-610785391671", "metadata": {}, "outputs": [], "source": [ "df_return_rates_1 = df_return_rates.loc[df_return_rates['Model'].isin([\"Linear Model 1 Years Hold Return\", \"Lasso Model 1 Years Hold Return\", \"1st Neural Network 1 Years Hold Return\", \"2nd Neural Network 1 Years Hold Return\"])]\n", "df_return_rates_3 = df_return_rates.loc[df_return_rates['Model'].isin([\"Linear Model 3 Years Hold Return\", \"Lasso Model 3 Years Hold Return\", \"1st Neural Network 3 Years Hold Return\", \"2nd Neural Network 3 Years Hold Return\"])]\n", "df_return_rates_5 = df_return_rates.loc[df_return_rates['Model'].isin([ \"Linear Model 5 Years Hold Return\", \"Lasso Model 5 Years Hold Return\", \"1st Neural Network 5 Years Hold Return\", \"2nd Neural Network 5 Years Hold Return\"])]\n" ] }, { "cell_type": "code", "execution_count": 86, "id": "49e31cb2-e32b-430d-a119-43a621d4ee11", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/uddhav/opt/anaconda3/envs/Py_2/lib/python3.8/site-packages/altair/utils/core.py:283: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.\n", " for col_name, dtype in df.dtypes.iteritems():\n" ] }, { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.VConcatChart(...)" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#A plot for the rate of returns for the different ml models\n", "# Create a chart object and specify the data source\n", "chart_1 = alt.Chart(df_return_rates_1)\n", "chart_3 = alt.Chart(df_return_rates_3)\n", "chart_5 = alt.Chart(df_return_rates_5)\n", "\n", "# Add a bar chart layer to display the average return rates for each model\n", "bars_1 = chart_1.mark_bar().encode(\n", " y='Model:N',\n", " x='Return Rate:Q',\n", " tooltip=[alt.Tooltip('Model:N', title=\"Model\"),\n", " alt.Tooltip('Return Rate:Q', title=\"Return Rate\")] \n", ").properties(width=1000, height = 200, title = \"1 Year Holding Period Models\")\n", "\n", "bars_3 = chart_3.mark_bar().encode(\n", " y='Model:N',\n", " x='Return Rate:Q',\n", " tooltip=[alt.Tooltip('Model:N', title=\"Model\"),\n", " alt.Tooltip('Return Rate:Q', title=\"Return Rate\")] \n", ").properties(width=1000, height = 200, title = \"3 Year Holding Period Models\")\n", "\n", "bars_5 = chart_5.mark_bar().encode(\n", " y='Model:N',\n", " x='Return Rate:Q',\n", " tooltip=[alt.Tooltip('Model:N', title=\"Model\"),\n", " alt.Tooltip('Return Rate:Q', title=\"Return Rate\")]\n", ").properties(width=1000, height = 200, title = \"5 Year Holding Period Models\")\n", "\n", "# # Add a text layer to display the return rates as labels on the bars\n", "# labels = bars.mark_text(\n", "# align='right',\n", "# baseline='bottom',\n", "# dy=-2\n", "# ).encode(\n", "# text='Return Rate'\n", "# )\n", "\n", "# Combine the layers into a single chart and display it\n", "alt.vconcat(bars_1, bars_3, bars_5).properties(title=\"Model Return Comparison\")" ] }, { "cell_type": "markdown", "id": "41b5e258", "metadata": {}, "source": [ "As we can see across all holding periods, the lasso regression performs the best. The mediocre performance of the neural networks could be because they are overfitting and therefore performing worse out of sample, we can see this in the mean squared error results too.\n", "\n", "We can see from the plots above that the returns are negative for all our machine learning models, while they were positive for the PPP model. " ] }, { "cell_type": "markdown", "id": "21160438", "metadata": {}, "source": [ "It is hard to predict the exchange rate even in the long run. This project has shown that theory does perform better than machine learning (based on our measuring method) in predicting exchange rates." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.16" } }, "nbformat": 4, "nbformat_minor": 5 }