{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "LaTeX macros (hidden cell)\n", "$\n", "\\newcommand{\\Q}{\\mathcal{Q}}\n", "\\newcommand{\\ECov}{\\boldsymbol{\\Sigma}}\n", "\\newcommand{\\EMean}{\\boldsymbol{\\mu}}\n", "\\newcommand{\\EAlpha}{\\boldsymbol{\\alpha}}\n", "\\newcommand{\\EBeta}{\\boldsymbol{\\beta}}\n", "$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Imports and configuration" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": false }, "outputs": [], "source": [ "import sys\n", "import os\n", "import re\n", "import datetime as dt\n", "\n", "import numpy as np\n", "import pandas as pd\n", "%matplotlib inline\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "from matplotlib.colors import LinearSegmentedColormap\n", "\n", "from mosek.fusion import *\n", "\n", "from notebook.services.config import ConfigManager\n", "\n", "from portfolio_tools import data_download, DataReader, compute_inputs" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.9.7 (default, Sep 16 2021, 13:09:58) \n", "[GCC 7.5.0]\n", "matplotlib: 3.7.2\n" ] } ], "source": [ "# Version checks\n", "print(sys.version)\n", "print('matplotlib: {}'.format(matplotlib.__version__))\n", "\n", "# Jupyter configuration\n", "c = ConfigManager()\n", "c.update('notebook', {\"CodeCell\": {\"cm_config\": {\"autoCloseBrackets\": False}}}) \n", "\n", "# Numpy options\n", "np.set_printoptions(precision=5, linewidth=120, suppress=True)\n", "\n", "# Pandas options\n", "pd.set_option('display.max_rows', None)\n", "\n", "# Matplotlib options\n", "plt.rcParams['figure.figsize'] = [12, 8]\n", "plt.rcParams['figure.dpi'] = 200" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Prepare input data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we load the raw data that will be used to compute the optimization input variables, the vector $\\EMean$ of expected returns and the covariance matrix $\\ECov$. The data consists of daily stock prices of $8$ stocks from the US market. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download data" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Data downloading:\n", "# If the user has an API key for alphavantage.co, then this code part will download the data. \n", "# The code can be modified to download from other sources. To be able to run the examples, \n", "# and reproduce results in the cookbook, the files have to have the following format and content:\n", "# - File name pattern: \"daily_adjusted_[TICKER].csv\", where TICKER is the symbol of a stock. \n", "# - The file contains at least columns \"timestamp\", \"adjusted_close\", and \"volume\".\n", "# - The data is daily price/volume, covering at least the period from 2016-03-18 until 2021-03-18, \n", "# - Files are for the stocks PM, LMT, MCD, MMM, AAPL, MSFT, TXN, CSCO.\n", "list_stocks = [\"PM\", \"LMT\", \"MCD\", \"MMM\", \"AAPL\", \"MSFT\", \"TXN\", \"CSCO\"]\n", "list_factors = []\n", "alphaToken = None\n", " \n", "list_tickers = list_stocks + list_factors\n", "if alphaToken is not None:\n", " data_download(list_tickers, alphaToken) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We load the daily stock price data from the downloaded CSV files. The data is adjusted for splits and dividends. Then a selected time period is taken from the data." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "investment_start = \"2016-03-18\"\n", "investment_end = \"2021-03-18\"" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found data files: \n", "stock_data/daily_adjusted_AAPL.csv\n", "stock_data/daily_adjusted_PM.csv\n", "stock_data/daily_adjusted_CSCO.csv\n", "stock_data/daily_adjusted_TXN.csv\n", "stock_data/daily_adjusted_MMM.csv\n", "stock_data/daily_adjusted_IWM.csv\n", "stock_data/daily_adjusted_MCD.csv\n", "stock_data/daily_adjusted_SPY.csv\n", "stock_data/daily_adjusted_MSFT.csv\n", "stock_data/daily_adjusted_LMT.csv\n", "\n", "Using data files: \n", "stock_data/daily_adjusted_PM.csv\n", "stock_data/daily_adjusted_LMT.csv\n", "stock_data/daily_adjusted_MCD.csv\n", "stock_data/daily_adjusted_MMM.csv\n", "stock_data/daily_adjusted_AAPL.csv\n", "stock_data/daily_adjusted_MSFT.csv\n", "stock_data/daily_adjusted_TXN.csv\n", "stock_data/daily_adjusted_CSCO.csv\n", "\n" ] } ], "source": [ "# The files are in \"stock_data\" folder, named as \"daily_adjusted_[TICKER].csv\"\n", "dr = DataReader(folder_path=\"stock_data\", symbol_list=list_tickers)\n", "dr.read_data()\n", "df_prices, _ = dr.get_period(start_date=investment_start, end_date=investment_end)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Run the optimization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define the optimization model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we generate return scenario data using Monte Carlo method based on the moments of historical prices. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Number of scenarios\n", "T = 99999\n", "\n", "# Mean and covariance of historical log-returns. \n", "m_log, S_log = compute_inputs(df_prices, return_log=True)\n", "\n", "# Generate logarithmic return scenarios assuming normal distribution\n", "scenarios_log = np.random.default_rng().multivariate_normal(m_log, S_log, T)\n", " \n", "# Convert logarithmic return scenarios to linear return scenarios \n", "scenarios_lin = np.exp(scenarios_log) - 1\n", "\n", "# Scenario probabilities\n", "p = np.ones(T) / T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We would like to optimize the 95% EVaR of the portfolio loss distribution." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Confidence level\n", "alpha = 0.95" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Primal CVaR formula (for testing)\n", "def CVaR(alpha, p, q):\n", " # We need to be careful that math index starts from 1 but numpy starts from 0 (matters in formulas like ceil(alpha * T))\n", " T = q.shape[0]\n", " \n", " # Starting index \n", " i_alpha = np.sort(np.nonzero(np.cumsum(p) >= alpha)[0])[0]\n", "\n", " # Weight of VaR component in CVaR\n", " lambda_alpha = (sum(p[:(i_alpha + 1)]) - alpha) / (1 - alpha) \n", " \n", " # CVaR\n", " sort_idx = np.argsort(q)\n", " sorted_q = q[sort_idx]\n", " sorted_p = p[sort_idx]\n", " cvar = lambda_alpha * sorted_q[i_alpha] + np.dot(sorted_p[(i_alpha + 1):], sorted_q[(i_alpha + 1):]) / (1 - alpha)\n", " \n", " return cvar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below we implement the optimization model in Fusion API. We create it inside a function so we can call it later." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# s * log(sum_i(p_i * exp(x_i / s))) <= t\n", "def persplogsumexp(M, x, s, p, t):\n", " n = int(x.getSize())\n", " u = M.variable(n)\n", " M.constraint(Expr.hstack(u, Var.repeat(s, n), Expr.sub(x, Var.repeat(t, n))), Domain.inPExpCone())\n", " M.constraint(Expr.sub(Expr.dot(p, u), s), Domain.lessThan(0.0))\n", " \n", "\n", "def EfficientFrontier(N, T, m, R, p, alpha, deltas):\n", "\n", " with Model(\"Case study\") as M:\n", " # Settings\n", " #M.setLogHandler(sys.stdout)\n", " \n", " # Variables \n", " # The variable x is the fraction of holdings relative to the initial capital.\n", " # It is constrained to take only positive values. \n", " x = M.variable(\"x\", N, Domain.greaterThan(0.0))\n", " \n", " # Budget constraint\n", " M.constraint('budget', Expr.sum(x), Domain.equalsTo(1.0))\n", " \n", " # Auxiliary variables.\n", " z = M.variable(\"z\", 1, Domain.unbounded())\n", " t = M.variable(\"t\", 1, Domain.greaterThan(0.0))\n", " \n", " # Constraint modeling perspective of log-sum-exp\n", " persplogsumexp(M, Expr.mul(-R, x), t, p, z)\n", " \n", " # Objective\n", " delta = M.parameter()\n", " evar_term = Expr.sub(z, Expr.mul(t, np.log(1.0 - alpha)))\n", " M.objective('obj', ObjectiveSense.Maximize, Expr.sub(Expr.dot(m, x), Expr.mul(delta, evar_term)))\n", " \n", " # Create DataFrame to store the results.\n", " columns = [\"delta\", \"obj\", \"return\", \"risk\", \"evar\"] + df_prices.columns.tolist()\n", " df_result = pd.DataFrame(columns=columns)\n", " for d in deltas:\n", " # Update parameter\n", " delta.setValue(d);\n", " \n", " # Solve optimization\n", " M.solve()\n", " # Check if the solution is an optimal point\n", " solsta = M.getPrimalSolutionStatus()\n", " if (solsta != SolutionStatus.Optimal):\n", " # See https://docs.mosek.com/latest/pythonfusion/accessing-solution.html about handling solution statuses.\n", " raise Exception(\"Unexpected solution status!\")\n", "\n", " # Save results\n", " portfolio_return = m @ x.level()\n", " portfolio_risk = z.level()[0] - t.level()[0] * np.log(1.0 - alpha)\n", " cvar = CVaR(alpha, p, -R @ x.level())\n", " row = pd.Series([d, M.primalObjValue(), portfolio_return, portfolio_risk, cvar] + list(x.level()), index=columns)\n", " df_result = pd.concat([df_result, pd.DataFrame([row])], ignore_index=True)\n", " \n", " # Check CVaR value using primal formula\n", " print(f\"Relative difference between EVaR and CVaR (%): {(cvar - portfolio_risk) / portfolio_risk * 100}\")\n", "\n", " return df_result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compute optimization input variables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we use the loaded daily price data to compute the corresponding yearly mean return and covariance matrix." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Number of securities\n", "N = df_prices.shape[1] \n", "\n", "# Get optimization parameters\n", "m, _ = compute_inputs(df_prices)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Call the optimizer function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We run the optimization for a range of risk aversion parameter values: $\\delta = 10^{-1},\\dots,10^{2}$. We compute the efficient frontier this way both with and without using shrinkage estimation. " ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Relative difference between EVaR and CVaR (%): -33.42378204398512\n", "Relative difference between EVaR and CVaR (%): -33.45646972266584\n", "Relative difference between EVaR and CVaR (%): -33.50204787771225\n", "Relative difference between EVaR and CVaR (%): -33.56488606024007\n", "Relative difference between EVaR and CVaR (%): -33.65252039086718\n", "Relative difference between EVaR and CVaR (%): -33.77210072922806\n", "Relative difference between EVaR and CVaR (%): -33.90833145658367\n", "Relative difference between EVaR and CVaR (%): -34.06647761773668\n", "Relative difference between EVaR and CVaR (%): -34.228864613910865\n", "Relative difference between EVaR and CVaR (%): -34.32076597469258\n", "Relative difference between EVaR and CVaR (%): -34.26903206118584\n", "Relative difference between EVaR and CVaR (%): -34.0913142099972\n", "Relative difference between EVaR and CVaR (%): -33.706820776915826\n", "Relative difference between EVaR and CVaR (%): -32.87550621993\n", "Relative difference between EVaR and CVaR (%): -32.13821514072266\n", "Relative difference between EVaR and CVaR (%): -31.414840069754284\n", "Relative difference between EVaR and CVaR (%): -29.94969115976413\n", "Relative difference between EVaR and CVaR (%): -27.216047490936095\n", "Relative difference between EVaR and CVaR (%): -23.044884017560292\n", "Relative difference between EVaR and CVaR (%): -20.845617388377192\n" ] } ], "source": [ "# Compute efficient frontier with and without shrinkage\n", "deltas = np.logspace(start=-1, stop=2, num=20)[::-1]\n", "df_result = EfficientFrontier(N, T, m, scenarios_lin, p, alpha, deltas)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check the results." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | delta | \n", "obj | \n", "return | \n", "risk | \n", "evar | \n", "PM | \n", "LMT | \n", "MCD | \n", "MMM | \n", "AAPL | \n", "MSFT | \n", "TXN | \n", "CSCO | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "100.000000 | \n", "-14.809388 | \n", "0.347111 | \n", "0.151565 | \n", "0.100906 | \n", "-2.712381e-10 | \n", "1.494921e-02 | \n", "1.363686e-01 | \n", "2.281447e-10 | \n", "0.090220 | \n", "5.360870e-01 | \n", "2.223747e-01 | \n", "3.692416e-08 | \n", "
1 | \n", "69.519280 | \n", "-10.189479 | \n", "0.347760 | \n", "0.151573 | \n", "0.100862 | \n", "-1.494799e-10 | \n", "1.380761e-02 | \n", "1.347412e-01 | \n", "8.227581e-10 | \n", "0.090979 | \n", "5.380318e-01 | \n", "2.224405e-01 | \n", "2.026158e-08 | \n", "
2 | \n", "48.329302 | \n", "-6.977512 | \n", "0.348694 | \n", "0.151589 | \n", "0.100804 | \n", "-2.945664e-10 | \n", "1.216126e-02 | \n", "1.323962e-01 | \n", "-2.890110e-10 | \n", "0.092075 | \n", "5.408365e-01 | \n", "2.225308e-01 | \n", "5.209527e-10 | \n", "
3 | \n", "33.598183 | \n", "-4.744225 | \n", "0.350042 | \n", "0.151623 | \n", "0.100731 | \n", "-1.268907e-10 | \n", "9.786780e-03 | \n", "1.290179e-01 | \n", "-1.236874e-10 | \n", "0.093662 | \n", "5.448811e-01 | \n", "2.226519e-01 | \n", "-5.141085e-11 | \n", "
4 | \n", "23.357215 | \n", "-3.191162 | \n", "0.351989 | \n", "0.151694 | \n", "0.100645 | \n", "9.499372e-11 | \n", "6.357014e-03 | \n", "1.241467e-01 | \n", "1.313828e-09 | \n", "0.095965 | \n", "5.507230e-01 | \n", "2.228084e-01 | \n", "7.211346e-08 | \n", "
5 | \n", "16.237767 | \n", "-2.110756 | \n", "0.354809 | \n", "0.151841 | \n", "0.100561 | \n", "-3.079113e-10 | \n", "1.390277e-03 | \n", "1.171098e-01 | \n", "-2.952811e-10 | \n", "0.099325 | \n", "5.591803e-01 | \n", "2.229945e-01 | \n", "-1.388656e-10 | \n", "
6 | \n", "11.288379 | \n", "-1.358693 | \n", "0.358261 | \n", "0.152099 | \n", "0.100525 | \n", "-4.289182e-10 | \n", "1.324001e-07 | \n", "1.038537e-01 | \n", "-3.300871e-10 | \n", "0.104016 | \n", "5.700917e-01 | \n", "2.220389e-01 | \n", "4.122753e-09 | \n", "
7 | \n", "7.847600 | \n", "-0.834635 | \n", "0.363036 | \n", "0.152616 | \n", "0.100625 | \n", "-7.075282e-10 | \n", "1.695273e-08 | \n", "8.366961e-02 | \n", "-2.133383e-10 | \n", "0.110821 | \n", "5.854145e-01 | \n", "2.200944e-01 | \n", "4.496131e-09 | \n", "
8 | \n", "5.455595 | \n", "-0.468526 | \n", "0.369963 | \n", "0.153693 | \n", "0.101086 | \n", "-6.042409e-10 | \n", "5.839406e-08 | \n", "5.453391e-02 | \n", "1.396929e-10 | \n", "0.120911 | \n", "6.076218e-01 | \n", "2.169333e-01 | \n", "1.057438e-08 | \n", "
9 | \n", "3.792690 | \n", "-0.211414 | \n", "0.380061 | \n", "0.155952 | \n", "0.102428 | \n", "4.562780e-09 | \n", "1.048043e-07 | \n", "1.233776e-02 | \n", "9.746963e-09 | \n", "0.135994 | \n", "6.399893e-01 | \n", "2.116789e-01 | \n", "7.122438e-08 | \n", "
10 | \n", "2.636651 | \n", "-0.030093 | \n", "0.384901 | \n", "0.157394 | \n", "0.103457 | \n", "-2.675437e-10 | \n", "8.275791e-09 | \n", "1.358343e-07 | \n", "7.311167e-10 | \n", "0.153269 | \n", "6.565735e-01 | \n", "1.901573e-01 | \n", "7.955828e-09 | \n", "
11 | \n", "1.832981 | \n", "0.096914 | \n", "0.388271 | \n", "0.158953 | \n", "0.104764 | \n", "1.299553e-09 | \n", "4.993040e-08 | \n", "4.729194e-07 | \n", "2.645259e-09 | \n", "0.176696 | \n", "6.686228e-01 | \n", "1.546808e-01 | \n", "3.657430e-08 | \n", "
12 | \n", "1.274275 | \n", "0.186450 | \n", "0.393112 | \n", "0.162180 | \n", "0.107514 | \n", "-2.211452e-10 | \n", "6.756127e-10 | \n", "4.488115e-08 | \n", "9.512924e-10 | \n", "0.211567 | \n", "6.840754e-01 | \n", "1.043575e-01 | \n", "4.037568e-09 | \n", "
13 | \n", "0.885867 | \n", "0.250502 | \n", "0.400035 | \n", "0.168798 | \n", "0.113305 | \n", "2.755646e-09 | \n", "-1.266690e-11 | \n", "2.424045e-08 | \n", "3.777747e-10 | \n", "0.264047 | \n", "7.022571e-01 | \n", "3.369572e-02 | \n", "2.769004e-09 | \n", "
14 | \n", "0.615848 | \n", "0.297040 | \n", "0.404439 | \n", "0.174391 | \n", "0.118345 | \n", "-2.052848e-11 | \n", "-1.011557e-11 | \n", "1.175816e-11 | \n", "-1.748460e-11 | \n", "0.319658 | \n", "6.803416e-01 | \n", "8.585560e-10 | \n", "-6.242794e-12 | \n", "
15 | \n", "0.428133 | \n", "0.330133 | \n", "0.406807 | \n", "0.179091 | \n", "0.122830 | \n", "5.774546e-10 | \n", "3.875773e-10 | \n", "2.384531e-08 | \n", "1.150304e-09 | \n", "0.385380 | \n", "6.146204e-01 | \n", "2.126529e-08 | \n", "2.074802e-09 | \n", "
16 | \n", "0.297635 | \n", "0.354048 | \n", "0.410417 | \n", "0.189392 | \n", "0.132669 | \n", "6.395015e-10 | \n", "6.525280e-10 | \n", "1.226104e-09 | \n", "2.599322e-09 | \n", "0.485545 | \n", "5.144548e-01 | \n", "3.869882e-08 | \n", "3.423996e-09 | \n", "
17 | \n", "0.206914 | \n", "0.372056 | \n", "0.415926 | \n", "0.212022 | \n", "0.154318 | \n", "6.981106e-10 | \n", "1.975800e-09 | \n", "5.781582e-09 | \n", "3.504131e-10 | \n", "0.638413 | \n", "3.615865e-01 | \n", "1.405329e-08 | \n", "2.153144e-09 | \n", "
18 | \n", "0.143845 | \n", "0.386679 | \n", "0.424172 | \n", "0.260645 | \n", "0.200580 | \n", "2.219064e-09 | \n", "6.181270e-09 | \n", "1.355847e-08 | \n", "4.917751e-09 | \n", "0.867205 | \n", "1.327948e-01 | \n", "1.368773e-08 | \n", "2.201932e-09 | \n", "
19 | \n", "0.100000 | \n", "0.399278 | \n", "0.428958 | \n", "0.296801 | \n", "0.234931 | \n", "-1.725104e-12 | \n", "1.257411e-11 | \n", "2.114564e-11 | \n", "-4.253230e-13 | \n", "1.000000 | \n", "1.343903e-08 | \n", "1.405113e-10 | \n", "4.463059e-12 | \n", "