{ "cells": [ { "cell_type": "markdown", "id": "d760e6e6", "metadata": {}, "source": [ "# Gini Mean Difference Portfolio Optimization\n", "\n", "In this notebook we show how we can solve a hard problem using some reformulations.\n", "\n", "## 1. Gini Optimization\n", "\n", "### 1.1 Original formulation\n", "\n", "The Gini mean difference (GMD) is a measure of dispersion and it was introduced in the context of portfolio optimization by __[Yitzhaki (1982)](https://www.researchgate.net/publication/4900733_Stochastic_Dominance_Mean_Variance_and_Gini%27s_Mean_Difference)__. However, this model is not used by practitioners due to the original formulation having a number of variables that increases proportional to $T(T-1)/2$, where $T$ is the number of observations. The original model is presented as follows:\n", "\n", "$$\n", "\\begin{equation}\n", "\\begin{aligned}\n", "& \\underset{x,\\, d}{\\text{min}} & & \\frac{1}{T(T-1)} \\sum^{T}_{i=1} \\sum^{T}_{j > i} d_{i,j} \\\\\n", "& \\text{s.t.} & & \\mu x \\geq \\bar{\\mu} \\\\\n", "& & & d_{i,j} \\geq (r_{i} -r_{j})x \\; ; \\; \\forall \\; i,j =1, \\ldots, T \\; ; \\; i < j \\\\\n", "& & & d_{i,j} \\geq -(r_{i} -r_{j})x \\\\\n", "& & & \\sum_{i=1}^{N} x_i = 1 \\\\\n", "& & & x_{i} \\geq 0 \\; ; \\; \\forall \\; i =1, \\ldots, N \\\\\n", "& & & d_{i,j} \\geq 0 \\; ; \\; \\forall \\; i,j =1, \\ldots, T \\\\\n", "\\end{aligned}\n", "\\end{equation}\n", "$$\n", "\n", "Where $r_{i}$ is the vector of returns in period $i$ and $d$ is an auxiliary variable.\n", "\n", "### 1.2 Murray's reformulation\n", "\n", "To increase the efficiency of the problem above, __[Murray (2022)](https://github.com/cvxpy/cvxpy/issues/1585)__ proposed the following reformulation:\n", "\n", "$$\n", "\\begin{equation}\n", "\\begin{aligned}\n", "& \\underset{x,\\, d}{\\text{min}} & & \\frac{1}{T(T-1)} \\sum^{T}_{i=1} \\sum^{T}_{j > i} d_{i,j} \\\\\n", "& \\text{s.t.} & & \\mu x \\geq \\bar{\\mu} \\\\\n", "& & & y = r x \\\\\n", "& & & d \\geq M y \\\\\n", "& & & d \\geq -M y \\\\\n", "& & & \\sum_{i=1}^{N} x_i = 1 \\\\\n", "& & & x_{i} \\geq 0 \\; ; \\; \\forall \\; i =1, \\ldots, N \\\\\n", "& & & d_{i,j} \\geq 0 \\; ; \\; \\forall \\; i,j =1, \\ldots, T \\\\\n", "\\end{aligned}\n", "\\end{equation}\n", "$$\n", "\n", "where \n", "$$\n", "M = \\begin{bmatrix}\n", "\\left. \\begin{matrix}\n", "-1 & 1 & 0 & 0 & \\ldots & 0 & 0\\\\\n", "-1 & 0 & 1 & 0 & \\ldots & 0 & 0\\\\\n", "\\vdots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\vdots \\\\\n", "-1 & 0 & 0 & 0 & \\ldots & 0 & 1 \\\\\n", "\\end{matrix} \\right \\} & T-1\\\\\n", "\\left. \\begin{matrix}\n", "0 & -1 & 1 & 0 & \\ldots & 0 & 0\\\\\n", "0 & -1 & 0 & 1 & \\ldots & 0 & 0\\\\\n", "\\vdots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\vdots \\\\\n", "0 & -1 & 0 & 0 & \\ldots & 0 & 1 \\\\\n", "\\end{matrix} \\right \\} & T-2\\\\ \n", "\\vdots \\\\\n", "\\underbrace{ \\left. \\begin{matrix}\n", "0 & 0 & 0 & 0 & \\ldots & -1 & 1 \\\\\n", "\\end{matrix} \\right \\} }_{T} & 1 \\\\\n", "\\end{bmatrix}\n", "$$\n", "\n", "This reformulation is more efficient for medium scale problems (T<800).\n", "\n", "### 1.3 Cajas's reformulation:\n", "\n", "__[Cajas (2021)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3988927)__ proposed an alternative reformulation based on the ordered weighted averaging (OWA) operator optimization model for monotonic weights proposed by __[Chassein and Goerigk (2015)](https://kluedo.ub.uni-kl.de/frontdoor/deliver/index/docId/3899/file/paper.pdf)__. This formulation works better for large scale problems (T>=800). This formulation is presented as follows:\n", "\n", "$$\n", "\\begin{equation}\n", "\\begin{aligned}\n", "& \\min_{\\alpha, \\, \\beta, \\, x, \\, y} & & \\sum^{T}_{i=1} \\alpha_{i} + \\beta_{i} \\\\\n", "& \\text{s.t.} & & \\mu x \\geq \\bar{\\mu} \\\\\n", "& & & r x = y \\\\\n", "& & & \\alpha_{i} + \\beta_{j} \\geq w_{i} y_{j} \\;\\;\\;\\; \\forall \\; i,j =1, \\ldots, T \\\\\n", "& & & \\sum_{i=1}^{N} x_i = 1 \\\\\n", "& & & x_i \\geq 0 \\; ; \\; \\forall \\; i =1, \\ldots, N \\\\\n", "\\end{aligned}\n", "\\end{equation}\n", "$$\n", "\n", "where $w_{i} = 2 \\left ( \\frac{2i - 1 - T}{T(T-1)} \\right )$." ] }, { "cell_type": "code", "execution_count": 1, "id": "2a19278a", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import cvxpy as cp\n", "import mosek\n", "import scipy.stats as st\n", "from timeit import default_timer as timer\n", "from datetime import timedelta\n", "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "def gini(mu, returns, D, assets, lift=0):\n", " (T, N) = returns.shape\n", "\n", " d = cp.Variable((int(T * (T - 1) / 2),1))\n", " w = cp.Variable((N,1))\n", " constraints = []\n", "\n", " if lift in ['Murray', 'Yitzhaki']: # use Murray's reformulation\n", " if lift == 'Murray': \n", " ret_w = cp.Variable((T,1))\n", " constraints.append(ret_w == returns @ w)\n", " mat = np.zeros((d.shape[0], T))\n", " \"\"\" \n", " We need to create a vector that has the following entries:\n", " ret_w[i] - ret_w[j]\n", " for j in range(T), for i in range(j+1, T).\n", " We do this by building a numpy array of mostly 0's and 1's.\n", " (It would be better to use SciPy sparse matrix objects.)\n", " \"\"\"\n", " ell = 0\n", " for j in range(T):\n", " for i in range(j+1, T):\n", " # write to mat so that (mat @ ret_w)[ell] == var_i - var_j\n", " mat[ell, i] = 1\n", " mat[ell, j] = -1\n", " ell += 1\n", " all_pairs_ret_diff = mat @ ret_w\n", " elif lift == 'Yitzhaki': # use the original formulation\n", " all_pairs_ret_diff = D @ w\n", " \n", " constraints += [d >= all_pairs_ret_diff,\n", " d >= -all_pairs_ret_diff,\n", " w >= 0,\n", " cp.sum(w) == 1,\n", " ]\n", "\n", " risk = cp.sum(d) / ((T - 1) * T)\n", "\n", " elif lift == 'Cajas':\n", " a = cp.Variable((T,1))\n", " b = cp.Variable((T,1))\n", " y = cp.Variable((T,1))\n", "\n", " owa_w = []\n", " for i in range(1,T+1):\n", " owa_w.append(2*i - 1 - T)\n", " owa_w = np.array(owa_w) / (T * (T-1))\n", "\n", " constraints = [returns @ w == y,\n", " w >= 0,\n", " cp.sum(w) == 1]\n", " \n", " for i in range(T):\n", " constraints += [a[i] + b >= cp.multiply(owa_w[i], y)]\n", "\n", " risk = cp.sum(a + b)\n", "\n", " objective = cp.Minimize(risk * 1000)\n", "\n", " prob = cp.Problem(objective, constraints)\n", "\n", " try:\n", " prob.solve(solver=cp.MOSEK,\n", " mosek_params={'MSK_IPAR_NUM_THREADS': 2},\n", " verbose=False,)\n", " w = pd.DataFrame(w.value)\n", " w.index = assets\n", " w = w / w.sum()\n", " except:\n", " w = None\n", " return w\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "583f7d9a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Yitzhaki-200-100\n", "Murray-200-100\n", "Cajas-200-100\n", "Yitzhaki-500-100\n", "Murray-500-100\n", "Cajas-500-100\n", "Murray-700-100\n", "Cajas-700-100\n", "Murray-1000-100\n", "Cajas-1000-100\n" ] } ], "source": [ "####################################\n", "# Calculating the Gini Portfolio\n", "# using three formulations\n", "####################################\n", "\n", "rs = np.random.RandomState(123)\n", "\n", "sizes = [100]\n", "data = {}\n", "weights = {}\n", "lifts = ['Yitzhaki', 'Murray', 'Cajas']\n", "k = 0\n", "T = [200, 500, 700, 1000]\n", "\n", "for t in T:\n", " for n in sizes:\n", " \n", " cov = rs.rand(n,n) * 1.5 - 0.5\n", " cov = cov @ cov.T/1000 + np.diag(rs.rand(n) * 0.7 + 0.3)/1000\n", " mean = np.zeros(n) + 1/1000\n", "\n", " Y = st.multivariate_normal.rvs(mean=mean, cov=cov, size=t, random_state=rs)\n", " Y = pd.DataFrame(Y)\n", " assets = ['Asset '+ str(i) for i in range(1,n+1)]\n", " mu = Y.mean().to_numpy()\n", " returns = Y.to_numpy()\n", "\n", " D = np.array([]).reshape(0,len(assets))\n", " for j in range(0, returns.shape[0]-1):\n", " D = np.concatenate((D, returns[j+1:] - returns[j,:]), axis=0)\n", "\n", " for lift in lifts:\n", " name = str(lift) + '-' + str(t) + '-' + str(n)\n", " data[name] = []\n", " weights[name] = []\n", " if t >= 700 and lift == 'Yitzhaki':\n", " continue\n", " else: \n", " start = timer()\n", " w = gini(mu, returns, D, assets, lift=lift)\n", " end = timer()\n", " data[name].append(timedelta(seconds=end-start).total_seconds())\n", " weights[name].append(w)\n", " \n", " k += 1\n", " print(name)" ] }, { "cell_type": "code", "execution_count": 3, "id": "c5a1f62b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Time in Seconds
Yitzhaki-200-1009.55
Murray-200-1000.88
Cajas-200-1001.59
Yitzhaki-500-10099.33
Murray-500-10011.49
Cajas-500-10013.67
Murray-700-10027.19
Cajas-700-10029.12
Murray-1000-10096.62
Cajas-1000-10080.50
\n", "
" ], "text/plain": [ " Time in Seconds\n", "Yitzhaki-200-100 9.55\n", "Murray-200-100 0.88\n", "Cajas-200-100 1.59\n", "Yitzhaki-500-100 99.33\n", "Murray-500-100 11.49\n", "Cajas-500-100 13.67\n", "Murray-700-100 27.19\n", "Cajas-700-100 29.12\n", "Murray-1000-100 96.62\n", "Cajas-1000-100 80.50" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "keys = list(data.keys())\n", "for i in keys:\n", " if len(data[i]) == 0:\n", " del data[i]\n", "\n", "pd.options.display.float_format = '{:.2f}'.format\n", "\n", "a = pd.DataFrame(data).T\n", "a.columns = ['Time in Seconds']\n", "display(a)" ] }, { "cell_type": "markdown", "id": "9edea478", "metadata": {}, "source": [ "As we can see, as the number of observations $T$ increases the formulation proposed by Cajas (2021) becomes more efficient than Yitzhaki's and Murray's formulations." ] }, { "cell_type": "code", "execution_count": 4, "id": "ed94c8e2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
Yitzhaki-200-100-0.0413%
Murray-200-100-0.0413%
Cajas-200-100-0.0414%
Yitzhaki-500-100-0.2315%
Murray-500-100-0.2315%
Cajas-500-100-0.2313%
Murray-700-1000.1601%
Cajas-700-1000.1602%
Murray-1000-1000.0339%
Cajas-1000-1000.0339%
\n", "
" ], "text/plain": [ " 0\n", "Yitzhaki-200-100 -0.0413%\n", "Murray-200-100 -0.0413%\n", "Cajas-200-100 -0.0414%\n", "Yitzhaki-500-100 -0.2315%\n", "Murray-500-100 -0.2315%\n", "Cajas-500-100 -0.2313%\n", "Murray-700-100 0.1601%\n", "Cajas-700-100 0.1602%\n", "Murray-1000-100 0.0339%\n", "Cajas-1000-100 0.0339%" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "b = pd.DataFrame([])\n", "for i in weights.keys():\n", " if len(weights[i]) == 0:\n", " continue\n", " weights[i][0].columns = [i]\n", " b = pd.concat([b , mu @ weights[i][0]],axis=0)\n", " \n", "pd.options.display.float_format = '{:.4%}'.format\n", "\n", "display(b)" ] }, { "cell_type": "markdown", "id": "d035ab01", "metadata": {}, "source": [ "If we calculate the expected returns for each formulation, we can see that the three models give us the same results, which means that these formulations give us the same solution." ] }, { "cell_type": "markdown", "id": "de8cc7f7", "metadata": {}, "source": [ "For more portfolio optimization models and applications, you can see the CVXPY based library __[Riskfolio-Lib](https://github.com/dcajasn/Riskfolio-Lib)__.\n", "\n", "Finally, I hope you liked this example." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }