{
"cells": [
{
"cell_type": "markdown",
"id": "d760e6e6",
"metadata": {},
"source": [
"# Gini Mean Difference Portfolio Optimization\n",
"\n",
"In this notebook we show how we can solve a hard problem using some reformulations.\n",
"\n",
"## 1. Gini Optimization\n",
"\n",
"### 1.1 Original formulation\n",
"\n",
"The Gini mean difference (GMD) is a measure of dispersion and it was introduced in the context of portfolio optimization by __[Yitzhaki (1982)](https://www.researchgate.net/publication/4900733_Stochastic_Dominance_Mean_Variance_and_Gini%27s_Mean_Difference)__. However, this model is not used by practitioners due to the original formulation having a number of variables that increases proportional to $T(T-1)/2$, where $T$ is the number of observations. The original model is presented as follows:\n",
"\n",
"$$\n",
"\\begin{equation}\n",
"\\begin{aligned}\n",
"& \\underset{x,\\, d}{\\text{min}} & & \\frac{1}{T(T-1)} \\sum^{T}_{i=1} \\sum^{T}_{j > i} d_{i,j} \\\\\n",
"& \\text{s.t.} & & \\mu x \\geq \\bar{\\mu} \\\\\n",
"& & & d_{i,j} \\geq (r_{i} -r_{j})x \\; ; \\; \\forall \\; i,j =1, \\ldots, T \\; ; \\; i < j \\\\\n",
"& & & d_{i,j} \\geq -(r_{i} -r_{j})x \\\\\n",
"& & & \\sum_{i=1}^{N} x_i = 1 \\\\\n",
"& & & x_{i} \\geq 0 \\; ; \\; \\forall \\; i =1, \\ldots, N \\\\\n",
"& & & d_{i,j} \\geq 0 \\; ; \\; \\forall \\; i,j =1, \\ldots, T \\\\\n",
"\\end{aligned}\n",
"\\end{equation}\n",
"$$\n",
"\n",
"Where $r_{i}$ is the vector of returns in period $i$ and $d$ is an auxiliary variable.\n",
"\n",
"### 1.2 Murray's reformulation\n",
"\n",
"To increase the efficiency of the problem above, __[Murray (2022)](https://github.com/cvxpy/cvxpy/issues/1585)__ proposed the following reformulation:\n",
"\n",
"$$\n",
"\\begin{equation}\n",
"\\begin{aligned}\n",
"& \\underset{x,\\, d}{\\text{min}} & & \\frac{1}{T(T-1)} \\sum^{T}_{i=1} \\sum^{T}_{j > i} d_{i,j} \\\\\n",
"& \\text{s.t.} & & \\mu x \\geq \\bar{\\mu} \\\\\n",
"& & & y = r x \\\\\n",
"& & & d \\geq M y \\\\\n",
"& & & d \\geq -M y \\\\\n",
"& & & \\sum_{i=1}^{N} x_i = 1 \\\\\n",
"& & & x_{i} \\geq 0 \\; ; \\; \\forall \\; i =1, \\ldots, N \\\\\n",
"& & & d_{i,j} \\geq 0 \\; ; \\; \\forall \\; i,j =1, \\ldots, T \\\\\n",
"\\end{aligned}\n",
"\\end{equation}\n",
"$$\n",
"\n",
"where \n",
"$$\n",
"M = \\begin{bmatrix}\n",
"\\left. \\begin{matrix}\n",
"-1 & 1 & 0 & 0 & \\ldots & 0 & 0\\\\\n",
"-1 & 0 & 1 & 0 & \\ldots & 0 & 0\\\\\n",
"\\vdots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\vdots \\\\\n",
"-1 & 0 & 0 & 0 & \\ldots & 0 & 1 \\\\\n",
"\\end{matrix} \\right \\} & T-1\\\\\n",
"\\left. \\begin{matrix}\n",
"0 & -1 & 1 & 0 & \\ldots & 0 & 0\\\\\n",
"0 & -1 & 0 & 1 & \\ldots & 0 & 0\\\\\n",
"\\vdots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\vdots \\\\\n",
"0 & -1 & 0 & 0 & \\ldots & 0 & 1 \\\\\n",
"\\end{matrix} \\right \\} & T-2\\\\ \n",
"\\vdots \\\\\n",
"\\underbrace{ \\left. \\begin{matrix}\n",
"0 & 0 & 0 & 0 & \\ldots & -1 & 1 \\\\\n",
"\\end{matrix} \\right \\} }_{T} & 1 \\\\\n",
"\\end{bmatrix}\n",
"$$\n",
"\n",
"This reformulation is more efficient for medium scale problems (T<800).\n",
"\n",
"### 1.3 Cajas's reformulation:\n",
"\n",
"__[Cajas (2021)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3988927)__ proposed an alternative reformulation based on the ordered weighted averaging (OWA) operator optimization model for monotonic weights proposed by __[Chassein and Goerigk (2015)](https://kluedo.ub.uni-kl.de/frontdoor/deliver/index/docId/3899/file/paper.pdf)__. This formulation works better for large scale problems (T>=800). This formulation is presented as follows:\n",
"\n",
"$$\n",
"\\begin{equation}\n",
"\\begin{aligned}\n",
"& \\min_{\\alpha, \\, \\beta, \\, x, \\, y} & & \\sum^{T}_{i=1} \\alpha_{i} + \\beta_{i} \\\\\n",
"& \\text{s.t.} & & \\mu x \\geq \\bar{\\mu} \\\\\n",
"& & & r x = y \\\\\n",
"& & & \\alpha_{i} + \\beta_{j} \\geq w_{i} y_{j} \\;\\;\\;\\; \\forall \\; i,j =1, \\ldots, T \\\\\n",
"& & & \\sum_{i=1}^{N} x_i = 1 \\\\\n",
"& & & x_i \\geq 0 \\; ; \\; \\forall \\; i =1, \\ldots, N \\\\\n",
"\\end{aligned}\n",
"\\end{equation}\n",
"$$\n",
"\n",
"where $w_{i} = 2 \\left ( \\frac{2i - 1 - T}{T(T-1)} \\right )$."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2a19278a",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import cvxpy as cp\n",
"import mosek\n",
"import scipy.stats as st\n",
"from timeit import default_timer as timer\n",
"from datetime import timedelta\n",
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"def gini(mu, returns, D, assets, lift=0):\n",
" (T, N) = returns.shape\n",
"\n",
" d = cp.Variable((int(T * (T - 1) / 2),1))\n",
" w = cp.Variable((N,1))\n",
" constraints = []\n",
"\n",
" if lift in ['Murray', 'Yitzhaki']: # use Murray's reformulation\n",
" if lift == 'Murray': \n",
" ret_w = cp.Variable((T,1))\n",
" constraints.append(ret_w == returns @ w)\n",
" mat = np.zeros((d.shape[0], T))\n",
" \"\"\" \n",
" We need to create a vector that has the following entries:\n",
" ret_w[i] - ret_w[j]\n",
" for j in range(T), for i in range(j+1, T).\n",
" We do this by building a numpy array of mostly 0's and 1's.\n",
" (It would be better to use SciPy sparse matrix objects.)\n",
" \"\"\"\n",
" ell = 0\n",
" for j in range(T):\n",
" for i in range(j+1, T):\n",
" # write to mat so that (mat @ ret_w)[ell] == var_i - var_j\n",
" mat[ell, i] = 1\n",
" mat[ell, j] = -1\n",
" ell += 1\n",
" all_pairs_ret_diff = mat @ ret_w\n",
" elif lift == 'Yitzhaki': # use the original formulation\n",
" all_pairs_ret_diff = D @ w\n",
" \n",
" constraints += [d >= all_pairs_ret_diff,\n",
" d >= -all_pairs_ret_diff,\n",
" w >= 0,\n",
" cp.sum(w) == 1,\n",
" ]\n",
"\n",
" risk = cp.sum(d) / ((T - 1) * T)\n",
"\n",
" elif lift == 'Cajas':\n",
" a = cp.Variable((T,1))\n",
" b = cp.Variable((T,1))\n",
" y = cp.Variable((T,1))\n",
"\n",
" owa_w = []\n",
" for i in range(1,T+1):\n",
" owa_w.append(2*i - 1 - T)\n",
" owa_w = np.array(owa_w) / (T * (T-1))\n",
"\n",
" constraints = [returns @ w == y,\n",
" w >= 0,\n",
" cp.sum(w) == 1]\n",
" \n",
" for i in range(T):\n",
" constraints += [a[i] + b >= cp.multiply(owa_w[i], y)]\n",
"\n",
" risk = cp.sum(a + b)\n",
"\n",
" objective = cp.Minimize(risk * 1000)\n",
"\n",
" prob = cp.Problem(objective, constraints)\n",
"\n",
" try:\n",
" prob.solve(solver=cp.MOSEK,\n",
" mosek_params={'MSK_IPAR_NUM_THREADS': 2},\n",
" verbose=False,)\n",
" w = pd.DataFrame(w.value)\n",
" w.index = assets\n",
" w = w / w.sum()\n",
" except:\n",
" w = None\n",
" return w\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "583f7d9a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Yitzhaki-200-100\n",
"Murray-200-100\n",
"Cajas-200-100\n",
"Yitzhaki-500-100\n",
"Murray-500-100\n",
"Cajas-500-100\n",
"Murray-700-100\n",
"Cajas-700-100\n",
"Murray-1000-100\n",
"Cajas-1000-100\n"
]
}
],
"source": [
"####################################\n",
"# Calculating the Gini Portfolio\n",
"# using three formulations\n",
"####################################\n",
"\n",
"rs = np.random.RandomState(123)\n",
"\n",
"sizes = [100]\n",
"data = {}\n",
"weights = {}\n",
"lifts = ['Yitzhaki', 'Murray', 'Cajas']\n",
"k = 0\n",
"T = [200, 500, 700, 1000]\n",
"\n",
"for t in T:\n",
" for n in sizes:\n",
" \n",
" cov = rs.rand(n,n) * 1.5 - 0.5\n",
" cov = cov @ cov.T/1000 + np.diag(rs.rand(n) * 0.7 + 0.3)/1000\n",
" mean = np.zeros(n) + 1/1000\n",
"\n",
" Y = st.multivariate_normal.rvs(mean=mean, cov=cov, size=t, random_state=rs)\n",
" Y = pd.DataFrame(Y)\n",
" assets = ['Asset '+ str(i) for i in range(1,n+1)]\n",
" mu = Y.mean().to_numpy()\n",
" returns = Y.to_numpy()\n",
"\n",
" D = np.array([]).reshape(0,len(assets))\n",
" for j in range(0, returns.shape[0]-1):\n",
" D = np.concatenate((D, returns[j+1:] - returns[j,:]), axis=0)\n",
"\n",
" for lift in lifts:\n",
" name = str(lift) + '-' + str(t) + '-' + str(n)\n",
" data[name] = []\n",
" weights[name] = []\n",
" if t >= 700 and lift == 'Yitzhaki':\n",
" continue\n",
" else: \n",
" start = timer()\n",
" w = gini(mu, returns, D, assets, lift=lift)\n",
" end = timer()\n",
" data[name].append(timedelta(seconds=end-start).total_seconds())\n",
" weights[name].append(w)\n",
" \n",
" k += 1\n",
" print(name)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c5a1f62b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Time in Seconds | \n",
"
\n",
" \n",
" \n",
" \n",
" Yitzhaki-200-100 | \n",
" 9.55 | \n",
"
\n",
" \n",
" Murray-200-100 | \n",
" 0.88 | \n",
"
\n",
" \n",
" Cajas-200-100 | \n",
" 1.59 | \n",
"
\n",
" \n",
" Yitzhaki-500-100 | \n",
" 99.33 | \n",
"
\n",
" \n",
" Murray-500-100 | \n",
" 11.49 | \n",
"
\n",
" \n",
" Cajas-500-100 | \n",
" 13.67 | \n",
"
\n",
" \n",
" Murray-700-100 | \n",
" 27.19 | \n",
"
\n",
" \n",
" Cajas-700-100 | \n",
" 29.12 | \n",
"
\n",
" \n",
" Murray-1000-100 | \n",
" 96.62 | \n",
"
\n",
" \n",
" Cajas-1000-100 | \n",
" 80.50 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Time in Seconds\n",
"Yitzhaki-200-100 9.55\n",
"Murray-200-100 0.88\n",
"Cajas-200-100 1.59\n",
"Yitzhaki-500-100 99.33\n",
"Murray-500-100 11.49\n",
"Cajas-500-100 13.67\n",
"Murray-700-100 27.19\n",
"Cajas-700-100 29.12\n",
"Murray-1000-100 96.62\n",
"Cajas-1000-100 80.50"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"keys = list(data.keys())\n",
"for i in keys:\n",
" if len(data[i]) == 0:\n",
" del data[i]\n",
"\n",
"pd.options.display.float_format = '{:.2f}'.format\n",
"\n",
"a = pd.DataFrame(data).T\n",
"a.columns = ['Time in Seconds']\n",
"display(a)"
]
},
{
"cell_type": "markdown",
"id": "9edea478",
"metadata": {},
"source": [
"As we can see, as the number of observations $T$ increases the formulation proposed by Cajas (2021) becomes more efficient than Yitzhaki's and Murray's formulations."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ed94c8e2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
"
\n",
" \n",
" \n",
" \n",
" Yitzhaki-200-100 | \n",
" -0.0413% | \n",
"
\n",
" \n",
" Murray-200-100 | \n",
" -0.0413% | \n",
"
\n",
" \n",
" Cajas-200-100 | \n",
" -0.0414% | \n",
"
\n",
" \n",
" Yitzhaki-500-100 | \n",
" -0.2315% | \n",
"
\n",
" \n",
" Murray-500-100 | \n",
" -0.2315% | \n",
"
\n",
" \n",
" Cajas-500-100 | \n",
" -0.2313% | \n",
"
\n",
" \n",
" Murray-700-100 | \n",
" 0.1601% | \n",
"
\n",
" \n",
" Cajas-700-100 | \n",
" 0.1602% | \n",
"
\n",
" \n",
" Murray-1000-100 | \n",
" 0.0339% | \n",
"
\n",
" \n",
" Cajas-1000-100 | \n",
" 0.0339% | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0\n",
"Yitzhaki-200-100 -0.0413%\n",
"Murray-200-100 -0.0413%\n",
"Cajas-200-100 -0.0414%\n",
"Yitzhaki-500-100 -0.2315%\n",
"Murray-500-100 -0.2315%\n",
"Cajas-500-100 -0.2313%\n",
"Murray-700-100 0.1601%\n",
"Cajas-700-100 0.1602%\n",
"Murray-1000-100 0.0339%\n",
"Cajas-1000-100 0.0339%"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"b = pd.DataFrame([])\n",
"for i in weights.keys():\n",
" if len(weights[i]) == 0:\n",
" continue\n",
" weights[i][0].columns = [i]\n",
" b = pd.concat([b , mu @ weights[i][0]],axis=0)\n",
" \n",
"pd.options.display.float_format = '{:.4%}'.format\n",
"\n",
"display(b)"
]
},
{
"cell_type": "markdown",
"id": "d035ab01",
"metadata": {},
"source": [
"If we calculate the expected returns for each formulation, we can see that the three models give us the same results, which means that these formulations give us the same solution."
]
},
{
"cell_type": "markdown",
"id": "de8cc7f7",
"metadata": {},
"source": [
"For more portfolio optimization models and applications, you can see the CVXPY based library __[Riskfolio-Lib](https://github.com/dcajasn/Riskfolio-Lib)__.\n",
"\n",
"Finally, I hope you liked this example."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}