{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d760e6e6",
   "metadata": {},
   "source": [
    "# Gini Mean Difference Portfolio Optimization\n",
    "\n",
    "In this notebook we show how we can solve a hard problem using some reformulations.\n",
    "\n",
    "## 1. Gini Optimization\n",
    "\n",
    "### 1.1 Original formulation\n",
    "\n",
    "The Gini mean difference (GMD) is a measure of dispersion and it was introduced in the context of portfolio optimization by __[Yitzhaki (1982)](https://www.researchgate.net/publication/4900733_Stochastic_Dominance_Mean_Variance_and_Gini%27s_Mean_Difference)__. However, this model is not used by practitioners due to the original formulation having a number of variables that increases proportional to $T(T-1)/2$, where $T$ is the number of observations. The original model is presented as follows:\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "\\begin{aligned}\n",
    "& \\underset{x,\\, d}{\\text{min}} & & \\frac{1}{T(T-1)} \\sum^{T}_{i=1} \\sum^{T}_{j > i} d_{i,j} \\\\\n",
    "& \\text{s.t.} & & \\mu x \\geq \\bar{\\mu} \\\\\n",
    "& & &  d_{i,j} \\geq (r_{i} -r_{j})x \\; ; \\; \\forall \\; i,j =1, \\ldots, T \\; ; \\; i < j \\\\\n",
    "& & &  d_{i,j} \\geq -(r_{i} -r_{j})x \\\\\n",
    "& & &  \\sum_{i=1}^{N} x_i = 1 \\\\\n",
    "& & &  x_{i} \\geq 0 \\; ; \\; \\forall \\; i =1, \\ldots, N \\\\\n",
    "& & &  d_{i,j} \\geq 0 \\; ; \\; \\forall \\; i,j =1, \\ldots, T \\\\\n",
    "\\end{aligned}\n",
    "\\end{equation}\n",
    "$$\n",
    "\n",
    "Where $r_{i}$ is the vector of returns in period $i$ and $d$ is an auxiliary variable.\n",
    "\n",
    "### 1.2 Murray's reformulation\n",
    "\n",
    "To increase the efficiency of the problem above, __[Murray (2022)](https://github.com/cvxpy/cvxpy/issues/1585)__ proposed the following reformulation:\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "\\begin{aligned}\n",
    "& \\underset{x,\\, d}{\\text{min}} & & \\frac{1}{T(T-1)} \\sum^{T}_{i=1} \\sum^{T}_{j > i} d_{i,j} \\\\\n",
    "& \\text{s.t.} & & \\mu x \\geq \\bar{\\mu} \\\\\n",
    "& & & y = r x \\\\\n",
    "& & & d \\geq M y \\\\\n",
    "& & & d \\geq -M y \\\\\n",
    "& & &  \\sum_{i=1}^{N} x_i = 1 \\\\\n",
    "& & &  x_{i} \\geq 0 \\; ; \\; \\forall \\; i =1, \\ldots, N \\\\\n",
    "& & &  d_{i,j} \\geq 0 \\; ; \\; \\forall \\; i,j =1, \\ldots, T \\\\\n",
    "\\end{aligned}\n",
    "\\end{equation}\n",
    "$$\n",
    "\n",
    "where \n",
    "$$\n",
    "M = \\begin{bmatrix}\n",
    "\\left. \\begin{matrix}\n",
    "-1 & 1 & 0 & 0 & \\ldots & 0 & 0\\\\\n",
    "-1 & 0 & 1 & 0 & \\ldots & 0 & 0\\\\\n",
    "\\vdots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots & \\vdots \\\\\n",
    "-1 & 0 & 0 & 0 & \\ldots & 0 & 1 \\\\\n",
    "\\end{matrix} \\right \\} & T-1\\\\\n",
    "\\left. \\begin{matrix}\n",
    "0 & -1 & 1 & 0 & \\ldots & 0 & 0\\\\\n",
    "0 &  -1 & 0 & 1 & \\ldots & 0 & 0\\\\\n",
    "\\vdots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots &  \\vdots \\\\\n",
    "0 & -1 & 0 & 0 & \\ldots & 0 & 1 \\\\\n",
    "\\end{matrix} \\right \\} & T-2\\\\          \n",
    "\\vdots \\\\\n",
    "\\underbrace{ \\left. \\begin{matrix}\n",
    "0 & 0 & 0 & 0 & \\ldots & -1 & 1 \\\\\n",
    "\\end{matrix} \\right \\} }_{T} & 1 \\\\\n",
    "\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "This reformulation is more efficient for medium scale problems (T<800).\n",
    "\n",
    "### 1.3 Cajas's reformulation:\n",
    "\n",
    "__[Cajas (2021)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3988927)__ proposed an alternative reformulation based on the ordered weighted averaging (OWA) operator optimization model for monotonic weights proposed by __[Chassein and Goerigk (2015)](https://kluedo.ub.uni-kl.de/frontdoor/deliver/index/docId/3899/file/paper.pdf)__. This formulation works better for large scale problems (T>=800). This formulation is presented as follows:\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "\\begin{aligned}\n",
    "& \\min_{\\alpha, \\, \\beta, \\, x, \\, y} & & \\sum^{T}_{i=1} \\alpha_{i} + \\beta_{i}  \\\\\n",
    "& \\text{s.t.} & & \\mu x \\geq \\bar{\\mu} \\\\\n",
    "& & & r x = y \\\\\n",
    "& & & \\alpha_{i} + \\beta_{j} \\geq w_{i} y_{j}  \\;\\;\\;\\; \\forall \\; i,j =1, \\ldots, T \\\\\n",
    "& & &  \\sum_{i=1}^{N} x_i = 1 \\\\\n",
    "& & &  x_i \\geq 0 \\; ; \\; \\forall \\; i =1, \\ldots, N \\\\\n",
    "\\end{aligned}\n",
    "\\end{equation}\n",
    "$$\n",
    "\n",
    "where $w_{i} =  2 \\left ( \\frac{2i - 1 - T}{T(T-1)} \\right )$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2a19278a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import cvxpy as cp\n",
    "import mosek\n",
    "import scipy.stats as st\n",
    "from timeit import default_timer as timer\n",
    "from datetime import timedelta\n",
    "import warnings\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "\n",
    "def gini(mu, returns, D, assets, lift=0):\n",
    "    (T, N) = returns.shape\n",
    "\n",
    "    d = cp.Variable((int(T * (T - 1) / 2),1))\n",
    "    w = cp.Variable((N,1))\n",
    "    constraints = []\n",
    "\n",
    "    if lift in ['Murray', 'Yitzhaki']:  # use Murray's reformulation\n",
    "        if lift == 'Murray':    \n",
    "            ret_w = cp.Variable((T,1))\n",
    "            constraints.append(ret_w == returns @ w)\n",
    "            mat = np.zeros((d.shape[0], T))\n",
    "            \"\"\" \n",
    "            We need to create a vector that has the following entries:\n",
    "                ret_w[i] - ret_w[j]\n",
    "            for j in range(T), for i in range(j+1, T).\n",
    "            We do this by building a numpy array of mostly 0's and 1's.\n",
    "            (It would be better to use SciPy sparse matrix objects.)\n",
    "            \"\"\"\n",
    "            ell = 0\n",
    "            for j in range(T):\n",
    "                for i in range(j+1, T):\n",
    "                    # write to mat so that (mat @ ret_w)[ell] == var_i - var_j\n",
    "                    mat[ell, i] = 1\n",
    "                    mat[ell, j] = -1\n",
    "                    ell += 1\n",
    "            all_pairs_ret_diff = mat @ ret_w\n",
    "        elif lift == 'Yitzhaki':  # use the original formulation\n",
    "            all_pairs_ret_diff = D @ w\n",
    "    \n",
    "        constraints += [d >= all_pairs_ret_diff,\n",
    "                        d >= -all_pairs_ret_diff,\n",
    "                        w >= 0,\n",
    "                        cp.sum(w) == 1,\n",
    "                       ]\n",
    "\n",
    "        risk = cp.sum(d) / ((T - 1) * T)\n",
    "\n",
    "    elif lift == 'Cajas':\n",
    "        a = cp.Variable((T,1))\n",
    "        b = cp.Variable((T,1))\n",
    "        y = cp.Variable((T,1))\n",
    "\n",
    "        owa_w = []\n",
    "        for i in range(1,T+1):\n",
    "            owa_w.append(2*i - 1 - T)\n",
    "        owa_w = np.array(owa_w) / (T * (T-1))\n",
    "\n",
    "        constraints = [returns @ w == y,\n",
    "                       w >= 0,\n",
    "                       cp.sum(w) == 1]\n",
    "   \n",
    "        for i in range(T):\n",
    "            constraints += [a[i] + b >= cp.multiply(owa_w[i], y)]\n",
    "\n",
    "        risk = cp.sum(a + b)\n",
    "\n",
    "    objective = cp.Minimize(risk * 1000)\n",
    "\n",
    "    prob = cp.Problem(objective, constraints)\n",
    "\n",
    "    try:\n",
    "        prob.solve(solver=cp.MOSEK,\n",
    "                   mosek_params={'MSK_IPAR_NUM_THREADS': 2},\n",
    "                   verbose=False,)\n",
    "        w = pd.DataFrame(w.value)\n",
    "        w.index = assets\n",
    "        w = w / w.sum()\n",
    "    except:\n",
    "        w = None\n",
    "    return w\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "583f7d9a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Yitzhaki-200-100\n",
      "Murray-200-100\n",
      "Cajas-200-100\n",
      "Yitzhaki-500-100\n",
      "Murray-500-100\n",
      "Cajas-500-100\n",
      "Murray-700-100\n",
      "Cajas-700-100\n",
      "Murray-1000-100\n",
      "Cajas-1000-100\n"
     ]
    }
   ],
   "source": [
    "####################################\n",
    "# Calculating the Gini Portfolio\n",
    "# using three formulations\n",
    "####################################\n",
    "\n",
    "rs = np.random.RandomState(123)\n",
    "\n",
    "sizes = [100]\n",
    "data = {}\n",
    "weights = {}\n",
    "lifts = ['Yitzhaki', 'Murray', 'Cajas']\n",
    "k = 0\n",
    "T = [200, 500, 700, 1000]\n",
    "\n",
    "for t in T:\n",
    "    for n in sizes:\n",
    "        \n",
    "        cov = rs.rand(n,n) * 1.5 - 0.5\n",
    "        cov = cov @ cov.T/1000 + np.diag(rs.rand(n) * 0.7 + 0.3)/1000\n",
    "        mean = np.zeros(n) + 1/1000\n",
    "\n",
    "        Y = st.multivariate_normal.rvs(mean=mean, cov=cov, size=t, random_state=rs)\n",
    "        Y = pd.DataFrame(Y)\n",
    "        assets = ['Asset '+ str(i) for i in range(1,n+1)]\n",
    "        mu = Y.mean().to_numpy()\n",
    "        returns = Y.to_numpy()\n",
    "\n",
    "        D = np.array([]).reshape(0,len(assets))\n",
    "        for j in range(0, returns.shape[0]-1):\n",
    "            D = np.concatenate((D, returns[j+1:] - returns[j,:]), axis=0)\n",
    "\n",
    "        for lift in lifts:\n",
    "            name = str(lift) + '-' + str(t) + '-' + str(n)\n",
    "            data[name] = []\n",
    "            weights[name] = []\n",
    "            if t >= 700 and lift == 'Yitzhaki':\n",
    "                continue\n",
    "            else: \n",
    "                start = timer()\n",
    "                w = gini(mu, returns, D, assets, lift=lift)\n",
    "                end = timer()\n",
    "                data[name].append(timedelta(seconds=end-start).total_seconds())\n",
    "                weights[name].append(w)\n",
    "                \n",
    "            k += 1\n",
    "            print(name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c5a1f62b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time in Seconds</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Yitzhaki-200-100</th>\n",
       "      <td>9.55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Murray-200-100</th>\n",
       "      <td>0.88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Cajas-200-100</th>\n",
       "      <td>1.59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yitzhaki-500-100</th>\n",
       "      <td>99.33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Murray-500-100</th>\n",
       "      <td>11.49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Cajas-500-100</th>\n",
       "      <td>13.67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Murray-700-100</th>\n",
       "      <td>27.19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Cajas-700-100</th>\n",
       "      <td>29.12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Murray-1000-100</th>\n",
       "      <td>96.62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Cajas-1000-100</th>\n",
       "      <td>80.50</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  Time in Seconds\n",
       "Yitzhaki-200-100             9.55\n",
       "Murray-200-100               0.88\n",
       "Cajas-200-100                1.59\n",
       "Yitzhaki-500-100            99.33\n",
       "Murray-500-100              11.49\n",
       "Cajas-500-100               13.67\n",
       "Murray-700-100              27.19\n",
       "Cajas-700-100               29.12\n",
       "Murray-1000-100             96.62\n",
       "Cajas-1000-100              80.50"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "keys = list(data.keys())\n",
    "for i in keys:\n",
    "    if len(data[i]) == 0:\n",
    "        del data[i]\n",
    "\n",
    "pd.options.display.float_format = '{:.2f}'.format\n",
    "\n",
    "a = pd.DataFrame(data).T\n",
    "a.columns = ['Time in Seconds']\n",
    "display(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9edea478",
   "metadata": {},
   "source": [
    "As we can see, as the number of observations $T$ increases the formulation proposed by Cajas (2021) becomes more efficient than Yitzhaki's and Murray's formulations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ed94c8e2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Yitzhaki-200-100</th>\n",
       "      <td>-0.0413%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Murray-200-100</th>\n",
       "      <td>-0.0413%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Cajas-200-100</th>\n",
       "      <td>-0.0414%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yitzhaki-500-100</th>\n",
       "      <td>-0.2315%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Murray-500-100</th>\n",
       "      <td>-0.2315%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Cajas-500-100</th>\n",
       "      <td>-0.2313%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Murray-700-100</th>\n",
       "      <td>0.1601%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Cajas-700-100</th>\n",
       "      <td>0.1602%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Murray-1000-100</th>\n",
       "      <td>0.0339%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Cajas-1000-100</th>\n",
       "      <td>0.0339%</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        0\n",
       "Yitzhaki-200-100 -0.0413%\n",
       "Murray-200-100   -0.0413%\n",
       "Cajas-200-100    -0.0414%\n",
       "Yitzhaki-500-100 -0.2315%\n",
       "Murray-500-100   -0.2315%\n",
       "Cajas-500-100    -0.2313%\n",
       "Murray-700-100    0.1601%\n",
       "Cajas-700-100     0.1602%\n",
       "Murray-1000-100   0.0339%\n",
       "Cajas-1000-100    0.0339%"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "b = pd.DataFrame([])\n",
    "for i in weights.keys():\n",
    "    if len(weights[i]) == 0:\n",
    "        continue\n",
    "    weights[i][0].columns = [i]\n",
    "    b = pd.concat([b , mu @ weights[i][0]],axis=0)\n",
    "    \n",
    "pd.options.display.float_format = '{:.4%}'.format\n",
    "\n",
    "display(b)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d035ab01",
   "metadata": {},
   "source": [
    "If we calculate the expected returns for each formulation, we can see that the three models give us the same results, which means that these formulations give us the same solution."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de8cc7f7",
   "metadata": {},
   "source": [
    "For more portfolio optimization models and applications, you can see the CVXPY based library __[Riskfolio-Lib](https://github.com/dcajasn/Riskfolio-Lib)__.\n",
    "\n",
    "Finally, I hope you liked this example."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}