{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <center>Block 4: optimal assignments</center>\n",
    "### <center>Alfred Galichon (NYU & Sciences Po)</center>\n",
    "## <center>'math+econ+code' masterclass on optimal transport and economic applications</center>\n",
    "<center>© 2018-2021 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274, and contributions by Jules Baudet, Pauline Corblet, Gregory Dannay, and James Nesbit are acknowledged.</center>\n",
    "\n",
    "#### <center>With python code examples</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimal Transport I: Discrete Transport\n",
    "\n",
    "### Learning Objectives\n",
    "\n",
    "* Optimal assignment problem\n",
    "\n",
    "* Pairwise stability, Walrasian equilibrium\n",
    "\n",
    "* Computation\n",
    "\n",
    "### References\n",
    "\n",
    "* Galichon, *Optimal Transport Methods in Economics*, Ch. 3\n",
    "\n",
    "* Roth, Sotomayor(1990). *Two-Sided Matching*. Cambridge.\n",
    "\n",
    "* Koopmans and Beckmann (1957). \"Assignment problems and the location of economic activities.\" *Econometrica*.\n",
    "\n",
    "* Shapley and Shubik (1972). The assignment game I: The core.\" *IJGT*.\n",
    "\n",
    "* Becker (1993). *A Treatise of the Family*. Harvard.\n",
    "\n",
    "* Gretsky, Ostroy, and Zame (1992). \"The nonatomic assignment model.\" *Economic Theory*.\n",
    "\n",
    "* Burkard, Dell'Amico, and Martello (2012). *Assignment Problems*. SIAM.\n",
    "\n",
    "* Dupuy and Galichon (2014). \"Personality traits and the marriage market.\" *JPE*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Motivation\n",
    "\n",
    "### Optimal Transport\n",
    "\n",
    "Consider the problem of assigning a possibly infinite number of workers\n",
    "and firms.\n",
    "\n",
    "* Each worker should work for one firm, and each firm should hire one worker.\n",
    "\n",
    "* Workers and firms have heterogenous characteristics; let $x\\in \\mathcal{X}$ and $y\\in\\mathcal{Y}$ be the characteristics of workers and firms respectively.\n",
    "\n",
    "* Workers and firms are in equal mass, which is normalized to one. The distribution of worker's types is $P$, and the distribution of the firm's types is $Q$, where $P$ and $Q$ are probability measures on $\\mathcal{X}$ and $\\mathcal{Y}$.\n",
    "\n",
    "It is assumed that if a worker $x$ matches with a firm $y$, the total output generated is $\\Phi_{xy}$. The questions are then:\n",
    "\n",
    "* **Optimality:** what is the optimal assignment in the sense that it maximizes the overal output generated?\n",
    "\n",
    "* **Equilibrium:** what are the equilibrium assignment and the equilibrium wages?\n",
    "\n",
    "* **Efficiency:** do these two notions coincide?\n",
    "\n",
    "The same tools have been used by Gary Becker to study the heterosexual marriage market, where $x$ is the man's characteristics, and $y$ is the woman's characteristics, and \"wages\" are replaced by \"transfers\".\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data\n",
    "\n",
    "In this block, we shall take a first look at marriage data (while a worker-firm example will be seen in next block). Dupuy and Galichon (JPE, 2014) study a marriage dataset where, in addition to usual socio-demographic variables (such as education and age), measures of personality traits are reported.\n",
    "\n",
    "* The literature on quantitative psychology argues that one can capture relatively well an individual's personality along five dimensions, the \"big 5\" - consciousness, extraversion, agreableness, emotional stability, autonomy - assessed though a standardized questionaire.\n",
    "\n",
    "* In addition to this, we observed a (self-assessed) measure of health, risk-aversion, education, height and body mass index = weight in kg/ (height in m)$^2$. In total, the available characteristics $x_{i}$ of man $i$ and $y_{j}$ of woman $j$ are both $10$-dimensional vectors.\n",
    "\n",
    "* It is assumed that the surplus of interaction is given by $\\Phi\\left(x_{i},y_{j}\\right)  =x_{i}^{\\intercal}Ay_{j}$, where $A$ is a *given* $10 \\times 10$ matrix. (later in this course, we'll see how to estimate $A$ based on matched marital data).\n",
    "\n",
    "Today, we solve a central planner's problem (a stylized version of the problem OKCupids would solve): given a population of men and a population of women, how do we mutually assign these in order to:\n",
    "\n",
    "1. maximize matching surplus \n",
    "\n",
    "2. attain a (hopefully) stable assignment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import scipy.sparse as spr\n",
    "import os\n",
    "# !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here\n",
    "import gurobipy as grb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "thepath = os.getcwd()\n",
    "\n",
    "#thepath = os.path.join(os.getcwd(),'data_mec_optim/marriage_personality-traits/')\n",
    "thepath = 'https://raw.githubusercontent.com/math-econ-code/mec_optim_2021-01/master/data_mec_optim/marriage_personality-traits/'\n",
    "\n",
    "data_X = pd.read_csv(thepath + \"Xvals.csv\")\n",
    "data_Y = pd.read_csv(thepath + \"Yvals.csv\")\n",
    "affdf = pd.read_csv(thepath + \"affinitymatrix.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>educm</th>\n",
       "      <th>heightm</th>\n",
       "      <th>BMIm</th>\n",
       "      <th>healthm</th>\n",
       "      <th>consm</th>\n",
       "      <th>extram</th>\n",
       "      <th>agreem</th>\n",
       "      <th>emom</th>\n",
       "      <th>autom</th>\n",
       "      <th>riskym</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>186</td>\n",
       "      <td>28.905075</td>\n",
       "      <td>3</td>\n",
       "      <td>-0.752877</td>\n",
       "      <td>-0.360787</td>\n",
       "      <td>-0.711276</td>\n",
       "      <td>-0.291031</td>\n",
       "      <td>0.840217</td>\n",
       "      <td>0.479437</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>176</td>\n",
       "      <td>27.440599</td>\n",
       "      <td>3</td>\n",
       "      <td>0.345542</td>\n",
       "      <td>-0.805524</td>\n",
       "      <td>-0.251796</td>\n",
       "      <td>-0.305475</td>\n",
       "      <td>-0.064454</td>\n",
       "      <td>0.030303</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>187</td>\n",
       "      <td>23.163374</td>\n",
       "      <td>3</td>\n",
       "      <td>-0.759678</td>\n",
       "      <td>0.898007</td>\n",
       "      <td>-0.029462</td>\n",
       "      <td>-0.672859</td>\n",
       "      <td>-0.961691</td>\n",
       "      <td>-0.556598</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>184</td>\n",
       "      <td>29.241493</td>\n",
       "      <td>2</td>\n",
       "      <td>-0.455688</td>\n",
       "      <td>-1.053375</td>\n",
       "      <td>-0.041612</td>\n",
       "      <td>0.436133</td>\n",
       "      <td>0.121873</td>\n",
       "      <td>0.992084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>174</td>\n",
       "      <td>23.781214</td>\n",
       "      <td>4</td>\n",
       "      <td>-1.440239</td>\n",
       "      <td>1.163730</td>\n",
       "      <td>0.293750</td>\n",
       "      <td>-0.538922</td>\n",
       "      <td>0.782285</td>\n",
       "      <td>-1.401034</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   educm  heightm       BMIm  healthm     consm    extram    agreem      emom  \\\n",
       "0      2      186  28.905075        3 -0.752877 -0.360787 -0.711276 -0.291031   \n",
       "1      2      176  27.440599        3  0.345542 -0.805524 -0.251796 -0.305475   \n",
       "2      3      187  23.163374        3 -0.759678  0.898007 -0.029462 -0.672859   \n",
       "3      1      184  29.241493        2 -0.455688 -1.053375 -0.041612  0.436133   \n",
       "4      1      174  23.781214        4 -1.440239  1.163730  0.293750 -0.538922   \n",
       "\n",
       "      autom    riskym  \n",
       "0  0.840217  0.479437  \n",
       "1 -0.064454  0.030303  \n",
       "2 -0.961691 -0.556598  \n",
       "3  0.121873  0.992084  \n",
       "4  0.782285 -1.401034  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Discrete Monge-Kantorovich Theorem\n",
    "\n",
    "Assume that the type spaces $\\mathcal{X}$ and $\\mathcal{Y}$ are finite, so $\\mathcal{X=}\\left\\{  1,...,N\\right\\}  $, and $\\mathcal{Y}=\\left\\{1,...,M\\right\\}$. \n",
    "\n",
    "The mass of workers of type $x$ is $p_{x}$; the mass of jobs of type $y$ is $q_{y}$, with $\\sum_{x}p_{x}=\\sum_{y}q_{y}=1$.\n",
    "\n",
    "Let $\\pi_{xy}$ be the mass of workers of type $x$ assigned to jobs of type $y$. Every worker is busy and every job is filled, thus\n",
    "\n",
    "<a name=\"dicsr-constraints\"></a>\n",
    "\\begin{align*}\n",
    "\\pi\\in\\mathcal{M}\\left(p,q\\right) :=\\left\\{ \\pi_{xy}\\geq 0, \\sum_{y\\in\\mathcal{Y}}\\pi_{xy}=p_{x}\\text{ and }\\sum_{x\\in\\mathcal{X}}\\pi\n",
    "_{xy}=q_{y} \\right\\}. \n",
    "\\end{align*}\n",
    "\n",
    "(Note that this formulation allows for mixing, i.e. it allows for $\\pi_{xy}>0$ and $\\pi_{xy^{\\prime}}>0$ to hold simultaneously with $y\\neq y^{\\prime}$.)\n",
    "\n",
    "\n",
    "Assume the economic output created when assigning worker $x$ to job $y$ is $\\Phi_{xy}$. Hence, the optimal assignment is\n",
    "\n",
    "<a name=\"OAP\"></a>\n",
    "\\begin{align*}\n",
    "\\max_{\\pi\\geq0}  &  \\sum_{xy}\\pi_{xy}\\Phi_{xy} \\\\\n",
    "s.t.~  &  \\sum_{y\\in\\mathcal{Y}}\\pi_{xy}=p_{x} \\left[u_{x}\\right] \\\\\n",
    "&  \\sum_{x\\in\\mathcal{X}}\\pi_{xy}=q_{y}~\\left[v_{y}\\right]\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Duality\n",
    "\n",
    "---\n",
    "**Theorem**\n",
    "1. The value of the [primal problem](#OAP) *coincides with the value of the dual problem*\n",
    "\n",
    "    <a name=\"dual-discr\"></a>\n",
    "    \\begin{align*}\n",
    "    \\min_{u,v}  &  \\sum_{x\\in\\mathcal{X}}p_{x}u_{x}+\\sum_{y\\in\\mathcal{Y}}\n",
    "    q_{y}v_{y}\\\\\n",
    "    s.t.  &  u_{x}+v_{y}\\geq\\Phi_{xy}~\\left[\\pi_{xy}\\geq0\\right]\n",
    "    \\end{align*}\n",
    "\n",
    "2. Both the primal and the dual problems have optimal solutions. If $\\pi$ *is a solution to the primal problem* and $\\left(u,v\\right)$ is *a solution to the dual problem, then by complementary slackness*,\n",
    "\n",
    "    <a name=\"complSlack\"> </a>\n",
    "    \\begin{align*}\n",
    "    \\pi_{xy}>0\\text{ implies }u_{x}+v_{y}=\\Phi_{xy}.\n",
    "    \\end{align*}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Proof**. The proof follows from the min-cost flow duality result, but let us rewrite it anyway. \n",
    "\n",
    "1. The value of the [primal problem](#OAP) can be written as $\\max_{\\pi\\geq0}\\min_{u,v}S\\left(  \\pi,u,v\\right)$, where\n",
    "\n",
    "    \\begin{align*}\n",
    "    S\\left(  \\pi,u,v\\right)  :=\\sum_{xy}\\pi_{xy}\\Phi_{xy}+\\sum_{x\\in\\mathcal{X}}u_{x}(p_{x}-\\sum_{y\\in\\mathcal{Y}}\\pi_{xy})+\\sum_{y\\in\\mathcal{Y}}v_{y}(q_{y}-\\sum_{x\\in\\mathcal{X}}\\pi_{xy})\n",
    "    \\end{align*}\n",
    "\n",
    "    but by the minmax theorem, this value is equal to $\\min_{u,v}\\max_{\\pi\\geq 0}S\\left(  \\pi,u,v\\right)$, which is the value of the [dual problem](#ual-discr}).\n",
    "\n",
    "2. Follows by noting that, for a primal solution $\\pi$ and a dual solution $\\left(  u,v\\right)  $, then $S\\left( \\pi,u,v\\right)  =\\sum_{xy}\\pi_{xy} \\Phi_{xy}$. $\\blacksquare$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Remarks**. Note that this result is the min-cost flow duality theorem in the bipartite case, as seen in lecture $2$, after setting transportation cost through $xy\\in\\mathcal{X}\\times\\mathcal{Y}$ to $c_{xy}=-\\Phi_{xy}$, and $n_{t}=-p_{t}1\\left\\{  t\\in\\mathcal{X}\\right\\}  +q_{t}\\mathbf{1}\\left\\{  t\\in\\mathcal{Y}\\right\\}$. We see various new interpretations of the result.\n",
    "\n",
    "The following statements are equivalent:\n",
    "\n",
    "* $\\pi$ is an optimal solution to the primal problem, and $\\left(\n",
    "u,v\\right)  $ is an optimal solution to the dual problem, and\n",
    "\n",
    "* (i) $\\pi\\in M\\left(  p,q\\right)  $\n",
    "\n",
    "* (ii) $u_{x}+v_{y}\\geq\\Phi_{xy}$\n",
    "\n",
    "* (iii) $\\pi_{xy}>0$ implies $u_{x}+v_{y}\\leq\\Phi_{xy}$.\n",
    "\n",
    "We saw the direct implication. But the converse is easy: take $\\pi$ and $\\left(  u,v\\right)  $ satisfying (i)--(iii), Then one has\n",
    "\n",
    "\\begin{align*}\n",
    "dual\\leq\\sum_{x}p_{x}u_{x}+\\sum_{y}q_{y}v_{y}=\\sum_{xy}\\pi_{xy}\\left(\n",
    "u_{x}+v_{y}\\right)  \\leq\\sum_{xy}\\pi_{xy}\\Phi_{xy}\\leq primal\n",
    "\\end{align*}\n",
    "\n",
    "but by the MK duality theorem, both ends coincide. Thus $\\pi$ is optimal for the primal and $\\left(  u,v\\right)  $ for the dual."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Unassigned Agents\n",
    "\n",
    "A important variant of the problem exists with $\\sum_{x\\in\\mathcal{X}}p_{x}\\neq\\sum_{y\\in\\mathcal{Y}}q_{y}$ and the primal constraints become inequality constraints. The duality then becomes\n",
    "\n",
    "\\begin{align*}\n",
    "\\begin{array}\n",
    "[c]{rrr}\n",
    "\\max_{\\pi\\geq0}\\sum\\pi_{xy}\\Phi_{xy} & = & \\min_{u,v}\\sum_{x\\in\\mathcal{X}\n",
    "}p_{x}u_{x}+\\sum_{y\\in\\mathcal{Y}}q_{y}v_{y}\\\\\n",
    "s.t.~\\sum_{y\\in\\mathcal{Y}}\\pi_{xy}\\leq p_{x} &  & u\\geq0,~v\\geq0 \\\\\n",
    "\\sum_{x\\in\\mathcal{X}}\\pi_{xy}\\leq q_{y} &  & u_{x}+v_{y}\\geq\\Phi_{xy}\n",
    "\\end{array}\n",
    "\\end{align*}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pairwise stability\n",
    "\n",
    "In a marriage context, an important concept is stability:\n",
    "\n",
    "An outcome is a vector $\\left(  \\pi,u,v\\right)  $, where $u_{x}$ and $v_{y}$ are $x$'s and $y$'s payoffs, and $\\pi$ is a matching that is\n",
    "\n",
    "<a name=\"primFeas\"></a>\n",
    "\\begin{align*}\n",
    "\\pi\\in\\mathcal{M}\\left(  p,q\\right).\n",
    "\\end{align*}\n",
    "\n",
    "A pair $xy$ is blocking if $x$ and $y$ can find a way of sharing their joint surplus $\\Phi_{xy}$ in such a way that $x$ gets more than $u_{x}$ and $y$ gets more than $v_{y}$. Hence there is no blocking pair if and only if for every $x$ and $y$, one has\n",
    "\n",
    "<a name=\"noBlocking\"></a>\n",
    "\\begin{align*}\n",
    "u_{x}+v_{y}\\geq\\Phi_{xy}.\n",
    "\\end{align*}\n",
    "\n",
    "If $x$ and $y$ are actually matched, their utilities $u_{x}$ and $v_{y}$ need to be feasible, i.e. the above inequality should be saturated. Hence\n",
    "\n",
    "<a name=\"cplSlck\"></a>\n",
    "\\begin{align*}\n",
    "\\pi_{xy}>0\\text{ implies }u_{x}+v_{y}=\\Phi_{xy}.\n",
    "\\end{align*}\n",
    "\n",
    "---\n",
    "**Definition:**\n",
    "\n",
    "A matching that satisfies [primal feasbilitity](#primFeas), [no blocking](#noBlocking), and [complementary slackness](#cplSlck) is called a stable matching.\n",
    "\n",
    "As it turns out, these conditions are precisely the conditions that express complementarity slackness in the Monge-Kantorovich problem. Therefore, outcome$\\left(  \\pi,u,v\\right)  $ is stable if and only if $\\pi$ is a solution to the primal problem, and $\\left(  u,v\\right)  $ is a solution to the dual problem."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Wage market interpretation 1\n",
    "\n",
    "Back to the workers / firms interpretation and assume for now that workers are indifferent between any two firms that offer the same salary. We argue that $u\\left(  x\\right)  $ can be interpreted as the equilibrium wage of worker $x$, while $v\\left(  y\\right)  $ can be interpreted as the equilibrium profit of firm $y$. Indeed:\n",
    "\n",
    "---\n",
    "**Proposition**\n",
    "If $\\left(u,v\\right)$ is a solution to the dual of the Kantorovich problem, then\n",
    "\n",
    "\\begin{align*}\n",
    "u_{x}  &  =\\sup_{y\\in\\mathcal{Y}}\\left(  \\Phi_{xy}-v_{y}\\right)\n",
    "\\label{conjug1}\\\\\n",
    "v_{y}  &  =\\sup_{x\\in\\mathcal{X}}\\left(  \\Phi_{xy}-u_{x}\\right)  .\n",
    "\\label{conjug2}\n",
    "\\end{align*}\n",
    "\n",
    "---\n",
    "\n",
    "Therefore, $u_{x}$ can be interpreted as equilibrium wage of worker $x$, and $v_{y}$ as equilibrium profit of firm $y$. In this interpretation, all workers get the same wage at equilibrium.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Wage market interpretation 2\n",
    "\n",
    "Assume now that if a worker of type $x$ works for a firm of type $y$ for wage $w_{xy}$, then gets $\\alpha_{xy}+w_{xy}$, where $\\alpha_{xy}$ is the nonmonetary payoff associated with working with a firm of type $y$. The firm's profit is $\\gamma_{xy}-w_{xy}$, where $\\gamma_{xy}$ is the economic output.\n",
    "\n",
    "If an employee of type $x$ matches with a firm of type $y$, they generate joint surplus $\\Phi_{xy}$, given by%\n",
    "\n",
    "\\begin{align*}\n",
    "\\Phi_{xy}=\\underset{\\text{employee's payoff}}{\\underbrace{\\alpha_{xy}+w_{xy}}}+\\underset{\\text{firm's payoff}}{\\underbrace{\\gamma_{xy}-w_{xy}}}=\\alpha_{xy}+\\gamma_{xy}\n",
    "\\end{align*}\n",
    "\n",
    "which is independent from $w$.\n",
    "\n",
    "Workers choose firms which maximize their utility, i.e. solve \n",
    "\n",
    "<a name=\"equivWalrasStable1\"></a>\n",
    "\\begin{align*}\n",
    "u_{x}=\\max_{y}\\left\\{  \\alpha_{xy}+w_{xy}\\right\\}\n",
    "\\end{align*}\n",
    "\n",
    "and $u_{x}=\\alpha_{xy}+w_{xy}$ if $x$ and $y$ are matched. Similarly, the indirect payoff vector of firms is\n",
    "\n",
    "<a name=\"equivWalrasStable2\"></a>\n",
    "\\begin{align*}\n",
    "v_{y}=\\max_{x}\\left\\{  \\gamma_{xy}-w_{xy}\\right\\}\n",
    "\\end{align*}\n",
    "\n",
    "and, again, $v_{y}=\\gamma_{xy}-w_{xy}$ if $x$ and $y$ are matched."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a result,\n",
    "\n",
    "\\begin{align*}\n",
    "u_{x}+v_{y}\\geq\\alpha_{xy}+\\gamma_{xy}=\\Phi_{xy}\n",
    "\\end{align*}\n",
    "\n",
    "and equality holds if $x$ and $y$ are matched. Thus, if $w_{xy}$ is an equilibrium wage, then the triple $\\left(  \\pi,u,v\\right)$ where $\\pi$ is the corresponding matching, and $u_{x}$ and $v_{y}$ are defined by [this](#equivWalrasStable1) and [this](#equivWalrasStable2) defines a stable outcome.\n",
    "\n",
    "Conversely, let $\\left(\\pi,u,v\\right)$ be a stable outcome. Then let $\\bar{w}_{xx}$ and \\b{w}$_{xy}$ be defined by\n",
    "\n",
    "\\begin{align*}\n",
    "\\bar{w}_{xy}=u_{x}-\\alpha_{xy}\\text{ and }w^l_{xy}=\\gamma_{xy}-v_{y}.\n",
    "\\end{align*}\n",
    "\n",
    "One has $\\bar{w}_{xy}\\geq b^l_{xy}$. Any $w_{xy}$ such that $\\bar{w}_{xy}\\geq w_{xy}\\geq b^l_{xy}$ is an equilibrium wage. Indeed, $\\pi_{xy}>0$ implies $\\bar{w}_{xy} \\geq b^l_{xy}$, thus [this](#equivWalrasStable1) and [this](#equivWalrasStable2) hold. Given $u$ and $v$, $w_{xy}$ is uniquely defined on the equilibrium path (ie. when $x$ and $y$ are such that $\\pi_{xy}>0$), but there are multiple choices of $w$ outside the equilibrium path.\n",
    "\n",
    "Note that all workers of the same type get the same indirect utility, but not necessarly the same wage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Application\n",
    "\n",
    "We postulate that the form of the surplus function is\n",
    "\\begin{align*}\n",
    "\\Phi_{ij}=x_{i}^{\\intercal} Ay_{j}\n",
    "\\end{align*}\n",
    "where $x_{i}$ and $y_{j}$ are the 10-dimensional characteristics of man $i$ and woman $j$, and the form of $A$, a 10x10 matrix, is given (it is stored in the file `affinitymatrix.csv`). Again, we'll see later how to solve the econometrics problem of estimating $A$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "nbcar = 10\n",
    "nobs = 10\n",
    "affmat = affdf.iloc[0:nbcar,1:nbcar+1].to_numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "sdX = data_X.std().to_numpy()\n",
    "sdY = data_Y.std().to_numpy()\n",
    "mX = data_X.mean().to_numpy()\n",
    "mY = data_Y.mean().to_numpy()\n",
    "\n",
    "Xvals = ((data_X-mX)/sdX).to_numpy()[0:nobs, :]\n",
    "Yvals = ((data_Y-mY)/sdY).to_numpy()[0:nobs, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "nobs = Xvals.shape[0]\n",
    "Phi = Xvals @ affmat @ Yvals.T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This problem of computation of the Optimal Assignment Problem, more specifically of $\\left(\\pi,u,v\\right)$, is arguably the most studied problem in Computer Science, and dozens, if not hundreds of algorithms exist, whose running time is polynomial in $\\max\\left(n,m\\right)$, typically a power less than three of the latter.\n",
    "\n",
    "Famous algorithms include: the Hungarian algorithm (Kuhn-Munkres); Bertsekas' auction algorithm; Goldberg and Kennedy's pseudoflow algorithm. For more on these, see the book by Burkard, Dell'Amico, and Martello, and a\n",
    "introductory presentation in http://www.assignmentproblems.com/doc/LSAPIntroduction.pdf.\n",
    "\n",
    "Here, we will show how to solve the problem with the help of a Linear Programming solver used as a black box; our challenge here will be to carefully set up the constraint matrix as a sparse matrix in order to let a large scale Linear Programming solvers such as Gurobi recognize and exploit the sparsity of the problem."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let $\\Pi$ and $\\Phi$ be the matrices with typical elements $\\left(\\pi_{xy}\\right)  $ and $\\left(  \\Phi_{xy}\\right)  $. We let $p$, $q$, $u$,$v$, and $1$ the column vectors with entries $\\left(  p_{x}\\right)$, $\\left(  q_{y}\\right)  $, $\\left(  u_{x}\\right)  $, $\\left(  v_{y}\\right)$, and $1$, respectively. The optimal assignment problem\n",
    "\n",
    "\\begin{align*}\n",
    "\\max_{\\pi\\geq0}  &  \\sum_{xy}\\pi_{xy}\\Phi_{xy} \\\\\n",
    "s.t.~  &  \\sum_{y\\in\\mathcal{Y}}\\pi_{xy}=p_{x}~\\left[  u_{x}\\right]\\\\\n",
    "&  \\sum_{x\\in\\mathcal{X}}\\pi_{xy}=q_{y}~\\left[  v_{y}\\right] \n",
    "\\end{align*}\n",
    "\n",
    "Can be rewritten writes using matrix algebra as\n",
    "\n",
    "\\begin{align*}\n",
    "&  \\max_{\\Pi\\geq0}Tr\\left(  \\Pi^{\\top}\\Phi\\right) \\\\\n",
    "&  \\Pi1_{M}=p\\\\\n",
    "&  1_{N}^{\\top}\\Pi=q^{\\top}.\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "obj = Phi.flatten()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Recall that if $A$ is a $M\\times p$ matrix and $B$ a $N\\times q$ matrix, and if $vec(A)$ vectorizes in the row-major order (i.e. concatenates the rows of $A$) then the Kronecker product $A\\otimes B$ of $A$ and $B$ is a $mn\\times pq$ matrix such that\n",
    "\n",
    "\\begin{align*}\n",
    "vec\\left(  AXB^{\\top}\\right)  =\\left(  A\\otimes B\\right)  vec\\left(\n",
    "X\\right). \\label{VecAndKronecker}\n",
    "\\end{align*}\n",
    "\n",
    "In python, $A\\otimes B$ is implemented by `sparse.kron(A,B)` of the library `sparse` of scipy.\n",
    "\n",
    "The first constraint $I_{N}\\Pi1_{M}=p$,  therefore vectorizes under the row-major order as\n",
    "\n",
    "\\begin{align*}\n",
    "\\left( I_{N} \\otimes1_{M}^{\\top}\\right)  vec\\left(  \\Pi\\right)  =vec\\left(\n",
    "p\\right),\n",
    "\\end{align*}\n",
    "\n",
    "and similarly, the second constraint $1_{N}^{\\top}\\Pi I_{M}=q^{\\top}$, vectorizes as\n",
    "\n",
    "\\begin{align*}\n",
    "\\left( 1_{N}^{\\top} \\otimes I_{M}\\right)  vec\\left(  \\Pi\\right)  =vec\\left(\n",
    "q\\right)  .\n",
    "\\end{align*}\n",
    "\n",
    "Note that the matrix $I_{N} \\otimes1_{M}^{\\top}$ is of size $N\\times NM$, and the matrix $ 1_{N}^{\\top} \\otimes I_{M}$ is of size $M\\times NM$; hence the full matrix involved in the left-hand side of the constraints is of size $\\left(  N+M\\right)  \\times NM$. In spite of its large size, this matrix is *sparse*. In python, the identity matrix $I_{N}$ is coded as `scipy.sparse.identity(N)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = Phi.shape[0]\n",
    "M = Phi.shape[1]\n",
    "\n",
    "A1 = spr.kron(spr.identity(N), np.ones(M))\n",
    "A2 = spr.kron(np.ones(N),spr.identity(M))\n",
    "A = spr.vstack([A1, A2])\n",
    "\n",
    "p = 1/nobs * np.ones(nobs)\n",
    "q = 1/nobs * np.ones(nobs)\n",
    "d = np.concatenate((p,q), axis = None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setting $z=vec\\left(  \\Pi\\right)$, the Linear Programming problem then becomes\n",
    "\n",
    "\\begin{align*}\n",
    "&  \\max_{z\\geq0}vec\\left(  \\Phi\\right)  ^{\\top}z\\label{LPvectorized}\\\\\n",
    "s.t.~  &  \\left(  1_{M}^{\\top}\\otimes I_{N}\\right)  z=vec\\left(  p\\right)\n",
    "\\nonumber\\\\\n",
    "&  \\left(  I_{M}\\otimes1_{N}^{\\top}\\right)  z=vec\\left(  q^{\\top}\\right)\n",
    "\\nonumber\n",
    "\\end{align*}\n",
    "\n",
    "which is ready to be passed on to a linear programming solver such as Gurobi.\n",
    "\n",
    "A LP solver typically computes programs of the form\n",
    "\n",
    "\\begin{align*}\n",
    "&  \\max_{z\\geq0}c^{\\top}z\\label{standardLP}\\\\\n",
    "&  s.t.~Az=d.\\nonumber\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Academic license - for non-commercial use only - expires 2022-11-14\n",
      "Using license file c:\\gurobi911\\gurobi.lic\n",
      "Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (win64)\n",
      "Thread count: 6 physical cores, 12 logical processors, using up to 12 threads\n",
      "Optimize a model with 20 rows, 100 columns and 200 nonzeros\n",
      "Model fingerprint: 0x32e2b3fc\n",
      "Coefficient statistics:\n",
      "  Matrix range     [1e+00, 1e+00]\n",
      "  Objective range  [6e-03, 3e+00]\n",
      "  Bounds range     [0e+00, 0e+00]\n",
      "  RHS range        [1e-01, 1e-01]\n",
      "Presolve time: 0.00s\n",
      "Presolved: 20 rows, 100 columns, 200 nonzeros\n",
      "\n",
      "Iteration    Objective       Primal Inf.    Dual Inf.      Time\n",
      "       0    2.8601661e+31   8.600000e+31   2.860166e+01      0s\n",
      "      22    1.0705334e+00   0.000000e+00   0.000000e+00      0s\n",
      "\n",
      "Solved in 22 iterations and 0.01 seconds\n",
      "Optimal objective  1.070533447e+00\n"
     ]
    }
   ],
   "source": [
    "m=grb.Model('Marriage')\n",
    "x = m.addMVar(shape=len(obj), name=\"x\")\n",
    "m.setObjective(obj @ x, grb.GRB.MAXIMIZE)\n",
    "# we minus 1 for the nodes because of python's 0 indexing\n",
    "rhs = d\n",
    "m.addConstr(A @ x == rhs, name=\"Constr\")\n",
    "m.optimize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "if m.status == grb.GRB.Status.OPTIMAL:\n",
    "    solution = np.array(m.getAttr('x')).reshape(N,M)\n",
    "    pi = m.getAttr('pi')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Who does man $1$ match with?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Woman 7\n"
     ]
    }
   ],
   "source": [
    "print('Woman', np.argwhere(solution[0,:] != 0)[0][0] + 1)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}