{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Solving Low-rank Factor Analysis Problem using `NExOS.jl`\n",
        "**Shuvomoy Das Gupta**\n",
        "\n",
        "In this tutorial, we will show how to solve low-rank factor analysis problem using `NExOS`. \n",
        "\n",
        "Factor analysis is a very popular method in statistics that reduces linear dimensionality of a particular model. The best way to understand factor analysis is to consider a generative model.\n",
        "\n",
        "### Basic factor analysis model\n",
        "\n",
        "A basic factor\n",
        "analysis model is of the form\n",
        "$$\n",
        "\\begin{eqnarray}\n",
        "x & = & Lf+\\epsilon,\n",
        "\\end{eqnarray}\n",
        "$$\n",
        "\n",
        "where $x\\in\\mathbf{R}^{n}$ is the observed random vector, $f\\in\\mathbf{R}^{r}$\n",
        "(with $r\\leq n$) is a random vector of common factor variables or\n",
        "scores, $L\\in\\mathbf{R}^{n\\times r}$ is a matrix of factor loadings,\n",
        "and $\\epsilon\\in\\mathbf{R}^{n}$ is a vector of uncorrelated random\n",
        "variables. \n",
        "\n",
        "The observed random vector $x$ may contain series of achievement\n",
        "tests, psychological evaluation, intellectual performance etc. Without\n",
        "loss of generality, we assume that $x$ is mean-centered *i.e.*,\n",
        "$\\mathbf{E}(x)=0$, the vectors $f$ and $\\epsilon$ are uncorrelated,\n",
        "and the covariance matrix of $f$ is the identity matrix.\n",
        "\n",
        "We will denote $\\mathbf{cov}(\\epsilon)=D=\\mathbf{diag}(d_{1},d_{2},\\ldots,d_{n}).$\n",
        "Then the covariance matrix of $x$, denoted by $\\Sigma$ can be written\n",
        "as: \n",
        "$$\n",
        "\\Sigma=X+D,\n",
        "$$\n",
        " where $X=LL^{T}$ is the covariance matrix corresponding to the common\n",
        "factors. The statistical method of factor analysis involves looking\n",
        "at $N$ samples generated by the model (1), *i.e.*, given $x_{1},\\ldots,x_{N}$\n",
        "generated by (1) we want to estimate the matrices $X$ and $D$."
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Optimization problem in consideration\n",
        "\n",
        "The optimization problem to determine the matrices $X,D$ can be written\n",
        "as: \n",
        "\n",
        "$$\n",
        "\\begin{equation}\n",
        "\\begin{array}{ll}\n",
        "\\textrm{minimize} & \\left\\Vert \\Sigma-X-D\\right\\Vert _{F}^{2}\\\\\n",
        "\\textrm{subject to} & D=\\mathbf{diag}(d)\\\\\n",
        " & d\\geq0\\\\\n",
        " & X\\succeq0\\\\\n",
        " & \\mathbf{rank}(X)\\leq r\\\\\n",
        " & \\Sigma-D\\succeq0\\\\\n",
        " & \\|X\\|_{2}\\leq M,\n",
        "\\end{array}\n",
        "\\end{equation}\n",
        "$$\n",
        "\n",
        "where $X\\in\\mathbf{S}^{n}$ and the diagonal matrix $D\\in\\mathbf{S}^{n}$\n",
        "with nonnegative diagonal entries are the decision variables, and\n",
        "$\\Sigma\\in\\mathbf{S}_{+}^{n}$ (a positive semidefinite matrix), $r\\in\\mathbf{Z}_{+},$\n",
        "and $M\\in\\mathbf{R}_{++}$ are the problem data. The term $r$ is called the number of factors for a factor analysis model. \n",
        "\n",
        "A proper solution for the optimization problem above requires that\n",
        "both $X$ and $D$ are positive semidefinite. Furthermore, when $\\Sigma-D$,\n",
        "which is the covariance matrix for the common parts of the variables,\n",
        "is not positive semidefinite, that would as embarrassing as having\n",
        "a negative unique variance in $D$, as noted by ten Berge [here, page 326](https://link.springer.com/chapter/10.1007/978-3-642-72253-0_44). To prevent the aforementioned undesriable situation we enforce the constraint $\\Sigma-D\\succeq0$.\n",
        "\n"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Next, we are going to show how to solve factor analysis problem using `NExOS`, which has a built in function for this purpose. \n",
        "\n",
        "First, we load the packages. "
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "using NExOS\n",
        "\n",
        "using LinearAlgebra, Convex, JuMP, MosekTools"
      ],
      "outputs": [],
      "execution_count": 1,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "execution": {
          "iopub.status.busy": "2020-12-03T16:39:28.510Z",
          "iopub.execute_input": "2020-12-03T16:39:28.986Z",
          "iopub.status.idle": "2020-12-03T16:39:39.025Z"
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Next, we load the data. You can change it to any other dataset as desired."
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "\n",
        "Σ = [1.0 -0.34019653769952096 -0.263030887801514 -0.14349389289304187 -0.18605860686109255; -0.34019653769952096 1.0 0.4848473200092671 0.3421745595621214 0.38218138592185846; -0.263030887801514 0.4848473200092671 1.0 0.3768343949936584 0.5028863662242727; -0.14349389289304187 0.3421745595621214 0.3768343949936584 1.0 0.3150998750134158; -0.18605860686109255 0.38218138592185846 0.5028863662242727 0.3150998750134158 1.0]\n",
        "\n",
        "n, _ = size(Σ)\n",
        "\n",
        "r = convert(Int64, round(rank(Σ)/2))\n",
        "\n",
        "M = 2*opnorm(Σ ,2)"
      ],
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 2,
          "data": {
            "text/plain": "4.747866256448232"
          },
          "metadata": {}
        }
      ],
      "execution_count": 2,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "execution": {
          "iopub.status.busy": "2020-12-03T16:39:39.036Z",
          "iopub.execute_input": "2020-12-03T16:39:39.376Z",
          "iopub.status.idle": "2020-12-03T16:39:40.506Z"
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now, we construct this problem using `NExOS`."
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "\n",
        "Z0 = Σ # Initial point in Z\n",
        "\n",
        "z0 = zeros(n) # Initial point in z\n",
        "\n",
        "problem =  NExOS.ProblemFactorAnalysisModel(Σ, r, M, Z0, z0)"
      ],
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 3,
          "data": {
            "text/plain": "ProblemFactorAnalysisModel{Array{Float64,2},Int64,Float64,Array{Float64,1}}([1.0 -0.34019653769952096 … -0.14349389289304187 -0.18605860686109255; -0.34019653769952096 1.0 … 0.3421745595621214 0.38218138592185846; … ; -0.14349389289304187 0.3421745595621214 … 1.0 0.3150998750134158; -0.18605860686109255 0.38218138592185846 … 0.3150998750134158 1.0], 2, 4.747866256448232, [1.0 -0.34019653769952096 … -0.14349389289304187 -0.18605860686109255; -0.34019653769952096 1.0 … 0.3421745595621214 0.38218138592185846; … ; -0.14349389289304187 0.3421745595621214 … 1.0 0.3150998750134158; -0.18605860686109255 0.38218138592185846 … 0.3150998750134158 1.0], [0.0, 0.0, 0.0, 0.0, 0.0])"
          },
          "metadata": {}
        }
      ],
      "execution_count": 3,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "execution": {
          "iopub.status.busy": "2020-12-03T16:39:40.519Z",
          "iopub.execute_input": "2020-12-03T16:39:40.527Z",
          "iopub.status.idle": "2020-12-03T16:39:40.988Z"
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now, we construct the settings."
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "settings = NExOS.Settings(μ_max = 1, μ_mult_fact = 0.5, n_iter_min = 100, n_iter_max = 100, verbose = false, freq = 50, tol = 1e-3, γ_updt_rule = :adaptive)"
      ],
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 4,
          "data": {
            "text/plain": "Settings(1.0, 1.0e-8, 0.5, 100, 100, 1.0e99, 1.0e-10, 0.001, false, 50, :adaptive)"
          },
          "metadata": {}
        }
      ],
      "execution_count": 4,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "execution": {
          "iopub.status.busy": "2020-12-03T16:39:40.998Z",
          "iopub.execute_input": "2020-12-03T16:39:41.005Z",
          "iopub.status.idle": "2020-12-03T16:39:41.760Z"
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "state_final = NExOS.solve!(problem, settings)"
      ],
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 5,
          "data": {
            "text/plain": "StateFactorAnalysisModel{Float64,Array{Float64,2},Array{Float64,1},Int64}([0.9002146797772541 -0.5073098207885949 … -0.01162192244559973 -0.1504992835320832; -0.5073098207885949 0.6031485179871381 … 0.4175939168614036 0.5042520342917494; … ; -0.01162192244559973 0.4175939168614036 … 0.5330983234204246 0.5456366988923106; -0.1504992835320832 0.5042520342917494 … 0.5456366988923106 0.5800701078457016], [0.005250172082997496, 0.021187955387039055, 0.018516601263122615, 0.024938920128344446, 0.02242803582382049], [0.9001997771769747 -0.507340289128668 … -0.011601738359221986 -0.15049153558795636; -0.5073402889938 0.6030783235875922 … 0.41760808612095407 0.5042712190274384; … ; -0.01160173895085777 0.41760808617757567 … 0.533032993581096 0.5456672012480441; -0.1504915350227588 0.5042712188724042 … 0.5456672016930424 0.580001876846992], [0.005252094448759534, 0.021188541148083113, 0.01851724057698888, 0.02493945575366491, 0.022428602243515484], [0.9001657040093316 -0.5073953096772941 … -0.011563299637903743 -0.15047580889563483; -0.5073953095456232 0.6029405753767878 … 0.41763257751863764 0.5043046332629799; … ; -0.011563300228443932 0.4176325775757554 … 0.5329055096773659 0.5457216902191176; -0.15047580833462976 0.5043046331097865 … 0.5457216906628047 0.5798682066817082], [0.005250172082997496, 0.021187955387039055, 0.018516601263122615, 0.024938920128344446, 0.02242803582382049], 1, 7.211676530795817e-5, 1.7948927094613154e-5, 7.450580596923828e-9, 8.6316745750311e-8)"
          },
          "metadata": {}
        }
      ],
      "execution_count": 5,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "execution": {
          "iopub.status.busy": "2020-12-03T16:39:41.781Z",
          "iopub.execute_input": "2020-12-03T16:39:41.789Z",
          "iopub.status.idle": "2020-12-03T16:40:15.472Z"
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let us take a look at the quality of the found solution."
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "println(\"fixed point gap = $(state_final.fxd_pnt_gap)\")"
      ],
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "fixed point gap = 7.211676530795817e-5\n"
          ]
        }
      ],
      "execution_count": 6,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "execution": {
          "iopub.status.busy": "2020-12-03T16:40:15.481Z",
          "iopub.execute_input": "2020-12-03T16:40:15.489Z",
          "iopub.status.idle": "2020-12-03T16:40:15.611Z"
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "println(\"feasibility gap = $(state_final.fsblt_gap)\")"
      ],
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "feasibility gap = 1.7948927094613154e-5\n"
          ]
        }
      ],
      "execution_count": 7,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "execution": {
          "iopub.status.busy": "2020-12-03T16:40:15.622Z",
          "iopub.execute_input": "2020-12-03T16:40:15.630Z",
          "iopub.status.idle": "2020-12-03T16:40:15.645Z"
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "println(\"objective value of the found solution = $(norm(vec(Σ-state_final.X-diagm(state_final.x)),2)^2)\")"
      ],
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "objective value of the found solution = 0.9539784868101482\n"
          ]
        }
      ],
      "execution_count": 8,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "execution": {
          "iopub.status.busy": "2020-12-03T16:40:15.657Z",
          "iopub.execute_input": "2020-12-03T16:40:15.663Z",
          "iopub.status.idle": "2020-12-03T16:40:15.675Z"
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "U_3, svs_3, Vt_3 = svd(state_final.X)\n",
        "\n",
        "U_4, svs_4, Vt_4 = svd(Σ-diagm(state_final.x))\n",
        "\n",
        "explained_variance = sum(svs_3[1:r])/sum(svs_4[1:n])\n",
        "\n",
        "println(\"explained variance of the found solution = $explained_variance\")"
      ],
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "explained variance of the found solution = 0.6661498849182806\n"
          ]
        }
      ],
      "execution_count": 9,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "execution": {
          "iopub.status.busy": "2020-12-03T16:40:15.682Z",
          "iopub.execute_input": "2020-12-03T16:40:15.687Z",
          "iopub.status.idle": "2020-12-03T16:40:15.700Z"
        }
      }
    },
    {
      "cell_type": "code",
      "source": [],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    }
  ],
  "metadata": {
    "kernel_info": {
      "name": "julia-1.5"
    },
    "language_info": {
      "file_extension": ".jl",
      "name": "julia",
      "mimetype": "application/julia",
      "version": "1.5.0"
    },
    "kernelspec": {
      "argv": [
        "C:\\Users\\shuvo\\AppData\\Local\\Programs\\Julia 1.5.0\\bin\\julia.exe",
        "-i",
        "--startup-file=yes",
        "--color=yes",
        "--project=@.",
        "C:\\Users\\shuvo\\.julia\\packages\\IJulia\\tOM8L\\src\\kernel.jl",
        "{connection_file}"
      ],
      "display_name": "Julia 1.5.0",
      "env": {},
      "interrupt_mode": "message",
      "language": "julia",
      "name": "julia-1.5"
    },
    "nteract": {
      "version": "0.28.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}