{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Biostat 257 Homework 5\n", "\n", "**Due June 3 @ 11:59PM**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "versioninfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this assignment, we continue with the linear mixed effects model (LMM) considered in HW3\n", "$$\n", " \\mathbf{Y}_i = \\mathbf{X}_i \\boldsymbol{\\beta} + \\mathbf{Z}_i \\boldsymbol{\\gamma}_i + \\boldsymbol{\\epsilon}_i, \\quad i=1,\\ldots,n,\n", "$$\n", "where \n", "- $\\mathbf{Y}_i \\in \\mathbb{R}^{n_i}$ is the response vector of $i$-th individual, \n", "- $\\mathbf{X}_i \\in \\mathbb{R}^{n_i \\times p}$ is the fixed effects predictor matrix of $i$-th individual, \n", "- $\\mathbf{Z}_i \\in \\mathbb{R}^{n_i \\times q}$ is the random effects predictor matrix of $i$-th individual, \n", "- $\\boldsymbol{\\epsilon}_i \\in \\mathbb{R}^{n_i}$ are multivariate normal $N(\\mathbf{0}_{n_i},\\sigma^2 \\mathbf{I}_{n_i})$, \n", "- $\\boldsymbol{\\beta} \\in \\mathbb{R}^p$ are fixed effects, and \n", "- $\\boldsymbol{\\gamma}_i \\in \\mathbb{R}^q$ are random effects assumed to be $N(\\mathbf{0}_q, \\boldsymbol{\\Sigma}_{q \\times q}$) independent of $\\boldsymbol{\\epsilon}_i$.\n", "\n", "The log-likelihood of the $i$-th datum $(\\mathbf{y}_i, \\mathbf{X}_i, \\mathbf{Z}_i)$ is \n", "$$\n", " \\ell_i(\\boldsymbol{\\beta}, \\mathbf{L}, \\sigma_0^2) = - \\frac{n_i}{2} \\log (2\\pi) - \\frac{1}{2} \\log \\det \\boldsymbol{\\Omega}_i - \\frac{1}{2} (\\mathbf{y} - \\mathbf{X}_i \\boldsymbol{\\beta})^T \\boldsymbol{\\Omega}_i^{-1} (\\mathbf{y} - \\mathbf{X}_i \\boldsymbol{\\beta}),\n", "$$\n", "where\n", "$$\n", " \\boldsymbol{\\Omega}_i = \\sigma^2 \\mathbf{I}_{n_i} + \\mathbf{Z}_i \\boldsymbol{\\Sigma} \\mathbf{Z}_i^T = \\sigma^2 \\mathbf{I}_{n_i} + \\mathbf{Z}_i \\mathbf{L} \\mathbf{L}^T \\mathbf{Z}_i^T.\n", "$$\n", "Because the variance component parameter $\\boldsymbol{\\Sigma}$ has to be positive semidefinite. We prefer to use its Cholesky factor $\\mathbf{L}$ as optimization variable. \n", "\n", "Given $m$ independent data tuples $(\\mathbf{y}_i, \\mathbf{X}_i, \\mathbf{Z}_i)$, $i=1,\\ldots,m$, we seek the maximum likelihood estimate (MLE) by maximizing the log-likelihood\n", "$$\n", "\\ell(\\boldsymbol{\\beta}, \\boldsymbol{\\Sigma}, \\sigma^2) = \\sum_{i=1}^m \\ell_i(\\boldsymbol{\\beta}, \\boldsymbol{\\Sigma}, \\sigma^2).\n", "$$\n", "In this assignment, we use the nonlinear programming (NLP) approach for optimization. In HW6, we will derive an EM (expectation-maximization) algorithm for the same problem. In HW7, we will derive an MM (minorization-maximization) algorithm for the same problem. I'm kidding 😁 There's no HW7. If you are curious about how to derive MM algorithm for this problem, see [this article](https://doi.org/10.1080/10618600.2018.1529601)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# load necessary packages; make sure install them first\n", "using BenchmarkTools, CSV, DataFrames, DelimitedFiles, Distributions\n", "using Ipopt, LinearAlgebra, MathOptInterface, MixedModels, NLopt\n", "using PrettyTables, Random, RCall\n", "\n", "const MOI = MathOptInterface" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q1. (Optional, 30 bonus pts) Derivatives\n", "\n", "NLP optimization solvers expect users to provide at least a function for evaluating objective value. If users can provide further information such as gradient and Hessian, the NLP solvers will be more stable and converge faster. Automatic differentiation tools are becoming more powerful but cannot apply to all problems yet.\n", "\n", "1. Show that the gradient of $\\ell_i$ is\n", "\\begin{eqnarray*}\n", "\\nabla_{\\boldsymbol{\\beta}} \\ell_i(\\boldsymbol{\\beta}, \\mathbf{L}, \\sigma^2) &=& \\mathbf{X}_i^T \\boldsymbol{\\Omega}_i^{-1} \\mathbf{r}_i, \\\\\n", "\\nabla_{\\sigma^2} \\ell_i(\\boldsymbol{\\beta}, \\mathbf{L}, \\sigma^2) &=& - \\frac{1}{2} \\operatorname{tr} (\\boldsymbol{\\Omega}_i^{-1}) + \\frac{1}{2} \\mathbf{r}_i^T \\boldsymbol{\\Omega}_i^{-2} \\mathbf{r}_i, \\\\\n", "\\frac{\\partial}{\\partial \\mathbf{L}} \\ell_i(\\boldsymbol{\\beta}, \\mathbf{L}, \\sigma^2) &=& - \\mathbf{Z}_i^T \\boldsymbol{\\Omega}_i^{-1} \\mathbf{Z}_i \\mathbf{L} + \\mathbf{Z}_i^T \\boldsymbol{\\Omega}_i^{-1} \\mathbf{r}_i \\mathbf{r}_i^T \\boldsymbol{\\Omega}_i^{-1} \\mathbf{Z}_i \\mathbf{L},\n", "\\end{eqnarray*}\n", "where $\\mathbf{r}_i = \\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta}$. \n", "\n", "2. Derive the observed information matrix and the expected (Fisher) information matrix.\n", "\n", "If you need a refresher on multivariate calculus, my [Biostat 216 lecture notes](https://ucla-biostat216-2019fall.github.io/slides/16-matrixcalc/16-matrixcalc.html) may be helpful." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q2. (20 pts) Objective and gradient evaluator for a single datum\n", "\n", "We expand the code from HW3 to evaluate both objective and gradient. I provide my code for HW3 below as a starting point. You do _not_ have to use this code. If your come up faster code, that's even better. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# define a type that holds an LMM datum\n", "struct LmmObs{T <: AbstractFloat}\n", " # data\n", " y :: Vector{T}\n", " X :: Matrix{T}\n", " Z :: Matrix{T}\n", " # arrays for holding gradient\n", " ∇β :: Vector{T}\n", " ∇σ² :: Vector{T}\n", " ∇Σ :: Matrix{T} \n", " # working arrays\n", " # TODO: whatever intermediate arrays you may want to pre-allocate\n", " yty :: T\n", " xty :: Vector{T}\n", " zty :: Vector{T}\n", " storage_p :: Vector{T}\n", " storage_q :: Vector{T}\n", " xtx :: Matrix{T}\n", " ztx :: Matrix{T}\n", " ztz :: Matrix{T}\n", " storage_qq :: Matrix{T}\n", "end\n", "\n", "\"\"\"\n", " LmmObs(y::Vector, X::Matrix, Z::Matrix)\n", "\n", "Create an LMM datum of type `LmmObs`.\n", "\"\"\"\n", "function LmmObs(\n", " y::Vector{T}, \n", " X::Matrix{T}, \n", " Z::Matrix{T}\n", " ) where T <: AbstractFloat\n", " n, p, q = size(X, 1), size(X, 2), size(Z, 2) \n", " ∇β = Vector{T}(undef, p)\n", " ∇σ² = Vector{T}(undef, 1)\n", " ∇Σ = Matrix{T}(undef, q, q) \n", " yty = abs2(norm(y))\n", " xty = transpose(X) * y\n", " zty = transpose(Z) * y \n", " storage_p = Vector{T}(undef, p)\n", " storage_q = Vector{T}(undef, q)\n", " xtx = transpose(X) * X\n", " ztx = transpose(Z) * X\n", " ztz = transpose(Z) * Z\n", " storage_qq = similar(ztz)\n", " LmmObs(y, X, Z, ∇β, ∇σ², ∇Σ, \n", " yty, xty, zty, storage_p, storage_q, \n", " xtx, ztx, ztz, storage_qq)\n", "end\n", "\n", "\"\"\"\n", " logl!(obs::LmmObs, β, L, σ², needgrad=false)\n", "\n", "Evaluate the log-likelihood of a single LMM datum at parameter values `β`, `L`, \n", "and `σ²`. If `needgrad==true`, then `obs.∇β`, `obs.∇Σ`, and `obs.σ² are filled \n", "with the corresponding gradient.\n", "\"\"\"\n", "function logl!(\n", " obs :: LmmObs{T}, \n", " β :: Vector{T}, \n", " L :: Matrix{T}, \n", " σ² :: T,\n", " needgrad :: Bool = true\n", " ) where T <: AbstractFloat\n", " n, p, q = size(obs.X, 1), size(obs.X, 2), size(obs.Z, 2)\n", " ####################\n", " # Evaluate objective\n", " #################### \n", " # form the q-by-q matrix: M = σ² * I + Lt Zt Z L\n", " copy!(obs.storage_qq, obs.ztz)\n", " BLAS.trmm!('L', 'L', 'T', 'N', T(1), L, obs.storage_qq) # O(q^3)\n", " BLAS.trmm!('R', 'L', 'N', 'N', T(1), L, obs.storage_qq) # O(q^3)\n", " @inbounds for j in 1:q\n", " obs.storage_qq[j, j] += σ²\n", " end\n", " # cholesky on M = σ² * I + Lt Zt Z L\n", " LAPACK.potrf!('U', obs.storage_qq) # O(q^3)\n", " # storage_q = (Mchol.U') \\ (Lt * (Zt * res))\n", " BLAS.gemv!('N', T(-1), obs.ztx, β, T(1), copy!(obs.storage_q, obs.zty)) # O(pq)\n", " BLAS.trmv!('L', 'T', 'N', L, obs.storage_q) # O(q^2)\n", " BLAS.trsv!('U', 'T', 'N', obs.storage_qq, obs.storage_q) # O(q^3)\n", " # l2 norm of residual vector\n", " copy!(obs.storage_p, obs.xty)\n", " rtr = obs.yty +\n", " dot(β, BLAS.gemv!('N', T(1), obs.xtx, β, T(-2), obs.storage_p))\n", " # assemble pieces\n", " logl::T = n * log(2π) + (n - q) * log(σ²) # constant term\n", " @inbounds for j in 1:q\n", " logl += 2log(obs.storage_qq[j, j])\n", " end\n", " qf = abs2(norm(obs.storage_q)) # quadratic form term\n", " logl += (rtr - qf) / σ² \n", " logl /= -2\n", " ###################\n", " # Evaluate gradient\n", " ################### \n", " if needgrad\n", " # TODO: fill ∇β, ∇L, ∇σ² by gradients\n", " sleep(1e-3) # pretend this step takes 1ms\n", " end \n", " ###################\n", " # Return\n", " ################### \n", " return logl \n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is a good idea to test correctness and efficiency of the single datum objective/gradient evaluator here. First generate the same data set as in [HW3](https://ucla-biostat-257.github.io/2022spring/hw/hw3/hw03.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Random.seed!(257)\n", "# dimension\n", "n, p, q = 2000, 5, 3\n", "# predictors\n", "X = [ones(n) randn(n, p - 1)]\n", "Z = [ones(n) randn(n, q - 1)]\n", "# parameter values\n", "β = [2.0; -1.0; rand(p - 2)]\n", "σ² = 1.5\n", "Σ = fill(0.1, q, q) + 0.9I # compound symmetry \n", "L = Matrix(cholesky(Symmetric(Σ)).L)\n", "# generate y\n", "y = X * β + Z * rand(MvNormal(Σ)) + sqrt(σ²) * randn(n)\n", "\n", "# form the LmmObs object\n", "obs = LmmObs(y, X, Z);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Correctness" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@show logl = logl!(obs, β, L, σ², true)\n", "@show obs.∇β\n", "@show obs.∇σ²\n", "@show obs.∇Σ;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will lose all 20 points if following statement throws `AssertionError`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@assert abs(logl - (-3256.1793358058258)) < 1e-4\n", "@assert norm(obs.∇β - [0.26698108057144054, 41.61418337067327, \n", " -34.34664962312689, 36.10898510707527, 27.913948208793144]) < 1e-4\n", "# @assert norm(obs.∇Σ - \n", "# [-0.9464482950697888 0.057792444809492895 -0.30244127639188767; \n", "# 0.057792444809492895 -1.00087164917123 0.2845116557144694; \n", "# -0.30244127639188767 0.2845116557144694 1.170040927259726]) < 1e-4\n", "@assert abs(obs.∇σ²[1] - (1.6283715138412163)) < 1e-4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Efficiency" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Benchmark for evaluating objective only. This is what we did in HW3." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@benchmark logl!($obs, $β, $L, $σ², false)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Benchmark for objective + gradient evaluation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bm_objgrad = @benchmark logl!($obs, $β, $L, $σ², true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My median runt time is 2.1μs. You will get full credit (10 pts) if the median run time is within 10μs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The points you will get are\n", "clamp(10 / (median(bm_objgrad).time / 1e3) * 10, 0, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q3. LmmModel type\n", "\n", "We create a `LmmModel` type to hold all data points and model parameters. Log-likelihood/gradient of a `LmmModel` object is simply the sum of log-likelihood/gradient of individual data points. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# define a type that holds LMM model (data + parameters)\n", "struct LmmModel{T <: AbstractFloat} <: MOI.AbstractNLPEvaluator\n", " # data\n", " data :: Vector{LmmObs{T}}\n", " # parameters\n", " β :: Vector{T}\n", " L :: Matrix{T}\n", " σ² :: Vector{T} \n", " # arrays for holding gradient\n", " ∇β :: Vector{T}\n", " ∇σ² :: Vector{T}\n", " ∇L :: Matrix{T}\n", " # TODO: add whatever intermediate arrays you may want to pre-allocate\n", " xty :: Vector{T}\n", " ztr2 :: Vector{T}\n", " xtx :: Matrix{T}\n", " ztz2 :: Matrix{T}\n", "end\n", "\n", "\"\"\"\n", " LmmModel(data::Vector{LmmObs})\n", "\n", "Create an LMM model that contains data and parameters.\n", "\"\"\"\n", "function LmmModel(obsvec::Vector{LmmObs{T}}) where T <: AbstractFloat\n", " # dims\n", " p = size(obsvec[1].X, 2)\n", " q = size(obsvec[1].Z, 2)\n", " # parameters\n", " β = Vector{T}(undef, p)\n", " L = Matrix{T}(undef, q, q)\n", " σ² = Vector{T}(undef, 1) \n", " # gradients\n", " ∇β = similar(β) \n", " ∇σ² = similar(σ²)\n", " ∇L = similar(L)\n", " # intermediate arrays\n", " xty = Vector{T}(undef, p)\n", " ztr2 = Vector{T}(undef, abs2(q))\n", " xtx = Matrix{T}(undef, p, p)\n", " ztz2 = Matrix{T}(undef, abs2(q), abs2(q))\n", " LmmModel(obsvec, β, L, σ², ∇β, ∇σ², ∇L, xty, ztr2, xtx, ztz2)\n", "end\n", "\n", "\"\"\"\n", " logl!(m::LmmModel, needgrad=false)\n", "\n", "Evaluate the log-likelihood of an LMM model at parameter values `m.β`, `m.L`, \n", "and `m.σ²`. If `needgrad==true`, then `m.∇β`, `m.∇Σ`, and `m.σ² are filled \n", "with the corresponding gradient.\n", "\"\"\"\n", "function logl!(m::LmmModel{T}, needgrad::Bool = false) where T <: AbstractFloat\n", " logl = zero(T)\n", " if needgrad\n", " fill!(m.∇β , 0)\n", " fill!(m.∇L , 0)\n", " fill!(m.∇σ², 0) \n", " end\n", " @inbounds for i in 1:length(m.data)\n", " obs = m.data[i]\n", " logl += logl!(obs, m.β, m.L, m.σ²[1], needgrad)\n", " if needgrad\n", " BLAS.axpy!(T(1), obs.∇β, m.∇β)\n", " BLAS.axpy!(T(1), obs.∇Σ, m.∇L)\n", " m.∇σ²[1] += obs.∇σ²[1]\n", " end\n", " end\n", " logl\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q4. (20 pts) Test data\n", "\n", "Let's generate a fake longitudinal data set to test our algorithm." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Random.seed!(257)\n", "\n", "# dimension\n", "m = 1000 # number of individuals\n", "ns = rand(1500:2000, m) # numbers of observations per individual\n", "p = 5 # number of fixed effects, including intercept\n", "q = 3 # number of random effects, including intercept\n", "obsvec = Vector{LmmObs{Float64}}(undef, m)\n", "# true parameter values\n", "βtrue = [0.1; 6.5; -3.5; 1.0; 5; zeros(p - 5)]\n", "σ²true = 1.5\n", "σtrue = sqrt(σ²true)\n", "Σtrue = Matrix(Diagonal([2.0; 1.2; 1.0; zeros(q - 3)]))\n", "Ltrue = Matrix(cholesky(Symmetric(Σtrue), Val(true), check=false).L)\n", "# generate data\n", "for i in 1:m\n", " # first column intercept, remaining entries iid std normal\n", " X = Matrix{Float64}(undef, ns[i], p)\n", " X[:, 1] .= 1\n", " @views Distributions.rand!(Normal(), X[:, 2:p])\n", " # first column intercept, remaining entries iid std normal\n", " Z = Matrix{Float64}(undef, ns[i], q)\n", " Z[:, 1] .= 1\n", " @views Distributions.rand!(Normal(), Z[:, 2:q])\n", " # generate y\n", " y = X * βtrue .+ Z * (Ltrue * randn(q)) .+ σtrue * randn(ns[i])\n", " # form a LmmObs instance\n", " obsvec[i] = LmmObs(y, X, Z)\n", "end\n", "# form a LmmModel instance\n", "lmm = LmmModel(obsvec);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For later comparison with other software, we save the data into a text file `lmm_data.csv`. **Do not put this file in Git.** It takes 245.4MB storage." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(isfile(\"lmm_data.csv\") && filesize(\"lmm_data.csv\") == 245369936) || \n", "open(\"lmm_data.csv\", \"w\") do io\n", " p = size(lmm.data[1].X, 2)\n", " q = size(lmm.data[1].Z, 2)\n", " # print header\n", " print(io, \"ID,Y,\")\n", " for j in 1:(p-1)\n", " print(io, \"X\" * string(j) * \",\")\n", " end\n", " for j in 1:(q-1)\n", " print(io, \"Z\" * string(j) * (j < q-1 ? \",\" : \"\\n\"))\n", " end\n", " # print data\n", " for i in eachindex(lmm.data)\n", " obs = lmm.data[i]\n", " for j in 1:length(obs.y)\n", " # id\n", " print(io, i, \",\")\n", " # Y\n", " print(io, obs.y[j], \",\")\n", " # X data\n", " for k in 2:p\n", " print(io, obs.X[j, k], \",\")\n", " end\n", " # Z data\n", " for k in 2:q-1\n", " print(io, obs.Z[j, k], \",\")\n", " end\n", " print(io, obs.Z[j, q], \"\\n\")\n", " end\n", " end\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Correctness\n", "\n", "Evaluate log-likelihood and gradient of whole data set at the true parameter values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "copy!(lmm.β, βtrue)\n", "copy!(lmm.L, Ltrue)\n", "lmm.σ²[1] = σ²true\n", "@show obj = logl!(lmm, true)\n", "@show lmm.∇β\n", "@show lmm.∇σ²\n", "@show lmm.∇L;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Test correctness. You will loss all 20 points if following code throws `AssertError`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@assert abs(obj - (-2.840068438369969e6)) < 1e-4\n", "@assert norm(lmm.∇β - [41.0659167074073, 445.75120353972426, \n", " 157.0133992249258, -335.09977360733626, -895.6257448385899]) < 1e-4\n", "@assert norm(lmm.∇L - [-3.3982575935824837 31.32103842086001 26.73645089732865; \n", " 40.43528672997116 61.86377650461202 -75.37427770754684; \n", " 37.811051468724486 -82.56838431216435 -56.45992542754974]) < 1e-4\n", "@assert abs(lmm.∇σ²[1] - (-489.5361730382465)) < 1e-4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Efficiency\n", "\n", "Test efficiency." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bm_model = @benchmark logl!($lmm, true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My median run time is 3.5ms. You will get full credit if your median run time is within 10ms. The points you will get are" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "clamp(10 / (median(bm_model).time / 1e6) * 10, 0, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Memory\n", "\n", "You will lose 1 point for each 100 bytes memory allocation. So the points you will get is" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "clamp(10 - median(bm_model).memory / 100, 0, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q5. (30 pts) Starting point\n", "\n", "For numerical optimization, a good starting point is critical. Let's start $\\boldsymbol{\\beta}$ and $\\sigma^2$ from the least sqaures solutions (ignoring intra-individual correlations)\n", "\\begin{eqnarray*}\n", "\\boldsymbol{\\beta}^{(0)} &=& \\left(\\sum_i \\mathbf{X}_i^T \\mathbf{X}_i\\right)^{-1} \\left(\\sum_i \\mathbf{X}_i^T \\mathbf{y}_i\\right) \\\\\n", "\\sigma^{2(0)} &=& \\frac{\\sum_i \\|\\mathbf{r}_i^{(0)}\\|_2^2}{\\sum_i n_i} = \\frac{\\sum_i \\|\\mathbf{y}_i - \\mathbf{X}_i \\boldsymbol{\\beta}^{(0)}\\|_2^2}{\\sum_i n_i}.\n", "\\end{eqnarray*}\n", "To get a reasonable starting point for $\\boldsymbol{\\Sigma} = \\mathbf{L} \\mathbf{L}^T$, we can minimize the least squares criterion (ignoring the noise variance component)\n", "$$\n", " \\text{minimize} \\sum_i \\| \\mathbf{r}_i^{(0)} \\mathbf{r}_i^{(0)T} - \\mathbf{Z}_i \\boldsymbol{\\Sigma} \\mathbf{Z}_i^T \\|_{\\text{F}}^2.\n", "$$\n", "Derive the minimizer $\\boldsymbol{\\Sigma}^{(0)}$ (10 pts). \n", "\n", "We implement this start point strategy in the function `init_ls()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", " init_ls!(m::LmmModel)\n", "\n", "Initialize parameters of a `LmmModel` object from the least squares estimate. \n", "`m.β`, `m.L`, and `m.σ²` are overwritten with the least squares estimates.\n", "\"\"\"\n", "function init_ls!(m::LmmModel{T}) where T <: AbstractFloat\n", " p, q = size(m.data[1].X, 2), size(m.data[1].Z, 2)\n", " # TODO: fill m.β, m.L, m.σ² by LS estimates\n", " sleep(1e-3) # pretend this takes 1ms\n", " m\n", "end" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "init_ls!(lmm)\n", "@show logl!(lmm)\n", "@show lmm.β\n", "@show lmm.σ²\n", "@show lmm.L;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Correctness\n", "\n", "Your start points should have a log-likelihood larger than -3.352991e6 (10 pts). The points you get are" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# this is the points you get\n", "(logl!(lmm) > -3.3627e6) * 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Efficiency\n", "\n", "The start point should be computed quickly. Otherwise there is no point using it as a starting point. You get full credit (10 pts) if the median run time is within 1ms." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bm_init = @benchmark init_ls!($lmm)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# this is the points you get\n", "clamp(1 / (median(bm_init).time / 1e6) * 10, 0, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q6. NLP via MathOptInterface.jl\n", "\n", "We define the NLP problem using the modelling tool [MathOptInterface.jl](https://github.com/jump-dev/MathOptInterface.jl). Start-up code is given below. Modify if necessary to accomodate your own code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", " fit!(m::LmmModel, solver=Ipopt.Optimizer())\n", "\n", "Fit an `LmmModel` object by MLE using a nonlinear programming solver. Start point \n", "should be provided in `m.β`, `m.σ²`, `m.L`.\n", "\"\"\"\n", "function fit!(\n", " m :: LmmModel{T},\n", " solver = Ipopt.Optimizer()\n", " ) where T <: AbstractFloat\n", " p = size(m.data[1].X, 2)\n", " q = size(m.data[1].Z, 2)\n", " npar = p + ((q * (q + 1)) >> 1) + 1\n", " # prep the MOI\n", " MOI.empty!(solver)\n", " # set lower bounds and upper bounds of parameters\n", " # q diagonal entries of Cholesky factor L should be >= 0\n", " # σ² should be >= 0\n", " lb = fill(0.0, q + 1)\n", " ub = fill(Inf, q + 1)\n", " NLPBlock = MOI.NLPBlockData(MOI.NLPBoundsPair.(lb, ub), m, true)\n", " MOI.set(solver, MOI.NLPBlock(), NLPBlock)\n", " # start point\n", " params = MOI.add_variables(solver, npar) \n", " par0 = Vector{T}(undef, npar)\n", " modelpar_to_optimpar!(par0, m) \n", " for i in 1:npar\n", " MOI.set(solver, MOI.VariablePrimalStart(), params[i], par0[i])\n", " end\n", " MOI.set(solver, MOI.ObjectiveSense(), MOI.MAX_SENSE)\n", " # optimize\n", " MOI.optimize!(solver)\n", " optstat = MOI.get(solver, MOI.TerminationStatus())\n", " optstat in (MOI.LOCALLY_SOLVED, MOI.ALMOST_LOCALLY_SOLVED) || \n", " @warn(\"Optimization unsuccesful; got $optstat\")\n", " # update parameters and refresh gradient\n", " xsol = [MOI.get(solver, MOI.VariablePrimal(), MOI.VariableIndex(i)) for i in 1:npar]\n", " optimpar_to_modelpar!(m, xsol)\n", " logl!(m, true)\n", " m\n", "end\n", "\n", "\"\"\"\n", " modelpar_to_optimpar!(par, m)\n", "\n", "Translate model parameters in `m` to optimization variables in `par`.\n", "\"\"\"\n", "function modelpar_to_optimpar!(\n", " par :: Vector,\n", " m :: LmmModel\n", " )\n", " p = size(m.data[1].X, 2)\n", " q = size(m.data[1].Z, 2)\n", " # β\n", " copyto!(par, m.β)\n", " # L\n", " offset = p + 1\n", " @inbounds for j in 1:q, i in j:q\n", " par[offset] = m.L[i, j]\n", " offset += 1\n", " end\n", " # σ²\n", " par[end] = m.σ²[1]\n", " par\n", "end\n", "\n", "\"\"\"\n", " optimpar_to_modelpar!(m, par)\n", "\n", "Translate optimization variables in `par` to the model parameters in `m`.\n", "\"\"\"\n", "function optimpar_to_modelpar!(\n", " m :: LmmModel, \n", " par :: Vector\n", " )\n", " p = size(m.data[1].X, 2)\n", " q = size(m.data[1].Z, 2)\n", " # β\n", " copyto!(m.β, 1, par, 1, p)\n", " # L\n", " fill!(m.L, 0)\n", " offset = p + 1\n", " @inbounds for j in 1:q, i in j:q\n", " m.L[i, j] = par[offset]\n", " offset += 1\n", " end\n", " # σ²\n", " m.σ²[1] = par[end] \n", " m\n", "end\n", "\n", "function MOI.initialize(\n", " m :: LmmModel, \n", " requested_features :: Vector{Symbol}\n", " )\n", " for feat in requested_features\n", " if !(feat in MOI.features_available(m))\n", " error(\"Unsupported feature $feat\")\n", " end\n", " end\n", "end\n", "\n", "MOI.features_available(m::LmmModel) = [:Grad, :Hess, :Jac]\n", "\n", "function MOI.eval_objective(\n", " m :: LmmModel, \n", " par :: Vector\n", " )\n", " optimpar_to_modelpar!(m, par)\n", " logl!(m, false) # don't need gradient here\n", "end\n", "\n", "function MOI.eval_objective_gradient(\n", " m :: LmmModel, \n", " grad :: Vector, \n", " par :: Vector\n", " )\n", " p = size(m.data[1].X, 2)\n", " q = size(m.data[1].Z, 2)\n", " optimpar_to_modelpar!(m, par) \n", " obj = logl!(m, true)\n", " # gradient wrt β\n", " copyto!(grad, m.∇β)\n", " # gradient wrt L\n", " offset = p + 1\n", " @inbounds for j in 1:q, i in j:q\n", " grad[offset] = m.∇L[i, j]\n", " offset += 1\n", " end\n", " # gradient with respect to σ²\n", " grad[end] = m.∇σ²[1]\n", " # return objective\n", " obj\n", "end\n", "\n", "function MOI.eval_constraint(m::LmmModel, g, par)\n", " p = size(m.data[1].X, 2)\n", " q = size(m.data[1].Z, 2)\n", " gidx = 1\n", " offset = p + 1\n", " @inbounds for j in 1:q, i in j:q\n", " if i == j\n", " g[gidx] = par[offset]\n", " gidx += 1\n", " end\n", " offset += 1\n", " end\n", " g[end] = par[end]\n", " g\n", "end\n", "\n", "function MOI.jacobian_structure(m::LmmModel)\n", " p = size(m.data[1].X, 2)\n", " q = size(m.data[1].Z, 2)\n", " row = collect(1:(q + 1))\n", " col = Int[]\n", " offset = p + 1\n", " for j in 1:q, i in j:q\n", " (i == j) && push!(col, offset)\n", " offset += 1\n", " end\n", " push!(col, offset)\n", " [(row[i], col[i]) for i in 1:length(row)]\n", "end\n", "\n", "MOI.eval_constraint_jacobian(m::LmmModel, J, par) = fill!(J, 1)\n", "\n", "function MOI.hessian_lagrangian_structure(m::LmmModel)\n", " p = size(m.data[1].X, 2)\n", " q = size(m.data[1].Z, 2) \n", " q◺ = ◺(q)\n", " # we work on the upper triangular part of the Hessian\n", " arr1 = Vector{Int}(undef, ◺(p) + ◺(q◺) + q◺ + 1)\n", " arr2 = Vector{Int}(undef, ◺(p) + ◺(q◺) + q◺ + 1)\n", " # Hββ block\n", " idx = 1 \n", " for j in 1:p, i in 1:j\n", " arr1[idx] = i\n", " arr2[idx] = j\n", " idx += 1\n", " end\n", " # HLL block\n", " for j in 1:q◺, i in 1:j\n", " arr1[idx] = p + i\n", " arr2[idx] = p + j\n", " idx += 1\n", " end\n", " # HLσ² block\n", " for i in (p + 1):(p + q◺)\n", " arr1[idx] = i\n", " arr2[idx] = p + q◺ + 1\n", " idx += 1\n", " end\n", " # Hσ²σ² block\n", " arr1[idx] = p + q◺ + 1\n", " arr2[idx] = p + q◺ + 1\n", " [(arr1[i], arr2[i]) for i in 1:length(arr1)]\n", "end\n", "\n", "function MOI.eval_hessian_lagrangian(\n", " m :: LmmModel, \n", " H :: AbstractVector{T},\n", " par :: AbstractVector{T}, \n", " σ :: T, \n", " μ :: AbstractVector{T}\n", " ) where {T} \n", " p = size(m.data[1].X, 2)\n", " q = size(m.data[1].Z, 2) \n", " q◺ = ◺(q)\n", " optimpar_to_modelpar!(m, par)\n", " logl!(m, true, true)\n", " # Hββ block\n", " idx = 1 \n", " @inbounds for j in 1:p, i in 1:j\n", " H[idx] = m.Hββ[i, j]\n", " idx += 1\n", " end\n", " # HLL block\n", " @inbounds for j in 1:q◺, i in 1:j\n", " H[idx] = m.HLL[i, j]\n", " idx += 1\n", " end\n", " # HLσ² block\n", " @inbounds for j in 1:q, i in j:q\n", " H[idx] = m.Hσ²L[i, j]\n", " idx += 1\n", " end\n", " # Hσ²σ² block\n", " H[idx] = m.Hσ²σ²[1, 1]\n", " lmul!(σ, H)\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q7. (20 pts) Test drive\n", "\n", "Now we can run any NLP solver (supported by MathOptInterface.jl) to compute the MLE. For grading purpose, let's use the `:LD_MMA` ([Method of Moving Asymptotes](https://nlopt.readthedocs.io/en/latest/NLopt_Algorithms/#mma-method-of-moving-asymptotes-and-ccsa)) algorithm in NLopt.jl." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# initialize from least squares\n", "init_ls!(lmm)\n", "println(\"objective value at starting point: \", logl!(lmm)); println()\n", "\n", "# NLopt (LD_MMA) obj. val = -2.8400587866501934e6\n", "NLopt_solver = NLopt.Optimizer()\n", "MOI.set(NLopt_solver, MOI.RawOptimizerAttribute(\"algorithm\"), :LD_MMA)\n", "@time fit!(lmm, NLopt_solver)\n", "\n", "println(\"objective value at solution: $(logl!(lmm)))\")\n", "println(\"solution values:\")\n", "@show lmm.β\n", "@show lmm.σ²\n", "@show lmm.L * transpose(lmm.L)\n", "println(\"gradient @ solution:\")\n", "@show lmm.∇β\n", "@show lmm.∇σ²\n", "@show lmm.∇L\n", "@show norm([lmm.∇β; vec(LowerTriangular(lmm.∇L)); lmm.∇σ²])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Correctness\n", "\n", "You get 10 points if the following code does not throw `AssertError`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# objective at solution should be close enough to the optimal\n", "@assert logl!(lmm) > -2.840059e6\n", "# gradient at solution should be small enough\n", "@assert norm([lmm.∇β; vec(LowerTriangular(lmm.∇L)); lmm.∇σ²]) < 0.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Efficiency\n", "\n", "My median run time is 140ms. You get 10 points if your median time is within 1s(=1000ms)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "NLopt_solver = NLopt.Optimizer()\n", "MOI.set(NLopt_solver, MOI.RawOptimizerAttribute(\"algorithm\"), :LD_MMA)\n", "bm_mma = @benchmark fit!($lmm, $(NLopt_solver)) setup=(init_ls!(lmm))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# this is the points you get\n", "clamp(1 / (median(bm_mma).time / 1e9) * 10, 0, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q8. (10 pts) Gradient free vs gradient-based methods\n", "\n", "Advantage of using a modelling tool such as MathOptInterface.jl is that we can easily switch the backend solvers. For a research problem, we never know beforehand which solver works best. \n", "\n", "Try different solvers in the NLopt.jl and Ipopt.jl packages. Compare the results in terms of run times (the shorter the better), objective values at solution (the larger the better), and gradients at solution (closer to 0 the better). Summarize what you find.\n", "\n", "See this [page](https://nlopt.readthedocs.io/en/latest/NLopt_Algorithms/) for the descriptions of algorithms in NLopt.\n", "\n", "Documentation for the Ipopt can be found [here](https://coin-or.github.io/Ipopt/). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# vector of solvers to compare\n", "solvers = [\"NLopt (LN_COBYLA, gradient free)\", \"NLopt (LD_MMA, gradient-based)\", \n", " \"Ipopt (L-BFGS)\"]\n", "\n", "function setup_solver(s::String)\n", " if s == \"NLopt (LN_COBYLA, gradient free)\"\n", " solver = NLopt.Optimizer()\n", " MOI.set(solver, MOI.RawOptimizerAttribute(\"algorithm\"), :LN_COBYLA)\n", " elseif s == \"NLopt (LD_MMA, gradient-based)\"\n", " solver = NLopt.Optimizer()\n", " MOI.set(solver, MOI.RawOptimizerAttribute(\"algorithm\"), :LD_MMA)\n", " elseif s == \"Ipopt (L-BFGS)\"\n", " solver = Ipopt.Optimizer()\n", " MOI.set(solver, MOI.RawOptimizerAttribute(\"print_level\"), 0)\n", " MOI.set(solver, MOI.RawOptimizerAttribute(\"hessian_approximation\"), \"limited-memory\")\n", " MOI.set(solver, MOI.RawOptimizerAttribute(\"tol\"), 1e-6)\n", " elseif s == \"Ipopt (use FIM)\"\n", " # Ipopt (use Hessian) obj val = -2.8400587866468e6\n", " solver = Ipopt.Optimizer()\n", " MOI.set(solver, MOI.RawOptimizerAttribute(\"print_level\"), 0) \n", " else\n", " error(\"unrecognized solver $s\")\n", " end\n", " solver\n", "end\n", "\n", "# containers for results\n", "runtime = zeros(length(solvers))\n", "objvals = zeros(length(solvers))\n", "gradnrm = zeros(length(solvers))\n", "\n", "for i in 1:length(solvers)\n", " solver = setup_solver(solvers[i])\n", " bm = @benchmark fit!($lmm, $solver) setup = (init_ls!(lmm))\n", " runtime[i] = median(bm).time / 1e9\n", " objvals[i] = logl!(lmm, true)\n", " gradnrm[i] = norm([lmm.∇β; vec(LowerTriangular(lmm.∇L)); lmm.∇σ²])\n", "end" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# display results\n", "pretty_table(\n", " hcat(solvers, runtime, objvals, gradnrm),\n", " header = [\"Solver\", \"Runtime\", \"Log-Like\", \"Gradiant Norm\"],\n", " formatters = (ft_printf(\"%5.2f\", 2), ft_printf(\"%8.8f\", 3:4))\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q9. (10 pts) Compare with existing art\n", "\n", "Let's compare our method with lme4 package in R and MixedModels.jl package in Julia. Both lme4 and MixedModels.jl are developed mainly by Doug Bates. Summarize what you find." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "method = [\"My method\", \"lme4\", \"MixedModels.jl\"]\n", "runtime = zeros(3) # record the run times\n", "loglike = zeros(3); # record the log-likelihood at MLE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Your approach" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "solver = setup_solver(\"NLopt (LD_MMA, gradient-based)\")\n", "bm_257 = @benchmark fit!($lmm, $solver) setup=(init_ls!(lmm))\n", "runtime[1] = (median(bm_257).time) / 1e9\n", "loglike[1] = logl!(lmm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### lme4 " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "R\"\"\"\n", "library(lme4)\n", "library(readr)\n", "library(magrittr)\n", "\n", "testdata <- read_csv(\"lmm_data.txt\")\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "R\"\"\"\n", "rtime <- system.time(mmod <- \n", " lmer(Y ~ X1 + X2 + X3 + X4 + (1 + Z1 + Z2 | ID), testdata, REML = FALSE))\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "R\"\"\"\n", "rtime <- rtime[\"elapsed\"]\n", "summary(mmod)\n", "rlogl <- logLik(mmod)\n", "\"\"\"\n", "runtime[2] = @rget rtime\n", "loglike[2] = @rget rlogl;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### MixedModels.jl\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "testdata = CSV.File(\"lmm_data.txt\", types = Dict(1=>String)) |> DataFrame!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mj = fit(MixedModel, @formula(Y ~ X1 + X2 + X3 + X4 + (1 + Z1 + Z2 | ID)), testdata)\n", "bm_mm = @benchmark fit(MixedModel, @formula(Y ~ X1 + X2 + X3 + X4 + (1 + Z1 + Z2 | ID)), $testdata)\n", "loglike[3] = loglikelihood(mj)\n", "runtime[3] = median(bm_mm).time / 1e9" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display(bm_mm)\n", "mj" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pretty_table(\n", " hcat(method, runtime, loglike),\n", " header = [\"Method\", \"Runtime\", \"Log-Like\"],\n", " formatters = (ft_printf(\"%5.2f\", 2), ft_printf(\"%8.6f\", 3))\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q9. Be proud of yourself\n", "\n", "Go to your resume/cv and claim you have experience performing analysis on complex longitudinal data sets with millions of records. And you beat current software by XXX fold." ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Julia 1.7.1", "language": "julia", "name": "julia-1.7" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.7.1" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "87px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "skip_h1_title": true, "threshold": 4, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 4 }