{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ "---\n", "title: Biostat/Biomath M257 Homework 5\n", "subtitle: 'Due May 26 @ 11:59PM'\n", "author: Student Name and UID\n", "date: today\n", "format:\n", " html:\n", " theme: cosmo\n", " embed-resources: true\n", " number-sections: true\n", " toc: true\n", " toc-depth: 4\n", " toc-location: left\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "System information (for reproducibility):" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Julia Version 1.9.0\n", "Commit 8e630552924 (2023-05-07 11:25 UTC)\n", "Platform Info:\n", " OS: macOS (arm64-apple-darwin22.4.0)\n", " CPU: 12 × Apple M2 Max\n", " WORD_SIZE: 64\n", " LIBM: libopenlibm\n", " LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)\n", " Threads: 9 on 8 virtual cores\n", "Environment:\n", " JULIA_NUM_THREADS = 8\n", " JULIA_EDITOR = code\n" ] } ], "source": [ "versioninfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load packages:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32m\u001b[1m Activating\u001b[22m\u001b[39m project at `~/Documents/github.com/ucla-biostat-257/2023spring/hw/hw5`\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32m\u001b[1mStatus\u001b[22m\u001b[39m `~/Documents/github.com/ucla-biostat-257/2023spring/hw/hw5/Project.toml`\n", " \u001b[90m[f65535da] \u001b[39mConvex v0.15.3\n", " \u001b[90m[a93c6f00] \u001b[39mDataFrames v1.5.0\n", " \u001b[90m[87dc4568] \u001b[39mHiGHS v1.5.1\n", " \u001b[90m[b99e6be6] \u001b[39mHypatia v0.7.2\n", " \u001b[90m[4076af6c] \u001b[39mJuMP v1.11.0\n", " \u001b[90m[2f354839] \u001b[39mPajarito v0.8.2\n", " \u001b[90m[08abe8d2] \u001b[39mPrettyTables v2.2.4\n", " \u001b[90m[3eaba693] \u001b[39mStatsModels v0.7.2\n" ] } ], "source": [ "using Pkg\n", "\n", "Pkg.activate(pwd())\n", "Pkg.instantiate()\n", "Pkg.status()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this exercise, we practice using disciplined convex programming (SDP in particular) to solve optimal design problems.\n", "\n", "## Introduction to optimal design\n", "\n", "Consider a linear model\n", "\\begin{eqnarray*}\n", "\ty_i = \\mathbf{x}_i^T \\boldsymbol{\\beta} + \\epsilon_i, \\quad i = 1,\\ldots, n,\n", "\\end{eqnarray*}\n", "where $\\epsilon_i$ are independent Gaussian noises with common variance $\\sigma^2$. It is well known that the least squares estimate $\\hat{\\boldsymbol{\\beta}}$ is unbiased and has covariance $\\sigma^2 (\\sum_{i=1}^n \\mathbf{x}_i \\mathbf{x}_i^T)^{-1}$. \n", "\n", "In **exact optimal design**, given total number of $n$ allowable experiments, we want to choose among a list of $m$ candidate design points $\\{\\mathbf{x}_1, \\ldots, \\mathbf{x}_m\\}$ such that the covariance matrix is minimized in some sense. In mathematical terms, we want to find an integer vector $\\mathbf{n} = (n_1, \\ldots, n_m)$ such that $n_i \\ge 0$, $\\sum_{i=1}^m n_i = n$, and the matrix $\\mathbf{V} = \\left( \\sum_{i=1}^m n_i \\mathbf{x}_i \\mathbf{x}_i^T \\right)^{-1}$ is \"small\".\n", "\n", "In **approximate optimal design**, we want to find a probability vector $\\mathbf{p} = (p_1, \\ldots, p_m)$ such that $p_i \\ge 0$, $\\sum_{i=1}^m p_i = 1$, and the matrix $\\mathbf{V} = \\left( \\sum_{i=1}^m p_i \\mathbf{x}_i \\mathbf{x}_i^T \\right)^{-1}$ is \"small\".\n", "\n", "Commonly used optimal design criteria include:\n", "\n", "- In **$D$-optimal design**, we minimize the determinant of $\\mathbf{V}$\n", "\\begin{eqnarray*}\n", "\t&\\text{minimize}& \\det \\left( \\sum_{i=1}^m p_i \\mathbf{x}_i \\mathbf{x}_i^T \\right)^{-1} \\\\\n", "\t&\\text{subject to}& p_i \\ge 0, \\sum_{i=1}^m p_i = 1.\n", "\\end{eqnarray*}\n", "\n", "- In **$E$-optimal design**, we minimize the spectral norm, i.e., the maximum eigenvalue of $\\mathbf{V}$\n", "\\begin{eqnarray*}\n", "\t&\\text{minimize}& \\lambda_{\\text{max}} \\left( \\sum_{i=1}^m p_i \\mathbf{x}_i \\mathbf{x}_i^T \\right)^{-1} \\\\\n", "\t&\\text{subject to}& p_i \\ge 0, \\sum_{i=1}^m p_i = 1.\t\n", "\\end{eqnarray*}\n", "Statistically we are minimizing the maximum variance of $\\sum_{j=1}^p a_j \\text{var}(\\hat \\beta_j)$ over all vectors $\\mathbf{a}$ with unit norm.\n", "\n", "- In **$A$-optimal design**, we minimize the trace of $\\mathbf{V}$\n", "\\begin{eqnarray*}\n", "\t&\\text{minimize}& \\text{tr} \\left( \\sum_{i=1}^m p_i \\mathbf{x}_i \\mathbf{x}_i^T \\right)^{-1} \\\\\n", "\t&\\text{subject to}& p_i \\ge 0, \\sum_{i=1}^m p_i = 1.\n", "\\end{eqnarray*}\n", "Statistically we are minimizing the total variance $\\sum_{j=1}^p \\text{var}(\\hat \\beta_j)$.\n", "\n", "## Q1 (10 pts) 3x4 factorial design\n", "\n", "A drug company ask you to help design a two factor clinical trial, in which treatment A has three levels (A1, A2, and A3) and treatment B has four levels (B1, B2, B3, and B4). Drug company also tells you that the treatment combination A3:B4 has undesirable side effects so we ignore this design point. \n", "\n", "Using dummy coding with A1 and B1 as the baseline levels, find the matrix $C$ with each row a unique design point.\n", "\n", "## Q2 (30 pts) Find approximate optimal designs\n", "\n", "Using semidefinite programming (SDP) software to find the approximate D-, E-, and A-optimal designs for this clinical trial.\n", "\n", "Hint: This is what I got, which may or may not be correct.\n", "\n", "```\n", "Approximate Optimal Design\n", "┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐\n", "│ design_pt │ D_opt │ E_opt │ A_opt │ D_opt_n │ E_opt_n │ A_opt_n │\n", "│ String │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │\n", "├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤\n", "│ A1B1 │ 0.082 │ 0.272 │ 0.200 │ 8 │ 27 │ 20 │\n", "│ A2B1 │ 0.082 │ 0.152 │ 0.101 │ 8 │ 15 │ 10 │\n", "│ A3B1 │ 0.097 │ 0.114 │ 0.104 │ 10 │ 11 │ 10 │\n", "│ A1B2 │ 0.082 │ 0.057 │ 0.086 │ 8 │ 6 │ 9 │\n", "│ A2B2 │ 0.082 │ 0.039 │ 0.051 │ 8 │ 4 │ 5 │\n", "│ A3B2 │ 0.097 │ 0.057 │ 0.068 │ 10 │ 6 │ 7 │\n", "│ A1B3 │ 0.082 │ 0.057 │ 0.086 │ 8 │ 6 │ 9 │\n", "│ A2B3 │ 0.082 │ 0.039 │ 0.051 │ 8 │ 4 │ 5 │\n", "│ A3B3 │ 0.097 │ 0.057 │ 0.068 │ 10 │ 6 │ 7 │\n", "│ A1B4 │ 0.109 │ 0.081 │ 0.106 │ 11 │ 8 │ 11 │\n", "│ A2B4 │ 0.109 │ 0.073 │ 0.080 │ 11 │ 7 │ 8 │\n", "└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘\n", "```\n", "\n", "## Q3 (30 pts) Find exact optimal designs\n", "\n", "Using mixed-integer semidefinite programming (SDP) software to find the exact D-, E-, and A-optimal designs for this clinical trial **with $n=100$**.\n", "\n", "Hint: This is what I got. Apparently I haven't got the E-optimal design right yet.\n", "\n", "```\n", "Exact Optimal Design\n", "┌───────────┬───────┬───────┬───────┐\n", "│ design_pt │ D_opt │ E_opt │ A_opt │\n", "│ String │ Int64 │ Int64 │ Int64 │\n", "├───────────┼───────┼───────┼───────┤\n", "│ A1B1 │ 8 │ 90 │ 20 │\n", "│ A2B1 │ 8 │ 1 │ 10 │\n", "│ A3B1 │ 10 │ 1 │ 10 │\n", "│ A1B2 │ 8 │ 1 │ 9 │\n", "│ A2B2 │ 8 │ 1 │ 5 │\n", "│ A3B2 │ 10 │ 1 │ 7 │\n", "│ A1B3 │ 8 │ 1 │ 9 │\n", "│ A2B3 │ 8 │ 1 │ 5 │\n", "│ A3B3 │ 10 │ 1 │ 7 │\n", "│ A1B4 │ 11 │ 1 │ 10 │\n", "│ A2B4 │ 11 │ 1 │ 8 │\n", "└───────────┴───────┴───────┴───────┘\n", "```\n", "\n", "## Q4 (30 bonus points) Optimal design with nuisance parameters\n", "\n", "Suppose the regression coefficients of linear model $\\boldsymbol{\\beta}$ is partitioned as $\\boldsymbol{\\beta} = (\\boldsymbol{\\beta}_0^T, \\boldsymbol{\\beta}_1^T)^T$, where $\\boldsymbol{\\beta}_0$ are nuisance parameters and $\\boldsymbol{\\beta}_1$ are parameters of primary interest. Given an approximate design $\\mathbf{p} = (p_1, \\ldots, p_m)$, let the information matrix be partitioned accordingly\n", "$$\n", "\\mathbf{I}(\\mathbf{p}) = \\sum_{i=1}^m p_i \\mathbf{x}_i \\mathbf{x}_i^T = \\begin{pmatrix}\n", "\\mathbf{I}_{00} & \\mathbf{I}_{01} \\\\\n", "\\mathbf{I}_{10} & \\mathbf{I}_{11}\n", "\\end{pmatrix}.\n", "$$\n", "Then the information matrix for $\\boldsymbol{\\beta}_1$ adjusted for nuisance parameter $\\boldsymbol{\\beta}_0$ is\n", "$$\n", "\\mathbf{I}_{1 \\mid 0}(\\mathbf{p}) = \\mathbf{I}_{11} - \\mathbf{I}_{10} \\mathbf{I}_{00}^{-1} \\mathbf{I}_{01}.\n", "$$\n", "\n", "Revisiting the 3x4 factorial design problem in Q1, suppose the drug company only cares about the estimation of A treatment effects. Find the approximate D-, E-, and A-optimal designs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "jupytext": { "formats": "ipynb,qmd" }, "kernelspec": { "display_name": "Julia 1.9.0", "language": "julia", "name": "julia-1.9" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.9.0" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "87px", "width": "252px" }, "navigate_menu": true, "number_sections": false, "sideBar": true, "skip_h1_title": true, "threshold": 4, "toc_cell": false, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 4 }