--- title: Biostat/Biomath M257 Homework 5 subtitle: 'Due May 26 @ 11:59PM' author: Student Name and UID date: today format: html: theme: cosmo embed-resources: true number-sections: true toc: true toc-depth: 4 toc-location: left jupyter: jupytext: formats: 'ipynb,qmd' text_representation: extension: .qmd format_name: quarto format_version: '1.0' jupytext_version: 1.14.5 kernelspec: display_name: Julia 1.9.0 language: julia name: julia-1.9 --- System information (for reproducibility): ```{julia} versioninfo() ``` Load packages: ```{julia} using Pkg Pkg.activate(pwd()) Pkg.instantiate() Pkg.status() ``` In this exercise, we practice using disciplined convex programming (SDP in particular) to solve optimal design problems. ## Introduction to optimal design Consider a linear model \begin{eqnarray*} y_i = \mathbf{x}_i^T \boldsymbol{\beta} + \epsilon_i, \quad i = 1,\ldots, n, \end{eqnarray*} where $\epsilon_i$ are independent Gaussian noises with common variance $\sigma^2$. It is well known that the least squares estimate $\hat{\boldsymbol{\beta}}$ is unbiased and has covariance $\sigma^2 (\sum_{i=1}^n \mathbf{x}_i \mathbf{x}_i^T)^{-1}$. In **exact optimal design**, given total number of $n$ allowable experiments, we want to choose among a list of $m$ candidate design points $\{\mathbf{x}_1, \ldots, \mathbf{x}_m\}$ such that the covariance matrix is minimized in some sense. In mathematical terms, we want to find an integer vector $\mathbf{n} = (n_1, \ldots, n_m)$ such that $n_i \ge 0$, $\sum_{i=1}^m n_i = n$, and the matrix $\mathbf{V} = \left( \sum_{i=1}^m n_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1}$ is "small". In **approximate optimal design**, we want to find a probability vector $\mathbf{p} = (p_1, \ldots, p_m)$ such that $p_i \ge 0$, $\sum_{i=1}^m p_i = 1$, and the matrix $\mathbf{V} = \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1}$ is "small". Commonly used optimal design criteria include: - In **$D$-optimal design**, we minimize the determinant of $\mathbf{V}$ \begin{eqnarray*} &\text{minimize}& \det \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1} \\ &\text{subject to}& p_i \ge 0, \sum_{i=1}^m p_i = 1. \end{eqnarray*} - In **$E$-optimal design**, we minimize the spectral norm, i.e., the maximum eigenvalue of $\mathbf{V}$ \begin{eqnarray*} &\text{minimize}& \lambda_{\text{max}} \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1} \\ &\text{subject to}& p_i \ge 0, \sum_{i=1}^m p_i = 1. \end{eqnarray*} Statistically we are minimizing the maximum variance of $\sum_{j=1}^p a_j \text{var}(\hat \beta_j)$ over all vectors $\mathbf{a}$ with unit norm. - In **$A$-optimal design**, we minimize the trace of $\mathbf{V}$ \begin{eqnarray*} &\text{minimize}& \text{tr} \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1} \\ &\text{subject to}& p_i \ge 0, \sum_{i=1}^m p_i = 1. \end{eqnarray*} Statistically we are minimizing the total variance $\sum_{j=1}^p \text{var}(\hat \beta_j)$. ## Q1 (10 pts) 3x4 factorial design A drug company ask you to help design a two factor clinical trial, in which treatment A has three levels (A1, A2, and A3) and treatment B has four levels (B1, B2, B3, and B4). Drug company also tells you that the treatment combination A3:B4 has undesirable side effects so we ignore this design point. Using dummy coding with A1 and B1 as the baseline levels, find the matrix $C$ with each row a unique design point. ## Q2 (30 pts) Find approximate optimal designs Using semidefinite programming (SDP) software to find the approximate D-, E-, and A-optimal designs for this clinical trial. Hint: This is what I got, which may or may not be correct. ``` Approximate Optimal Design ┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐ │ design_pt │ D_opt │ E_opt │ A_opt │ D_opt_n │ E_opt_n │ A_opt_n │ │ String │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ ├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤ │ A1B1 │ 0.082 │ 0.272 │ 0.200 │ 8 │ 27 │ 20 │ │ A2B1 │ 0.082 │ 0.152 │ 0.101 │ 8 │ 15 │ 10 │ │ A3B1 │ 0.097 │ 0.114 │ 0.104 │ 10 │ 11 │ 10 │ │ A1B2 │ 0.082 │ 0.057 │ 0.086 │ 8 │ 6 │ 9 │ │ A2B2 │ 0.082 │ 0.039 │ 0.051 │ 8 │ 4 │ 5 │ │ A3B2 │ 0.097 │ 0.057 │ 0.068 │ 10 │ 6 │ 7 │ │ A1B3 │ 0.082 │ 0.057 │ 0.086 │ 8 │ 6 │ 9 │ │ A2B3 │ 0.082 │ 0.039 │ 0.051 │ 8 │ 4 │ 5 │ │ A3B3 │ 0.097 │ 0.057 │ 0.068 │ 10 │ 6 │ 7 │ │ A1B4 │ 0.109 │ 0.081 │ 0.106 │ 11 │ 8 │ 11 │ │ A2B4 │ 0.109 │ 0.073 │ 0.080 │ 11 │ 7 │ 8 │ └───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘ ``` ## Q3 (30 pts) Find exact optimal designs Using mixed-integer semidefinite programming (SDP) software to find the exact D-, E-, and A-optimal designs for this clinical trial **with $n=100$**. Hint: This is what I got. Apparently I haven't got the E-optimal design right yet. ``` Exact Optimal Design ┌───────────┬───────┬───────┬───────┐ │ design_pt │ D_opt │ E_opt │ A_opt │ │ String │ Int64 │ Int64 │ Int64 │ ├───────────┼───────┼───────┼───────┤ │ A1B1 │ 8 │ 90 │ 20 │ │ A2B1 │ 8 │ 1 │ 10 │ │ A3B1 │ 10 │ 1 │ 10 │ │ A1B2 │ 8 │ 1 │ 9 │ │ A2B2 │ 8 │ 1 │ 5 │ │ A3B2 │ 10 │ 1 │ 7 │ │ A1B3 │ 8 │ 1 │ 9 │ │ A2B3 │ 8 │ 1 │ 5 │ │ A3B3 │ 10 │ 1 │ 7 │ │ A1B4 │ 11 │ 1 │ 10 │ │ A2B4 │ 11 │ 1 │ 8 │ └───────────┴───────┴───────┴───────┘ ``` ## Q4 (30 bonus points) Optimal design with nuisance parameters Suppose the regression coefficients of linear model $\boldsymbol{\beta}$ is partitioned as $\boldsymbol{\beta} = (\boldsymbol{\beta}_0^T, \boldsymbol{\beta}_1^T)^T$, where $\boldsymbol{\beta}_0$ are nuisance parameters and $\boldsymbol{\beta}_1$ are parameters of primary interest. Given an approximate design $\mathbf{p} = (p_1, \ldots, p_m)$, let the information matrix be partitioned accordingly $$ \mathbf{I}(\mathbf{p}) = \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T = \begin{pmatrix} \mathbf{I}_{00} & \mathbf{I}_{01} \\ \mathbf{I}_{10} & \mathbf{I}_{11} \end{pmatrix}. $$ Then the information matrix for $\boldsymbol{\beta}_1$ adjusted for nuisance parameter $\boldsymbol{\beta}_0$ is $$ \mathbf{I}_{1 \mid 0}(\mathbf{p}) = \mathbf{I}_{11} - \mathbf{I}_{10} \mathbf{I}_{00}^{-1} \mathbf{I}_{01}. $$ Revisiting the 3x4 factorial design problem in Q1, suppose the drug company only cares about the estimation of A treatment effects. Find the approximate D-, E-, and A-optimal designs.