--- title: The geometry of the balanced ANOVA model (with fixed effects) date : 2014-01-14 --- &lead {r setup0, echo=FALSE} opts_chunk$set(fig.path="assets/fig/AV1fixed")  {r setup, echo=FALSE} library(shiny) html.box <- function(title, body, style.title = "") { html <- withTags(div(class = "title_box", div(id = "title", style = "%s", "%s"), div(id = "content", "%s"))) if (is.list(style.title)) { temp <- paste(c(names(style.title), ""), collapse = ":%s;") style.title <- do.call(function(...) sprintf(temp, ...), unname(style.title)) } sprintf(as.character(html), style.title, title, body) }  Most usually, the mathematical treatment of Gaussian linear models starts with the matricial writing$Y=X\beta+\sigma G$, where$Y$is a random vector modelling the$n$response values,$X$is a known matrix,$\beta$is a vector of unknown parameters, and$G$has the standard normal distribution on$\mathbb{R}^n$. There are good reasons to use this matricial writing, however it is cleaner to treat the theory with the equivalent vector space notation$Y = \mu + \sigma G$, where$\mu$is assumed to lie in a linear subspace$W$of$\mathbb{R}^n$, corresponding to$\text{Im}(X)$in the matricial notation. For example, denoting by$P_W$the orthogonal projection on$W$, the least-squares estimate$\hat\mu$of$\mu$is simply given by$\hat\mu=P_Wy$and the residuals are$P_W^\perp y$, denoting by$P^\perp_W$the projection on the orthogonal complement of$W$, and there is no need to consider$W=\text{Im}(X)$to derive the general principles of the theory. The balanced one-way ANOVA model, which is the topic of this article, illustrates this approach. ## Standard normal distribution on a vector space The main tool used to treat the theory of Gaussian linear models is the standard normal distribution on a linear space. r html.box("Theorem and definition", "Let$X$be a$\\mathbb{R}^n$-valued random vector, and$W \\subset \\mathbb{R}^n$be a linear space. Say that$X$has the standard normal distribution on the vector space$W$, and then note$X \\sim SN(W)$, if it takes its values in$W$and its characteristic function is given by $$\\mathbb{E} \\textrm{e}^{i\\langle w, X \\rangle} = \\textrm{e}^{-\\frac12{\\Vert w \\Vert}^2} \\quad \\text{for all } w \\in W.$$ The three following assertions are equivalent (and this is easy to prove): 1.$X \\sim SN(W)$; 2. the coordinates of$X$in some orthonormal basis of$W$are i.i.d. standard normal random variables; 3. the coordinates of$X$in any orthonormal basis of$W$are i.i.d. standard normal random variables. ", style.title=list(color="blue", "font-size"="30px")) Of course we retrieve the standard normal distribution on$\mathbb{R}^n$when taking$W=\mathbb{R}^n$. From this definition-theorem, the so-called *Cochran's theorem* is an obvious statement. More precisely, if$U \subset W$is a linear space, and$Z=U^\perp \cap W$is the orthogonal complement of$U$in$W$, then the projection$P_UX$of$X$on$U$has the standard normal distribution on$U$, similarly the projection$P_ZX$of$X$on$Z$has the standard normal distribution on$Z$, and moreover$P_UX$and$P_ZX$are independent. This is straightforward to see from the definition-theorem of$SN(W)$, and it is also easy to see that${\Vert P_UX\Vert}^2 \sim \chi^2_{\dim(U)}$. ## The balanced ANOVA model The balanced ANOVA model is used to model a sample$y=(y_{ij})$with a tabular structure: $$y=\begin{pmatrix} y_{11} & \ldots & y_{1J} \\ \vdots & y_{ij} & \vdots \\ y_{I1} & \ldots & y_{IJ} \end{pmatrix},$$$y_{ij}$denoting the$j$-th measurement in group$i$. It is assumed that the$y_{ij}$are independent and the population mean depends on the group index$i$. More precisely, the$y_{ij}$are modelled by random variables$Y_{ij} \sim_{\text{iid}} {\cal N}(\mu_i, \sigma^2)$. So, how to write this model as$Y=\mu + \sigma G$where$G \sim SN(\mathbb{R}^n)$and$\mu$lies in a linear space$W \subset \mathbb{R}^n$? ## Tensor product Here$n=IJ$and one should consider$Y$as the vector obtained by stacking the$Y_{ij}$. For example if$I=2$and$J=3$, we should write $$Y={(Y_{11}, Y_{12}, Y_{13}, Y_{21}, Y_{22}, Y_{23})}'.$$ Actually this is not a good idea to loose the tabular structure. The appropriate approach for writing the balanced ANOVA model involves the *tensor product*. We keep the tabular structure of the data: $$Y = \begin{pmatrix} Y_{11} & Y_{12} & Y_{13} \\ Y_{21} & Y_{22} & Y_{23} \end{pmatrix}$$ and we take $$G \sim SN(\mathbb{R}^I\otimes\mathbb{R}^J)$$ where the *tensor poduct*$\mathbb{R}^I\otimes\mathbb{R}^J$of$\mathbb{R}^I$and$\mathbb{R}^J$is nothing but the space of matrices with$I$rows and$J$columns. Here $$\mu = \begin{pmatrix} \mu_1 & \mu_1 & \mu_1 \\ \mu_2 & \mu_2 & \mu_2 \end{pmatrix},$$ lies in a linear space$W \subset \mathbb{R}^I\otimes\mathbb{R}^J$which is convenient to define with the help of the *tensor product*$x \otimes y$of two vectors$x \in \mathbb{R}^I$and$y \in \mathbb{R}^J$, defined as the element of$\mathbb{R}^I\otimes\mathbb{R}^J$given by $${(x \otimes y)}_{ij}=x_iy_j.$$ Indeed, one has $$\mu = (\mu_1, \mu_2) \otimes (1,1,1),$$ and then the linear space$W$in which$\mu$is assumed to lie is $$W = \mathbb{R}^I\otimes{\bf 1}_J.$$ Moreover, there is a nice orthogonal decomposition of$W$corresponding to the usual other parameterization of the model: $$\boxed{\mu_i = m + \alpha_i} \quad \text{with } \sum_{i=1}^I\alpha_i=0.$$ Indeed, writing$\mathbb{R}^I=[{\bf 1}_I] \oplus {[{\bf 1}_I]}^\perp$yields the following decomposition of$\mu: \begin{align*} \mu = (\mu_1, \ldots, \mu_I) \otimes {\bf 1}_J & = \begin{pmatrix} m & m & m \\ m & m & m \end{pmatrix} + \begin{pmatrix} \alpha_1 & \alpha_1 & \alpha_1 \\ \alpha_2 & \alpha_2 & \alpha_2 \end{pmatrix} \\ & = \underset{\in \bigl([{\bf 1}_I]\otimes[{\bf 1}_J]\bigr)}{\underbrace{m({\bf 1}_I\otimes{\bf 1}_J)}} + \underset{\in \bigl([{\bf 1}_I]^{\perp}\otimes[{\bf 1}_J] \bigr)}{\underbrace{(\alpha_1,\ldots,\alpha_I)\otimes{\bf 1}_J}} \end{align*} ## Least-squares estimates With the theory introduced above, the least-squares estimates ofm$and the$\alpha_i$are given by$\hat m({\bf 1}_I\otimes{\bf 1}_J) = P_U y$and$\hat\alpha\otimes{\bf 1}_J = P_Zy$where$U = [{\bf 1}_I]\otimes[{\bf 1}_J]$and$Z = {[{\bf 1}_I]}^{\perp}\otimes[{\bf 1}_J] = U^\perp \cap W$, and we also know that$\hat m$and the$\hat\alpha_i$are independent. The least-squares estimates of the$\mu_i$are given by$\hat\mu_i=\hat m +\hat\alpha_i\$. Deriving the expression of these estimates and their distribution is left as an exercise to the reader.