---
author: Stéphane Laurent
date: '2019-07-12'
highlighter: 'pandoc-solarized'
linenums: True
output:
  html_document:
    highlight: kate
    keep_md: False
  md_document:
    preserve_yaml: True
    variant: markdown
prettify: True
prettifycss: minimal
tags: 'maths, statistics'
title: 'The chi-square approximation of Pearson''s statistic'
---

Our goal is to derive the asymptotic distribution of Pearson's statistic
for goodness-of-fit testing. We follow the method given in David
Williams's book *Weighing the odds*, but we provide a bit more details.

Let $Y_1$, $\ldots$, $Y_n$ be independent and identically distibuted
random variables in $\{1, \ldots, b\}$ whose common distribution is
given by $$
\Pr(Y_m = k) = p_k
$$ with $\sum_{k=1}^b p_k = 1$.

Denoting by $N_k$ the number of $Y_m$ equal to $k$, and setting $$
W_k = \frac{N_k - np_k}{\sqrt{np_k}},
$$ the *Pearson statistic* is $\sum_{k=1}^b W_k^2$. To derive its
asymptotic distribution, we firstly derive the one of the random vector
${(W_1, \ldots, W_k)}'$.

Define the random variables $$
X_k^{(m)} = \begin{cases}
1 & \text{if } Y_m = k \\
0 & \text{if } Y_m \neq k
\end{cases},
$$ so that $$
N_k = X_k^{(1)} + \cdots + X_k^{(n)}.
$$

For any $m_1$, $m_2$, the random vectors
$(X_1^{(m_1)}, \ldots, X_b^{(m_1)})$ and
$(X_1^{(m_2)}, \ldots, X_b^{(m_2)})$ have the same distribution. It is
easy to see that $$
\mathbb{E}X_k^{(m)} = p_k, \quad
\mathbb{E}X_k^{(m)}X_\ell^{(m)} = \begin{cases} 
p_k & \text{if } k = \ell \\
0 & \text{if } k \neq \ell
\end{cases},
$$ leading to $$
V_{k\ell} := \text{Cov}\bigl(X_k^{(m)},X_\ell^{(m)}\bigr) = \begin{cases}
p_k(1-p_k) & \text{if } k = \ell \\
-p_k p_\ell & \text{if } k \neq \ell
\end{cases}.
$$

One can write $$
\begin{pmatrix}
N_1 \\ \vdots \\ N_b
\end{pmatrix} = 
\begin{pmatrix}
X_1^{(1)} \\ \vdots \\ X_b^{(1)}
\end{pmatrix} 
+ \cdots +
\begin{pmatrix}
X_1^{(n)} \\ \vdots \\ X_b^{(n)}
\end{pmatrix}.
$$ and this is a sum of independent and identically distributed random
vectors. Therefore we have, for large $n$, $$
\begin{pmatrix}
N_1 \\ \vdots \\ N_b
\end{pmatrix} 
\approx \mathcal{M}\mathcal{N}\left(\begin{pmatrix}
n p_1 \\ \vdots \\ n p_b
\end{pmatrix}, n V\right) 
$$ by the multivariate central limit theorem.

Recall that $$
W_k = \frac{N_k - np_k}{\sqrt{np_k}}.
$$ Then one has $$
\begin{pmatrix}
W_1 \\ \vdots \\ W_b
\end{pmatrix} 
\approx \mathcal{M}\mathcal{N}\left(\begin{pmatrix}
0 \\ \vdots \\ 0
\end{pmatrix}, C\right) 
$$ where $$
C_{k\ell} = \begin{cases}
1-p_k & \text{if } k = \ell \\
-\sqrt{p_k p_\ell} & \text{if } k \neq \ell
\end{cases}.
$$

Now we are going to derive the characteristic function of
$\mathcal{M}\mathcal{N}(\mathbf{0}, C)$.

The covariance matrix $C$ is not a strictly positive definite matrix
since $\sum_{k=1}^b \sqrt{np_k} W_k = 0$. But let us firstly give an
expression of $\mathbb{E}\mathbf{e}^{-\alpha \sum_{k=1}^b W_k^2}$ when
${(W_1, \ldots, W_b)}' \sim \mathcal{M}\mathcal{N}(\mathbf{0}, C)$ in
the case of a strictly positive definite covariance matrix $C$. Then we
will argue that this expression still holds for our $C$. In the strictly
positive case, by using the *pdf* of the multivariate normal
distribuion, we get
\begin{align}
\mathbb{E}\mathbf{e}^{-\alpha \sum_{k=1}^b W_k^2} & = 
\frac{1}{{(2\pi)}^\frac{b}{2}\sqrt{\det(C)}}
\int_{\mathbb{R}^b} \mathbf{e}^{-\frac12\mathbf{w}' 
(C^{-1} + 2 \alpha I)
\mathbf{w}}\mathrm{d}\mathbf{w} \\ 
& = \frac{{\bigl(\det(C^{-1} + 2\alpha I)\bigr)}^{-\frac12}}{\sqrt{\det(C)}} \\ 
& = {\bigl(\det(I+2\alpha C)\bigr)}^{-\frac12}.
\end{align}
Now, by a continuity argument, this equality holds when $C$ is
non-negative definite. Let us detail this point. Let
$\mathbf{W} = {(W_1, \ldots, W_b)}' \sim \mathcal{M}\mathcal{N}(\mathbf{0}, C)$
where $C$ is non-negative and not strictly positive. Let
$\mathbf{G} = {(G_1, \ldots, G_b)}'$ be a standard normal distribution
on $\mathbb{R}^b$ and let $\epsilon > 0$. Then $$
\mathbf{W} + \sqrt{\epsilon}\mathbf{G} \sim 
\mathcal{M}\mathcal{N}(\mathbf{0}, C + \epsilon I),
$$ and since $C + \epsilon I$ is strictly positive, we know by the
previous result that $$
\mathbb{E}\mathbf{e}^{-\alpha \sum_{k=1}^b {(W_k + \sqrt{\epsilon}G_k)}^2} 
= {\Bigl(\det\bigl(I+2\alpha (C+\sqrt{\epsilon}I)\bigr)\Bigr)}^{-\frac12},
$$ and we get the announced result by letting $\epsilon \to 0$.

Thus we have to derive $\det(I+2\alpha C)$ now. Observe that $$
I-C = \mathbf{u} \mathbf{u}'
$$ where $\mathbf{u} = {\bigl(\sqrt{p_1}, \ldots, \sqrt{p_b}\bigr)}'$.
Since $\mathbf{u}' \mathbf{u} = 1$, $$
I-C = \mathbf{u} {\bigl(\mathbf{u}'\mathbf{u})}^{-1} \mathbf{u}'
$$ is the matrix of the orthogonal projection on the line directed by
$\mathbf{u}$, therefore $C$ is the matrix of the orthogonal projection
on ${[\mathbf{u}]}^\perp$.

Thus $C$ has one eigenvalue $0$, with associated eigenvector
$\mathbf{u}$, and $b-1$ eigenvalues equal to $1$. Consequently, for
every real number $\alpha$, the matrix $I + 2\alpha C$ has one
eigenvalue $1$ and $b-1$ others $1+2\alpha$. Therefore $$
\det(I+2\alpha C) = {(1+2\alpha)}^{b-1}.
$$ We finally get $$
\mathbb{E}\mathbf{e}^{-\alpha \sum_{k=1}^b W_k^2} 
\approx {(1+2\alpha)}^{-\frac{b-1}{2}}, 
$$ and we recognize the characteristic function of the $\chi^2_{b-1}$
distribution.