# Variational Autoencoders

$\newcommand{\vec}{\mathbf}$
We have a $\vec x^{(1)}, \vec x^{(2)}, ...$ samples from a (unknown) distribution $p(\vec x)$ and we want to generate more samples of this distribution.

We can parameterize the distribution by a set of parameter $\theta$, $p_\theta(\vec x)$. The optimal choice of $\theta$ would maximize the probability:

$$
p_\theta(\vec x^{(1)}, \vec x^{(2)}, ...) = \prod_i p_\theta( \vec x^{(i)})
$$

or maximize $\log( p_\theta( \vec x^{(i)})$ to maximize the likelihood of the sample.


We introduce a latent vector $\vec z$ with a prescribed distribution. Often $\vec z$ is choosen to be Gaussian distributed.



$\newcommand{\Eq}{E_{\vec z \sim q_\theta(\vec z | \vec x)}}$
$$
\log( p_\theta( \vec x) ) = \Eq[ \log( p_\theta(\vec x)) ]
$$

Using the Bayes rule: $ p_\theta(\vec x) = p_\theta(\vec x | \vec z) p_\theta(\vec z) / p_\theta(\vec z | \vec x)$

$$
 \log( p_\theta(\vec x)) = 
\Eq\left[
\log\left(
\frac{ p_\theta(\vec x | \vec z) p_\theta(\vec z) } { p_\theta(\vec z | \vec x)}
\right)
\right]
$$

As the expectation is computed over $\vec z \sim q_\theta(\vec z | \vec x)$, we make this experession appear:


$$
\log( p_\theta(\vec x)) = 
\Eq\left[
\log\left(
\frac{ p_\theta(\vec x | \vec z) p_\theta(\vec z) } { p_\theta(\vec z | \vec x)}
\frac{ q_\theta(\vec z | \vec x) } { q_\theta(\vec z | \vec x)}
\right)
\right]
$$



$$
\log( p_\theta(\vec x)) = 
\Eq\left[\log\left( p_\theta(\vec x | \vec z) \right) \right]
- \Eq\left[\log\left( \frac{ q_\theta(\vec z | \vec x) }{ p_\theta(\vec z) } \right) \right]
+ \Eq\left[\log\left( \frac{ q_\theta(\vec z | \vec x) }{p_\theta(\vec x | \vec z)} \right) \right]
$$


Kullback–Leibler divergence:

$$
D_{KL}( q_\theta(\vec z | \vec x) || p_\theta(\vec z)) = \Eq\left[\log\left( \frac{ q_\theta(\vec z | \vec x) }{ p_\theta(\vec z) } \right) \right]
$$
$$
D_{KL}(q_\theta(\vec z | \vec x) || p_\theta(\vec x | \vec z)) = 
\Eq\left[\log\left( \frac{ q_\theta(\vec z | \vec x) }{p_\theta(\vec x | \vec z)} \right) \right]
$$

Kullback–Leibler divergence is always positive as it can be shown from the Jensen's inequality:


$$
\log( p_\theta(\vec x)) \ge 
\Eq\left[\log\left( p_\theta(\vec x | \vec z) \right) \right]
- D_{KL}( q_\theta(\vec z | \vec x) || p_\theta(\vec z))
$$

We droped the (postive) term $D_{KL}(q_\theta(\vec z | \vec x) || p_\theta(\vec x | \vec z))$ as it in intractable. The right-hand side of this inequality is called the variational lower bound.