{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Variational Autoencoders\n",
    "\n",
    "$\\newcommand{\\vec}{\\mathbf}$\n",
    "We have a $\\vec x^{(1)}, \\vec x^{(2)}, ...$ samples from a (unknown) distribution $p(\\vec x)$ and we want to generate more samples of this distribution.\n",
    "\n",
    "We can parameterize the distribution by a set of parameter $\\theta$, $p_\\theta(\\vec x)$. The optimal choice of $\\theta$ would maximize the probability:\n",
    "\n",
    "$$\n",
    "p_\\theta(\\vec x^{(1)}, \\vec x^{(2)}, ...)  =  \\prod_i  p_\\theta( \\vec x^{(i)})\n",
    "$$\n",
    "\n",
    "or maximize $\\log( p_\\theta( \\vec x^{(i)})$ to maximize the likelihood of the sample.\n",
    "\n",
    "\n",
    "We introduce a latent vector $\\vec z$ with a prescribed distribution. Often $\\vec z$ is choosen to be Gaussian distributed.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "$\\newcommand{\\Eq}{E_{\\vec z \\sim q_\\theta(\\vec z | \\vec x)}}$\n",
    "$$\n",
    "\\log( p_\\theta( \\vec x) ) = \\Eq[ \\log( p_\\theta(\\vec x))  ]\n",
    "$$\n",
    "\n",
    "Using the Bayes rule: $ p_\\theta(\\vec x) = p_\\theta(\\vec x | \\vec z) p_\\theta(\\vec z) / p_\\theta(\\vec z | \\vec x)$\n",
    "\n",
    "$$\n",
    " \\log( p_\\theta(\\vec x))  = \n",
    "\\Eq\\left[\n",
    "\\log\\left(\n",
    "\\frac{ p_\\theta(\\vec x | \\vec z) p_\\theta(\\vec z) } { p_\\theta(\\vec z | \\vec x)}\n",
    "\\right)\n",
    "\\right]\n",
    "$$\n",
    "\n",
    "As the expectation is computed over $\\vec z \\sim q_\\theta(\\vec z | \\vec x)$, we make this experession appear:\n",
    "\n",
    "\n",
    "$$\n",
    "\\log( p_\\theta(\\vec x))  = \n",
    "\\Eq\\left[\n",
    "\\log\\left(\n",
    "\\frac{ p_\\theta(\\vec x | \\vec z) p_\\theta(\\vec z) } { p_\\theta(\\vec z | \\vec x)}\n",
    "\\frac{ q_\\theta(\\vec z | \\vec x) } { q_\\theta(\\vec z | \\vec x)}\n",
    "\\right)\n",
    "\\right]\n",
    "$$\n",
    "\n",
    "\n",
    "\n",
    "$$\n",
    "\\log( p_\\theta(\\vec x))  = \n",
    "\\Eq\\left[\\log\\left( p_\\theta(\\vec x | \\vec z) \\right) \\right]\n",
    "- \\Eq\\left[\\log\\left( \\frac{ q_\\theta(\\vec z | \\vec x) }{ p_\\theta(\\vec z) } \\right) \\right]\n",
    "+ \\Eq\\left[\\log\\left( \\frac{ q_\\theta(\\vec z | \\vec x) }{p_\\theta(\\vec x | \\vec z)} \\right) \\right]\n",
    "$$\n",
    "\n",
    "\n",
    "Kullback–Leibler divergence:\n",
    "\n",
    "$$\n",
    "D_{KL}( q_\\theta(\\vec z | \\vec x) || p_\\theta(\\vec z)) =  \\Eq\\left[\\log\\left( \\frac{ q_\\theta(\\vec z | \\vec x) }{ p_\\theta(\\vec z) } \\right) \\right]\n",
    "$$\n",
    "$$\n",
    "D_{KL}(q_\\theta(\\vec z | \\vec x) || p_\\theta(\\vec x | \\vec z)) = \n",
    "\\Eq\\left[\\log\\left( \\frac{ q_\\theta(\\vec z | \\vec x) }{p_\\theta(\\vec x | \\vec z)} \\right) \\right]\n",
    "$$\n",
    "\n",
    "Kullback–Leibler divergence is always positive as it can be shown from the Jensen's inequality:\n",
    "\n",
    "\n",
    "$$\n",
    "\\log( p_\\theta(\\vec x))  \\ge \n",
    "\\Eq\\left[\\log\\left( p_\\theta(\\vec x | \\vec z) \\right) \\right]\n",
    "- D_{KL}( q_\\theta(\\vec z | \\vec x) || p_\\theta(\\vec z))\n",
    "$$\n",
    "\n",
    "We droped the (postive) term $D_{KL}(q_\\theta(\\vec z | \\vec x) || p_\\theta(\\vec x | \\vec z))$ as it in intractable. The right-hand side of this inequality is called the variational lower bound."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.6.2",
   "language": "julia",
   "name": "julia-1.6"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}