{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Variational Autoencoders\n", "\n", "$\\newcommand{\\vec}{\\mathbf}$\n", "We have a $\\vec x^{(1)}, \\vec x^{(2)}, ...$ samples from a (unknown) distribution $p(\\vec x)$ and we want to generate more samples of this distribution.\n", "\n", "We can parameterize the distribution by a set of parameter $\\theta$, $p_\\theta(\\vec x)$. The optimal choice of $\\theta$ would maximize the probability:\n", "\n", "$$\n", "p_\\theta(\\vec x^{(1)}, \\vec x^{(2)}, ...) = \\prod_i p_\\theta( \\vec x^{(i)})\n", "$$\n", "\n", "or maximize $\\log( p_\\theta( \\vec x^{(i)})$ to maximize the likelihood of the sample.\n", "\n", "\n", "We introduce a latent vector $\\vec z$ with a prescribed distribution. Often $\\vec z$ is choosen to be Gaussian distributed.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "$\\newcommand{\\Eq}{E_{\\vec z \\sim q_\\theta(\\vec z | \\vec x)}}$\n", "$$\n", "\\log( p_\\theta( \\vec x) ) = \\Eq[ \\log( p_\\theta(\\vec x)) ]\n", "$$\n", "\n", "Using the Bayes rule: $ p_\\theta(\\vec x) = p_\\theta(\\vec x | \\vec z) p_\\theta(\\vec z) / p_\\theta(\\vec z | \\vec x)$\n", "\n", "$$\n", " \\log( p_\\theta(\\vec x)) = \n", "\\Eq\\left[\n", "\\log\\left(\n", "\\frac{ p_\\theta(\\vec x | \\vec z) p_\\theta(\\vec z) } { p_\\theta(\\vec z | \\vec x)}\n", "\\right)\n", "\\right]\n", "$$\n", "\n", "As the expectation is computed over $\\vec z \\sim q_\\theta(\\vec z | \\vec x)$, we make this experession appear:\n", "\n", "\n", "$$\n", "\\log( p_\\theta(\\vec x)) = \n", "\\Eq\\left[\n", "\\log\\left(\n", "\\frac{ p_\\theta(\\vec x | \\vec z) p_\\theta(\\vec z) } { p_\\theta(\\vec z | \\vec x)}\n", "\\frac{ q_\\theta(\\vec z | \\vec x) } { q_\\theta(\\vec z | \\vec x)}\n", "\\right)\n", "\\right]\n", "$$\n", "\n", "\n", "\n", "$$\n", "\\log( p_\\theta(\\vec x)) = \n", "\\Eq\\left[\\log\\left( p_\\theta(\\vec x | \\vec z) \\right) \\right]\n", "- \\Eq\\left[\\log\\left( \\frac{ q_\\theta(\\vec z | \\vec x) }{ p_\\theta(\\vec z) } \\right) \\right]\n", "+ \\Eq\\left[\\log\\left( \\frac{ q_\\theta(\\vec z | \\vec x) }{p_\\theta(\\vec x | \\vec z)} \\right) \\right]\n", "$$\n", "\n", "\n", "Kullback–Leibler divergence:\n", "\n", "$$\n", "D_{KL}( q_\\theta(\\vec z | \\vec x) || p_\\theta(\\vec z)) = \\Eq\\left[\\log\\left( \\frac{ q_\\theta(\\vec z | \\vec x) }{ p_\\theta(\\vec z) } \\right) \\right]\n", "$$\n", "$$\n", "D_{KL}(q_\\theta(\\vec z | \\vec x) || p_\\theta(\\vec x | \\vec z)) = \n", "\\Eq\\left[\\log\\left( \\frac{ q_\\theta(\\vec z | \\vec x) }{p_\\theta(\\vec x | \\vec z)} \\right) \\right]\n", "$$\n", "\n", "Kullback–Leibler divergence is always positive as it can be shown from the Jensen's inequality:\n", "\n", "\n", "$$\n", "\\log( p_\\theta(\\vec x)) \\ge \n", "\\Eq\\left[\\log\\left( p_\\theta(\\vec x | \\vec z) \\right) \\right]\n", "- D_{KL}( q_\\theta(\\vec z | \\vec x) || p_\\theta(\\vec z))\n", "$$\n", "\n", "We droped the (postive) term $D_{KL}(q_\\theta(\\vec z | \\vec x) || p_\\theta(\\vec x | \\vec z))$ as it in intractable. The right-hand side of this inequality is called the variational lower bound." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.6.2", "language": "julia", "name": "julia-1.6" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.6.2" } }, "nbformat": 4, "nbformat_minor": 4 }