{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sampling from a multivariate normal\n", "\n", "To be able to tune the Gibbs sampler for a multivariate normal likelihood and multivariate normal prior, it helps to go back and look at the sampler. Eventually sampling from a multivariate normal works its way down to a function in the `mvnormal.jl` source file for the [Distributions](https://github.com/JuliaStats/Distributions) package.\n", "\n", "That function is a one-liner\n", "```julia\n", "_rand!(d::MvNormal, x::VecOrMat{Float64}) = add!(unwhiten!(d.Σ, randn!(x)), d.μ)\n", "```\n", "\n", "The underscore at the beginning of the name indicates that you shouldn't expect to call this function directly. The trailing `!` on the function names indicates that these functions are _mutating functions_. That is, they modify one or more of their arguments.\n", "\n", "This is a common idiom in [Julia](http://julialang.org). A function that generates a vector or array is often defined to call a companion function that mutates one of its arguments. The `randn!` function operates on a `VecOrMat{Float64}` overwriting the argument with i.i.d. standard normal variates. If you check how `randn` is defined to return a vector it simply allocates the result and passes it to `randn!`. The point it that, if you want to, you can overwrite an array in each step of a iteration and avoid needless allocation and garbage collection of vectors at every call to a function like `randn`.\n", "\n", "The `MvNormal` type is an abstract type encompassing many specialized cases." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "using Distributions, PDMats" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "search: " ] }, { "data": { "text/latex": [ "No documentation found.\n", "\\textbf{Summary:}\n", "\\begin{verbatim}\n", "immutable Distributions.MvNormal{Cov<:PDMats.AbstractPDMat{T<:AbstractFloat},Mean<:Union{Array{Float64,1},Distributions.ZeroVector{Float64}}} <: Distributions.AbstractMvNormal\n", "\\end{verbatim}\n", "\\textbf{Fields:}\n", "\\begin{verbatim}\n", "μ :: Mean<:Union{Array{Float64,1},Distributions.ZeroVector{Float64}}\n", "Σ :: Cov<:PDMats.AbstractPDMat{T<:AbstractFloat}\n", "\\end{verbatim}\n" ], "text/markdown": [ "No documentation found.\n", "\n", "**Summary:**\n", "\n", "```julia\n", "immutable Distributions.MvNormal{Cov<:PDMats.AbstractPDMat{T<:AbstractFloat},Mean<:Union{Array{Float64,1},Distributions.ZeroVector{Float64}}} <: Distributions.AbstractMvNormal\n", "```\n", "\n", "**Fields:**\n", "\n", "```julia\n", "μ :: Mean<:Union{Array{Float64,1},Distributions.ZeroVector{Float64}}\n", "Σ :: Cov<:PDMats.AbstractPDMat{T<:AbstractFloat}\n", "```\n" ], "text/plain": [ "No documentation found.\n", "\n", "**Summary:**\n", "\n", "```julia\n", "immutable Distributions.MvNormal{Cov<:PDMats.AbstractPDMat{T<:AbstractFloat},Mean<:Union{Array{Float64,1},Distributions.ZeroVector{Float64}}} <: Distributions.AbstractMvNormal\n", "```\n", "\n", "**Fields:**\n", "\n", "```julia\n", "μ :: Mean<:Union{Array{Float64,1},Distributions.ZeroVector{Float64}}\n", "Σ :: Cov<:PDMats.AbstractPDMat{T<:AbstractFloat}\n", "```\n" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "MvNormal MvNormalCanon MvNormalKnownCov gmvnormal MvLogNormal\n", "\n" ] } ], "source": [ "?MvNormal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is common to define a diffuse multivariate normal prior for unbounded vector parameters like coefficients. These end up being special cases of the `MvNormal` type according to the template arguments." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "ZeroMeanIsoNormal(\n", "dim: 2\n", "μ: [0.0,0.0]\n", "Σ: 2x2 Array{Float64,2}:\n", " 1.0e6 0.0 \n", " 0.0 1.0e6\n", ")\n" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = MvNormal(2, 1000.)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Distributions.ZeroVector{Float64}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "typeof(d.μ)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "PDMats.ScalMat{Float64}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "typeof(d.Σ)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "search: " ] }, { "data": { "text/latex": [ "No documentation found.\n", "\\textbf{Summary:}\n", "\\begin{verbatim}\n", "immutable PDMats.ScalMat{T<:AbstractFloat} <: PDMats.AbstractPDMat{T<:AbstractFloat}\n", "\\end{verbatim}\n", "\\textbf{Fields:}\n", "\\begin{verbatim}\n", "dim :: Int64\n", "value :: T<:AbstractFloat\n", "inv_value :: T<:AbstractFloat\n", "\\end{verbatim}\n" ], "text/markdown": [ "No documentation found.\n", "\n", "**Summary:**\n", "\n", "```julia\n", "immutable PDMats.ScalMat{T<:AbstractFloat} <: PDMats.AbstractPDMat{T<:AbstractFloat}\n", "```\n", "\n", "**Fields:**\n", "\n", "```julia\n", "dim :: Int64\n", "value :: T<:AbstractFloat\n", "inv_value :: T<:AbstractFloat\n", "```\n" ], "text/plain": [ "No documentation found.\n", "\n", "**Summary:**\n", "\n", "```julia\n", "immutable PDMats.ScalMat{T<:AbstractFloat} <: PDMats.AbstractPDMat{T<:AbstractFloat}\n", "```\n", "\n", "**Fields:**\n", "\n", "```julia\n", "dim :: Int64\n", "value :: T<:AbstractFloat\n", "inv_value :: T<:AbstractFloat\n", "```\n" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "ScalMat WalleniusNoncentralHypergeometric FisherNoncentralHypergeometric\n", "\n" ] } ], "source": [ "?ScalMat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This type of MvNormal could be of dimension 1,000,000 and would still only require storing these three numbers.\n", "\n", "The next question to ask is what does `unwhiten!` do. Essentially it is the multivariate equivalent of multiplying by $\\sigma$ in the univariate case. If you want to take a univariate standard normal, `z`, and convert it to have a $\\mathcal{N}(\\mu, \\sigma^2)$ distribution you would form $\\mu + \\sigma z$. The \"unwhitening\" here is essentially multiplying by a square root of `\\Sigma`, which is the Cholesky factor.\n", "\n", "If you have a sample from a $\\mathcal{N}(\\mu,\\sigma)$ distribution you convert it to \"white noise\" or \"whiten\" it with the Z transformation. The transformation in the other direction is called \"unwhitening\".\n", "\n", "## Creating a Gibbs sampler for coefficients\n", "\n", "The Gibbs sampler for $\\beta$ in the [Mamba documentation](http://mambajl.readthedocs.org/en/latest/) uses standard formulas to get the mean and variance of a combination of multivariate normals.\n", "\n", "```julia\n", "Gibbs_beta = Sampler([:beta],\n", " (beta, s2, xmat, y) ->\n", " begin\n", " beta_mean = mean(beta.distr)\n", " beta_invcov = invcov(beta.distr)\n", " Sigma = inv(xmat' * xmat / s2 + beta_invcov)\n", " mu = Sigma * (xmat' * y / s2 + beta_invcov * beta_mean)\n", " rand(MvNormal(mu, Sigma))\n", " end\n", ")\n", "```\n", "\n", "The function indicated by the `->` mapping operator (what is sometimes called the \"stabby lambda\" syntax) produces the next value of $\\beta$. This particular syntax is unusual in that it combines the \"one-liner\" form with a \"begin/end\" block. It would be more common to write this as\n", "\n", "```julia\n", "Gibbs_beta = Sampler(:beta,\n", " function (beta, s2, xmat, y)\n", " beta_mean = mean(beta.distr)\n", " beta_invcov = invcov(beta.distr)\n", " Sigma = inv(xmat' * xmat / s2 + beta_invcov)\n", " mu = Sigma * (xmat' * y / s2 + beta_invcov * beta_mean)\n", " rand(MvNormal(mu, Sigma))\n", " end\n", ")\n", "```\n", "\n", "Notice that `xmat' * xmat` and `xmat' * y` are recomputed at each iteration. A slightly less obvious inefficiency comes from losing the information on the structure of the matrices involved. The matrix `xmat' * xmat` is positive semi-definite and symmetric but this information is lost." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "5x2 Array{Float64,2}:\n", " 1.0 1.0\n", " 1.0 2.0\n", " 1.0 3.0\n", " 1.0 4.0\n", " 1.0 5.0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xmat = hcat(ones(5), collect(1:5))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2x2 Array{Float64,2}:\n", " 5.0 15.0\n", " 15.0 55.0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xmat' * xmat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also lose the special structure of the covariance of the prior." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2x2 Array{Float64,2}:\n", " 1.0e-6 0.0 \n", " 0.0 1.0e-6" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invcov(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "None of these issues are at all important for this example. In large examples, however, one might want to be more careful and store `X'X` and `X'y`. It would look like" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "updateβ (generic function with 1 method)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function updateβ(β, σ², XtX, Xty)\n", " d = β.distr\n", " if isa(d, ZeroMeanIsoNormal)\n", " cfac = cholfact!(XtX + σ² * d.Σ.inv_value * I, :L)\n", " return cfac\\Xty + √σ² * cfac[:L]' \\ randn(length(β))\n", " else\n", " throw(ArgumentError(\n", " \"prior for coefficients should be ZeroMeanIsoNormal\"))\n", " end\n", "end" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.4.3", "language": "julia", "name": "julia-0.4" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }