[ { "objectID": "developers/transforms/bijectors/index.html", "href": "developers/transforms/bijectors/index.html", "title": "Bijectors in MCMC", "section": "", "text": "All the above has purely been a mathematical discussion of how distributions can be transformed. Now, we turn to their implementation in Julia, specifically using the Bijectors.jl package.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Bijectors in MCMC" ] }, { "objectID": "developers/transforms/bijectors/index.html#bijectors.jl", "href": "developers/transforms/bijectors/index.html#bijectors.jl", "title": "Bijectors in MCMC", "section": "Bijectors.jl", "text": "Bijectors.jl\n\nimport Random\nRandom.seed!(468);\n\nusing Distributions: Normal, LogNormal, logpdf\nusing Statistics: mean, var\nusing Plots: histogram\n\nA bijection between two sets (Wikipedia) is, essentially, a one-to-one mapping between the elements of these sets. That is to say, if we have two sets \\(X\\) and \\(Y\\), then a bijection maps each element of \\(X\\) to a unique element of \\(Y\\). To return to our univariate example, where we transformed \\(x\\) to \\(y\\) using \\(y = \\exp(x)\\), the exponentiation function is a bijection because every value of \\(x\\) maps to one unique value of \\(y\\). The input set (the domain) is \\((-\\infty, \\infty)\\), and the output set (the codomain) is \\((0, \\infty)\\). (Here, \\((a, b)\\) denotes the open interval from \\(a\\) to \\(b\\) but excluding \\(a\\) and \\(b\\) themselves.)\nSince bijections are a one-to-one mapping between elements, we can also reverse the direction of this mapping to create an inverse function. In the case of \\(y = \\exp(x)\\), the inverse function is \\(x = \\log(y)\\).\n\n\n\n\n\n\nNote\n\n\n\nTechnically, the bijections in Bijectors.jl are functions \\(f: X \\to Y\\) for which:\n\n\\(f\\) is continuously differentiable, i.e. the derivative \\(\\mathrm{d}f(x)/\\mathrm{d}x\\) exists and is continuous (over the domain of interest \\(X\\));\nIf \\(f^{-1}: Y \\to X\\) is the inverse of \\(f\\), then that is also continuously differentiable (over its own domain, i.e. \\(Y\\)).\n\nThe technical mathematical term for this is a diffeomorphism (Wikipedia), but we call them ‘bijectors’.\nWhen thinking about continuous differentiability, it’s important to be conscious of the domains or codomains that we care about. For example, taking the inverse function \\(\\log(y)\\) from above, its derivative is \\(1/y\\), which is not continuous at \\(y = 0\\). However, we specified that the bijection \\(y = \\exp(x)\\) maps values of \\(x \\in (-\\infty, \\infty)\\) to \\(y \\in (0, \\infty)\\), so the point \\(y = 0\\) is not within the domain of the inverse function.\n\n\nSpecifically, one of the primary purposes of Bijectors.jl is to construct bijections which map constrained distributions to unconstrained ones. For example, the log-normal distribution which we saw in the previous page is constrained: its support, i.e. the range over which \\(p(x) > 0\\), is \\((0, \\infty)\\). However, we can transform that to an unconstrained distribution (the normal distribution) using the transformation \\(y = \\log(x)\\).\n\n\n\n\n\n\nNote\n\n\n\nBijectors.jl, as well as DynamicPPL (which we’ll come to later), can work with a much broader class of bijective transformations of variables, not just ones that go to the entire real line. But for the purposes of MCMC, unconstraining is the most common transformation, so we’ll stick with that terminology.\n\n\nThe bijector function, when applied to a distribution, returns a bijection \\(f\\) that can be used to map the constrained distribution to an unconstrained one. Unsurprisingly, for the log-normal distribution, the bijection is (a broadcasted version of) the \\(\\log\\) function.\n\nimport Bijectors as B\n\nf = B.bijector(LogNormal())\n\n(::Base.Fix1{typeof(broadcast), typeof(log)}) (generic function with 1 method)\n\n\nWe can apply this transformation to samples from the original distribution, for example:\n\nsamples_lognormal = rand(LogNormal(), 5)\n\nsamples_normal = f(samples_lognormal)\n\n5-element Vector{Float64}:\n 0.07200886749732066\n -0.07404375655951738\n 0.6327762377562545\n -0.9799776018729268\n 1.6115229499167665\n\n\nWe can also obtain the inverse of a bijection, \\(f^{-1}\\):\n\nf_inv = B.inverse(f)\n\nf_inv(samples_normal) == samples_lognormal\n\ntrue\n\n\nWe know that the transformation \\(y = \\log(x)\\) changes the log-normal distribution to the normal distribution. Bijectors.jl also gives us a way to access that transformed distribution:\n\ntransformed_dist = B.transformed(LogNormal(), f)\n\nBijectors.UnivariateTransformed{Distributions.LogNormal{Float64}, Base.Fix1{typeof(broadcast), typeof(log)}}(\ndist: Distributions.LogNormal{Float64}(μ=0.0, σ=1.0)\ntransform: Base.Fix1{typeof(broadcast), typeof(log)}(broadcast, log)\n)\n\n\nThis type doesn’t immediately look like a Normal(), but it behaves in exactly the same way. For example, we can sample from it and plot a histogram:\n\nsamples_plot = rand(transformed_dist, 5000)\nhistogram(samples_plot, bins=50)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can also obtain the logpdf of the transformed distribution and check that it is the same as that of a normal distribution:\n\nprintln(\"Sample: $(samples_plot[1])\")\nprintln(\"Expected: $(logpdf(Normal(), samples_plot[1]))\")\nprintln(\"Actual: $(logpdf(transformed_dist, samples_plot[1]))\")\n\nSample: -0.2031149013821452\nExpected: -0.9395663647864121\nActual: -0.9395663647864121\n\n\nGiven the discussion in the previous sections, you might not be surprised to find that the logpdf of the transformed distribution is implemented using the Jacobian of the transformation. In particular, it directly uses the formula\n\\[\\log(q(\\mathbf{y})) = \\log(p(\\mathbf{x})) - \\log(|\\det(\\mathbf{J})|).\\]\nYou can access \\(\\log(|\\det(\\mathbf{J})|)\\) (evaluated at the point \\(\\mathbf{x}\\)) using the logabsdetjac function:\n\n# Reiterating the setup, just to be clear\noriginal_dist = LogNormal()\nx = rand(original_dist)\nf = B.bijector(original_dist)\ny = f(x)\ntransformed_dist = B.transformed(LogNormal(), f)\n\nprintln(\"log(q(y)) : $(logpdf(transformed_dist, y))\")\nprintln(\"log(p(x)) : $(logpdf(original_dist, x))\")\nprintln(\"log(|det(J)|) : $(B.logabsdetjac(f, x))\")\n\nlog(q(y)) : -0.9258400203646245\nlog(p(x)) : -0.8083539602557612\nlog(|det(J)|) : 0.11748606010886327\n\n\nfrom which you can see that the equation above holds. There are more functions available in the Bijectors.jl API; for full details do check out the documentation. For example, logpdf_with_trans can directly give us \\(\\log(q(\\mathbf{y}))\\) without going through the effort of constructing the bijector:\n\nB.logpdf_with_trans(original_dist, x, true)\n\n-0.9258400203646245", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Bijectors in MCMC" ] }, { "objectID": "developers/transforms/bijectors/index.html#the-case-for-bijectors-in-mcmc", "href": "developers/transforms/bijectors/index.html#the-case-for-bijectors-in-mcmc", "title": "Bijectors in MCMC", "section": "The case for bijectors in MCMC", "text": "The case for bijectors in MCMC\nConstraints pose a challenge for many numerical methods such as optimisation, and sampling is no exception to this. The problem is that for any value \\(x\\) outside of the support of a constrained distribution, \\(p(x)\\) will be zero, and the logpdf will be \\(-\\infty\\). Thus, any term that involves some ratio of probabilities (or equivalently, the logpdf) will be infinite.\n\nMetropolis with rejection\nTo see the practical impact of this on sampling, let’s attempt to sample from a log-normal distribution using a random walk Metropolis algorithm.\nOne way of handling constraints is to simply reject any steps that would take us out of bounds. This is a barebones implementation which does precisely that:\n\n# Take a step where the proposal is a normal distribution centred around\n# the current value. Return the new value, plus a flag to indicate whether\n# the new value was in bounds.\nfunction mh_step(logp, x, in_bounds)\n x_proposed = rand(Normal(x, 1))\n in_bounds(x_proposed) || return (x, false) # bounds check\n acceptance_logp = logp(x_proposed) - logp(x)\n return if log(rand()) < acceptance_logp\n (x_proposed, true) # successful step\n else\n (x, true) # failed step\n end\nend\n\n# Run a random walk Metropolis sampler.\n# `logp` : a function that takes `x` and returns the log pdf of the\n# distribution we're trying to sample from (up to a constant\n# additive factor)\n# `n_samples` : the number of samples to draw\n# `in_bounds` : a function that takes `x` and returns whether `x` is within\n# the support of the distribution\n# `x0` : the initial value\n# Returns a vector of samples, plus the number of times we went out of bounds.\nfunction mh(logp, n_samples, in_bounds; x0=1.0)\n samples = [x0]\n x = x0\n n_out_of_bounds = 0\n for _ in 2:n_samples\n x, inb = mh_step(logp, x, in_bounds)\n if !inb\n n_out_of_bounds += 1\n end\n push!(samples, x)\n end\n return (samples, n_out_of_bounds)\nend\n\nmh (generic function with 1 method)\n\n\n\n\n\n\n\n\nNote\n\n\n\nIn the MH algorithm, we technically do not need to explicitly check the proposal, because for any \\(x \\leq 0\\), we have that \\(p(x) = 0\\); thus, the acceptance probability will be zero. However, doing so here allows us to track how often this happens, and also illustrates the general principle of handling constraints by rejection.\n\n\nNow to actually perform the sampling:\n\nlogp(x) = logpdf(LogNormal(), x)\nsamples, n_out_of_bounds = mh(logp, 10000, x -> x > 0)\nhistogram(samples, bins=0:0.1:5; xlims=(0, 5))\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHow do we know that this has sampled correctly? For one, we can check that the mean of the samples are what we expect them to be. From Wikipedia, the mean of a log-normal distribution is given by \\(\\exp[\\mu + (\\sigma^2/2)]\\). For our log-normal distribution, we set \\(\\mu = 0\\) and \\(\\sigma = 1\\), so:\n\nprintln(\"expected mean: $(exp(0 + (1^2/2)))\")\nprintln(\" actual mean: $(mean(samples))\")\n\nexpected mean: 1.6487212707001282\n actual mean: 1.3347941996487\n\n\n\n\nMetropolis with transformation\nThe issue with this is that many of the sampling steps are unproductive, in that they bring us to the region of \\(x \\leq 0\\) and get rejected:\n\nprintln(\"went out of bounds $n_out_of_bounds/10000 times\")\n\nwent out of bounds 1870/10000 times\n\n\nAnd this could have been even worse if we had chosen a wider proposal distribution in the Metropolis step, or if the support of the distribution was narrower! In general, we probably don’t want to have to re-parameterise our proposal distribution each time we sample from a distribution with different constraints.\nThis is where the transformation functions from Bijectors.jl come in: we can use them to map the distribution to an unconstrained one and sample from that instead. Since the sampler only ever sees an unconstrained distribution, it doesn’t have to worry about checking for bounds.\nTo make this happen, instead of passing \\(\\log(p(x))\\) to the sampler, we pass \\(\\log(q(y))\\). This can be obtained using the Bijectors.logpdf_with_trans function that was introduced above.\n\nd = LogNormal()\nf = B.bijector(d) # Transformation function\nf_inv = B.inverse(f) # Inverse transformation function\nfunction logq(y)\n x = f_inv(y)\n return B.logpdf_with_trans(d, x, true)\nend\nsamples_transformed, n_oob_transformed = mh(logq, 10000, x -> true);\n\nNow, this process gives us samples that have been transformed, so we need to un-transform them to get the samples from the original distribution:\n\nsamples_untransformed = f_inv(samples_transformed)\nhistogram(samples_untransformed, bins=0:0.1:5; xlims=(0, 5))\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can check the mean of the samples too, to see that it is what we expect:\n\nprintln(\"expected mean: $(exp(0 + (1^2/2)))\")\nprintln(\" actual mean: $(mean(samples_untransformed))\")\n\nexpected mean: 1.6487212707001282\n actual mean: 1.7184757306010636\n\n\nOn top of that, we can also verify that we don’t ever go out of bounds:\n\nprintln(\"went out of bounds $n_oob_transformed/10000 times\")\n\nwent out of bounds 0/10000 times\n\n\n\n\nWhich one is better?\nIn the subsections above, we’ve seen two different methods of sampling from a constrained distribution:\n\nSample directly from the distribution and reject any samples outside of its support.\nTransform the distribution to an unconstrained one and sample from that instead.\n\n(Note that both of these methods are applicable to other samplers as well, such as Hamiltonian Monte Carlo.)\nOf course, a natural question to then ask is which one of these is better!\nOne option might be look at the sample means above to see which one is ‘closer’ to the expected mean. However, that’s not a very robust method because the sample mean is itself random, and if we were to use a different random seed we might well reach a different conclusion.\nAnother possibility we could look at the number of times the sample was rejected. Does a lower rejection rate (as in the transformed case) imply that the method is better? As it happens, this might seem like an intuitive conclusion, but it’s not necessarily the case: for example, the sampling in unconstrained space could be much less efficient, such that even though we’re not rejecting samples, the ones that we do get are overly correlated and thus not representative of the distribution.\nA robust comparison would involve performing both methods many times and seeing how reliable the sample mean is.\n\nfunction get_sample_mean(; transform)\n if transform\n # Sample from transformed distribution\n samples = f_inv(first(mh(logq, 10000, x -> true)))\n else\n # Sample from original distribution and reject if out of bounds\n samples = first(mh(logp, 10000, x -> x > 0))\n end\n return mean(samples)\nend\n\nget_sample_mean (generic function with 1 method)\n\n\n\nmeans_with_rejection = [get_sample_mean(; transform=false) for _ in 1:1000]\nmean(means_with_rejection), var(means_with_rejection)\n\n(1.652032684314151, 0.30454613712270745)\n\n\n\nmeans_with_transformation = [get_sample_mean(; transform=true) for _ in 1:1000]\nmean(means_with_transformation), var(means_with_transformation)\n\n(1.6489347143276902, 0.003945513418875533)\n\n\nWe can see from this small study that although both methods give us the correct mean (on average), the method with the transformation is more reliable, in that the variance is much lower!\n\n\n\n\n\n\nNote\n\n\n\nAlternatively, we could also try to directly measure how correlated the samples are. One way to do this is to calculate the effective sample size (ESS), which is described in the Stan documentation, and implemented in MCMCChains.jl. A larger ESS implies that the samples are less correlated, and thus more representative of the underlying distribution:\n\nusing MCMCChains: Chains, ess\n\nrejection = first(mh(logp, 10000, x -> x > 0))\ntransformation = f_inv(first(mh(logq, 10000, x -> true)))\nchn = Chains(hcat(rejection, transformation), [:rejection, :transformation])\ness(chn)\n\nESS\n parameters ess ess_per_sec\n Symbol Float64 Missing\n\n rejection 503.4349 missing\n transformation 1106.6909 missing\n\n\n\n\n\n\nWhat happens without the Jacobian?\nIn the transformation method above, we used Bijectors.logpdf_with_trans to calculate the log probability density of the transformed distribution. This function makes sure to include the Jacobian term when performing the transformation, and this is what makes sure that when we un-transform the samples, we get the correct distribution.\nThe next code block shows what happens if we don’t include the Jacobian term. In this logq_wrong, we’ve un-transformed y to x and calculated the logpdf with respect to its original distribution. This is exactly the same mistake that we made at the start of this article with naive_logpdf.\n\nfunction logq_wrong(y)\n x = f_inv(y)\n return logpdf(d, x) # no Jacobian term!\nend\nsamples_questionable, _ = mh(logq_wrong, 100000, x -> true)\nsamples_questionable_untransformed = f_inv(samples_questionable)\n\nprintln(\"mean: $(mean(samples_questionable_untransformed))\")\n\nmean: 0.5919166187308191\n\n\nYou can see that even though we used ten times more samples, the mean is quite wrong, which implies that our samples are not being drawn from the correct distribution.\nIn the next page, we’ll see how to use these transformations in the context of a probabilistic programming language, paying particular attention to their handling in DynamicPPL.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Bijectors in MCMC" ] }, { "objectID": "developers/transforms/dynamicppl/index.html", "href": "developers/transforms/dynamicppl/index.html", "title": "Variable transformations in DynamicPPL", "section": "", "text": "In the final part of this chapter, we’ll discuss the higher-level implications of constrained distributions in the Turing.jl framework.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/transforms/dynamicppl/index.html#linked-and-unlinked-varinfos-in-dynamicppl", "href": "developers/transforms/dynamicppl/index.html#linked-and-unlinked-varinfos-in-dynamicppl", "title": "Variable transformations in DynamicPPL", "section": "Linked and unlinked VarInfos in DynamicPPL", "text": "Linked and unlinked VarInfos in DynamicPPL\n\nimport Random\nRandom.seed!(468);\n\n# Turing re-exports the entirety of Distributions\nusing Turing\n\nWhen we are performing Bayesian inference, we’re trying to sample from a joint probability distribution, which isn’t usually a single, well-defined distribution like in the rather simplified example above. However, each random variable in the model will have its own distribution, and often some of these will be constrained. For example, if b ~ LogNormal() is a random variable in a model, then \\(p(b)\\) will be zero for any \\(b \\leq 0\\). Consequently, any joint probability \\(p(b, c, \\ldots)\\) will also be zero for any combination of parameters where \\(b \\leq 0\\), and so that joint distribution is itself constrained.\nTo get around this, DynamicPPL allows the variables to be transformed in exactly the same way as above. For simplicity, consider the following model:\n\nusing DynamicPPL\n\n@model function demo()\n x ~ LogNormal()\nend\n\nmodel = demo()\nvi = VarInfo(model)\nvn_x = @varname(x)\n# Retrieve the 'internal' representation of x – we'll explain this later\nDynamicPPL.getindex_internal(vi, vn_x)\n\n1-element Vector{Float64}:\n 1.0746648736094493\n\n\nThe call to VarInfo executes the model once and stores the sampled value inside vi. By default, VarInfo itself stores un-transformed values. We can see this by comparing the value of the logpdf stored inside the VarInfo:\n\nDynamicPPL.getlogp(vi)\n\n-0.9935400392011169\n\n\nwith a manual calculation:\n\nlogpdf(LogNormal(), DynamicPPL.getindex_internal(vi, vn_x))\n\n1-element Vector{Float64}:\n -0.9935400392011169\n\n\nIn DynamicPPL, the link function can be used to transform the variables. This function does three things: firstly, it transforms the variables; secondly, it updates the value of logp (by adding the Jacobian term); and thirdly, it sets a flag on the variables to indicate that it has been transformed. Note that this acts on all variables in the model, including unconstrained ones. (Unconstrained variables just have an identity transformation.)\n\nvi_linked = DynamicPPL.link(vi, model)\nprintln(\"Transformed value: $(DynamicPPL.getindex_internal(vi_linked, vn_x))\")\nprintln(\"Transformed logp: $(DynamicPPL.getlogp(vi_linked))\")\nprintln(\"Transformed flag: $(DynamicPPL.istrans(vi_linked, vn_x))\")\n\nTransformed value: [0.07200886749732066]\nTransformed logp: -0.9215311717037962\nTransformed flag: true\n\n\nIndeed, we can see that the new logp value matches with\n\nlogpdf(Normal(), DynamicPPL.getindex_internal(vi_linked, vn_x))\n\n1-element Vector{Float64}:\n -0.9215311717037962\n\n\nThe reverse transformation, invlink, reverts all of the above steps:\n\nvi = DynamicPPL.invlink(vi_linked, model) # Same as the previous vi\nprintln(\"Un-transformed value: $(DynamicPPL.getindex_internal(vi, vn_x))\")\nprintln(\"Un-transformed logp: $(DynamicPPL.getlogp(vi))\")\nprintln(\"Un-transformed flag: $(DynamicPPL.istrans(vi, vn_x))\")\n\nUn-transformed value: [1.0746648736094493]\nUn-transformed logp: -0.9935400392011169\nUn-transformed flag: false\n\n\n\nModel and internal representations\nIn DynamicPPL, there is a difference between the value of a random variable and its ‘internal’ value. This is most easily seen by first transforming, and then comparing the output of getindex and getindex_internal. The former extracts the regular value, which we call the model representation (because it is consistent with the distribution specified in the model). The latter, as the name suggests, gets the internal representation of the variable, which is how it is actually stored in the VarInfo object.\n\nprintln(\" Model representation: $(getindex(vi_linked, vn_x))\")\nprintln(\"Internal representation: $(DynamicPPL.getindex_internal(vi_linked, vn_x))\")\n\n Model representation: 1.0746648736094493\nInternal representation: [0.07200886749732066]\n\n\n\n\n\n\n\n\nNote\n\n\n\nNote that vi_linked[vn_x] can also be used as shorthand for getindex(vi_linked, vn_x); this usage is common in the DynamicPPL/Turing codebase.\n\n\nWe can see (for this linked varinfo) that there are two differences between these outputs:\n\nThe internal representation has been transformed using the bijector (in this case, the log function). This means that the istrans() flag which we used above doesn’t modify the model representation: it only tells us whether the internal representation has been transformed or not.\nThe internal representation is a vector, whereas the model representation is a scalar. This is because in DynamicPPL, all internal values are vectorised (i.e. converted into some vector), regardless of distribution. On the other hand, since the model specifies a univariate distribution, the model representation is a scalar.\n\nOne might also ask, what is the internal representation for an unlinked varinfo?\n\nprintln(\" Model representation: $(getindex(vi, vn_x))\")\nprintln(\"Internal representation: $(DynamicPPL.getindex_internal(vi, vn_x))\")\n\n Model representation: 1.0746648736094493\nInternal representation: [1.0746648736094493]\n\n\nFor an unlinked VarInfo, the internal representation is vectorised, but not transformed. We call this an unlinked internal representation; conversely, when the VarInfo has been linked, each variable will have a corresponding linked internal representation.\nThis sequence of events is summed up in the following diagram, where f(..., args) indicates that the ... is to be replaced with the object at the beginning of the arrow:\n\n\n\nFunctions related to variable transforms in DynamicPPL\n\n\nIn the final part of this article, we’ll take a more in-depth look at the internal DynamicPPL machinery that allows us to convert between representations and obtain the correct probability densities. Before that, though, we’ll take a quick high-level look at how the HMC sampler in Turing.jl uses the functions introduced so far.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/transforms/dynamicppl/index.html#case-study-hmc-in-turing.jl", "href": "developers/transforms/dynamicppl/index.html#case-study-hmc-in-turing.jl", "title": "Variable transformations in DynamicPPL", "section": "Case study: HMC in Turing.jl", "text": "Case study: HMC in Turing.jl\nWhile DynamicPPL provides the functionality for transforming variables, the transformation itself happens at an even higher level, i.e. in the sampler itself. The HMC sampler in Turing.jl is in this file. In the first step of sampling, it calls link on the sampler. This transformation is preserved throughout the sampling process, meaning that istrans() always returns true.\nWe can observe this by inserting print statements into the model. Here, __varinfo__ is the internal symbol for the VarInfo object used in model evaluation:\n\nsetprogress!(false)\n\n@model function demo2()\n x ~ LogNormal()\n if x isa AbstractFloat\n println(\"-----------\")\n println(\"model repn: $(DynamicPPL.getindex(__varinfo__, @varname(x)))\")\n println(\"internal repn: $(DynamicPPL.getindex_internal(__varinfo__, @varname(x)))\")\n println(\"istrans: $(istrans(__varinfo__, @varname(x)))\")\n end\nend\n\nsample(demo2(), HMC(0.1, 3), 3);\n\n[ Info: [Turing]: progress logging is disabled globally\n[ Info: [AdvancedVI]: global PROGRESS is set as false\n-----------\nmodel repn: 0.9286310592520649\ninternal repn: Real[0.9286310592520649]\nistrans: false\n-----------\nmodel repn: 1.0439687015916723\ninternal repn: Real[1.0439687015916723]\nistrans: false\n-----------\nmodel repn: 1.6788901959902727\ninternal repn: [0.5181329774995787]\nistrans: true\n-----------\nmodel repn: 1.6993713154851589\ninternal repn: [0.5302583682434592]\nistrans: true\n-----------\nmodel repn: 2.5879692292256236\ninternal repn: [0.9508734867787983]\nistrans: true\n\n\n(Here, the check on if x isa AbstractFloat prevents the printing from occurring during computation of the derivative.) You can see that during the three sampling steps, istrans is always kept as true.\n\n\n\n\n\n\nNote\n\n\n\nThe first two model evaluations where istrans is false occur prior to the actual sampling. One occurs when the model is checked for correctness (using DynamicPPL.check_model_and_trace). The second occurs because the model is evaluated once to generate a set of initial parameters inside DynamicPPL’s implementation of AbstractMCMC.step. Both of these steps occur with all samplers in Turing.jl, so are not specific to the HMC example shown here.\n\n\nWhat this means is that from the perspective of the HMC sampler, it never sees the constrained variable: it always thinks that it is sampling from an unconstrained distribution.\nThe biggest prerequisite for this to work correctly is that the potential energy term in the Hamiltonian—or in other words, the model log density—must be programmed correctly to include the Jacobian term. This is exactly the same as how we had to make sure to define logq(y) correctly in the toy HMC example above.\nWithin Turing.jl, this is correctly handled because a statement like x ~ LogNormal() in the model definition above is translated into assume(LogNormal(), @varname(x), __varinfo__), defined here. If you follow the trail of function calls, you can verify that the assume function does indeed check for the presence of the istrans flag and adds the Jacobian term accordingly.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/transforms/dynamicppl/index.html#a-deeper-dive-into-dynamicppls-internal-machinery", "href": "developers/transforms/dynamicppl/index.html#a-deeper-dive-into-dynamicppls-internal-machinery", "title": "Variable transformations in DynamicPPL", "section": "A deeper dive into DynamicPPL’s internal machinery", "text": "A deeper dive into DynamicPPL’s internal machinery\nAs described above, DynamicPPL stores a (possibly linked) internal representation which is accessible via getindex_internal, but can also provide users with the original, untransformed, model representation via getindex. This abstraction allows the user to obtain samples from constrained distributions without having to perform the transformation themselves.\n\n\n\nMore functions related to variable transforms in DynamicPPL\n\n\nThe conversion between these representations is done using several internal functions in DynamicPPL, as depicted in the above diagram. The following operations are labelled:\n\nThis is linking, i.e. transforming a constrained variable to an unconstrained one.\nThis is vectorisation: for example, converting a scalar value to a 1-element vector.\nThis arrow brings us from the model representation to the linked internal representation. This is the composition of (1) and (2): linking and then vectorising.\nThis arrow brings us from the model representation to the unlinked internal representation. This only requires a single step, vectorisation.\n\nEach of these steps can be accomplished using the following functions.\n\n\n\n\n\n\n\n\n\nTo get the function\nTo get the inverse function\n\n\n\n\n(1)\nlink_transform(dist)\ninvlink_transform(dist)\n\n\n(2)\nto_vec_transform(dist)\nfrom_vec_transform(dist)\n\n\n(3)\nto_linked_internal_transform(vi, vn[, dist])\nfrom_linked_internal_transform(vi, vn[, dist])\n\n\n(4)\nto_internal_transform(vi, vn[, dist])\nfrom_internal_transform(vi, vn[, dist])\n\n\n\nNote that these functions do not perform the actual transformation; rather, they return the transformation function itself. For example, let’s take a look at the VarInfo from the previous section, which contains a single variable x ~ LogNormal().\n\nmodel_repn = vi[vn_x]\n\n1.0746648736094493\n\n\n\n# (1) Get the link function\nf_link = DynamicPPL.link_transform(LogNormal())\n# (2) Get the vectorise function\nf_vec = DynamicPPL.to_vec_transform(LogNormal())\n\n# Apply it to the model representation\nlinked_internal_repn = f_vec(f_link(model_repn))\n\n1-element Vector{Float64}:\n 0.07200886749732066\n\n\nEquivalently, we could have done:\n\n# (3) Get the linked internal transform function\nf_linked_internal = DynamicPPL.to_linked_internal_transform(vi, vn_x, LogNormal())\n\n# Apply it to the model representation\nlinked_internal_repn = f_linked_internal(model_repn)\n\n1-element Vector{Float64}:\n 0.07200886749732066\n\n\nAnd let’s confirm that this is the same as the linked internal representation, using the VarInfo that we linked earlier:\n\nDynamicPPL.getindex_internal(vi_linked, vn_x)\n\n1-element Vector{Float64}:\n 0.07200886749732066\n\n\nThe purpose of having all of these machinery is to allow other parts of DynamicPPL, such as the tilde pipeline, to handle transformed variables correctly. The following diagram shows how assume first checks whether the variable is transformed (using istrans), and then applies the appropriate transformation function.\n\n\n\n\n\n\n%%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%%\n%%{ init: { 'flowchart': { 'curve': 'linear', 'wrappingWidth': -1 } } }%%\ngraph TD\n A[\"x ~ LogNormal()\"]:::boxStyle\n B[\"vn = <span style='color:#3B6EA8 !important;'>@varname</span>(x)<br>dist = LogNormal()<br>x, vi = ...\"]:::boxStyle\n C[\"assume(vn, dist, vi)\"]:::boxStyle\n D([\"<span style='color:#3B6EA8 !important;'>if</span> istrans(vi, vn)\"]):::boxStyle\n E[\"f = from_internal_transform(vi, vn, dist)\"]:::boxStyle\n F[\"f = from_linked_internal_transform(vi, vn, dist)\"]:::boxStyle\n G[\"x, logjac = with_logabsdet_jacobian(f, getindex_internal(vi, vn, dist))\"]:::boxStyle\n H[\"<span style='color:#3B6EA8 !important;'>return</span> x, logpdf(dist, x) - logjac, vi\"]:::boxStyle\n \n A -.->|<span style='color:#3B6EA8 ; background-color:#ffffff;'>@model</span>| B\n B -.->|<span style='color:#000000 ; background-color:#ffffff;'>tilde-pipeline</span>| C\n C --> D\n D -->|<span style='color:#97365B ; background-color:#ffffff;'>false</span>| E\n D -->|<span style='color:#97365B ; background-color:#ffffff;'>true</span>| F\n E --> G\n F --> G\n G --> H\n\n classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000\n linkStyle default stroke:#000000,stroke-width:1px,color:#000000\n\n\n\n\n\n\nHere, with_logabsdet_jacobian is defined in the ChangesOfVariables.jl package, and returns both the effect of the transformation f as well as the log Jacobian term.\nBecause we chose f appropriately, we find here that x is always the model representation; furthermore, if the variable was not linked (i.e. istrans was false), the log Jacobian term will be zero. However, if it was linked, then the Jacobian term would be appropriately included, making sure that sampling proceeds correctly.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/transforms/dynamicppl/index.html#why-do-we-need-to-do-this-at-runtime", "href": "developers/transforms/dynamicppl/index.html#why-do-we-need-to-do-this-at-runtime", "title": "Variable transformations in DynamicPPL", "section": "Why do we need to do this at runtime?", "text": "Why do we need to do this at runtime?\nGiven that we know whether a VarInfo is linked or not, one might wonder why we need both from_internal_transform and from_linked_internal_transform at the point where the model is evaluated. Could we not, for example, store the required transformation inside the VarInfo when we link it, and simply reuse that each time?\nThat is, why can’t we just do\n\n\n\n\n\n%%{ init: { 'flowchart': { 'curve': 'linear', 'wrappingWidth': -1 } } }%%\n%%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%%\ngraph TD\n A[\"assume(varinfo, <span style='color:#3B6EA8 !important;'>@varname</span>(x), Normal())\"]:::boxStyle\n B[\"f = from_internal_transform(varinfo, varname, dist)\"]:::boxStyle\n C[\"x, logjac = with_logabsdet_jacobian(f, getindex_internal(varinfo, varname))\"]:::boxStyle\n D[\"<span style='color:#3B6EA8 !important;'>return</span> x, logpdf(dist, x) - logjac, varinfo\"]:::dashedBox\n \n A --> B\n B --> C\n C --> D\n\n classDef dashedBox fill:#ffffff,stroke:#000000,stroke-dasharray: 5 5,font-family:Courier,color:#000000\n classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000\n\n linkStyle default stroke:#000000,stroke-width:1px,color:#000000\n\n\n\n\n\n\nwhere from_internal_transform here only looks up a stored transformation function?\nUnfortunately, this is not possible in general, because the transformation function might not be a constant between different model evaluations. Consider, for example, the following model:\n\n@model function demo_dynamic_constraint()\n m ~ Normal()\n x ~ truncated(Normal(); lower=m)\n return (m=m, x=x)\nend\n\ndemo_dynamic_constraint (generic function with 2 methods)\n\n\nHere, m is distributed according to a plain Normal(), whereas the variable x is constrained to be in the domain (m, Inf). Because of this, we expect that any time we sample from the model, we should have that m < x (in terms of their model representations):\n\nmodel = demo_dynamic_constraint()\nvi = VarInfo(model)\nvn_m, vn_x = @varname(m), @varname(x)\n\nvi[vn_m], vi[vn_x]\n\n(-0.20318141265857553, 0.07028870940645648)\n\n\n(Note that vi[vn] is a shorthand for getindex(vi, vn), so this retrieves the model representations of m and x.) So far, so good. Let’s now link this VarInfo so that we end up working in an ‘unconstrained’ space, where both m and x can take on any values in (-Inf, Inf). First, we should check that the model representations are unchanged when linking:\n\nvi_linked = link(vi, model)\nvi_linked[vn_m], vi_linked[vn_x]\n\n(-0.20318141265857553, 0.07028870940645648)\n\n\nBut if we change the value of m, to, say, a bit larger than x:\n\n# Update the model representation for `m` in `vi_linked`.\nvi_linked[vn_m] = vi_linked[vn_x] + 1\nvi_linked[vn_m], vi_linked[vn_x]\n\n(1.0702887094064564, 0.07028870940645648)\n\n\n\n\n\n\n\n\nWarning\n\n\n\nThis is just for demonstration purposes! You shouldn’t be directly setting variables in a linked varinfo like this unless you know for a fact that the value will be compatible with the constraints of the model.\n\n\nNow, we see that the constraint m < x is no longer satisfied. Hence, one might expect that if we try to evaluate the model using this VarInfo, we should obtain an error. Here, evaluate!! returns two things: the model’s return value itself (which we defined above to be a NamedTuple), and the resulting VarInfo post-evaluation.\n\nretval, ret_varinfo = DynamicPPL.evaluate!!(model, vi_linked, DefaultContext())\ngetlogp(ret_varinfo)\n\n-2.6598362786308956\n\n\nBut we don’t get any errors! Indeed, we could even calculate the ‘log probability density’ for this evaluation.\nTo understand this, we need to look at the actual value which was used during the model evaluation. We can glean this from the return value (or from the returned VarInfo, but the former is easier):\n\nretval\n\n(m = 1.0702887094064564, x = 1.3437588314714883)\n\n\nWe can see here that the model evaluation used the value of m that we provided, but the value of x was ‘updated’.\nThe reason for this is that internally in a model evaluation, we construct the transformation function from the internal to the model representation based on the current realizations in the model! That is, we take the dist in a x ~ dist expression at model evaluation time and use that to construct the transformation, thus allowing it to change between model evaluations without invalidating the transformation.\nKnowing that the distribution of x depends on the value of m, we can now understand how the model representation of x got updated. The linked VarInfo does not store the model representation of x, but only its linked internal representation. So, what happened during the model evaluation above was that the linked internal representation of x – which was constructed using the original value of m – was transformed back into a new model representation using a different value of m.\nWe can reproduce the ‘new’ value of x by performing these transformations manually:\n\n# Generate a fresh linked VarInfo (without the new / 'corrupted' values)\nvi_linked = link(vi, model)\n# See the linked internal representations\nDynamicPPL.getindex_internal(vi_linked, vn_m), DynamicPPL.getindex_internal(vi_linked, vn_x)\n\n([-0.20318141265857553], [-1.2965629059941892])\n\n\nNow we update the value of m like we did before:\n\nvi_linked[vn_m] = vi_linked[vn_x] + 1\nvi_linked[vn_m]\n\n1.0702887094064564\n\n\nWhen evaluating the model, the distribution of x is now changed, and so is the corresponding inverse bijector:\n\nnew_dist_x = truncated(Normal(); lower=vi_linked[vn_m])\nnew_f_inv = DynamicPPL.invlink_transform(new_dist_x)\n\nBijectors.Inverse{Bijectors.TruncatedBijector{Float64, Float64}}(Bijectors.TruncatedBijector{Float64, Float64}(1.0702887094064564, Inf))\n\n\nand if we apply this to the internal representation of x:\n\nnew_f_inv(DynamicPPL.getindex_internal(vi_linked, vn_x))\n\n1-element Vector{Float64}:\n 1.3437588314714883\n\n\nwhich is the same value as we got above in retval.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/transforms/dynamicppl/index.html#conclusion", "href": "developers/transforms/dynamicppl/index.html#conclusion", "title": "Variable transformations in DynamicPPL", "section": "Conclusion", "text": "Conclusion\nIn this chapter of the Turing docs, we’ve looked at:\n\nwhy variables might need to be transformed;\nhow this is accounted for mathematically with the Jacobian term;\nthe basic API and functionality of Bijectors.jl; and\nthe higher-level usage of transforms in DynamicPPL and Turing.\n\nThis will hopefully have equipped you with a better understanding of how constrained variables are handled in the Turing framework. With this knowledge, you should especially find it easier to navigate DynamicPPL’s VarInfo type, which forms the backbone of model evaluation.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/inference/abstractmcmc-interface/index.html", "href": "developers/inference/abstractmcmc-interface/index.html", "title": "Interface Guide", "section": "", "text": "Turing implements a sampling interface (hosted at AbstractMCMC) that is intended to provide a common framework for Markov chain Monte Carlo samplers. The interface presents several structures and functions that one needs to overload in order to implement an interface-compatible sampler.\nThis guide will demonstrate how to implement the interface without Turing.\n\n\nAny implementation of an inference method that uses the AbstractMCMC interface should implement a subset of the following types and functions:\n\nA subtype of AbstractSampler, defined as a mutable struct containing state information or sampler parameters.\nA function sample_init! which performs any necessary set-up (default: do not perform any set-up).\nA function step! which returns a transition that represents a single draw from the sampler.\nA function transitions_init which returns a container for the transitions obtained from the sampler (default: return a Vector{T} of length N where T is the type of the transition obtained in the first step and N is the number of requested samples).\nA function transitions_save! which saves transitions to the container (default: save the transition of iteration i at position i in the vector of transitions).\nA function sample_end! which handles any sampler wrap-up (default: do not perform any wrap-up).\nA function bundle_samples which accepts the container of transitions and returns a collection of samples (default: return the vector of transitions).\n\nThe interface methods with exclamation points are those that are intended to allow for state mutation. Any mutating function is meant to allow mutation where needed – you might use:\n\nsample_init! to run some kind of sampler preparation, before sampling begins. This could mutate a sampler’s state.\nstep! might mutate a sampler flag after each sample.\nsample_end! contains any wrap-up you might need to do. If you were sampling in a transformed space, this might be where you convert everything back to a constrained space.\n\n\n\n\nThe motivation for the interface is to allow Julia’s fantastic probabilistic programming language community to have a set of standards and common implementations so we can all thrive together. Markov chain Monte Carlo methods tend to have a very similar framework to one another, and so a common interface should help more great inference methods built in single-purpose packages to experience more use among the community.\n\n\n\nMetropolis-Hastings is often the first sampling method that people are exposed to. It is a very straightforward algorithm and is accordingly the easiest to implement, so it makes for a good example. In this section, you will learn how to use the types and functions listed above to implement the Metropolis-Hastings sampler using the MCMC interface.\nThe full code for this implementation is housed in AdvancedMH.jl.\n\n\nLet’s begin by importing the relevant libraries. We’ll import AbstractMCMC, which contains the interface framework we’ll fill out. We also need Distributions and Random.\n\n# Import the relevant libraries.\nusing AbstractMCMC: AbstractMCMC\nusing Distributions\nusing Random\n\nAn interface extension (like the one we’re writing right now) typically requires that you overload or implement several functions. Specifically, you should import the functions you intend to overload. This next code block accomplishes that.\nFrom Distributions, we need Sampleable, VariateForm, and ValueSupport, three abstract types that define a distribution. Models in the interface are assumed to be subtypes of Sampleable{VariateForm, ValueSupport}. In this section our model is going be be extremely simple, so we will not end up using these except to make sure that the inference functions are dispatching correctly.\n\n\n\nLet’s begin our sampler definition by defining a sampler called MetropolisHastings which is a subtype of AbstractSampler. Correct typing is very important for proper interface implementation – if you are missing a subtype, your method may not be dispatched to when you call sample.\n\n# Define a sampler type.\nstruct MetropolisHastings{T,D} <: AbstractMCMC.AbstractSampler\n init_θ::T\n proposal::D\nend\n\n# Default constructors.\nMetropolisHastings(init_θ::Real) = MetropolisHastings(init_θ, Normal(0, 1))\nfunction MetropolisHastings(init_θ::Vector{<:Real})\n return MetropolisHastings(init_θ, MvNormal(zero(init_θ), I))\nend\n\nMetropolisHastings\n\n\nAbove, we have defined a sampler that stores the initial parameterization of the prior, and a distribution object from which proposals are drawn. You can have a struct that has no fields, and simply use it for dispatching onto the relevant functions, or you can store a large amount of state information in your sampler.\nThe general intuition for what to store in your sampler struct is that anything you may need to perform inference between samples but you don’t want to store in a transition should go into the sampler struct. It’s the only way you can carry non-sample related state information between step! calls.\n\n\n\nNext, we need to have a model of some kind. A model is a struct that’s a subtype of AbstractModel that contains whatever information is necessary to perform inference on your problem. In our case we want to know the mean and variance parameters for a standard Normal distribution, so we can keep our model to the log density of a Normal.\nNote that we only have to do this because we are not yet integrating the sampler with Turing – Turing has a very sophisticated modelling engine that removes the need to define custom model structs.\n\n# Define a model type. Stores the log density function.\nstruct DensityModel{F<:Function} <: AbstractMCMC.AbstractModel\n ℓπ::F\nend\n\n\n\n\nThe next step is to define some transition which we will return from each step! call. We’ll keep it simple by just defining a wrapper struct that contains the parameter draws and the log density of that draw:\n\n# Create a very basic Transition type, only stores the \n# parameter draws and the log probability of the draw.\nstruct Transition{T,L}\n θ::T\n lp::L\nend\n\n# Store the new draw and its log density.\nTransition(model::DensityModel, θ) = Transition(θ, ℓπ(model, θ))\n\nTransition\n\n\nTransition can now store any type of parameter, whether it’s a vector of draws from multiple parameters or a single univariate draw.\n\n\n\nNow it’s time to get into the actual inference. We’ve defined all of the core pieces we need, but we need to implement the step! function which actually performs inference.\nAs a refresher, Metropolis-Hastings implements a very basic algorithm:\n\nPick some initial state, \\theta_0.\nFor t in [1,N], do\n\nGenerate a proposal parameterization \\theta^\\prime_t \\sim q(\\theta^\\prime_t \\mid \\theta_{t-1}).\nCalculate the acceptance probability, \\alpha = \\text{min}\\left[1,\\frac{\\pi(\\theta'_t)}{\\pi(\\theta_{t-1})} \\frac{q(\\theta_{t-1} \\mid \\theta'_t)}{q(\\theta'_t \\mid \\theta_{t-1})}) \\right].\nIf U \\le \\alpha where U \\sim [0,1], then \\theta_t = \\theta'_t. Otherwise, \\theta_t = \\theta_{t-1}.\n\n\nOf course, it’s much easier to do this in the log space, so the acceptance probability is more commonly written as\n\\log \\alpha = \\min\\left[0, \\log \\pi(\\theta'_t) - \\log \\pi(\\theta_{t-1}) + \\log q(\\theta_{t-1} \\mid \\theta^\\prime_t) - \\log q(\\theta\\prime_t \\mid \\theta_{t-1}) \\right].\nIn interface terms, we should do the following:\n\nMake a new transition containing a proposed sample.\nCalculate the acceptance probability.\nIf we accept, return the new transition, otherwise, return the old one.\n\n\n\n\nThe step! function is the function that performs the bulk of your inference. In our case, we will implement two step! functions – one for the very first iteration, and one for every subsequent iteration.\n\n# Define the first step! function, which is called at the \n# beginning of sampling. Return the initial parameter used\n# to define the sampler.\nfunction AbstractMCMC.step!(\n rng::AbstractRNG,\n model::DensityModel,\n spl::MetropolisHastings,\n N::Integer,\n ::Nothing;\n kwargs...,\n)\n return Transition(model, spl.init_θ)\nend\n\nThe first step! function just packages up the initial parameterization inside the sampler, and returns it. We implicitly accept the very first parameterization.\nThe other step! function performs the usual steps from Metropolis-Hastings. Included are several helper functions, proposal and q, which are designed to replicate the functions in the pseudocode above.\n\nproposal generates a new proposal in the form of a Transition, which can be univariate if the value passed in is univariate, or it can be multivariate if the Transition given is multivariate. Proposals use a basic Normal or MvNormal proposal distribution.\nq returns the log density of one parameterization conditional on another, according to the proposal distribution.\nstep! generates a new proposal, checks the acceptance probability, and then returns either the previous transition or the proposed transition.\n\n\n# Define a function that makes a basic proposal depending on a univariate\n# parameterization or a multivariate parameterization.\nfunction propose(spl::MetropolisHastings, model::DensityModel, θ::Real)\n return Transition(model, θ + rand(spl.proposal))\nend\nfunction propose(spl::MetropolisHastings, model::DensityModel, θ::Vector{<:Real})\n return Transition(model, θ + rand(spl.proposal))\nend\nfunction propose(spl::MetropolisHastings, model::DensityModel, t::Transition)\n return propose(spl, model, t.θ)\nend\n\n# Calculates the probability `q(θ|θcond)`, using the proposal distribution `spl.proposal`.\nq(spl::MetropolisHastings, θ::Real, θcond::Real) = logpdf(spl.proposal, θ - θcond)\nfunction q(spl::MetropolisHastings, θ::Vector{<:Real}, θcond::Vector{<:Real})\n return logpdf(spl.proposal, θ - θcond)\nend\nq(spl::MetropolisHastings, t1::Transition, t2::Transition) = q(spl, t1.θ, t2.θ)\n\n# Calculate the density of the model given some parameterization.\nℓπ(model::DensityModel, θ) = model.ℓπ(θ)\nℓπ(model::DensityModel, t::Transition) = t.lp\n\n# Define the other step function. Returns a Transition containing\n# either a new proposal (if accepted) or the previous proposal \n# (if not accepted).\nfunction AbstractMCMC.step!(\n rng::AbstractRNG,\n model::DensityModel,\n spl::MetropolisHastings,\n ::Integer,\n θ_prev::Transition;\n kwargs...,\n)\n # Generate a new proposal.\n θ = propose(spl, model, θ_prev)\n\n # Calculate the log acceptance probability.\n α = ℓπ(model, θ) - ℓπ(model, θ_prev) + q(spl, θ_prev, θ) - q(spl, θ, θ_prev)\n\n # Decide whether to return the previous θ or the new one.\n if log(rand(rng)) < min(α, 0.0)\n return θ\n else\n return θ_prev\n end\nend\n\n\n\n\nIn the default implementation, sample just returns a vector of all transitions. If instead you would like to obtain a Chains object (e.g., to simplify downstream analysis), you have to implement the bundle_samples function as well. It accepts the vector of transitions and returns a collection of samples. Fortunately, our Transition is incredibly simple, and we only need to build a little bit of functionality to accept custom parameter names passed in by the user.\n\n# A basic chains constructor that works with the Transition struct we defined.\nfunction AbstractMCMC.bundle_samples(\n rng::AbstractRNG,\n ℓ::DensityModel,\n s::MetropolisHastings,\n N::Integer,\n ts::Vector{<:Transition},\n chain_type::Type{Any};\n param_names=missing,\n kwargs...,\n)\n # Turn all the transitions into a vector-of-vectors.\n vals = copy(reduce(hcat, [vcat(t.θ, t.lp) for t in ts])')\n\n # Check if we received any parameter names.\n if ismissing(param_names)\n param_names = [\"Parameter $i\" for i in 1:(length(first(vals)) - 1)]\n end\n\n # Add the log density field to the parameter names.\n push!(param_names, \"lp\")\n\n # Bundle everything up and return a Chains struct.\n return Chains(vals, param_names, (internals=[\"lp\"],))\nend\n\nAll done!\nYou can even implement different output formats by implementing bundle_samples for different chain_types, which can be provided as keyword argument to sample. As default sample uses chain_type = Any.\n\n\n\nNow that we have all the pieces, we should test the implementation by defining a model to calculate the mean and variance parameters of a Normal distribution. We can do this by constructing a target density function, providing a sample of data, and then running the sampler with sample.\n\n# Generate a set of data from the posterior we want to estimate.\ndata = rand(Normal(5, 3), 30)\n\n# Define the components of a basic model.\ninsupport(θ) = θ[2] >= 0\ndist(θ) = Normal(θ[1], θ[2])\ndensity(θ) = insupport(θ) ? sum(logpdf.(dist(θ), data)) : -Inf\n\n# Construct a DensityModel.\nmodel = DensityModel(density)\n\n# Set up our sampler with initial parameters.\nspl = MetropolisHastings([0.0, 0.0])\n\n# Sample from the posterior.\nchain = sample(model, spl, 100000; param_names=[\"μ\", \"σ\"])\n\nIf all the interface functions have been extended properly, you should get an output from display(chain) that looks something like this:\nObject of type Chains, with data of type 100000×3×1 Array{Float64,3}\n\nIterations = 1:100000\nThinning interval = 1\nChains = 1\nSamples per chain = 100000\ninternals = lp\nparameters = μ, σ\n\n2-element Array{ChainDataFrame,1}\n\nSummary Statistics\n\n│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │\n│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │\n├─────┼────────────┼─────────┼──────────┼────────────┼────────────┼─────────┼─────────┤\n│ 1 │ μ │ 5.33157 │ 0.854193 │ 0.0027012 │ 0.00893069 │ 8344.75 │ 1.00009 │\n│ 2 │ σ │ 4.54992 │ 0.632916 │ 0.00200146 │ 0.00534942 │ 14260.8 │ 1.00005 │\n\nQuantiles\n\n│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │\n│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │\n├─────┼────────────┼─────────┼─────────┼─────────┼─────────┼─────────┤\n│ 1 │ μ │ 3.6595 │ 4.77754 │ 5.33182 │ 5.89509 │ 6.99651 │\n│ 2 │ σ │ 3.5097 │ 4.09732 │ 4.47805 │ 4.93094 │ 5.96821 │\nIt looks like we’re extremely close to our true parameters of Normal(5,3), though with a fairly high variance due to the low sample size.\n\n\n\n\nWe’ve seen how to implement the sampling interface for general projects. Turing’s interface methods are ever-evolving, so please open an issue at AbstractMCMC with feature requests or problems.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Interface Guide" ] }, { "objectID": "developers/inference/abstractmcmc-interface/index.html#interface-overview", "href": "developers/inference/abstractmcmc-interface/index.html#interface-overview", "title": "Interface Guide", "section": "", "text": "Any implementation of an inference method that uses the AbstractMCMC interface should implement a subset of the following types and functions:\n\nA subtype of AbstractSampler, defined as a mutable struct containing state information or sampler parameters.\nA function sample_init! which performs any necessary set-up (default: do not perform any set-up).\nA function step! which returns a transition that represents a single draw from the sampler.\nA function transitions_init which returns a container for the transitions obtained from the sampler (default: return a Vector{T} of length N where T is the type of the transition obtained in the first step and N is the number of requested samples).\nA function transitions_save! which saves transitions to the container (default: save the transition of iteration i at position i in the vector of transitions).\nA function sample_end! which handles any sampler wrap-up (default: do not perform any wrap-up).\nA function bundle_samples which accepts the container of transitions and returns a collection of samples (default: return the vector of transitions).\n\nThe interface methods with exclamation points are those that are intended to allow for state mutation. Any mutating function is meant to allow mutation where needed – you might use:\n\nsample_init! to run some kind of sampler preparation, before sampling begins. This could mutate a sampler’s state.\nstep! might mutate a sampler flag after each sample.\nsample_end! contains any wrap-up you might need to do. If you were sampling in a transformed space, this might be where you convert everything back to a constrained space.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Interface Guide" ] }, { "objectID": "developers/inference/abstractmcmc-interface/index.html#why-do-you-have-an-interface", "href": "developers/inference/abstractmcmc-interface/index.html#why-do-you-have-an-interface", "title": "Interface Guide", "section": "", "text": "The motivation for the interface is to allow Julia’s fantastic probabilistic programming language community to have a set of standards and common implementations so we can all thrive together. Markov chain Monte Carlo methods tend to have a very similar framework to one another, and so a common interface should help more great inference methods built in single-purpose packages to experience more use among the community.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Interface Guide" ] }, { "objectID": "developers/inference/abstractmcmc-interface/index.html#implementing-metropolis-hastings-without-turing", "href": "developers/inference/abstractmcmc-interface/index.html#implementing-metropolis-hastings-without-turing", "title": "Interface Guide", "section": "", "text": "Metropolis-Hastings is often the first sampling method that people are exposed to. It is a very straightforward algorithm and is accordingly the easiest to implement, so it makes for a good example. In this section, you will learn how to use the types and functions listed above to implement the Metropolis-Hastings sampler using the MCMC interface.\nThe full code for this implementation is housed in AdvancedMH.jl.\n\n\nLet’s begin by importing the relevant libraries. We’ll import AbstractMCMC, which contains the interface framework we’ll fill out. We also need Distributions and Random.\n\n# Import the relevant libraries.\nusing AbstractMCMC: AbstractMCMC\nusing Distributions\nusing Random\n\nAn interface extension (like the one we’re writing right now) typically requires that you overload or implement several functions. Specifically, you should import the functions you intend to overload. This next code block accomplishes that.\nFrom Distributions, we need Sampleable, VariateForm, and ValueSupport, three abstract types that define a distribution. Models in the interface are assumed to be subtypes of Sampleable{VariateForm, ValueSupport}. In this section our model is going be be extremely simple, so we will not end up using these except to make sure that the inference functions are dispatching correctly.\n\n\n\nLet’s begin our sampler definition by defining a sampler called MetropolisHastings which is a subtype of AbstractSampler. Correct typing is very important for proper interface implementation – if you are missing a subtype, your method may not be dispatched to when you call sample.\n\n# Define a sampler type.\nstruct MetropolisHastings{T,D} <: AbstractMCMC.AbstractSampler\n init_θ::T\n proposal::D\nend\n\n# Default constructors.\nMetropolisHastings(init_θ::Real) = MetropolisHastings(init_θ, Normal(0, 1))\nfunction MetropolisHastings(init_θ::Vector{<:Real})\n return MetropolisHastings(init_θ, MvNormal(zero(init_θ), I))\nend\n\nMetropolisHastings\n\n\nAbove, we have defined a sampler that stores the initial parameterization of the prior, and a distribution object from which proposals are drawn. You can have a struct that has no fields, and simply use it for dispatching onto the relevant functions, or you can store a large amount of state information in your sampler.\nThe general intuition for what to store in your sampler struct is that anything you may need to perform inference between samples but you don’t want to store in a transition should go into the sampler struct. It’s the only way you can carry non-sample related state information between step! calls.\n\n\n\nNext, we need to have a model of some kind. A model is a struct that’s a subtype of AbstractModel that contains whatever information is necessary to perform inference on your problem. In our case we want to know the mean and variance parameters for a standard Normal distribution, so we can keep our model to the log density of a Normal.\nNote that we only have to do this because we are not yet integrating the sampler with Turing – Turing has a very sophisticated modelling engine that removes the need to define custom model structs.\n\n# Define a model type. Stores the log density function.\nstruct DensityModel{F<:Function} <: AbstractMCMC.AbstractModel\n ℓπ::F\nend\n\n\n\n\nThe next step is to define some transition which we will return from each step! call. We’ll keep it simple by just defining a wrapper struct that contains the parameter draws and the log density of that draw:\n\n# Create a very basic Transition type, only stores the \n# parameter draws and the log probability of the draw.\nstruct Transition{T,L}\n θ::T\n lp::L\nend\n\n# Store the new draw and its log density.\nTransition(model::DensityModel, θ) = Transition(θ, ℓπ(model, θ))\n\nTransition\n\n\nTransition can now store any type of parameter, whether it’s a vector of draws from multiple parameters or a single univariate draw.\n\n\n\nNow it’s time to get into the actual inference. We’ve defined all of the core pieces we need, but we need to implement the step! function which actually performs inference.\nAs a refresher, Metropolis-Hastings implements a very basic algorithm:\n\nPick some initial state, \\theta_0.\nFor t in [1,N], do\n\nGenerate a proposal parameterization \\theta^\\prime_t \\sim q(\\theta^\\prime_t \\mid \\theta_{t-1}).\nCalculate the acceptance probability, \\alpha = \\text{min}\\left[1,\\frac{\\pi(\\theta'_t)}{\\pi(\\theta_{t-1})} \\frac{q(\\theta_{t-1} \\mid \\theta'_t)}{q(\\theta'_t \\mid \\theta_{t-1})}) \\right].\nIf U \\le \\alpha where U \\sim [0,1], then \\theta_t = \\theta'_t. Otherwise, \\theta_t = \\theta_{t-1}.\n\n\nOf course, it’s much easier to do this in the log space, so the acceptance probability is more commonly written as\n\\log \\alpha = \\min\\left[0, \\log \\pi(\\theta'_t) - \\log \\pi(\\theta_{t-1}) + \\log q(\\theta_{t-1} \\mid \\theta^\\prime_t) - \\log q(\\theta\\prime_t \\mid \\theta_{t-1}) \\right].\nIn interface terms, we should do the following:\n\nMake a new transition containing a proposed sample.\nCalculate the acceptance probability.\nIf we accept, return the new transition, otherwise, return the old one.\n\n\n\n\nThe step! function is the function that performs the bulk of your inference. In our case, we will implement two step! functions – one for the very first iteration, and one for every subsequent iteration.\n\n# Define the first step! function, which is called at the \n# beginning of sampling. Return the initial parameter used\n# to define the sampler.\nfunction AbstractMCMC.step!(\n rng::AbstractRNG,\n model::DensityModel,\n spl::MetropolisHastings,\n N::Integer,\n ::Nothing;\n kwargs...,\n)\n return Transition(model, spl.init_θ)\nend\n\nThe first step! function just packages up the initial parameterization inside the sampler, and returns it. We implicitly accept the very first parameterization.\nThe other step! function performs the usual steps from Metropolis-Hastings. Included are several helper functions, proposal and q, which are designed to replicate the functions in the pseudocode above.\n\nproposal generates a new proposal in the form of a Transition, which can be univariate if the value passed in is univariate, or it can be multivariate if the Transition given is multivariate. Proposals use a basic Normal or MvNormal proposal distribution.\nq returns the log density of one parameterization conditional on another, according to the proposal distribution.\nstep! generates a new proposal, checks the acceptance probability, and then returns either the previous transition or the proposed transition.\n\n\n# Define a function that makes a basic proposal depending on a univariate\n# parameterization or a multivariate parameterization.\nfunction propose(spl::MetropolisHastings, model::DensityModel, θ::Real)\n return Transition(model, θ + rand(spl.proposal))\nend\nfunction propose(spl::MetropolisHastings, model::DensityModel, θ::Vector{<:Real})\n return Transition(model, θ + rand(spl.proposal))\nend\nfunction propose(spl::MetropolisHastings, model::DensityModel, t::Transition)\n return propose(spl, model, t.θ)\nend\n\n# Calculates the probability `q(θ|θcond)`, using the proposal distribution `spl.proposal`.\nq(spl::MetropolisHastings, θ::Real, θcond::Real) = logpdf(spl.proposal, θ - θcond)\nfunction q(spl::MetropolisHastings, θ::Vector{<:Real}, θcond::Vector{<:Real})\n return logpdf(spl.proposal, θ - θcond)\nend\nq(spl::MetropolisHastings, t1::Transition, t2::Transition) = q(spl, t1.θ, t2.θ)\n\n# Calculate the density of the model given some parameterization.\nℓπ(model::DensityModel, θ) = model.ℓπ(θ)\nℓπ(model::DensityModel, t::Transition) = t.lp\n\n# Define the other step function. Returns a Transition containing\n# either a new proposal (if accepted) or the previous proposal \n# (if not accepted).\nfunction AbstractMCMC.step!(\n rng::AbstractRNG,\n model::DensityModel,\n spl::MetropolisHastings,\n ::Integer,\n θ_prev::Transition;\n kwargs...,\n)\n # Generate a new proposal.\n θ = propose(spl, model, θ_prev)\n\n # Calculate the log acceptance probability.\n α = ℓπ(model, θ) - ℓπ(model, θ_prev) + q(spl, θ_prev, θ) - q(spl, θ, θ_prev)\n\n # Decide whether to return the previous θ or the new one.\n if log(rand(rng)) < min(α, 0.0)\n return θ\n else\n return θ_prev\n end\nend\n\n\n\n\nIn the default implementation, sample just returns a vector of all transitions. If instead you would like to obtain a Chains object (e.g., to simplify downstream analysis), you have to implement the bundle_samples function as well. It accepts the vector of transitions and returns a collection of samples. Fortunately, our Transition is incredibly simple, and we only need to build a little bit of functionality to accept custom parameter names passed in by the user.\n\n# A basic chains constructor that works with the Transition struct we defined.\nfunction AbstractMCMC.bundle_samples(\n rng::AbstractRNG,\n ℓ::DensityModel,\n s::MetropolisHastings,\n N::Integer,\n ts::Vector{<:Transition},\n chain_type::Type{Any};\n param_names=missing,\n kwargs...,\n)\n # Turn all the transitions into a vector-of-vectors.\n vals = copy(reduce(hcat, [vcat(t.θ, t.lp) for t in ts])')\n\n # Check if we received any parameter names.\n if ismissing(param_names)\n param_names = [\"Parameter $i\" for i in 1:(length(first(vals)) - 1)]\n end\n\n # Add the log density field to the parameter names.\n push!(param_names, \"lp\")\n\n # Bundle everything up and return a Chains struct.\n return Chains(vals, param_names, (internals=[\"lp\"],))\nend\n\nAll done!\nYou can even implement different output formats by implementing bundle_samples for different chain_types, which can be provided as keyword argument to sample. As default sample uses chain_type = Any.\n\n\n\nNow that we have all the pieces, we should test the implementation by defining a model to calculate the mean and variance parameters of a Normal distribution. We can do this by constructing a target density function, providing a sample of data, and then running the sampler with sample.\n\n# Generate a set of data from the posterior we want to estimate.\ndata = rand(Normal(5, 3), 30)\n\n# Define the components of a basic model.\ninsupport(θ) = θ[2] >= 0\ndist(θ) = Normal(θ[1], θ[2])\ndensity(θ) = insupport(θ) ? sum(logpdf.(dist(θ), data)) : -Inf\n\n# Construct a DensityModel.\nmodel = DensityModel(density)\n\n# Set up our sampler with initial parameters.\nspl = MetropolisHastings([0.0, 0.0])\n\n# Sample from the posterior.\nchain = sample(model, spl, 100000; param_names=[\"μ\", \"σ\"])\n\nIf all the interface functions have been extended properly, you should get an output from display(chain) that looks something like this:\nObject of type Chains, with data of type 100000×3×1 Array{Float64,3}\n\nIterations = 1:100000\nThinning interval = 1\nChains = 1\nSamples per chain = 100000\ninternals = lp\nparameters = μ, σ\n\n2-element Array{ChainDataFrame,1}\n\nSummary Statistics\n\n│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │\n│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │\n├─────┼────────────┼─────────┼──────────┼────────────┼────────────┼─────────┼─────────┤\n│ 1 │ μ │ 5.33157 │ 0.854193 │ 0.0027012 │ 0.00893069 │ 8344.75 │ 1.00009 │\n│ 2 │ σ │ 4.54992 │ 0.632916 │ 0.00200146 │ 0.00534942 │ 14260.8 │ 1.00005 │\n\nQuantiles\n\n│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │\n│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │\n├─────┼────────────┼─────────┼─────────┼─────────┼─────────┼─────────┤\n│ 1 │ μ │ 3.6595 │ 4.77754 │ 5.33182 │ 5.89509 │ 6.99651 │\n│ 2 │ σ │ 3.5097 │ 4.09732 │ 4.47805 │ 4.93094 │ 5.96821 │\nIt looks like we’re extremely close to our true parameters of Normal(5,3), though with a fairly high variance due to the low sample size.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Interface Guide" ] }, { "objectID": "developers/inference/abstractmcmc-interface/index.html#conclusion", "href": "developers/inference/abstractmcmc-interface/index.html#conclusion", "title": "Interface Guide", "section": "", "text": "We’ve seen how to implement the sampling interface for general projects. Turing’s interface methods are ever-evolving, so please open an issue at AbstractMCMC with feature requests or problems.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Interface Guide" ] }, { "objectID": "developers/inference/implementing-samplers/index.html", "href": "developers/inference/implementing-samplers/index.html", "title": "Implementing Samplers", "section": "", "text": "In this tutorial, we’ll go through step-by-step how to implement a “simple” sampler in AbstractMCMC.jl in such a way that it can be easily applied to Turing.jl models.\nIn particular, we’re going to implement a version of Metropolis-adjusted Langevin (MALA).\nNote that we will implement this sampler in the AbstractMCMC.jl framework, completely “ignoring” Turing.jl until the very end of the tutorial, at which point we’ll use a single line of code to make the resulting sampler available to Turing.jl. This is to really drive home the point that one can implement samplers in a way that is accessible to all of Turing.jl’s users without having to use Turing.jl yourself.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#quick-overview-of-mala", "href": "developers/inference/implementing-samplers/index.html#quick-overview-of-mala", "title": "Implementing Samplers", "section": "Quick overview of MALA", "text": "Quick overview of MALA\nWe can view MALA as a single step of the leapfrog intergrator with resampling of momentum \\(p\\) at every step.1 To make that statement a bit more concrete, we first define the extended target \\(\\bar{\\gamma}(x, p)\\) as\n\\[\\begin{equation*}\n\\log \\bar{\\gamma}(x, p) \\propto \\log \\gamma(x) + \\log \\gamma_{\\mathcal{N}(0, M)}(p)\n\\end{equation*}\\]\nwhere \\(\\gamma_{\\mathcal{N}(0, M)}\\) denotes the density for a zero-centered Gaussian with covariance matrix \\(M\\). We then consider targeting this joint distribution over both \\(x\\) and \\(p\\) as follows. First we define the map\n\\[\\begin{equation*}\n\\begin{split}\n L_{\\epsilon}: \\quad & \\mathbb{R}^d \\times \\mathbb{R}^d \\to \\mathbb{R}^d \\times \\mathbb{R}^d \\\\\n & (x, p) \\mapsto (\\tilde{x}, \\tilde{p}) := L_{\\epsilon}(x, p)\n\\end{split}\n\\end{equation*}\\]\nas\n\\[\\begin{equation*}\n\\begin{split}\n p_{1 / 2} &:= p + \\frac{\\epsilon}{2} \\nabla \\log \\gamma(x) \\\\\n \\tilde{x} &:= x + \\epsilon M^{-1} p_{1 /2 } \\\\\n p_1 &:= p_{1 / 2} + \\frac{\\epsilon}{2} \\nabla \\log \\gamma(\\tilde{x}) \\\\\n \\tilde{p} &:= - p_1\n\\end{split}\n\\end{equation*}\\]\nThis might be familiar for some readers as a single step of the Leapfrog integrator. We then define the MALA kernel as follows: given the current iterate \\(x_i\\), we sample the next iterate \\(x_{i + 1}\\) as\n\\[\\begin{equation*}\n\\begin{split}\n p &\\sim \\mathcal{N}(0, M) \\\\\n (\\tilde{x}, \\tilde{p}) &:= L_{\\epsilon}(x_i, p) \\\\\n \\alpha &:= \\min \\left\\{ 1, \\frac{\\bar{\\gamma}(\\tilde{x}, \\tilde{p})}{\\bar{\\gamma}(x_i, p)} \\right\\} \\\\\n x_{i + 1} &:=\n \\begin{cases}\n \\tilde{x} \\quad & \\text{ with prob. } \\alpha \\\\\n x_i \\quad & \\text{ with prob. } 1 - \\alpha\n \\end{cases}\n\\end{split}\n\\end{equation*}\\]\ni.e. we accept the proposal \\(\\tilde{x}\\) with probability \\(\\alpha\\) and reject it, thus sticking with our current iterate, with probability \\(1 - \\alpha\\).", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#what-we-need-from-a-model-logdensityproblems.jl", "href": "developers/inference/implementing-samplers/index.html#what-we-need-from-a-model-logdensityproblems.jl", "title": "Implementing Samplers", "section": "What we need from a model: LogDensityProblems.jl", "text": "What we need from a model: LogDensityProblems.jl\nThere are a few things we need from the “target” / “model” / density that we want to sample from:\n\nWe need access to log-density evaluations \\(\\log \\gamma(x)\\) so we can compute the acceptance ratio involving \\(\\log \\bar{\\gamma}(x, p)\\).\nWe need access to log-density gradients \\(\\nabla \\log \\gamma(x)\\) so we can compute the Leapfrog steps \\(L_{\\epsilon}(x, p)\\).\nWe also need access to the “size” of the model so we can determine the size of \\(M\\).\n\nLuckily for us, there is a package called LogDensityProblems.jl which provides an interface for exactly this!\nTo demonstrate how one can implement the “LogDensityProblems.jl interface”2 we will use a simple Gaussian model as an example:\n\nusing LogDensityProblems: LogDensityProblems;\n\n# Let's define some type that represents the model.\nstruct IsotropicNormalModel{M<:AbstractVector{<:Real}}\n \"mean of the isotropic Gaussian\"\n mean::M\nend\n\n# Specifies what input length the model expects.\nLogDensityProblems.dimension(model::IsotropicNormalModel) = length(model.mean)\n# Implementation of the log-density evaluation of the model.\nfunction LogDensityProblems.logdensity(model::IsotropicNormalModel, x::AbstractVector{<:Real})\n return - sum(abs2, x .- model.mean) / 2\nend\n\nThis gives us all of the properties we want for our MALA sampler with the exception of the computation of the gradient \\(\\nabla \\log \\gamma(x)\\). There is the method LogDensityProblems.logdensity_and_gradient which should return a 2-tuple where the first entry is the evaluation of the logdensity \\(\\log \\gamma(x)\\) and the second entry is the gradient \\(\\nabla \\log \\gamma(x)\\).\nThere are two ways to “implement” this method: 1) we implement it by hand, which is feasible in the case of our IsotropicNormalModel, or b) we defer the implementation of this to a automatic differentiation backend.\nTo implement it by hand we can simply do\n\n# Tell LogDensityProblems.jl that first-order, i.e. gradient information, is available.\nLogDensityProblems.capabilities(model::IsotropicNormalModel) = LogDensityProblems.LogDensityOrder{1}()\n\n# Implement `logdensity_and_gradient`.\nfunction LogDensityProblems.logdensity_and_gradient(model::IsotropicNormalModel, x)\n logγ_x = LogDensityProblems.logdensity(model, x)\n ∇logγ_x = -x .* (x - model.mean)\n return logγ_x, ∇logγ_x\nend\n\nLet’s just try it out:\n\n# Instantiate the problem.\nmodel = IsotropicNormalModel([-5., 0., 5.])\n# Create some example input that we can test on.\nx_example = randn(LogDensityProblems.dimension(model))\n# Evaluate!\nLogDensityProblems.logdensity(model, x_example)\n\n-17.309512855870754\n\n\nTo defer it to an automatic differentiation backend, we can do\n\n# Tell LogDensityProblems.jl we only have access to 0-th order information.\nLogDensityProblems.capabilities(model::IsotropicNormalModel) = LogDensityProblems.LogDensityOrder{0}()\n\n# Use `LogDensityProblemsAD`'s `ADgradient` in combination with some AD backend to implement `logdensity_and_gradient`.\nusing LogDensityProblemsAD, ADTypes, ForwardDiff\nmodel_with_grad = ADgradient(AutoForwardDiff(), model)\nLogDensityProblems.logdensity(model_with_grad, x_example)\n\n-17.309512855870754\n\n\nWe’ll continue with the second approach in this tutorial since this is typically what one does in practice, because there are better hobbies to spend time on than deriving gradients by hand.\nAt this point, one might wonder how we’re going to tie this back to Turing.jl in the end. Effectively, when working with inference methods that only require log-density evaluations and / or higher-order information of the log-density, Turing.jl actually converts the user-provided Model into an object implementing the above methods for LogDensityProblems.jl. As a result, most samplers provided by Turing.jl are actually implemented to work with LogDensityProblems.jl, enabling their use both within Turing.jl and outside of Turing.jl! Morever, there exists similar conversions for Stan through BridgeStan and StanLogDensityProblems.jl, which means that a sampler supporting the LogDensityProblems.jl interface can easily be used on both Turing.jl and Stan models (in addition to user-provided models, as our IsotropicNormalModel above)!\nAnyways, let’s move on to actually implementing the sampler.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#implementing-mala-in-abstractmcmc.jl", "href": "developers/inference/implementing-samplers/index.html#implementing-mala-in-abstractmcmc.jl", "title": "Implementing Samplers", "section": "Implementing MALA in AbstractMCMC.jl", "text": "Implementing MALA in AbstractMCMC.jl\nNow that we’ve established that a model implementing the LogDensityProblems.jl interface provides us with all the information we need from \\(\\log \\gamma(x)\\), we can address the question: given an object that implements the LogDensityProblems.jl interface, how can we define a sampler for it?\nWe’re going to do this by making our sampler a sub-type of AbstractMCMC.AbstractSampler in addition to implementing a few methods from AbstractMCMC.jl. Why? Because it gets us a lot of functionality for free, as we will see later.\nMoreover, AbstractMCMC.jl provides a very natural interface for MCMC algorithms.\nFirst, we’ll define our MALA type\n\nusing AbstractMCMC\n\nstruct MALA{T,A} <: AbstractMCMC.AbstractSampler\n \"stepsize used in the leapfrog step\"\n ϵ_init::T\n \"covariance matrix used for the momentum\"\n M_init::A\nend\n\nNotice how we’ve added the suffix _init to both the stepsize and the covariance matrix. We’ve done this because a AbstractMCMC.AbstractSampler should be immutable. Of course there might be many scenarios where we want to allow something like the stepsize and / or the covariance matrix to vary between iterations, e.g. during the burn-in / adaptation phase of the sampling process we might want to adjust the parameters using statistics computed from these initial iterations. But information which can change between iterations should not go in the sampler itself! Instead, this information should go in the sampler state.\nThe sampler state should at the very least contain all the necessary information to perform the next MCMC iteration, but usually contains further information, e.g. quantities and statistics useful for evaluating whether the sampler has converged.\nWe will use the following sampler state for our MALA sampler:\n\nstruct MALAState{A<:AbstractVector{<:Real}}\n \"current position\"\n x::A\nend\n\nThis might seem overly redundant: we’re defining a type MALAState and it only contains a simple vector of reals. In this particular case we indeed could have dropped this and simply used a AbstractVector{<:Real} as our sampler state, but typically, as we will see later, one wants to include other quantities in the sampler state. For example, if we also wanted to adapt the parameters of our MALA, e.g. alter the stepsize depending on acceptance rates, in which case we should also put ϵ in the state, but for now we’ll keep things simple.\nMoreover, we also want a sample type, which is a type meant for “public consumption”, i.e. the end-user. This is generally going to contain a subset of the information present in the state. But in such a simple scenario as this, we similarly only have a AbstractVector{<:Real}:\n\nstruct MALASample{A<:AbstractVector{<:Real}}\n \"current position\"\n x::A\nend\n\nWe currently have three things:\n\nA AbstractMCMC.AbstractSampler implementation called MALA.\nA state MALAState for our sampler MALA.\nA sample MALASample for our sampler MALA.\n\nThat means that we’re ready to implement the only thing that really matters: AbstractMCMC.step.\nAbstractMCMC.step defines the MCMC iteration of our MALA given the current MALAState. Specifically, the signature of the function is as follows:\n\nfunction AbstractMCMC.step(\n # The RNG to ensure reproducibility.\n rng::Random.AbstractRNG,\n # The model that defines our target.\n model::AbstractMCMC.AbstractModel,\n # The sampler for which we're taking a `step`.\n sampler::AbstractMCMC.AbstractSampler,\n # The current sampler `state`.\n state;\n # Additional keyword arguments that we may or may not need.\n kwargs...\n)\n\nMoreover, there is a specific AbstractMCMC.AbstractModel which is used to indicate that the model that is provided implements the LogDensityProblems.jl interface: AbstractMCMC.LogDensityModel.\nSince, as we discussed earlier, in our case we’re indeed going to work with types that support the LogDensityProblems.jl interface, we’ll define AbstractMCMC.step for such a AbstractMCMC.LogDensityModel.\nNote that AbstractMCMC.LogDensityModel has no other purpose; it has a single field called logdensity, and it does nothing else. But by wrapping the model in AbstractMCMC.LogDensityModel, it allows samplers that want to work with LogDensityProblems.jl to define their AbstractMCMC.step on this type without running into method ambiguities.\nAll in all, that means that the signature for our AbstractMCMC.step is going to be the following:\n\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n # `LogDensityModel` so we know we're working with LogDensityProblems.jl model.\n model::AbstractMCMC.LogDensityModel,\n # Our sampler.\n sampler::MALA,\n # Our sampler state.\n state::MALAState;\n kwargs...\n)\n\nGreat! Now let’s actually implement the full AbstractMCMC.step for our MALA.\nLet’s remind ourselves what we’re going to do:\n\nSample a new momentum \\(p\\).\nCompute the log-density of the extended target \\(\\log \\bar{\\gamma}(x, p)\\).\nTake a single leapfrog step \\((\\tilde{x}, \\tilde{p}) = L_{\\epsilon}(x, p)\\).\nAccept or reject the proposed \\((\\tilde{x}, \\tilde{p})\\).\n\nAll in all, this results in the following:\n\nusing Random: Random\nusing Distributions # so we get the `MvNormal`\n\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n model_wrapper::AbstractMCMC.LogDensityModel,\n sampler::MALA,\n state::MALAState;\n kwargs...\n)\n # Extract the wrapped model which implements LogDensityProblems.jl.\n model = model_wrapper.logdensity\n # Let's just extract the sampler parameters to make our lives easier.\n ϵ = sampler.ϵ_init\n M = sampler.M_init\n # Extract the current parameters.\n x = state.x\n # Sample the momentum.\n p_dist = MvNormal(zeros(LogDensityProblems.dimension(model)), M)\n p = rand(rng, p_dist)\n # Propose using a single leapfrog step.\n x̃, p̃ = leapfrog_step(model, x, p, ϵ, M)\n # Accept or reject proposal.\n logp = LogDensityProblems.logdensity(model, x) + logpdf(p_dist, p)\n logp̃ = LogDensityProblems.logdensity(model, x̃) + logpdf(p_dist, p̃)\n logα = logp̃ - logp\n state_new = if log(rand(rng)) < logα\n # Accept.\n MALAState(x̃)\n else\n # Reject.\n MALAState(x)\n end\n # Return the \"sample\" and the sampler state.\n return MALASample(state_new.x), state_new\nend\n\nFairly straight-forward.\nOf course, we haven’t defined the leapfrog_step method yet, so let’s do that:\n\nfunction leapfrog_step(model, x, p, ϵ, M)\n # Update momentum `p` using \"position\" `x`.\n ∇logγ_x = last(LogDensityProblems.logdensity_and_gradient(model, x))\n p1 = p + (ϵ / 2) .* ∇logγ_x\n # Update the \"position\" `x` using momentum `p1`.\n x̃ = x + ϵ .* (M \\ p1)\n # Update momentum `p1` using position `x̃`\n ∇logγ_x̃ = last(LogDensityProblems.logdensity_and_gradient(model, x̃))\n p2 = p1 + (ϵ / 2) .* ∇logγ_x̃\n # Flip momentum `p2`.\n p̃ = -p2\n return x̃, p̃\nend\n\nleapfrog_step (generic function with 1 method)\n\n\nWith all of this, we’re technically ready to sample!\n\nusing Random, LinearAlgebra\n\nrng = Random.default_rng()\nsampler = MALA(1, I)\nstate = MALAState(zeros(LogDensityProblems.dimension(model)))\n\nx_next, state_next = AbstractMCMC.step(\n rng,\n AbstractMCMC.LogDensityModel(model),\n sampler,\n state\n)\n\n(MALASample{Vector{Float64}}([-0.014181186578520641, -0.4995799260361332, 1.6741711458091737]), MALAState{Vector{Float64}}([-0.014181186578520641, -0.4995799260361332, 1.6741711458091737]))\n\n\nGreat, it works!\nAnd I promised we would get quite some functionality for free if we implemented AbstractMCMC.step, and so we can now simply call sample to perform standard MCMC sampling:\n\n# Perform 1000 iterations with our `MALA` sampler.\nsamples = sample(model_with_grad, sampler, 10_000; initial_state=state, progress=false)\n# Concatenate into a matrix.\nsamples_matrix = stack(sample -> sample.x, samples)\n\n3×10000 Matrix{Float64}:\n -2.51762 -4.65618 -4.65618 … -6.83431 -6.46115 -6.46115\n -0.589032 -0.388995 -0.388995 -0.0333378 -0.0920903 -0.0920903\n 2.04763 4.20722 4.20722 5.69681 4.71154 4.71154\n\n\n\n# Compute the marginal means and standard deviations.\nhcat(mean(samples_matrix; dims=2), std(samples_matrix; dims=2))\n\n3×2 Matrix{Float64}:\n -5.01393 0.993994\n -0.00152073 0.988607\n 4.98185 1.00721\n\n\nLet’s visualize the samples\n\nusing StatsPlots\nplot(transpose(samples_matrix[:, 1:10:end]), alpha=0.5, legend=false)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLook at that! Things are working; amazin’.\nWe can also exploit AbstractMCMC.jl’s parallel sampling capabilities:\n\n# Run separate 4 chains for 10 000 iterations using threads to parallelize.\nnum_chains = 4\nsamples = sample(\n model_with_grad,\n sampler,\n MCMCThreads(),\n 10_000,\n num_chains;\n # Note we need to provide an initial state for every chain.\n initial_state=fill(state, num_chains),\n progress=false\n)\nsamples_array = stack(map(Base.Fix1(stack, sample -> sample.x), samples))\n\n3×10000×4 Array{Float64, 3}:\n[:, :, 1] =\n -2.46778 -4.10989 -4.66516 -4.19931 … -4.25622 -5.128 -5.128\n 0.160427 1.64987 0.414399 -0.0237751 0.364159 -1.0844 -1.0844\n 1.91749 3.37866 4.97705 5.99319 4.19123 5.09599 5.09599\n\n[:, :, 2] =\n -2.1967 -3.97957 -3.37711 -3.55371 … -6.35208 -4.92601 -6.07605\n 0.600203 -0.653079 0.113986 -1.15682 -0.215006 1.85535 0.297156\n 3.61576 5.32722 5.18044 4.76653 4.07807 3.43816 5.17154\n\n[:, :, 3] =\n -3.4197 -4.55159 -4.54857 … -5.32109 -5.32109 -4.17888\n -0.848826 -1.47657 0.187833 0.360103 0.360103 0.798691\n 0.842563 2.54597 2.88359 7.33507 7.33507 5.69377\n\n[:, :, 4] =\n -1.56173 -3.40162 -3.45972 -4.42636 … -5.07761 -5.07761 -4.69455\n -0.30085 0.745361 1.07527 1.38408 0.467855 0.467855 -0.0243259\n 2.12185 2.81538 4.39249 6.16617 4.31493 4.31493 4.80613\n\n\nBut the fact that we have to provide the AbstractMCMC.sample call, etc. with an initial_state to get started is a bit annoying. We can avoid this by also defining a AbstractMCMC.step without the state argument:\n\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n model_wrapper::AbstractMCMC.LogDensityModel,\n ::MALA;\n # NOTE: No state provided!\n kwargs...\n)\n model = model_wrapper.logdensity\n # Let's just create the initial state by sampling using a Gaussian.\n x = randn(rng, LogDensityProblems.dimension(model))\n\n return MALASample(x), MALAState(x)\nend\n\nEquipped with this, we no longer need to provide the initial_state everywhere:\n\nsamples = sample(model_with_grad, sampler, 10_000; progress=false)\nsamples_matrix = stack(sample -> sample.x, samples)\nhcat(mean(samples_matrix; dims=2), std(samples_matrix; dims=2))\n\n3×2 Matrix{Float64}:\n -5.0088 1.02237\n 0.00582071 1.00677\n 4.97753 1.01923", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#using-our-sampler-with-turing.jl", "href": "developers/inference/implementing-samplers/index.html#using-our-sampler-with-turing.jl", "title": "Implementing Samplers", "section": "Using our sampler with Turing.jl", "text": "Using our sampler with Turing.jl\nAs we promised, all of this hassle of implementing our MALA sampler in a way that uses LogDensityProblems.jl and AbstractMCMC.jl gets us something more than just an “automatic” implementation of AbstractMCMC.sample.\nIt also enables use with Turing.jl through the externalsampler, but we need to do one final thing first: we need to tell Turing.jl how to extract a vector of parameters from the “sample” returned in our implementation of AbstractMCMC.step. In our case, the “sample” is a MALASample, so we just need the following line:\n\n# Load Turing.jl.\nusing Turing\n\n# Overload the `getparams` method for our \"sample\" type, which is just a vector.\nTuring.Inference.getparams(::Turing.Model, sample::MALASample) = sample.x\n\nAnd with that, we’re good to go!\n\n# Our previous model defined as a Turing.jl model.\n@model mvnormal_model() = x ~ MvNormal([-5., 0., 5.], I)\n# Instantiate our model.\nturing_model = mvnormal_model()\n# Call `sample` but now we're passing in a Turing.jl `model` and wrapping\n# our `MALA` sampler in the `externalsampler` to tell Turing.jl that the sampler\n# expects something that implements LogDensityProblems.jl.\nchain = sample(turing_model, externalsampler(sampler), 10_000; progress=false)\n\nChains MCMC chain (10000×4×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 3.0 seconds\nCompute duration = 3.0 seconds\nparameters = x[1], x[2], x[3]\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n x[1] -5.0362 0.9956 0.0180 3069.9868 5217.8566 1.0020 ⋯\n x[2] -0.0261 1.0025 0.0174 3326.8189 5590.9041 0.9999 ⋯\n x[3] 5.0228 1.0053 0.0175 3312.2689 4905.5010 1.0011 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n x[1] -7.0081 -5.7048 -5.0288 -4.3561 -3.1000\n x[2] -1.9832 -0.7082 -0.0242 0.6625 1.9035\n x[3] 3.0113 4.3597 5.0372 5.6895 6.9795\n\n\nPretty neat, eh?\n\nModels with constrained parameters\nOne thing we’ve sort of glossed over in all of the above is that MALA, at least how we’ve implemented it, requires \\(x\\) to live in \\(\\mathbb{R}^d\\) for some \\(d > 0\\). If some of the parameters were in fact constrained, e.g. we were working with a Beta distribution which has support on the interval \\((0, 1)\\), not on \\(\\mathbb{R}^d\\), we could easily end up outside of the valid range \\((0, 1)\\).\n\n@model beta_model() = x ~ Beta(3, 3)\nturing_model = beta_model()\nchain = sample(turing_model, externalsampler(sampler), 10_000; progress=false)\n\nChains MCMC chain (10000×2×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 1.93 seconds\nCompute duration = 1.93 seconds\nparameters = x\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n x 0.4950 0.1866 0.0028 4496.8823 6103.2233 1.0001 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n x 0.1457 0.3575 0.4969 0.6322 0.8407\n\n\nYep, that still works, but only because Turing.jl actually transforms the turing_model from constrained to unconstrained, so that the sampler provided to externalsampler is actually always working in unconstrained space! This is not always desirable, so we can turn this off:\n\nchain = sample(turing_model, externalsampler(sampler; unconstrained=false), 10_000; progress=false)\n\nChains MCMC chain (10000×2×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 0.18 seconds\nCompute duration = 0.18 seconds\nparameters = x\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n x 0.5470 0.1712 0.0172 82.3328 47.8154 1.0023 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n x 0.2742 0.4029 0.5504 0.7142 0.7889\n\n\nThe fun thing is that this still sort of works because\n\nlogpdf(Beta(3, 3), 10.0)\n\n-Inf\n\n\nand so the samples that fall outside of the range are always rejected. But do notice how much worse all the diagnostics are, e.g. ess_tail is very poor compared to when we use unconstrained=true. Moreover, in more complex cases this won’t just result in a “nice” -Inf log-density value, but instead will error:\n\n@model function demo()\n σ² ~ truncated(Normal(), lower=0)\n # If we end up with negative values for `σ²`, the `Normal` will error.\n x ~ Normal(0, σ²)\nend\nsample(demo(), externalsampler(sampler; unconstrained=false), 10_000; progress=false)\n\nDomainError: DomainError(-0.9822264302135183, \"Normal: the condition σ >= zero(σ) is not satisfied.\")\nDomainError with -0.9822264302135183:\nNormal: the condition σ >= zero(σ) is not satisfied.\nStacktrace:\n [1] #371\n @ ~/.julia/packages/Distributions/fi8Qd/src/univariate/continuous/normal.jl:37 [inlined]\n [2] check_args\n @ ~/.julia/packages/Distributions/fi8Qd/src/utils.jl:89 [inlined]\n [3] #Normal#370\n @ ~/.julia/packages/Distributions/fi8Qd/src/univariate/continuous/normal.jl:37 [inlined]\n [4] Normal\n @ ~/.julia/packages/Distributions/fi8Qd/src/univariate/continuous/normal.jl:36 [inlined]\n [5] Normal\n @ ~/.julia/packages/Distributions/fi8Qd/src/univariate/continuous/normal.jl:42 [inlined]\n [6] macro expansion\n @ ~/.julia/packages/DynamicPPL/DxBhI/src/compiler.jl:574 [inlined]\n [7] demo(__model__::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, __varinfo__::DynamicPPL.ThreadSafeVarInfo{DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{Float64}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}}}, Float64}, Vector{Base.RefValue{Float64}}}, __context__::DynamicPPL.ValuesAsInModelContext{DynamicPPL.DefaultContext})\n @ Main.Notebook ~/work/docs/docs/developers/inference/implementing-samplers/index.qmd:595\n [8] _evaluate!!\n @ ~/.julia/packages/DynamicPPL/DxBhI/src/model.jl:936 [inlined]\n [9] evaluate_threadsafe!!(model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, varinfo::DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{Float64}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}}}, Float64}, context::DynamicPPL.ValuesAsInModelContext{DynamicPPL.DefaultContext})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/DxBhI/src/model.jl:925\n [10] evaluate!!\n @ ~/.julia/packages/DynamicPPL/DxBhI/src/model.jl:855 [inlined]\n [11] values_as_in_model\n @ ~/.julia/packages/DynamicPPL/DxBhI/src/values_as_in_model.jl:173 [inlined]\n [12] values_as_in_model\n @ ~/.julia/packages/DynamicPPL/DxBhI/src/values_as_in_model.jl:172 [inlined]\n [13] getparams(model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, vi::DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{Float64}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}}}, Float64})\n @ Turing.Inference ~/.julia/packages/Turing/IXycm/src/mcmc/Inference.jl:354\n [14] Turing.Inference.Transition(model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, vi::DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{Float64}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}}}, Float64}, t::MALASample{Vector{Float64}})\n @ Turing.Inference ~/.julia/packages/Turing/IXycm/src/mcmc/Inference.jl:231\n [15] transition_to_turing(f::LogDensityFunction{DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{Float64}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}}}, Float64}, DynamicPPL.DefaultContext, AutoForwardDiff{2, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}}, transition::MALASample{Vector{Float64}})\n @ Turing.Inference ~/.julia/packages/Turing/IXycm/src/mcmc/abstractmcmc.jl:12\n [16] step(rng::TaskLocalRNG, model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, sampler_wrapper::DynamicPPL.Sampler{Turing.Inference.ExternalSampler{MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}, false}}; initial_state::Nothing, initial_params::Nothing, kwargs::@Kwargs{})\n @ Turing.Inference ~/.julia/packages/Turing/IXycm/src/mcmc/abstractmcmc.jl:77\n [17] macro expansion\n @ ~/.julia/packages/AbstractMCMC/FSyVk/src/sample.jl:0 [inlined]\n [18] macro expansion\n @ ~/.julia/packages/AbstractMCMC/FSyVk/src/logging.jl:16 [inlined]\n [19] mcmcsample(rng::TaskLocalRNG, model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, sampler::DynamicPPL.Sampler{Turing.Inference.ExternalSampler{MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}, false}}, N::Int64; progress::Bool, progressname::String, callback::Nothing, num_warmup::Int64, discard_initial::Int64, thinning::Int64, chain_type::Type, initial_state::Nothing, kwargs::@Kwargs{})\n @ AbstractMCMC ~/.julia/packages/AbstractMCMC/FSyVk/src/sample.jl:142\n [20] sample(rng::TaskLocalRNG, model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, sampler::DynamicPPL.Sampler{Turing.Inference.ExternalSampler{MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}, false}}, N::Int64; chain_type::Type, resume_from::Nothing, initial_state::Nothing, kwargs::@Kwargs{progress::Bool})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/DxBhI/src/sampler.jl:102\n [21] sample\n @ ~/.julia/packages/DynamicPPL/DxBhI/src/sampler.jl:92 [inlined]\n [22] #sample#8\n @ ~/.julia/packages/Turing/IXycm/src/mcmc/Inference.jl:278 [inlined]\n [23] sample\n @ ~/.julia/packages/Turing/IXycm/src/mcmc/Inference.jl:269 [inlined]\n [24] #sample#7\n @ ~/.julia/packages/Turing/IXycm/src/mcmc/Inference.jl:266 [inlined]\n [25] top-level scope\n @ ~/work/docs/docs/developers/inference/implementing-samplers/index.qmd:597\n\n\nAs expected, we run into a DomainError at some point, while if we set unconstrained=true, letting Turing.jl transform the model to a unconstrained form behind the scenes, everything works as expected:\n\nsample(demo(), externalsampler(sampler; unconstrained=true), 10_000; progress=false)\n\nChains MCMC chain (10000×3×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 1.98 seconds\nCompute duration = 1.98 seconds\nparameters = σ², x\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n σ² 0.6750 0.5638 0.0751 61.9247 103.8590 1.1232 ⋯\n x -0.2348 1.0370 0.0990 95.5229 254.4186 1.0532 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n σ² 0.1468 0.2488 0.4233 0.9967 2.1376\n x -1.8574 -0.7418 -0.4658 0.1008 2.2114\n\n\nNeat!\nSimilarly, which automatic differentiation backend one should use can be specified through the adtype keyword argument too. For example, if we want to use ReverseDiff.jl instead of the default ForwardDiff.jl:\n\nusing ReverseDiff: ReverseDiff\n# Specify that we want to use `AutoReverseDiff`.\nsample(\n demo(),\n externalsampler(sampler; unconstrained=true, adtype=AutoReverseDiff()),\n 10_000;\n progress=false\n)\n\nChains MCMC chain (10000×3×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 2.82 seconds\nCompute duration = 2.82 seconds\nparameters = σ², x\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n σ² 0.8363 0.5832 0.0302 271.8479 118.1400 1.0083 ⋯\n x -0.0223 1.0545 0.0718 300.6832 110.7829 1.0032 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n σ² 0.1322 0.3380 0.7190 1.1940 2.2021\n x -2.3721 -0.4056 -0.0202 0.3788 2.6245\n\n\nDouble-neat.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#summary", "href": "developers/inference/implementing-samplers/index.html#summary", "title": "Implementing Samplers", "section": "Summary", "text": "Summary\nAt this point it’s worth maybe reminding ourselves what we did and also why we did it:\n\nWe define our models in the LogDensityProblems.jl interface because it makes the sampler agnostic to how the underlying model is implemented.\nWe implement our sampler in the AbstractMCMC.jl interface, which just means that our sampler is a subtype of AbstractMCMC.AbstractSampler and we implement the MCMC transition in AbstractMCMC.step.\nPoints 1 and 2 makes it so our sampler can be used with a wide range of model implementations, amongst them being models implemented in both Turing.jl and Stan. This gives you, the inference implementer, a large collection of models to test your inference method on, in addition to allowing users of Turing.jl and Stan to try out your inference method with minimal effort.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#footnotes", "href": "developers/inference/implementing-samplers/index.html#footnotes", "title": "Implementing Samplers", "section": "Footnotes", "text": "Footnotes\n\n\nWe’re going with the leapfrog formulation because in a future version of this tutorial we’ll add a section extending this simple “baseline” MALA sampler to more complex versions. See issue #479 for progress on this.↩︎\nThere is no such thing as a proper interface in Julia (at least not officially), and so we use the word “interface” here to mean a few minimal methods that needs to be implemented by any type that we treat as a target model.↩︎", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Implementing Samplers" ] }, { "objectID": "developers/compiler/minituring-contexts/index.html", "href": "developers/compiler/minituring-contexts/index.html", "title": "A Mini Turing Implementation II: Contexts", "section": "", "text": "In the Mini Turing tutorial we developed a miniature version of the Turing language, to illustrate its core design. A passing mention was made of contexts. In this tutorial we develop that aspect of our mini Turing language further to demonstrate how and why contexts are an important part of Turing’s design.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "A Mini Turing Implementation II: Contexts" ] }, { "objectID": "developers/compiler/minituring-contexts/index.html#contexts-within-contexts", "href": "developers/compiler/minituring-contexts/index.html#contexts-within-contexts", "title": "A Mini Turing Implementation II: Contexts", "section": "Contexts within contexts", "text": "Contexts within contexts\nLet’s use the above two contexts to provide a slightly more general definition of the SamplingContext and the Metropolis-Hastings sampler we wrote in the mini Turing tutorial.\n\nstruct SamplingContext{S<:AbstractMCMC.AbstractSampler,R<:Random.AbstractRNG}\n rng::R\n sampler::S\n subcontext::Union{PriorContext, JointContext}\nend\n\nThe new aspect here is the subcontext field. Note that this is a context within a context! The idea is that we don’t need to hard code how the MCMC sampler evaluates the log probability, but rather can pass that work onto the subcontext. This way the same sampler can be used to sample from either the joint or the prior distribution.\nThe methods for SamplingContext are largely as in the our earlier mini Turing case, except they now pass some of the work onto the subcontext:\n\nfunction observe(context::SamplingContext, args...)\n # Sampling doesn't affect the observed values, so nothing to do here other than pass to\n # the subcontext.\n return observe(context.subcontext, args...)\nend\n\nstruct PriorSampler <: AbstractMCMC.AbstractSampler end\n\nfunction assume(context::SamplingContext{PriorSampler}, varinfo, dist, var_id)\n sample = Random.rand(context.rng, dist)\n varinfo[var_id] = (sample, NaN)\n # Once the value has been sampled, let the subcontext handle evaluating the log\n # probability.\n return assume(context.subcontext, varinfo, dist, var_id)\nend;\n\n# The subcontext field of the MHSampler determines which distribution this sampler\n# samples from.\nstruct MHSampler{D, T<:Real} <: AbstractMCMC.AbstractSampler\n sigma::T\n subcontext::D\nend\n\nMHSampler(subcontext) = MHSampler(1, subcontext)\n\nfunction assume(context::SamplingContext{<:MHSampler}, varinfo, dist, var_id)\n sampler = context.sampler\n old_value = varinfo.values[var_id]\n\n # propose a random-walk step, i.e, add the current value to a random \n # value sampled from a Normal distribution centered at 0\n value = rand(context.rng, Normal(old_value, sampler.sigma))\n varinfo[var_id] = (value, NaN)\n # Once the value has been sampled, let the subcontext handle evaluating the log\n # probability.\n return assume(context.subcontext, varinfo, dist, var_id)\nend;\n\n# The following three methods are identical to before, except for passing\n# `sampler.subcontext` to the context SamplingContext.\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG, model::MiniModel, sampler::MHSampler; kwargs...\n)\n vi = VarInfo()\n ctx = SamplingContext(rng, PriorSampler(), sampler.subcontext)\n model.f(vi, ctx, values(model.data)...)\n return vi, vi\nend\n\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n model::MiniModel,\n sampler::MHSampler,\n prev_state::VarInfo; # is just the old trace\n kwargs...,\n)\n vi = prev_state\n new_vi = deepcopy(vi)\n ctx = SamplingContext(rng, sampler, sampler.subcontext)\n model.f(new_vi, ctx, values(model.data)...)\n\n # Compute log acceptance probability\n # Since the proposal is symmetric the computation can be simplified\n logα = sum(values(new_vi.logps)) - sum(values(vi.logps))\n\n # Accept proposal with computed acceptance probability\n if -Random.randexp(rng) < logα\n return new_vi, new_vi\n else\n return prev_state, prev_state\n end\nend;\n\nfunction AbstractMCMC.bundle_samples(\n samples, model::MiniModel, ::MHSampler, ::Any, ::Type{Chains}; kwargs...\n)\n # We get a vector of traces\n values = [sample.values for sample in samples]\n params = [key for key in keys(values[1]) if key ∉ keys(model.data)]\n vals = reduce(hcat, [value[p] for value in values] for p in params)\n # Composing the `Chains` data-structure, of which analyzing infrastructure is provided\n chains = Chains(vals, params)\n return chains\nend;\n\nWe can use this to sample from the joint distribution just like before:\n\nsample(MiniModel(m, (x=3.0,)), MHSampler(JointContext()), 1_000_000; chain_type=Chains, progress=false)\n\nChains MCMC chain (1000000×2×1 Array{Float64, 3}):\n\nIterations = 1:1:1000000\nNumber of chains = 1\nSamples per chain = 1000000\nparameters = a, b\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rh ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float ⋯\n\n a 0.9773 0.8979 0.0032 79716.2245 119350.8088 1.00 ⋯\n b 2.8801 0.4880 0.0012 172647.8763 215439.4075 1.00 ⋯\n 2 columns omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n a -0.7841 0.3732 0.9769 1.5840 2.7355\n b 1.9261 2.5510 2.8815 3.2087 3.8399\n\n\nor we can choose to sample from the prior instead\n\nsample(MiniModel(m, (x=3.0,)), MHSampler(PriorContext()), 1_000_000; chain_type=Chains, progress=false)\n\nChains MCMC chain (1000000×2×1 Array{Float64, 3}):\n\nIterations = 1:1:1000000\nNumber of chains = 1\nSamples per chain = 1000000\nparameters = a, b\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rha ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float6 ⋯\n\n a 0.4951 0.9976 0.0039 64163.7201 127802.6530 1.000 ⋯\n b 0.4853 2.2301 0.0135 27098.4255 53246.7333 1.000 ⋯\n 2 columns omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n a -1.4596 -0.1791 0.4969 1.1702 2.4479\n b -3.9056 -1.0234 0.4903 1.9962 4.8402\n\n\nOf course, using an MCMC algorithm to sample from the prior is unnecessary and silly (PriorSampler exists, after all), but the point is to illustrate the flexibility of the context system. We could, for instance, use the same setup to implement an Approximate Bayesian Computation (ABC) algorithm.\nThe use of contexts also goes far beyond just evaluating log probabilities and sampling. Some examples from Turing are\n\nFixedContext, which fixes some variables to given values and removes them completely from the evaluation of any log probabilities. They power the Turing.fix and Turing.unfix functions.\nConditionContext conditions the model on fixed values for some parameters. They are used by Turing.condition and Turing.uncondition, i.e. the model | (parameter=value,) syntax. The difference between fix and condition is whether the log probability for the corresponding variable is included in the overall log density.\nPriorExtractorContext collects information about what the prior distribution of each variable is.\nPrefixContext adds prefixes to variable names, allowing models to be used within other models without variable name collisions.\nPointwiseLikelihoodContext records the log likelihood of each individual variable.\nDebugContext collects useful debugging information while executing the model.\n\nAll of the above are what Turing calls parent contexts, which is to say that they all keep a subcontext just like our above SamplingContext did. Their implementations of assume and observe call the implementation of the subcontext once they are done doing their own work of fixing/conditioning/prefixing/etc. Contexts are often chained, so that e.g. a DebugContext may wrap within it a PrefixContext, which may in turn wrap a ConditionContext, etc. The only contexts that don’t have a subcontext in the Turing are the ones for evaluating the prior, likelihood, and joint distributions. These are called leaf contexts.\nThe above version of mini Turing is still much simpler than the full Turing language, but the principles of how contexts are used are the same.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "A Mini Turing Implementation II: Contexts" ] }, { "objectID": "developers/compiler/minituring-compiler/index.html", "href": "developers/compiler/minituring-compiler/index.html", "title": "A Mini Turing Implementation I: Compiler", "section": "", "text": "In this tutorial we develop a very simple probabilistic programming language. The implementation is similar to DynamicPPL. This is intentional as we want to demonstrate some key ideas from Turing’s internal implementation.\nTo make things easy to understand and to implement we restrict our language to a very simple subset of the language that Turing actually supports. Defining an accurate syntax description is not our goal here, instead, we give a simple example and all similar programs should work.\n\nConsider a probabilistic model defined by\n\\[\n\\begin{aligned}\na &\\sim \\operatorname{Normal}(0.5, 1^2) \\\\\nb &\\sim \\operatorname{Normal}(a, 2^2) \\\\\nx &\\sim \\operatorname{Normal}(b, 0.5^2)\n\\end{aligned}\n\\]\nWe assume that x is data, i.e., an observed variable. In our small language this model will be defined as\n\n@mini_model function m(x)\n a ~ Normal(0.5, 1)\n b ~ Normal(a, 2)\n x ~ Normal(b, 0.5)\n return nothing\nend\n\nSpecifically, we demand that\n\nall observed variables are arguments of the program,\nthe model definition does not contain any control flow,\nall variables are scalars, and\nthe function returns nothing.\n\nFirst, we import some required packages:\n\nusing MacroTools, Distributions, Random, AbstractMCMC, MCMCChains\n\nBefore getting to the actual “compiler”, we first build the data structure for the program trace. A program trace for a probabilistic programming language needs to at least record the values of stochastic variables and their log-probabilities.\n\nstruct VarInfo{V,L}\n values::V\n logps::L\nend\n\nVarInfo() = VarInfo(Dict{Symbol,Float64}(), Dict{Symbol,Float64}())\n\nfunction Base.setindex!(varinfo::VarInfo, (value, logp), var_id)\n varinfo.values[var_id] = value\n varinfo.logps[var_id] = logp\n return varinfo\nend\n\nInternally, our probabilistic programming language works with two main functions:\n\nassume for sampling unobserved variables and computing their log-probabilities, and\nobserve for computing log-probabilities of observed variables (but not sampling them).\n\nFor different inference algorithms we may have to use different sampling procedures and different log-probability computations. For instance, in some cases we might want to sample all variables from their prior distributions and in other cases we might only want to compute the log-likelihood of the observations based on a given set of values for the unobserved variables. Thus depending on the inference algorithm we want to use different assume and observe implementations. We can achieve this by providing this context information as a function argument to assume and observe.\nNote: Although the context system in this tutorial is inspired by DynamicPPL, it is very simplistic. We expand this mini Turing example in the contexts tutorial with some more complexity, to illustrate how and why contexts are central to Turing’s design. For the full details one still needs to go to the actual source of DynamicPPL though.\nHere we can see the implementation of a sampler that draws values of unobserved variables from the prior and computes the log-probability for every variable.\n\nstruct SamplingContext{S<:AbstractMCMC.AbstractSampler,R<:Random.AbstractRNG}\n rng::R\n sampler::S\nend\n\nstruct PriorSampler <: AbstractMCMC.AbstractSampler end\n\nfunction observe(context::SamplingContext, varinfo, dist, var_id, var_value)\n logp = logpdf(dist, var_value)\n varinfo[var_id] = (var_value, logp)\n return nothing\nend\n\nfunction assume(context::SamplingContext{PriorSampler}, varinfo, dist, var_id)\n sample = Random.rand(context.rng, dist)\n logp = logpdf(dist, sample)\n varinfo[var_id] = (sample, logp)\n return sample\nend;\n\nNext we define the “compiler” for our simple programming language. The term compiler is actually a bit misleading here since its only purpose is to transform the function definition in the @mini_model macro by\n\nadding the context information (context) and the tracing data structure (varinfo) as additional arguments, and\nreplacing tildes with calls to assume and observe.\n\nAfterwards, as usual the Julia compiler will just-in-time compile the model function when it is called.\nThe manipulation of Julia expressions is an advanced part of the Julia language. The Julia documentation provides an introduction to and more details about this so-called metaprogramming.\n\nmacro mini_model(expr)\n return esc(mini_model(expr))\nend\n\nfunction mini_model(expr)\n # Split the function definition into a dictionary with its name, arguments, body etc.\n def = MacroTools.splitdef(expr)\n\n # Replace tildes in the function body with calls to `assume` or `observe`\n def[:body] = MacroTools.postwalk(def[:body]) do sub_expr\n if MacroTools.@capture(sub_expr, var_ ~ dist_)\n if var in def[:args]\n # If the variable is an argument of the model function, it is observed\n return :($(observe)(context, varinfo, $dist, $(Meta.quot(var)), $var))\n else\n # Otherwise it is unobserved\n return :($var = $(assume)(context, varinfo, $dist, $(Meta.quot(var))))\n end\n else\n return sub_expr\n end\n end\n\n # Add `context` and `varinfo` arguments to the model function\n def[:args] = vcat(:varinfo, :context, def[:args])\n\n # Reassemble the function definition from its name, arguments, body etc.\n return MacroTools.combinedef(def)\nend;\n\nFor inference, we make use of the AbstractMCMC interface. It provides a default implementation of a sample function for sampling a Markov chain. The default implementation already supports e.g. sampling of multiple chains in parallel, thinning of samples, or discarding initial samples.\nThe AbstractMCMC interface requires us to at least\n\ndefine a model that is a subtype of AbstractMCMC.AbstractModel,\ndefine a sampler that is a subtype of AbstractMCMC.AbstractSampler,\nimplement AbstractMCMC.step for our model and sampler.\n\nThus here we define a MiniModel model. In this model we store the model function and the observed data.\n\nstruct MiniModel{F,D} <: AbstractMCMC.AbstractModel\n f::F\n data::D # a NamedTuple of all the data\nend\n\nIn the Turing compiler, the model-specific DynamicPPL.Model is constructed automatically when calling the model function. But for the sake of simplicity here we construct the model manually.\nTo illustrate probabilistic inference with our mini language we implement an extremely simplistic Random-Walk Metropolis-Hastings sampler. We hard-code the proposal step as part of the sampler and only allow normal distributions with zero mean and fixed standard deviation. The Metropolis-Hastings sampler in Turing is more flexible.\n\nstruct MHSampler{T<:Real} <: AbstractMCMC.AbstractSampler\n sigma::T\nend\n\nMHSampler() = MHSampler(1)\n\nfunction assume(context::SamplingContext{<:MHSampler}, varinfo, dist, var_id)\n sampler = context.sampler\n old_value = varinfo.values[var_id]\n\n # propose a random-walk step, i.e, add the current value to a random\n # value sampled from a Normal distribution centered at 0\n value = rand(context.rng, Normal(old_value, sampler.sigma))\n logp = Distributions.logpdf(dist, value)\n varinfo[var_id] = (value, logp)\n\n return value\nend;\n\nWe need to define two step functions, one for the first step and the other for the following steps. In the first step we sample values from the prior distributions and in the following steps we sample with the random-walk proposal. The two functions are identified by the different arguments they take.\n\n# The fist step: Sampling from the prior distributions\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG, model::MiniModel, sampler::MHSampler; kwargs...\n)\n vi = VarInfo()\n ctx = SamplingContext(rng, PriorSampler())\n model.f(vi, ctx, values(model.data)...)\n return vi, vi\nend\n\n# The following steps: Sampling with random-walk proposal\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n model::MiniModel,\n sampler::MHSampler,\n prev_state::VarInfo; # is just the old trace\n kwargs...,\n)\n vi = prev_state\n new_vi = deepcopy(vi)\n ctx = SamplingContext(rng, sampler)\n model.f(new_vi, ctx, values(model.data)...)\n\n # Compute log acceptance probability\n # Since the proposal is symmetric the computation can be simplified\n logα = sum(values(new_vi.logps)) - sum(values(vi.logps))\n\n # Accept proposal with computed acceptance probability\n if -randexp(rng) < logα\n return new_vi, new_vi\n else\n return prev_state, prev_state\n end\nend;\n\nTo make it easier to analyze the samples and compare them with results from Turing, additionally we define a version of AbstractMCMC.bundle_samples for our model and sampler that returns a MCMCChains.Chains object of samples.\n\nfunction AbstractMCMC.bundle_samples(\n samples, model::MiniModel, ::MHSampler, ::Any, ::Type{Chains}; kwargs...\n)\n # We get a vector of traces\n values = [sample.values for sample in samples]\n params = [key for key in keys(values[1]) if key ∉ keys(model.data)]\n vals = reduce(hcat, [value[p] for value in values] for p in params)\n # Composing the `Chains` data-structure, of which analyzing infrastructure is provided\n chains = Chains(vals, params)\n return chains\nend;\n\nLet us check how our mini probabilistic programming language works. We define the probabilistic model:\n\n@mini_model function m(x)\n a ~ Normal(0.5, 1)\n b ~ Normal(a, 2)\n x ~ Normal(b, 0.5)\n return nothing\nend;\n\nWe perform inference with data x = 3.0:\n\nsample(MiniModel(m, (x=3.0,)), MHSampler(), 1_000_000; chain_type=Chains, progress=false)\n\nChains MCMC chain (1000000×2×1 Array{Float64, 3}):\n\nIterations = 1:1:1000000\nNumber of chains = 1\nSamples per chain = 1000000\nparameters = a, b\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rh ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float ⋯\n\n a 0.9767 0.8969 0.0031 82477.4089 124151.3914 1.00 ⋯\n b 2.8812 0.4876 0.0012 173693.2110 215974.5794 1.00 ⋯\n 2 columns omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n a -0.7798 0.3735 0.9748 1.5805 2.7428\n b 1.9231 2.5532 2.8800 3.2096 3.8391\n\n\nWe compare these results with Turing.\n\nusing Turing\nusing PDMats\n\n@model function turing_m(x)\n a ~ Normal(0.5, 1)\n b ~ Normal(a, 2)\n x ~ Normal(b, 0.5)\n return nothing\nend\n\nsample(turing_m(3.0), MH(ScalMat(2, 1.0)), 1_000_000, progress=false)\n\nChains MCMC chain (1000000×3×1 Array{Float64, 3}):\n\nIterations = 1:1:1000000\nNumber of chains = 1\nSamples per chain = 1000000\nWall duration = 25.3 seconds\nCompute duration = 25.3 seconds\nparameters = a, b\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rh ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float ⋯\n\n a 0.9772 0.8996 0.0032 80881.5473 124110.2138 1.00 ⋯\n b 2.8825 0.4891 0.0012 171047.4469 217271.5244 1.00 ⋯\n 2 columns omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n a -0.7848 0.3687 0.9784 1.5859 2.7335\n b 1.9224 2.5530 2.8822 3.2129 3.8396\n\n\nAs you can see, with our simple probabilistic programming language and custom samplers we get similar results as Turing.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "A Mini Turing Implementation I: Compiler" ] }, { "objectID": "core-functionality/index.html", "href": "core-functionality/index.html", "title": "Core Functionality", "section": "", "text": "This article provides an overview of the core functionality in Turing.jl, which are likely to be used across a wide range of models.", "crumbs": [ "Get Started", "Core Functionality" ] }, { "objectID": "core-functionality/index.html#basics", "href": "core-functionality/index.html#basics", "title": "Core Functionality", "section": "Basics", "text": "Basics\n\nIntroduction\nA probabilistic program is Julia code wrapped in a @model macro. It can use arbitrary Julia code, but to ensure correctness of inference it should not have external effects or modify global state. Stack-allocated variables are safe, but mutable heap-allocated objects may lead to subtle bugs when using task copying. By default Libtask deepcopies Array and Dict objects when copying task to avoid bugs with data stored in mutable structure in Turing models.\nTo specify distributions of random variables, Turing programs should use the ~ notation:\nx ~ distr where x is a symbol and distr is a distribution. If x is undefined in the model function, inside the probabilistic program, this puts a random variable named x, distributed according to distr, in the current scope. distr can be a value of any type that implements rand(distr), which samples a value from the distribution distr. If x is defined, this is used for conditioning in a style similar to Anglican (another PPL). In this case, x is an observed value, assumed to have been drawn from the distribution distr. The likelihood is computed using logpdf(distr,y). The observe statements should be arranged so that every possible run traverses all of them in exactly the same order. This is equivalent to demanding that they are not placed inside stochastic control flow.\nAvailable inference methods include Importance Sampling (IS), Sequential Monte Carlo (SMC), Particle Gibbs (PG), Hamiltonian Monte Carlo (HMC), Hamiltonian Monte Carlo with Dual Averaging (HMCDA) and The No-U-Turn Sampler (NUTS).\n\n\nSimple Gaussian Demo\nBelow is a simple Gaussian demo illustrate the basic usage of Turing.jl.\n\n# Import packages.\nusing Turing\nusing StatsPlots\n\n# Define a simple Normal model with unknown mean and variance.\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n return y ~ Normal(m, sqrt(s²))\nend\n\ngdemo (generic function with 2 methods)\n\n\nNote: As a sanity check, the prior expectation of s² is mean(InverseGamma(2, 3)) = 3/(2 - 1) = 3 and the prior expectation of m is 0. This can be easily checked using Prior:\n\nsetprogress!(false)\n\n\np1 = sample(gdemo(missing, missing), Prior(), 100000)\n\nChains MCMC chain (100000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:100000\nNumber of chains = 1\nSamples per chain = 100000\nWall duration = 1.14 seconds\nCompute duration = 1.14 seconds\nparameters = s², m, x, y\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rha ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float6 ⋯\n\n s² 3.0035 5.9215 0.0187 100395.5883 99089.6699 1.000 ⋯\n m -0.0073 1.7329 0.0054 101651.0798 95280.3028 1.000 ⋯\n x -0.0004 2.4576 0.0077 101138.0695 99666.6524 1.000 ⋯\n y -0.0048 2.4322 0.0077 100500.6795 99197.9677 1.000 ⋯\n 2 columns omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.5394 1.1143 1.7886 3.1116 12.5982\n m -3.4321 -0.9144 -0.0024 0.9065 3.3806\n x -4.8241 -1.2922 -0.0010 1.2879 4.7971\n y -4.8173 -1.2851 0.0050 1.2758 4.7671\n\n\nWe can perform inference by using the sample function, the first argument of which is our probabilistic program and the second of which is a sampler.\n\n# Run sampler, collect results.\nc1 = sample(gdemo(1.5, 2), SMC(), 1000)\nc2 = sample(gdemo(1.5, 2), PG(10), 1000)\nc3 = sample(gdemo(1.5, 2), HMC(0.1, 5), 1000)\nc4 = sample(gdemo(1.5, 2), Gibbs(:m => PG(10), :s² => HMC(0.1, 5)), 1000)\nc5 = sample(gdemo(1.5, 2), HMCDA(0.15, 0.65), 1000)\nc6 = sample(gdemo(1.5, 2), NUTS(0.65), 1000)\n\n┌ Info: Found initial step size\n└ ϵ = 1.6\n┌ Info: Found initial step size\n└ ϵ = 1.6\n\n\nChains MCMC chain (1000×14×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 1.77 seconds\nCompute duration = 1.77 seconds\nparameters = s², m\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s² 2.1643 2.4692 0.1514 382.8233 367.1666 1.0017 ⋯\n m 1.1201 0.8528 0.0508 300.5970 225.0328 1.0030 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.5619 1.0355 1.6164 2.4920 7.2330\n m -0.5799 0.5888 1.1024 1.6522 2.7382\n\n\nThe arguments for each sampler are:\n\nSMC: number of particles.\nPG: number of particles, number of iterations.\nHMC: leapfrog step size, leapfrog step numbers.\nGibbs: component sampler 1, component sampler 2, …\nHMCDA: total leapfrog length, target accept ratio.\nNUTS: number of adaptation steps (optional), target accept ratio.\n\nMore information about each sampler can be found in Turing.jl’s API docs.\nThe MCMCChains module (which is re-exported by Turing) provides plotting tools for the Chain objects returned by a sample function. See the MCMCChains repository for more information on the suite of tools available for diagnosing MCMC chains.\n\n# Summarise results\ndescribe(c3)\n\n# Plot results\nplot(c3)\nsavefig(\"gdemo-plot.png\")\n\n\n\nModelling Syntax Explained\nUsing this syntax, a probabilistic model is defined in Turing. The model function generated by Turing can then be used to condition the model onto data. Subsequently, the sample function can be used to generate samples from the posterior distribution.\nIn the following example, the defined model is conditioned to the data (arg1 = 1, arg2 = 2) by passing (1, 2) to the model function.\n\n@model function model_name(arg_1, arg_2)\n return ...\nend\n\nThe conditioned model can then be passed onto the sample function to run posterior inference.\n\nmodel_func = model_name(1, 2)\nchn = sample(model_func, HMC(..)) # Perform inference by sampling using HMC.\n\nThe returned chain contains samples of the variables in the model.\n\nvar_1 = mean(chn[:var_1]) # Taking the mean of a variable named var_1.\n\nThe key (:var_1) can be a Symbol or a String. For example, to fetch x[1], one can use chn[Symbol(\"x[1]\")] or chn[\"x[1]\"]. If you want to retrieve all parameters associated with a specific symbol, you can use group. As an example, if you have the parameters \"x[1]\", \"x[2]\", and \"x[3]\", calling group(chn, :x) or group(chn, \"x\") will return a new chain with only \"x[1]\", \"x[2]\", and \"x[3]\".\nTuring does not have a declarative form. More generally, the order in which you place the lines of a @model macro matters. For example, the following example works:\n\n# Define a simple Normal model with unknown mean and variance.\n@model function model_function(y)\n s ~ Poisson(1)\n y ~ Normal(s, 1)\n return y\nend\n\nsample(model_function(10), SMC(), 100)\n\nChains MCMC chain (100×3×1 Array{Float64, 3}):\n\nLog evidence = -22.829452582957096\nIterations = 1:1:100\nNumber of chains = 1\nSamples per chain = 100\nWall duration = 2.45 seconds\nCompute duration = 2.45 seconds\nparameters = s\ninternals = lp, weight\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s 4.0000 0.0000 NaN NaN NaN NaN ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s 4.0000 4.0000 4.0000 4.0000 4.0000\n\n\nBut if we switch the s ~ Poisson(1) and y ~ Normal(s, 1) lines, the model will no longer sample correctly:\n\n# Define a simple Normal model with unknown mean and variance.\n@model function model_function(y)\n y ~ Normal(s, 1)\n s ~ Poisson(1)\n return y\nend\n\nsample(model_function(10), SMC(), 100)\n\n\n\nSampling Multiple Chains\nTuring supports distributed and threaded parallel sampling. To do so, call sample(model, sampler, parallel_type, n, n_chains), where parallel_type can be either MCMCThreads() or MCMCDistributed() for thread and parallel sampling, respectively.\nHaving multiple chains in the same object is valuable for evaluating convergence. Some diagnostic functions like gelmandiag require multiple chains.\nIf you do not want parallelism or are on an older version Julia, you can sample multiple chains with the mapreduce function:\n\n# Replace num_chains below with however many chains you wish to sample.\nchains = mapreduce(c -> sample(model_fun, sampler, 1000), chainscat, 1:num_chains)\n\nThe chains variable now contains a Chains object which can be indexed by chain. To pull out the first chain from the chains object, use chains[:,:,1]. The method is the same if you use either of the below parallel sampling methods.\n\nMultithreaded sampling\nIf you wish to perform multithreaded sampling, you can call sample with the following signature:\n\nusing Turing\n\n@model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\nend\n\nmodel = gdemo([1.5, 2.0])\n\n# Sample four chains using multiple threads, each with 1000 samples.\nsample(model, NUTS(), MCMCThreads(), 1000, 4)\n\nBe aware that Turing cannot add threads for you – you must have started your Julia instance with multiple threads to experience any kind of parallelism. See the Julia documentation for details on how to achieve this.\n\n\nDistributed sampling\nTo perform distributed sampling (using multiple processes), you must first import Distributed.\nProcess parallel sampling can be done like so:\n\n# Load Distributed to add processes and the @everywhere macro.\nusing Distributed\n\n# Load Turing.\nusing Turing\n\n# Add four processes to use for sampling.\naddprocs(4; exeflags=\"--project=$(Base.active_project())\")\n\n# Initialize everything on all the processes.\n# Note: Make sure to do this after you've already loaded Turing,\n# so each process does not have to precompile.\n# Parallel sampling may fail silently if you do not do this.\n@everywhere using Turing\n\n# Define a model on all processes.\n@everywhere @model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\nend\n\n# Declare the model instance everywhere.\n@everywhere model = gdemo([1.5, 2.0])\n\n# Sample four chains using multiple processes, each with 1000 samples.\nsample(model, NUTS(), MCMCDistributed(), 1000, 4)\n\n\n\n\nSampling from an Unconditional Distribution (The Prior)\nTuring allows you to sample from a declared model’s prior. If you wish to draw a chain from the prior to inspect your prior distributions, you can simply run\n\nchain = sample(model, Prior(), n_samples)\n\nYou can also run your model (as if it were a function) from the prior distribution, by calling the model without specifying inputs or a sampler. In the below example, we specify a gdemo model which returns two variables, x and y. The model includes x and y as arguments, but calling the function without passing in x or y means that Turing’s compiler will assume they are missing values to draw from the relevant distribution. The return statement is necessary to retrieve the sampled x and y values. Assign the function with missing inputs to a variable, and Turing will produce a sample from the prior distribution.\n\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n y ~ Normal(m, sqrt(s²))\n return x, y\nend\n\ngdemo (generic function with 2 methods)\n\n\nAssign the function with missing inputs to a variable, and Turing will produce a sample from the prior distribution.\n\n# Samples from p(x,y)\ng_prior_sample = gdemo(missing, missing)\ng_prior_sample()\n\n(2.325745793024842, 1.6718491597598966)\n\n\n\n\nSampling from a Conditional Distribution (The Posterior)\n\nTreating observations as random variables\nInputs to the model that have a value missing are treated as parameters, aka random variables, to be estimated/sampled. This can be useful if you want to simulate draws for that parameter, or if you are sampling from a conditional distribution. Turing supports the following syntax:\n\n@model function gdemo(x, ::Type{T}=Float64) where {T}\n if x === missing\n # Initialize `x` if missing\n x = Vector{T}(undef, 2)\n end\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\nend\n\n# Construct a model with x = missing\nmodel = gdemo(missing)\nc = sample(model, HMC(0.05, 20), 500)\n\nChains MCMC chain (500×14×1 Array{Float64, 3}):\n\nIterations = 1:1:500\nNumber of chains = 1\nSamples per chain = 500\nWall duration = 3.14 seconds\nCompute duration = 3.14 seconds\nparameters = s², m, x[1], x[2]\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s² 2.3934 2.0763 0.1353 205.7801 253.6508 1.0018 ⋯\n m 0.0433 1.3762 0.2484 30.5134 60.8764 1.0231 ⋯\n x[1] 0.0008 1.8245 0.3209 32.8440 90.2795 1.0182 ⋯\n x[2] 0.1271 2.0478 0.4178 24.0741 59.3932 1.0111 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.5334 1.1529 1.8088 2.8315 8.5208\n m -2.4016 -0.9314 -0.0628 1.0304 2.6589\n x[1] -3.3737 -1.2456 -0.0508 1.3019 3.5384\n x[2] -3.8526 -1.2242 0.0876 1.6242 4.0125\n\n\nNote the need to initialize x when missing since we are iterating over its elements later in the model. The generated values for x can be extracted from the Chains object using c[:x].\nTuring also supports mixed missing and non-missing values in x, where the missing ones will be treated as random variables to be sampled while the others get treated as observations. For example:\n\n@model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\nend\n\n# x[1] is a parameter, but x[2] is an observation\nmodel = gdemo([missing, 2.4])\nc = sample(model, HMC(0.01, 5), 500)\n\nChains MCMC chain (500×13×1 Array{Float64, 3}):\n\nIterations = 1:1:500\nNumber of chains = 1\nSamples per chain = 500\nWall duration = 2.1 seconds\nCompute duration = 2.1 seconds\nparameters = s², m, x[1]\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s² 0.5699 0.1798 0.1417 1.6486 10.8746 1.6610 ⋯\n m 0.9980 0.3031 0.0965 9.7826 21.1813 0.9992 ⋯\n x[1] 0.7506 0.3611 0.1347 7.5424 11.9953 1.1674 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.1981 0.4223 0.5849 0.6986 0.8868\n m 0.3722 0.8393 1.0245 1.1995 1.5965\n x[1] 0.0196 0.5030 0.8114 0.9859 1.4309\n\n\n\n\nDefault Values\nArguments to Turing models can have default values much like how default values work in normal Julia functions. For instance, the following will assign missing to x and treat it as a random variable. If the default value is not missing, x will be assigned that value and will be treated as an observation instead.\n\nusing Turing\n\n@model function generative(x=missing, ::Type{T}=Float64) where {T<:Real}\n if x === missing\n # Initialize x when missing\n x = Vector{T}(undef, 10)\n end\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n for i in 1:length(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\n return s², m\nend\n\nm = generative()\nchain = sample(m, HMC(0.01, 5), 1000)\n\nChains MCMC chain (1000×22×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 1.95 seconds\nCompute duration = 1.95 seconds\nparameters = s², m, x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8], x[9], x[10]\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s² 1.7645 0.6716 0.2628 8.5593 20.9179 1.1204 ⋯\n m 1.4942 0.3068 0.1612 3.6359 21.2823 1.4443 ⋯\n x[1] 0.4054 0.7196 0.2676 7.9875 25.9697 1.0160 ⋯\n x[2] 0.1575 0.8634 0.4877 3.3110 12.4449 1.4353 ⋯\n x[3] 0.8821 1.0202 0.6259 2.8937 12.9892 1.6403 ⋯\n x[4] 2.4551 0.3376 0.1176 8.4608 61.6580 1.0840 ⋯\n x[5] 0.7661 0.4464 0.2098 4.2432 20.5311 1.3227 ⋯\n x[6] 1.0855 0.5325 0.3168 3.0566 22.1362 1.6635 ⋯\n x[7] 1.0272 0.6561 0.3863 2.9850 12.9844 1.6083 ⋯\n x[8] 3.4716 0.6020 0.3832 2.7087 21.7706 1.8399 ⋯\n x[9] 1.1359 0.3224 0.1585 4.0939 28.6566 1.3930 ⋯\n x[10] 1.1215 0.7266 0.4406 2.7508 15.1411 1.8662 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.9309 1.3149 1.5580 1.9825 3.5523\n m 0.8422 1.3035 1.5079 1.6701 2.0617\n x[1] -0.7666 -0.1324 0.2663 1.0166 1.7107\n x[2] -1.8565 -0.3543 0.3654 0.8257 1.4013\n x[3] -0.4734 0.0426 0.6107 1.5135 3.0500\n x[4] 1.7122 2.2472 2.5149 2.7360 2.9414\n x[5] -0.2942 0.5515 0.8900 1.0961 1.3579\n x[6] 0.2463 0.5153 1.1975 1.5154 2.0108\n x[7] 0.2752 0.5624 0.8105 1.3365 2.6598\n x[8] 2.5148 2.9188 3.5218 4.0421 4.5160\n x[9] 0.5938 0.8875 1.0937 1.3525 1.8724\n x[10] -0.4909 0.8492 1.1354 1.6528 2.2394\n\n\n\n\nAccess Values inside Chain\nYou can access the values inside a chain several ways:\n\nTurn them into a DataFrame object\nUse their raw AxisArray form\nCreate a three-dimensional Array object\n\nFor example, let c be a Chain:\n\nDataFrame(c) converts c to a DataFrame,\nc.value retrieves the values inside c as an AxisArray, and\nc.value.data retrieves the values inside c as a 3D Array.\n\n\n\nVariable Types and Type Parameters\nThe element type of a vector (or matrix) of random variables should match the eltype of its prior distribution, <: Integer for discrete distributions and <: AbstractFloat for continuous distributions. Moreover, if the continuous random variable is to be sampled using a Hamiltonian sampler, the vector’s element type needs to either be:\n\nReal to enable auto-differentiation through the model which uses special number types that are sub-types of Real, or\nSome type parameter T defined in the model header using the type parameter syntax, e.g. function gdemo(x, ::Type{T} = Float64) where {T}.\n\nSimilarly, when using a particle sampler, the Julia variable used should either be:\n\nAn Array, or\nAn instance of some type parameter T defined in the model header using the type parameter syntax, e.g. function gdemo(x, ::Type{T} = Vector{Float64}) where {T}.\n\n\n\n\nQuerying Probabilities from Model or Chain\nTuring offers three functions: loglikelihood, logprior, and logjoint to query the log-likelihood, log-prior, and log-joint probabilities of a model, respectively.\nLet’s look at a simple model called gdemo:\n\n@model function gdemo0()\n s ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s))\n return x ~ Normal(m, sqrt(s))\nend\n\ngdemo0 (generic function with 2 methods)\n\n\nIf we observe x to be 1.0, we can condition the model on this datum using the condition syntax:\n\nmodel = gdemo0() | (x=1.0,)\n\nDynamicPPL.Model{typeof(gdemo0), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{@NamedTuple{x::Float64}, DynamicPPL.DefaultContext}}(gdemo0, NamedTuple(), NamedTuple(), ConditionContext((x = 1.0,), DynamicPPL.DefaultContext()))\n\n\nNow, let’s compute the log-likelihood of the observation given specific values of the model parameters, s and m:\n\nloglikelihood(model, (s=1.0, m=1.0))\n\n-0.9189385332046728\n\n\nWe can easily verify that value in this case:\n\nlogpdf(Normal(1.0, 1.0), 1.0)\n\n-0.9189385332046728\n\n\nWe can also compute the log-prior probability of the model for the same values of s and m:\n\nlogprior(model, (s=1.0, m=1.0))\n\n-2.221713955868453\n\n\n\nlogpdf(InverseGamma(2, 3), 1.0) + logpdf(Normal(0, sqrt(1.0)), 1.0)\n\n-2.221713955868453\n\n\nFinally, we can compute the log-joint probability of the model parameters and data:\n\nlogjoint(model, (s=1.0, m=1.0))\n\n-3.1406524890731258\n\n\n\nlogpdf(Normal(1.0, 1.0), 1.0) +\nlogpdf(InverseGamma(2, 3), 1.0) +\nlogpdf(Normal(0, sqrt(1.0)), 1.0)\n\n-3.1406524890731258\n\n\nQuerying with Chains object is easy as well:\n\nchn = sample(model, Prior(), 10)\n\nChains MCMC chain (10×3×1 Array{Float64, 3}):\n\nIterations = 1:1:10\nNumber of chains = 1\nSamples per chain = 10\nWall duration = 0.03 seconds\nCompute duration = 0.03 seconds\nparameters = s, m\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s 2.0591 1.0635 0.3625 6.4313 10.0000 1.1155 ⋯\n m 0.4162 1.0718 0.3389 10.0000 10.0000 1.1211 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s 0.5890 1.4826 1.8581 2.7022 3.6370\n m -1.1383 -0.0184 0.1959 1.1159 2.1792\n\n\n\nloglikelihood(model, chn)\n\n10×1 Matrix{Float64}:\n -1.2956021610513948\n -1.587270348078016\n -1.1402328698321083\n -1.5766585948382827\n -1.6125549684890501\n -1.7588147688446933\n -1.9735590131990954\n -1.4190745744833648\n -5.6546292532850915\n -0.9537213800718106\n\n\n\n\nMaximum likelihood and maximum a posterior estimates\nTuring also has functions for estimating the maximum aposteriori and maximum likelihood parameters of a model. This can be done with\n\nmle_estimate = maximum_likelihood(model)\nmap_estimate = maximum_a_posteriori(model)\n\nModeResult with maximized lp of -2.81\n[0.8125000000039697, 0.5000000000092388]\n\n\nFor more details see the mode estimation page.", "crumbs": [ "Get Started", "Core Functionality" ] }, { "objectID": "core-functionality/index.html#beyond-the-basics", "href": "core-functionality/index.html#beyond-the-basics", "title": "Core Functionality", "section": "Beyond the Basics", "text": "Beyond the Basics\n\nCompositional Sampling Using Gibbs\nTuring.jl provides a Gibbs interface to combine different samplers. For example, one can combine an HMC sampler with a PG sampler to run inference for different parameters in a single model as below.\n\n@model function simple_choice(xs)\n p ~ Beta(2, 2)\n z ~ Bernoulli(p)\n for i in 1:length(xs)\n if z == 1\n xs[i] ~ Normal(0, 1)\n else\n xs[i] ~ Normal(2, 1)\n end\n end\nend\n\nsimple_choice_f = simple_choice([1.5, 2.0, 0.3])\n\nchn = sample(simple_choice_f, Gibbs(:p => HMC(0.2, 3), :z => PG(20)), 1000)\n\nChains MCMC chain (1000×3×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 21.65 seconds\nCompute duration = 21.65 seconds\nparameters = p, z\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n p 0.4460 0.1927 0.0212 82.3721 210.3250 1.0107 ⋯\n z 0.1760 0.3810 0.0170 502.6703 NaN 0.9997 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n p 0.1193 0.2952 0.4358 0.5839 0.8229\n z 0.0000 0.0000 0.0000 0.0000 1.0000\n\n\nThe Gibbs sampler can be used to specify unique automatic differentiation backends for different variable spaces. Please see the Automatic Differentiation article for more.\nFor more details of compositional sampling in Turing.jl, please check the corresponding paper.\n\n\nWorking with filldist and arraydist\nTuring provides filldist(dist::Distribution, n::Int) and arraydist(dists::AbstractVector{<:Distribution}) as a simplified interface to construct product distributions, e.g., to model a set of variables that share the same structure but vary by group.\n\nConstructing product distributions with filldist\nThe function filldist provides a general interface to construct product distributions over distributions of the same type and parameterisation. Note that, in contrast to the product distribution interface provided by Distributions.jl (Product), filldist supports product distributions over univariate or multivariate distributions.\nExample usage:\n\n@model function demo(x, g)\n k = length(unique(g))\n a ~ filldist(Exponential(), k) # = Product(fill(Exponential(), k))\n mu = a[g]\n for i in eachindex(x)\n x[i] ~ Normal(mu[i])\n end\n return mu\nend\n\ndemo (generic function with 2 methods)\n\n\n\n\nConstructing product distributions with arraydist\nThe function arraydist provides a general interface to construct product distributions over distributions of varying type and parameterisation. Note that in contrast to the product distribution interface provided by Distributions.jl (Product), arraydist supports product distributions over univariate or multivariate distributions.\nExample usage:\n\n@model function demo(x, g)\n k = length(unique(g))\n a ~ arraydist([Exponential(i) for i in 1:k])\n mu = a[g]\n for i in eachindex(x)\n x[i] ~ Normal(mu[i])\n end\n return mu\nend\n\ndemo (generic function with 2 methods)\n\n\n\n\n\nWorking with MCMCChains.jl\nTuring.jl wraps its samples using MCMCChains.Chain so that all the functions working for MCMCChains.Chain can be re-used in Turing.jl. Two typical functions are MCMCChains.describe and MCMCChains.plot, which can be used as follows for an obtained chain chn. For more information on MCMCChains, please see the GitHub repository.\n\ndescribe(chn) # Lists statistics of the samples.\nplot(chn) # Plots statistics of the samples.\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThere are numerous functions in addition to describe and plot in the MCMCChains package, such as those used in convergence diagnostics. For more information on the package, please see the GitHub repository.\n\n\nChanging Default Settings\nSome of Turing.jl’s default settings can be changed for better usage.\n\nAD Chunk Size\nForwardDiff (Turing’s default AD backend) uses forward-mode chunk-wise AD. The chunk size can be set manually by AutoForwardDiff(;chunksize=new_chunk_size).\n\n\nAD Backend\nTuring supports four automatic differentiation (AD) packages in the back end during sampling. The default AD backend is ForwardDiff for forward-mode AD. Three reverse-mode AD backends are also supported, namely Mooncake, Zygote and ReverseDiff. Mooncake, Zygote, and ReverseDiff also require the user to explicitly load them using import Mooncake, import Zygote, or import ReverseDiff next to using Turing.\nFor more information on Turing’s automatic differentiation backend, please see the Automatic Differentiation article.\n\n\nProgress Logging\nTuring.jl uses ProgressLogging.jl to log the sampling progress. Progress logging is enabled as default but might slow down inference. It can be turned on or off by setting the keyword argument progress of sample to true or false. Moreover, you can enable or disable progress logging globally by calling setprogress!(true) or setprogress!(false), respectively.\nTuring uses heuristics to select an appropriate visualization backend. If you use Jupyter notebooks, the default backend is ConsoleProgressMonitor.jl. In all other cases, progress logs are displayed with TerminalLoggers.jl. Alternatively, if you provide a custom visualization backend, Turing uses it instead of the default backend.", "crumbs": [ "Get Started", "Core Functionality" ] }, { "objectID": "versions.html", "href": "versions.html", "title": "Latest Version", "section": "", "text": "Latest Version\n\n\n\nv0.37\nDocumentation\nChangelog\n\n\n\n\n\nPrevious Versions\n\n\n\nv0.36\nDocumentation\n\n\nv0.35\nDocumentation\n\n\nv0.34\nDocumentation\n\n\nv0.33\nDocumentation\n\n\nv0.32\nDocumentation\n\n\n\n\n\nArchived Versions\nDocumentation for archived versions is available on our deprecated documentation site.\n\n\n\nv0.31\nDocumentation\n\n\nv0.30\nDocumentation\n\n\nv0.29\nDocumentation\n\n\nv0.28\nDocumentation\n\n\nv0.27\nDocumentation\n\n\nv0.26\nDocumentation\n\n\nv0.25\nDocumentation\n\n\nv0.24\nDocumentation\n\n\n\n\n\n\n\n Back to top" }, { "objectID": "tutorials/gaussian-mixture-models/index.html", "href": "tutorials/gaussian-mixture-models/index.html", "title": "Gaussian Mixture Models", "section": "", "text": "The following tutorial illustrates the use of Turing for an unsupervised task, namely, clustering data using a Bayesian mixture model. The aim of this task is to infer a latent grouping (hidden structure) from unlabelled data.", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#synthetic-data", "href": "tutorials/gaussian-mixture-models/index.html#synthetic-data", "title": "Gaussian Mixture Models", "section": "Synthetic Data", "text": "Synthetic Data\nWe generate a synthetic dataset of \\(N = 60\\) two-dimensional points \\(x_i \\in \\mathbb{R}^2\\) drawn from a Gaussian mixture model. For simplicity, we use \\(K = 2\\) clusters with\n\nequal weights, i.e., we use mixture weights \\(w = [0.5, 0.5]\\), and\nisotropic Gaussian distributions of the points in each cluster.\n\nMore concretely, we use the Gaussian distributions \\(\\mathcal{N}([\\mu_k, \\mu_k]^\\mathsf{T}, I)\\) with parameters \\(\\mu_1 = -3.5\\) and \\(\\mu_2 = 0.5\\).\n\nusing Distributions\nusing FillArrays\nusing StatsPlots\n\nusing LinearAlgebra\nusing Random\n\n# Set a random seed.\nRandom.seed!(3)\n\n# Define Gaussian mixture model.\nw = [0.5, 0.5]\nμ = [-3.5, 0.5]\nmixturemodel = MixtureModel([MvNormal(Fill(μₖ, 2), I) for μₖ in μ], w)\n\n# We draw the data points.\nN = 60\nx = rand(mixturemodel, N);\n\nThe following plot shows the dataset.\n\nscatter(x[1, :], x[2, :]; legend=false, title=\"Synthetic Dataset\")", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#gaussian-mixture-model-in-turing", "href": "tutorials/gaussian-mixture-models/index.html#gaussian-mixture-model-in-turing", "title": "Gaussian Mixture Models", "section": "Gaussian Mixture Model in Turing", "text": "Gaussian Mixture Model in Turing\nWe are interested in recovering the grouping from the dataset. More precisely, we want to infer the mixture weights, the parameters \\(\\mu_1\\) and \\(\\mu_2\\), and the assignment of each datum to a cluster for the generative Gaussian mixture model.\nIn a Bayesian Gaussian mixture model with \\(K\\) components each data point \\(x_i\\) (\\(i = 1,\\ldots,N\\)) is generated according to the following generative process. First we draw the model parameters, i.e., in our example we draw parameters \\(\\mu_k\\) for the mean of the isotropic normal distributions and the mixture weights \\(w\\) of the \\(K\\) clusters. We use standard normal distributions as priors for \\(\\mu_k\\) and a Dirichlet distribution with parameters \\(\\alpha_1 = \\cdots = \\alpha_K = 1\\) as prior for \\(w\\): \\[\n\\begin{aligned}\n\\mu_k &\\sim \\mathcal{N}(0, 1) \\qquad (k = 1,\\ldots,K)\\\\\nw &\\sim \\operatorname{Dirichlet}(\\alpha_1, \\ldots, \\alpha_K)\n\\end{aligned}\n\\] After having constructed all the necessary model parameters, we can generate an observation by first selecting one of the clusters \\[\nz_i \\sim \\operatorname{Categorical}(w) \\qquad (i = 1,\\ldots,N),\n\\] and then drawing the datum accordingly, i.e., in our example drawing \\[\nx_i \\sim \\mathcal{N}([\\mu_{z_i}, \\mu_{z_i}]^\\mathsf{T}, I) \\qquad (i=1,\\ldots,N).\n\\] For more details on Gaussian mixture models, we refer to Christopher M. Bishop, Pattern Recognition and Machine Learning, Section 9.\nWe specify the model with Turing.\n\nusing Turing\n\n@model function gaussian_mixture_model(x)\n # Draw the parameters for each of the K=2 clusters from a standard normal distribution.\n K = 2\n μ ~ MvNormal(Zeros(K), I)\n\n # Draw the weights for the K clusters from a Dirichlet distribution with parameters αₖ = 1.\n w ~ Dirichlet(K, 1.0)\n # Alternatively, one could use a fixed set of weights.\n # w = fill(1/K, K)\n\n # Construct categorical distribution of assignments.\n distribution_assignments = Categorical(w)\n\n # Construct multivariate normal distributions of each cluster.\n D, N = size(x)\n distribution_clusters = [MvNormal(Fill(μₖ, D), I) for μₖ in μ]\n\n # Draw assignments for each datum and generate it from the multivariate normal distribution.\n k = Vector{Int}(undef, N)\n for i in 1:N\n k[i] ~ distribution_assignments\n x[:, i] ~ distribution_clusters[k[i]]\n end\n\n return k\nend\n\nmodel = gaussian_mixture_model(x);\n\nWe run a MCMC simulation to obtain an approximation of the posterior distribution of the parameters \\(\\mu\\) and \\(w\\) and assignments \\(k\\). We use a Gibbs sampler that combines a particle Gibbs sampler for the discrete parameters (assignments \\(k\\)) and a Hamiltonian Monte Carlo sampler for the continuous parameters (\\(\\mu\\) and \\(w\\)). We generate multiple chains in parallel using multi-threading.\n\nsampler = Gibbs(:k => PG(100), (:μ, :w) => HMC(0.05, 10))\nnsamples = 150\nnchains = 4\nburn = 10\nchains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = burn);\n\n\n\n\n\n\n\nSampling With Multiple Threads\n\n\n\n\n\nThe sample() call above assumes that you have at least nchains threads available in your Julia instance. If you do not, the multiple chains will run sequentially, and you may notice a warning. For more information, see the Turing documentation on sampling multiple chains.", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#inferred-mixture-model", "href": "tutorials/gaussian-mixture-models/index.html#inferred-mixture-model", "title": "Gaussian Mixture Models", "section": "Inferred Mixture Model", "text": "Inferred Mixture Model\nAfter sampling we can visualize the trace and density of the parameters of interest.\nWe consider the samples of the location parameters \\(\\mu_1\\) and \\(\\mu_2\\) for the two clusters.\n\nplot(chains[[\"μ[1]\", \"μ[2]\"]]; legend=true)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nIt can happen that the modes of \\(\\mu_1\\) and \\(\\mu_2\\) switch between chains. For more information see the Stan documentation. This is because it’s possible for either model parameter \\(\\mu_k\\) to be assigned to either of the corresponding true means, and this assignment need not be consistent between chains.\nThat is, the posterior is fundamentally multimodal, and different chains can end up in different modes, complicating inference. One solution here is to enforce an ordering on our \\(\\mu\\) vector, requiring \\(\\mu_k > \\mu_{k-1}\\) for all \\(k\\). Bijectors.jl provides an easy transformation (ordered()) for this purpose:\n\nusing Bijectors: ordered\n\n@model function gaussian_mixture_model_ordered(x)\n # Draw the parameters for each of the K=2 clusters from a standard normal distribution.\n K = 2\n μ ~ ordered(MvNormal(Zeros(K), I))\n # Draw the weights for the K clusters from a Dirichlet distribution with parameters αₖ = 1.\n w ~ Dirichlet(K, 1.0)\n # Alternatively, one could use a fixed set of weights.\n # w = fill(1/K, K)\n # Construct categorical distribution of assignments.\n distribution_assignments = Categorical(w)\n # Construct multivariate normal distributions of each cluster.\n D, N = size(x)\n distribution_clusters = [MvNormal(Fill(μₖ, D), I) for μₖ in μ]\n # Draw assignments for each datum and generate it from the multivariate normal distribution.\n k = Vector{Int}(undef, N)\n for i in 1:N\n k[i] ~ distribution_assignments\n x[:, i] ~ distribution_clusters[k[i]]\n end\n return k\nend\n\nmodel = gaussian_mixture_model_ordered(x);\n\nNow, re-running our model, we can see that the assigned means are consistent across chains:\n\nchains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = burn);\n\n\nplot(chains[[\"μ[1]\", \"μ[2]\"]]; legend=true)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe also inspect the samples of the mixture weights \\(w\\).\n\nplot(chains[[\"w[1]\", \"w[2]\"]]; legend=true)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAs the distributions of the samples for the parameters \\(\\mu_1\\), \\(\\mu_2\\), \\(w_1\\), and \\(w_2\\) are unimodal, we can safely visualize the density region of our model using the average values.\n\n# Model with mean of samples as parameters.\nμ_mean = [mean(chains, \"μ[$i]\") for i in 1:2]\nw_mean = [mean(chains, \"w[$i]\") for i in 1:2]\nmixturemodel_mean = MixtureModel([MvNormal(Fill(μₖ, 2), I) for μₖ in μ_mean], w_mean)\ncontour(\n range(-7.5, 3; length=1_000),\n range(-6.5, 3; length=1_000),\n (x, y) -> logpdf(mixturemodel_mean, [x, y]);\n widen=false,\n)\nscatter!(x[1, :], x[2, :]; legend=false, title=\"Synthetic Dataset\")", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#inferred-assignments", "href": "tutorials/gaussian-mixture-models/index.html#inferred-assignments", "title": "Gaussian Mixture Models", "section": "Inferred Assignments", "text": "Inferred Assignments\nFinally, we can inspect the assignments of the data points inferred using Turing. As we can see, the dataset is partitioned into two distinct groups.\n\nassignments = [mean(chains, \"k[$i]\") for i in 1:N]\nscatter(\n x[1, :],\n x[2, :];\n legend=false,\n title=\"Assignments on Synthetic Dataset\",\n zcolor=assignments,\n)", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#marginalizing-out-the-assignments", "href": "tutorials/gaussian-mixture-models/index.html#marginalizing-out-the-assignments", "title": "Gaussian Mixture Models", "section": "Marginalizing Out The Assignments", "text": "Marginalizing Out The Assignments\nWe can write out the marginal posterior of (continuous) \\(w, \\mu\\) by summing out the influence of our (discrete) assignments \\(z_i\\) from our likelihood: \\[\np(y \\mid w, \\mu ) = \\sum_{k=1}^K w_k p_k(y \\mid \\mu_k)\n\\] In our case, this gives us: \\[\np(y \\mid w, \\mu) = \\sum_{k=1}^K w_k \\cdot \\operatorname{MvNormal}(y \\mid \\mu_k, I)\n\\]\n\nMarginalizing By Hand\nWe could implement the above version of the Gaussian mixture model in Turing as follows: First, Turing uses log-probabilities, so the likelihood above must be converted into log-space: \\[\n\\log \\left( p(y \\mid w, \\mu) \\right) = \\text{logsumexp} \\left[\\log (w_k) + \\log(\\operatorname{MvNormal}(y \\mid \\mu_k, I)) \\right]\n\\]\nWhere we sum the components with logsumexp from the LogExpFunctions.jl package. The manually incremented likelihood can be added to the log-probability with Turing.@addlogprob!, giving us the following model:\n\nusing LogExpFunctions\n\n@model function gmm_marginalized(x)\n K = 2\n D, N = size(x)\n μ ~ ordered(MvNormal(Zeros(K), I))\n w ~ Dirichlet(K, 1.0)\n dists = [MvNormal(Fill(μₖ, D), I) for μₖ in μ]\n for i in 1:N\n lvec = Vector(undef, K)\n for k in 1:K\n lvec[k] = (w[k] + logpdf(dists[k], x[:, i]))\n end\n Turing.@addlogprob! logsumexp(lvec)\n end\nend\n\n\n\n\n\n\n\nManually Incrementing Probablity\n\n\n\n\n\nWhen possible, use of Turing.@addlogprob! should be avoided, as it exists outside the usual structure of a Turing model. In most cases, a custom distribution should be used instead.\nHere, the next section demonstrates the perfered method — using the MixtureModel distribution we have seen already to perform the marginalization automatically.\n\n\n\n\n\nMarginalizing For Free With Distribution.jl’s MixtureModel Implementation\nWe can use Turing’s ~ syntax with anything that Distributions.jl provides logpdf and rand methods for. It turns out that the MixtureModel distribution it provides has, as its logpdf method, logpdf(MixtureModel([Component_Distributions], weight_vector), Y), where Y can be either a single observation or vector of observations.\nIn fact, Distributions.jl provides many convenient constructors for mixture models, allowing further simplification in common special cases.\nFor example, when mixtures distributions are of the same type, one can write: ~ MixtureModel(Normal, [(μ1, σ1), (μ2, σ2)], w), or when the weight vector is known to allocate probability equally, it can be ommited.\nThe logpdf implementation for a MixtureModel distribution is exactly the marginalization defined above, and so our model becomes simply:\n\n@model function gmm_marginalized(x)\n K = 2\n D, _ = size(x)\n μ ~ ordered(MvNormal(Zeros(K), I))\n w ~ Dirichlet(K, 1.0)\n x ~ MixtureModel([MvNormal(Fill(μₖ, D), I) for μₖ in μ], w)\nend\nmodel = gmm_marginalized(x);\n\nAs we’ve summed out the discrete components, we can perform inference using NUTS() alone.\n\nsampler = NUTS()\nchains = sample(model, sampler, MCMCThreads(), nsamples, nchains; discard_initial = burn);\n\nNUTS() significantly outperforms our compositional Gibbs sampler, in large part because our model is now Rao-Blackwellized thanks to the marginalization of our assignment parameter.\n\nplot(chains[[\"μ[1]\", \"μ[2]\"]], legend=true)", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#inferred-assignments---marginalized-model", "href": "tutorials/gaussian-mixture-models/index.html#inferred-assignments---marginalized-model", "title": "Gaussian Mixture Models", "section": "Inferred Assignments - Marginalized Model", "text": "Inferred Assignments - Marginalized Model\nAs we’ve summed over possible assignments, the associated parameter is no longer available in our chain. This is not a problem, however, as given any fixed sample \\((\\mu, w)\\), the assignment probability — \\(p(z_i \\mid y_i)\\) — can be recovered using Bayes rule: \\[\np(z_i \\mid y_i) = \\frac{p(y_i \\mid z_i) p(z_i)}{\\sum_{k = 1}^K \\left(p(y_i \\mid z_i) p(z_i) \\right)}\n\\]\nThis quantity can be computed for every \\(p(z = z_i \\mid y_i)\\), resulting in a probability vector, which is then used to sample posterior predictive assignments from a categorial distribution. For details on the mathematics here, see the Stan documentation on latent discrete parameters.\n\nfunction sample_class(xi, dists, w)\n lvec = [(logpdf(d, xi) + log(w[i])) for (i, d) in enumerate(dists)]\n rand(Categorical(softmax(lvec)))\nend\n\n@model function gmm_recover(x)\n K = 2\n D, N = size(x)\n μ ~ ordered(MvNormal(Zeros(K), I))\n w ~ Dirichlet(K, 1.0)\n dists = [MvNormal(Fill(μₖ, D), I) for μₖ in μ]\n x ~ MixtureModel(dists, w)\n # Return assignment draws for each datapoint.\n return [sample_class(x[:, i], dists, w) for i in 1:N]\nend\n\nWe sample from this model as before:\n\nmodel = gmm_recover(x)\nchains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = burn);\n\nGiven a sample from the marginalized posterior, these assignments can be recovered with:\n\nassignments = mean(generated_quantities(gmm_recover(x), chains));\n\n\nscatter(\n x[1, :],\n x[2, :];\n legend=false,\n title=\"Assignments on Synthetic Dataset - Recovered\",\n zcolor=assignments,\n)", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/coin-flipping/index.html", "href": "tutorials/coin-flipping/index.html", "title": "Introduction: Coin Flipping", "section": "", "text": "This is the first of a series of guided tutorials on the Turing language. In this tutorial, we will use Bayesian inference to estimate the probability that a coin flip will result in heads, given a series of observations.\n\nSetup\nFirst, let us load some packages that we need to simulate a coin flip:\n\nusing Distributions\n\nusing Random\nRandom.seed!(12); # Set seed for reproducibility\n\nand to visualize our results.\n\nusing StatsPlots\n\nNote that Turing is not loaded here — we do not use it in this example. Next, we configure the data generating model. Let us set the true probability that a coin flip turns up heads\n\np_true = 0.5;\n\nand set the number of coin flips we will show our model.\n\nN = 100;\n\nWe simulate N coin flips by drawing N random samples from the Bernoulli distribution with success probability p_true. The draws are collected in a variable called data:\n\ndata = rand(Bernoulli(p_true), N);\n\nHere are the first five coin flips:\n\ndata[1:5]\n\n5-element Vector{Bool}:\n 1\n 0\n 0\n 0\n 1\n\n\n\n\nCoin Flipping Without Turing\nThe following example illustrates the effect of updating our beliefs with every piece of new evidence we observe.\nAssume that we are unsure about the probability of heads in a coin flip. To get an intuitive understanding of what “updating our beliefs” is, we will visualize the probability of heads in a coin flip after each observed evidence.\nWe begin by specifying a prior belief about the distribution of heads and tails in a coin toss. Here we choose a Beta distribution as prior distribution for the probability of heads. Before any coin flip is observed, we assume a uniform distribution \\(\\operatorname{U}(0, 1) = \\operatorname{Beta}(1, 1)\\) of the probability of heads. I.e., every probability is equally likely initially.\n\nprior_belief = Beta(1, 1);\n\nWith our priors set and our data at hand, we can perform Bayesian inference.\nThis is a fairly simple process. We expose one additional coin flip to our model every iteration, such that the first run only sees the first coin flip, while the last iteration sees all the coin flips. In each iteration we update our belief to an updated version of the original Beta distribution that accounts for the new proportion of heads and tails. The update is particularly simple since our prior distribution is a conjugate prior. Note that a closed-form expression for the posterior (implemented in the updated_belief expression below) is not accessible in general and usually does not exist for more interesting models.\n\nfunction updated_belief(prior_belief::Beta, data::AbstractArray{Bool})\n # Count the number of heads and tails.\n heads = sum(data)\n tails = length(data) - heads\n\n # Update our prior belief in closed form (this is possible because we use a conjugate prior).\n return Beta(prior_belief.α + heads, prior_belief.β + tails)\nend\n\n# Show updated belief for increasing number of observations\n@gif for n in 0:N\n plot(\n updated_belief(prior_belief, data[1:n]);\n size=(500, 250),\n title=\"Updated belief after $n observations\",\n xlabel=\"probability of heads\",\n ylabel=\"\",\n legend=nothing,\n xlim=(0, 1),\n fill=0,\n α=0.3,\n w=3,\n )\n vline!([p_true])\nend\n\nGKS: cannot open display - headless operation mode active\n[ Info: Saved animation to /tmp/jl_gnwfmxwtwP.gif\n\n\n\n\n\nThe animation above shows that with increasing evidence our belief about the probability of heads in a coin flip slowly adjusts towards the true value. The orange line in the animation represents the true probability of seeing heads on a single coin flip, while the mode of the distribution shows what the model believes the probability of a heads is given the evidence it has seen.\nFor the mathematically inclined, the \\(\\operatorname{Beta}\\) distribution is updated by adding each coin flip to the parameters \\(\\alpha\\) and \\(\\beta\\) of the distribution. Initially, the parameters are defined as \\(\\alpha = 1\\) and \\(\\beta = 1\\). Over time, with more and more coin flips, \\(\\alpha\\) and \\(\\beta\\) will be approximately equal to each other as we are equally likely to flip a heads or a tails.\nThe mean of the \\(\\operatorname{Beta}(\\alpha, \\beta)\\) distribution is\n\\[\\operatorname{E}[X] = \\dfrac{\\alpha}{\\alpha+\\beta}.\\]\nThis implies that the plot of the distribution will become centered around 0.5 for a large enough number of coin flips, as we expect \\(\\alpha \\approx \\beta\\).\nThe variance of the \\(\\operatorname{Beta}(\\alpha, \\beta)\\) distribution is\n\\[\\operatorname{var}[X] = \\dfrac{\\alpha\\beta}{(\\alpha + \\beta)^2 (\\alpha + \\beta + 1)}.\\]\nThus the variance of the distribution will approach 0 with more and more samples, as the denominator will grow faster than will the numerator. More samples means less variance. This implies that the distribution will reflect less uncertainty about the probability of receiving a heads and the plot will become more tightly centered around 0.5 for a large enough number of coin flips.\n\n\nCoin Flipping With Turing\nWe now move away from the closed-form expression above. We use Turing to specify the same model and to approximate the posterior distribution with samples. To do so, we first need to load Turing.\n\nusing Turing\n\nAdditionally, we load MCMCChains, a library for analyzing and visualizing the samples with which we approximate the posterior distribution.\n\nusing MCMCChains\n\nFirst, we define the coin-flip model using Turing.\n\n# Unconditioned coinflip model with `N` observations.\n@model function coinflip(; N::Int)\n # Our prior belief about the probability of heads in a coin toss.\n p ~ Beta(1, 1)\n\n # Heads or tails of a coin are drawn from `N` independent and identically\n # distributed Bernoulli distributions with success rate `p`.\n y ~ filldist(Bernoulli(p), N)\n\n return y\nend;\n\nIn the Turing model the prior distribution of the variable p, the probability of heads in a coin toss, and the distribution of the observations y are specified on the right-hand side of the ~ expressions. The @model macro modifies the body of the Julia function coinflip and, e.g., replaces the ~ statements with internal function calls that are used for sampling.\nHere we defined a model that is not conditioned on any specific observations as this allows us to easily obtain samples of both p and y with\n\nrand(coinflip(; N))\n\n(p = 0.9520583115441003, y = Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])\n\n\nThe model can be conditioned on some observations with |. See the documentation of the condition syntax in DynamicPPL.jl for more details. In the conditioned model the observations y are fixed to data.\n\ncoinflip(y::AbstractVector{<:Real}) = coinflip(; N=length(y)) | (; y)\n\nmodel = coinflip(data);\n\nAfter defining the model, we can approximate the posterior distribution by drawing samples from the distribution. In this example, we use a Hamiltonian Monte Carlo sampler to draw these samples. Other tutorials give more information on the samplers available in Turing and discuss their use for different models.\n\nsampler = NUTS();\n\nWe approximate the posterior distribution with 1000 samples:\n\nchain = sample(model, sampler, 2_000, progress=false);\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\nThe sample function and common keyword arguments are explained more extensively in the documentation of AbstractMCMC.jl.\nAfter finishing the sampling process, we can visually compare the closed-form posterior distribution with the approximation obtained with Turing.\n\nhistogram(chain)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNow we can build our plot:\n\n# Visualize a blue density plot of the approximate posterior distribution using HMC (see Chain 1 in the legend).\ndensity(chain; xlim=(0, 1), legend=:best, w=2, c=:blue)\n\n# Visualize a green density plot of the posterior distribution in closed-form.\nplot!(\n 0:0.01:1,\n pdf.(updated_belief(prior_belief, data), 0:0.01:1);\n xlabel=\"probability of heads\",\n ylabel=\"\",\n title=\"\",\n xlim=(0, 1),\n label=\"Closed-form\",\n fill=0,\n α=0.3,\n w=3,\n c=:lightgreen,\n)\n\n# Visualize the true probability of heads in red.\nvline!([p_true]; label=\"True probability\", c=:red)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAs we can see, the samples obtained with Turing closely approximate the true posterior distribution. Hopefully this tutorial has provided an easy-to-follow, yet informative introduction to Turing’s simpler applications. More advanced usage is demonstrated in other tutorials.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Tutorials", "Introduction: Coin Flipping" ] }, { "objectID": "tutorials/infinite-mixture-models/index.html", "href": "tutorials/infinite-mixture-models/index.html", "title": "Infinite Mixture Models", "section": "", "text": "In many applications it is desirable to allow the model to adjust its complexity to the amount of data. Consider for example the task of assigning objects into clusters or groups. This task often involves the specification of the number of groups. However, often times it is not known beforehand how many groups exist. Moreover, in some applictions, e.g. modelling topics in text documents or grouping species, the number of examples per group is heavy tailed. This makes it impossible to predefine the number of groups and requiring the model to form new groups when data points from previously unseen groups are observed.\nA natural approach for such applications is the use of non-parametric models. This tutorial will introduce how to use the Dirichlet process in a mixture of infinitely many Gaussians using Turing. For further information on Bayesian nonparametrics and the Dirichlet process we refer to the introduction by Zoubin Ghahramani and the book “Fundamentals of Nonparametric Bayesian Inference” by Subhashis Ghosal and Aad van der Vaart.\nusing Turing", "crumbs": [ "Get Started", "Tutorials", "Infinite Mixture Models" ] }, { "objectID": "tutorials/infinite-mixture-models/index.html#mixture-model", "href": "tutorials/infinite-mixture-models/index.html#mixture-model", "title": "Infinite Mixture Models", "section": "Mixture Model", "text": "Mixture Model\nBefore introducing infinite mixture models in Turing, we will briefly review the construction of finite mixture models. Subsequently, we will define how to use the Chinese restaurant process construction of a Dirichlet process for non-parametric clustering.\n\nTwo-Component Model\nFirst, consider the simple case of a mixture model with two Gaussian components with fixed covariance. The generative process of such a model can be written as:\n\\[\\begin{equation*}\n\\begin{aligned}\n\\pi_1 &\\sim \\mathrm{Beta}(a, b) \\\\\n\\pi_2 &= 1-\\pi_1 \\\\\n\\mu_1 &\\sim \\mathrm{Normal}(\\mu_0, \\Sigma_0) \\\\\n\\mu_2 &\\sim \\mathrm{Normal}(\\mu_0, \\Sigma_0) \\\\\nz_i &\\sim \\mathrm{Categorical}(\\pi_1, \\pi_2) \\\\\nx_i &\\sim \\mathrm{Normal}(\\mu_{z_i}, \\Sigma)\n\\end{aligned}\n\\end{equation*}\\]\nwhere \\(\\pi_1, \\pi_2\\) are the mixing weights of the mixture model, i.e. \\(\\pi_1 + \\pi_2 = 1\\), and \\(z_i\\) is a latent assignment of the observation \\(x_i\\) to a component (Gaussian).\nWe can implement this model in Turing for 1D data as follows:\n\n@model function two_model(x)\n # Hyper-parameters\n μ0 = 0.0\n σ0 = 1.0\n\n # Draw weights.\n π1 ~ Beta(1, 1)\n π2 = 1 - π1\n\n # Draw locations of the components.\n μ1 ~ Normal(μ0, σ0)\n μ2 ~ Normal(μ0, σ0)\n\n # Draw latent assignment.\n z ~ Categorical([π1, π2])\n\n # Draw observation from selected component.\n if z == 1\n x ~ Normal(μ1, 1.0)\n else\n x ~ Normal(μ2, 1.0)\n end\nend\n\ntwo_model (generic function with 2 methods)\n\n\n\n\nFinite Mixture Model\nIf we have more than two components, this model can elegantly be extended using a Dirichlet distribution as prior for the mixing weights \\(\\pi_1, \\dots, \\pi_K\\). Note that the Dirichlet distribution is the multivariate generalization of the beta distribution. The resulting model can be written as:\n\\[\n\\begin{align}\n(\\pi_1, \\dots, \\pi_K) &\\sim Dirichlet(K, \\alpha) \\\\\n\\mu_k &\\sim \\mathrm{Normal}(\\mu_0, \\Sigma_0), \\;\\; \\forall k \\\\\nz &\\sim Categorical(\\pi_1, \\dots, \\pi_K) \\\\\nx &\\sim \\mathrm{Normal}(\\mu_z, \\Sigma)\n\\end{align}\n\\]\nwhich resembles the model in the Gaussian mixture model tutorial with a slightly different notation.", "crumbs": [ "Get Started", "Tutorials", "Infinite Mixture Models" ] }, { "objectID": "tutorials/infinite-mixture-models/index.html#infinite-mixture-model", "href": "tutorials/infinite-mixture-models/index.html#infinite-mixture-model", "title": "Infinite Mixture Models", "section": "Infinite Mixture Model", "text": "Infinite Mixture Model\nThe question now arises, is there a generalization of a Dirichlet distribution for which the dimensionality \\(K\\) is infinite, i.e. \\(K = \\infty\\)?\nBut first, to implement an infinite Gaussian mixture model in Turing, we first need to load the Turing.RandomMeasures module. RandomMeasures contains a variety of tools useful in nonparametrics.\n\nusing Turing.RandomMeasures\n\nWe now will utilize the fact that one can integrate out the mixing weights in a Gaussian mixture model allowing us to arrive at the Chinese restaurant process construction. See Carl E. Rasmussen: The Infinite Gaussian Mixture Model, NIPS (2000) for details.\nIn fact, if the mixing weights are integrated out, the conditional prior for the latent variable \\(z\\) is given by:\n\\[\np(z_i = k \\mid z_{\\not i}, \\alpha) = \\frac{n_k + \\alpha K}{N - 1 + \\alpha}\n\\]\nwhere \\(z_{\\not i}\\) are the latent assignments of all observations except observation \\(i\\). Note that we use \\(n_k\\) to denote the number of observations at component \\(k\\) excluding observation \\(i\\). The parameter \\(\\alpha\\) is the concentration parameter of the Dirichlet distribution used as prior over the mixing weights.\n\nChinese Restaurant Process\nTo obtain the Chinese restaurant process construction, we can now derive the conditional prior if \\(K \\rightarrow \\infty\\).\nFor \\(n_k > 0\\) we obtain:\n\\[\np(z_i = k \\mid z_{\\not i}, \\alpha) = \\frac{n_k}{N - 1 + \\alpha}\n\\]\nand for all infinitely many clusters that are empty (combined) we get:\n\\[\np(z_i = k \\mid z_{\\not i}, \\alpha) = \\frac{\\alpha}{N - 1 + \\alpha}\n\\]\nThose equations show that the conditional prior for component assignments is proportional to the number of such observations, meaning that the Chinese restaurant process has a rich get richer property.\nTo get a better understanding of this property, we can plot the cluster choosen by for each new observation drawn from the conditional prior.\n\n# Concentration parameter.\nα = 10.0\n\n# Random measure, e.g. Dirichlet process.\nrpm = DirichletProcess(α)\n\n# Cluster assignments for each observation.\nz = Vector{Int}()\n\n# Maximum number of observations we observe.\nNmax = 500\n\nfor i in 1:Nmax\n # Number of observations per cluster.\n K = isempty(z) ? 0 : maximum(z)\n nk = Vector{Int}(map(k -> sum(z .== k), 1:K))\n\n # Draw new assignment.\n push!(z, rand(ChineseRestaurantProcess(rpm, nk)))\nend\n\n\nusing Plots\n\n# Plot the cluster assignments over time\n@gif for i in 1:Nmax\n scatter(\n collect(1:i),\n z[1:i];\n markersize=2,\n xlabel=\"observation (i)\",\n ylabel=\"cluster (k)\",\n legend=false,\n )\nend\n\nGKS: cannot open display - headless operation mode active\n[ Info: Saved animation to /tmp/jl_fq3qPSB0VI.gif\n\n\n\n\n\nFurther, we can see that the number of clusters is logarithmic in the number of observations and data points. This is a side-effect of the “rich-get-richer” phenomenon, i.e. we expect large clusters and thus the number of clusters has to be smaller than the number of observations.\n\\[\n\\mathbb{E}[K \\mid N] \\approx \\alpha \\cdot log \\big(1 + \\frac{N}{\\alpha}\\big)\n\\]\nWe can see from the equation that the concentration parameter \\(\\alpha\\) allows us to control the number of clusters formed a priori.\nIn Turing we can implement an infinite Gaussian mixture model using the Chinese restaurant process construction of a Dirichlet process as follows:\n\n@model function infiniteGMM(x)\n # Hyper-parameters, i.e. concentration parameter and parameters of H.\n α = 1.0\n μ0 = 0.0\n σ0 = 1.0\n\n # Define random measure, e.g. Dirichlet process.\n rpm = DirichletProcess(α)\n\n # Define the base distribution, i.e. expected value of the Dirichlet process.\n H = Normal(μ0, σ0)\n\n # Latent assignment.\n z = zeros(Int, length(x))\n\n # Locations of the infinitely many clusters.\n μ = zeros(Float64, 0)\n\n for i in 1:length(x)\n\n # Number of clusters.\n K = maximum(z)\n nk = Vector{Int}(map(k -> sum(z .== k), 1:K))\n\n # Draw the latent assignment.\n z[i] ~ ChineseRestaurantProcess(rpm, nk)\n\n # Create a new cluster?\n if z[i] > K\n push!(μ, 0.0)\n\n # Draw location of new cluster.\n μ[z[i]] ~ H\n end\n\n # Draw observation.\n x[i] ~ Normal(μ[z[i]], 1.0)\n end\nend\n\ninfiniteGMM (generic function with 2 methods)\n\n\nWe can now use Turing to infer the assignments of some data points. First, we will create some random data that comes from three clusters, with means of 0, -5, and 10.\n\nusing Plots, Random\n\n# Generate some test data.\nRandom.seed!(1)\ndata = vcat(randn(10), randn(10) .- 5, randn(10) .+ 10)\ndata .-= mean(data)\ndata /= std(data);\n\nNext, we’ll sample from our posterior using SMC.\n\nsetprogress!(false)\n\n\n# MCMC sampling\nRandom.seed!(2)\niterations = 1000\nmodel_fun = infiniteGMM(data);\nchain = sample(model_fun, SMC(), iterations);\n\nFinally, we can plot the number of clusters in each sample.\n\n# Extract the number of clusters for each sample of the Markov chain.\nk = map(\n t -> length(unique(vec(chain[t, MCMCChains.namesingroup(chain, :z), :].value))),\n 1:iterations,\n);\n\n# Visualize the number of clusters.\nplot(k; xlabel=\"Iteration\", ylabel=\"Number of clusters\", label=\"Chain 1\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nIf we visualize the histogram of the number of clusters sampled from our posterior, we observe that the model seems to prefer 3 clusters, which is the true number of clusters. Note that the number of clusters in a Dirichlet process mixture model is not limited a priori and will grow to infinity with probability one. However, if conditioned on data the posterior will concentrate on a finite number of clusters enforcing the resulting model to have a finite amount of clusters. It is, however, not given that the posterior of a Dirichlet process Gaussian mixture model converges to the true number of clusters, given that data comes from a finite mixture model. See Jeffrey Miller and Matthew Harrison: A simple example of Dirichlet process mixture inconsitency for the number of components for details.\n\nhistogram(k; xlabel=\"Number of clusters\", legend=false)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nOne issue with the Chinese restaurant process construction is that the number of latent parameters we need to sample scales with the number of observations. It may be desirable to use alternative constructions in certain cases. Alternative methods of constructing a Dirichlet process can be employed via the following representations:\nSize-Biased Sampling Process\n\\[\nj_k \\sim \\mathrm{Beta}(1, \\alpha) \\cdot \\mathrm{surplus}\n\\]\nStick-Breaking Process \\[\nv_k \\sim \\mathrm{Beta}(1, \\alpha)\n\\]\nChinese Restaurant Process \\[\np(z_n = k | z_{1:n-1}) \\propto \\begin{cases}\n\\frac{m_k}{n-1+\\alpha}, \\text{ if } m_k > 0\\\\\\\n\\frac{\\alpha}{n-1+\\alpha}\n\\end{cases}\n\\]\nFor more details see this article.", "crumbs": [ "Get Started", "Tutorials", "Infinite Mixture Models" ] }, { "objectID": "tutorials/bayesian-neural-networks/index.html", "href": "tutorials/bayesian-neural-networks/index.html", "title": "Bayesian Neural Networks", "section": "", "text": "In this tutorial, we demonstrate how one can implement a Bayesian Neural Network using a combination of Turing and Lux, a suite of machine learning tools. We will use Lux to specify the neural network’s layers and Turing to implement the probabilistic inference, with the goal of implementing a classification algorithm.\nWe will begin with importing the relevant libraries.\nusing Turing\nusing FillArrays\nusing Lux\nusing Plots\nimport Mooncake\nusing Functors\n\nusing LinearAlgebra\nusing Random\nOur goal here is to use a Bayesian neural network to classify points in an artificial dataset. The code below generates data points arranged in a box-like pattern and displays a graph of the dataset we will be working with.\n# Number of points to generate\nN = 80\nM = round(Int, N / 4)\nrng = Random.default_rng()\nRandom.seed!(rng, 1234)\n\n# Generate artificial data\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nxt1s = Array([[x1s[i] + 0.5f0; x2s[i] + 0.5f0] for i in 1:M])\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nappend!(xt1s, Array([[x1s[i] - 5.0f0; x2s[i] - 5.0f0] for i in 1:M]))\n\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nxt0s = Array([[x1s[i] + 0.5f0; x2s[i] - 5.0f0] for i in 1:M])\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nappend!(xt0s, Array([[x1s[i] - 5.0f0; x2s[i] + 0.5f0] for i in 1:M]))\n\n# Store all the data for later\nxs = [xt1s; xt0s]\nts = [ones(2 * M); zeros(2 * M)]\n\n# Plot data points.\nfunction plot_data()\n x1 = map(e -> e[1], xt1s)\n y1 = map(e -> e[2], xt1s)\n x2 = map(e -> e[1], xt0s)\n y2 = map(e -> e[2], xt0s)\n\n Plots.scatter(x1, y1; color=\"red\", clim=(0, 1))\n return Plots.scatter!(x2, y2; color=\"blue\", clim=(0, 1))\nend\n\nplot_data()", "crumbs": [ "Get Started", "Tutorials", "Bayesian Neural Networks" ] }, { "objectID": "tutorials/bayesian-neural-networks/index.html#building-a-neural-network", "href": "tutorials/bayesian-neural-networks/index.html#building-a-neural-network", "title": "Bayesian Neural Networks", "section": "Building a Neural Network", "text": "Building a Neural Network\nThe next step is to define a feedforward neural network where we express our parameters as distributions, and not single points as with traditional neural networks. For this we will use Dense to define liner layers and compose them via Chain, both are neural network primitives from Lux. The network nn_initial we created has two hidden layers with tanh activations and one output layer with sigmoid (σ) activation, as shown below.\n\n\n\n\n\n\n\nG\n\nInput layer Hidden layers Output layer\n\ncluster_input\n\n\n\ncluster_hidden1\n\n\n\ncluster_hidden2\n\n\n\ncluster_output\n\n\n\n\ninput1\n\n\n\n\nhidden11\n\n\n\n\ninput1--hidden11\n\n\n\n\nhidden12\n\n\n\n\ninput1--hidden12\n\n\n\n\nhidden13\n\n\n\n\ninput1--hidden13\n\n\n\n\ninput2\n\n\n\n\ninput2--hidden11\n\n\n\n\ninput2--hidden12\n\n\n\n\ninput2--hidden13\n\n\n\n\nhidden21\n\n\n\n\nhidden11--hidden21\n\n\n\n\nhidden22\n\n\n\n\nhidden11--hidden22\n\n\n\n\nhidden12--hidden21\n\n\n\n\nhidden12--hidden22\n\n\n\n\nhidden13--hidden21\n\n\n\n\nhidden13--hidden22\n\n\n\n\noutput1\n\n\n\n\nhidden21--output1\n\n\n\n\nhidden22--output1\n\n\n\n\n\n\n\n\n\nThe nn_initial is an instance that acts as a function and can take data as inputs and output predictions. We will define distributions on the neural network parameters.\n\n# Construct a neural network using Lux\nnn_initial = Chain(Dense(2 => 3, tanh), Dense(3 => 2, tanh), Dense(2 => 1, σ))\n\n# Initialize the model weights and state\nps, st = Lux.setup(rng, nn_initial)\n\nLux.parameterlength(nn_initial) # number of parameters in NN\n\n20\n\n\nThe probabilistic model specification below creates a parameters variable, which has IID normal variables. The parameters vector represents all parameters of our neural net (weights and biases).\n\n# Create a regularization term and a Gaussian prior variance term.\nalpha = 0.09\nsigma = sqrt(1.0 / alpha)\n\n3.3333333333333335\n\n\nWe also define a function to construct a named tuple from a vector of sampled parameters. (We could use ComponentArrays here and broadcast to avoid doing this, but this way avoids introducing an extra dependency.)\n\nfunction vector_to_parameters(ps_new::AbstractVector, ps::NamedTuple)\n @assert length(ps_new) == Lux.parameterlength(ps)\n i = 1\n function get_ps(x)\n z = reshape(view(ps_new, i:(i + length(x) - 1)), size(x))\n i += length(x)\n return z\n end\n return fmap(get_ps, ps)\nend\n\nvector_to_parameters (generic function with 1 method)\n\n\nTo interface with external libraries it is often desirable to use the StatefulLuxLayer to automatically handle the neural network states.\n\nconst nn = StatefulLuxLayer{true}(nn_initial, nothing, st)\n\n# Specify the probabilistic model.\n@model function bayes_nn(xs, ts; sigma = sigma, ps = ps, nn = nn)\n # Sample the parameters\n nparameters = Lux.parameterlength(nn_initial)\n parameters ~ MvNormal(zeros(nparameters), Diagonal(abs2.(sigma .* ones(nparameters))))\n\n # Forward NN to make predictions\n preds = Lux.apply(nn, xs, f32(vector_to_parameters(parameters, ps)))\n\n # Observe each prediction.\n for i in eachindex(ts)\n ts[i] ~ Bernoulli(preds[i])\n end\nend\n\nbayes_nn (generic function with 2 methods)\n\n\nInference can now be performed by calling sample. We use the NUTS Hamiltonian Monte Carlo sampler here.\n\nsetprogress!(false)\n\n\n# Perform inference.\nn_iters = 2_000\nch = sample(bayes_nn(reduce(hcat, xs), ts), NUTS(; adtype=AutoMooncake(; config=nothing)), n_iters);\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\nNow we extract the parameter samples from the sampled chain as θ (this is of size 5000 x 20 where 5000 is the number of iterations and 20 is the number of parameters). We’ll use these primarily to determine how good our model’s classifier is.\n\n# Extract all weight and bias parameters.\nθ = MCMCChains.group(ch, :parameters).value;", "crumbs": [ "Get Started", "Tutorials", "Bayesian Neural Networks" ] }, { "objectID": "tutorials/bayesian-neural-networks/index.html#prediction-visualization", "href": "tutorials/bayesian-neural-networks/index.html#prediction-visualization", "title": "Bayesian Neural Networks", "section": "Prediction Visualization", "text": "Prediction Visualization\nWe can use MAP estimation to classify our population by using the set of weights that provided the highest log posterior.\n\n# A helper to run the nn through data `x` using parameters `θ`\nnn_forward(x, θ) = nn(x, vector_to_parameters(θ, ps))\n\n# Plot the data we have.\nfig = plot_data()\n\n# Find the index that provided the highest log posterior in the chain.\n_, i = findmax(ch[:lp])\n\n# Extract the max row value from i.\ni = i.I[1]\n\n# Plot the posterior distribution with a contour plot\nx1_range = collect(range(-6; stop=6, length=25))\nx2_range = collect(range(-6; stop=6, length=25))\nZ = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]\ncontour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)\nfig\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe contour plot above shows that the MAP method is not too bad at classifying our data.\nNow we can visualize our predictions.\n\\[\np(\\tilde{x} | X, \\alpha) = \\int_{\\theta} p(\\tilde{x} | \\theta) p(\\theta | X, \\alpha) \\approx \\sum_{\\theta \\sim p(\\theta | X, \\alpha)}f_{\\theta}(\\tilde{x})\n\\]\nThe nn_predict function takes the average predicted value from a network parameterized by weights drawn from the MCMC chain.\n\n# Return the average predicted value across\n# multiple weights.\nfunction nn_predict(x, θ, num)\n num = min(num, size(θ, 1)) # make sure num does not exceed the number of samples\n return mean([first(nn_forward(x, view(θ, i, :))) for i in 1:10:num])\nend\n\nnn_predict (generic function with 1 method)\n\n\nNext, we use the nn_predict function to predict the value at a sample of points where the x1 and x2 coordinates range between -6 and 6. As we can see below, we still have a satisfactory fit to our data, and more importantly, we can also see where the neural network is uncertain about its predictions much easier—those regions between cluster boundaries.\n\n# Plot the average prediction.\nfig = plot_data()\n\nn_end = 1500\nx1_range = collect(range(-6; stop=6, length=25))\nx2_range = collect(range(-6; stop=6, length=25))\nZ = [nn_predict([x1, x2], θ, n_end)[1] for x1 in x1_range, x2 in x2_range]\ncontour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)\nfig\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSuppose we are interested in how the predictive power of our Bayesian neural network evolved between samples. In that case, the following graph displays an animation of the contour plot generated from the network weights in samples 1 to 1,000.\n\n# Number of iterations to plot.\nn_end = 500\n\nanim = @gif for i in 1:n_end\n plot_data()\n Z = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]\n contour!(x1_range, x2_range, Z; title=\"Iteration $i\", clim=(0, 1))\nend every 5\n\n[ Info: Saved animation to /tmp/jl_X0R698f6py.gif\n\n\n\n\n\nThis has been an introduction to the applications of Turing and Lux in defining Bayesian neural networks.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Neural Networks" ] }, { "objectID": "tutorials/bayesian-poisson-regression/index.html", "href": "tutorials/bayesian-poisson-regression/index.html", "title": "Bayesian Poisson Regression", "section": "", "text": "This notebook is ported from the example notebook of PyMC3 on Poisson Regression.\nPoisson Regression is a technique commonly used to model count data. Some of the applications include predicting the number of people defaulting on their loans or the number of cars running on a highway on a given day. This example describes a method to implement the Bayesian version of this technique using Turing.\nWe will generate the dataset that we will be working on which describes the relationship between number of times a person sneezes during the day with his alcohol consumption and medicinal intake.\nWe start by importing the required libraries.\n\n#Import Turing, Distributions and DataFrames\nusing Turing, Distributions, DataFrames, Distributed\n\n# Import MCMCChain, Plots, and StatsPlots for visualizations and diagnostics.\nusing MCMCChains, Plots, StatsPlots\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(12);\n\n\nGenerating data\nWe start off by creating a toy dataset. We take the case of a person who takes medicine to prevent excessive sneezing. Alcohol consumption increases the rate of sneezing for that person. Thus, the two factors affecting the number of sneezes in a given day are alcohol consumption and whether the person has taken his medicine. Both these variable are taken as boolean valued while the number of sneezes will be a count valued variable. We also take into consideration that the interaction between the two boolean variables will affect the number of sneezes\n5 random rows are printed from the generated data to get a gist of the data generated.\n\ntheta_noalcohol_meds = 1 # no alcohol, took medicine\ntheta_alcohol_meds = 3 # alcohol, took medicine\ntheta_noalcohol_nomeds = 6 # no alcohol, no medicine\ntheta_alcohol_nomeds = 36 # alcohol, no medicine\n\n# no of samples for each of the above cases\nq = 100\n\n#Generate data from different Poisson distributions\nnoalcohol_meds = Poisson(theta_noalcohol_meds)\nalcohol_meds = Poisson(theta_alcohol_meds)\nnoalcohol_nomeds = Poisson(theta_noalcohol_nomeds)\nalcohol_nomeds = Poisson(theta_alcohol_nomeds)\n\nnsneeze_data = vcat(\n rand(noalcohol_meds, q),\n rand(alcohol_meds, q),\n rand(noalcohol_nomeds, q),\n rand(alcohol_nomeds, q),\n)\nalcohol_data = vcat(zeros(q), ones(q), zeros(q), ones(q))\nmeds_data = vcat(zeros(q), zeros(q), ones(q), ones(q))\n\ndf = DataFrame(;\n nsneeze=nsneeze_data,\n alcohol_taken=alcohol_data,\n nomeds_taken=meds_data,\n product_alcohol_meds=meds_data .* alcohol_data,\n)\ndf[sample(1:nrow(df), 5; replace=false), :]\n\n5×4 DataFrame\n\n\n\nRow\nnsneeze\nalcohol_taken\nnomeds_taken\nproduct_alcohol_meds\n\n\n\nInt64\nFloat64\nFloat64\nFloat64\n\n\n\n\n1\n2\n1.0\n0.0\n0.0\n\n\n2\n1\n0.0\n0.0\n0.0\n\n\n3\n2\n0.0\n0.0\n0.0\n\n\n4\n30\n1.0\n1.0\n1.0\n\n\n5\n0\n1.0\n0.0\n0.0\n\n\n\n\n\n\n\n\nVisualisation of the dataset\nWe plot the distribution of the number of sneezes for the 4 different cases taken above. As expected, the person sneezes the most when he has taken alcohol and not taken his medicine. He sneezes the least when he doesn’t consume alcohol and takes his medicine.\n\n# Data Plotting\n\np1 = Plots.histogram(\n df[(df[:, :alcohol_taken] .== 0) .& (df[:, :nomeds_taken] .== 0), 1];\n title=\"no_alcohol+meds\",\n)\np2 = Plots.histogram(\n (df[(df[:, :alcohol_taken] .== 1) .& (df[:, :nomeds_taken] .== 0), 1]);\n title=\"alcohol+meds\",\n)\np3 = Plots.histogram(\n (df[(df[:, :alcohol_taken] .== 0) .& (df[:, :nomeds_taken] .== 1), 1]);\n title=\"no_alcohol+no_meds\",\n)\np4 = Plots.histogram(\n (df[(df[:, :alcohol_taken] .== 1) .& (df[:, :nomeds_taken] .== 1), 1]);\n title=\"alcohol+no_meds\",\n)\nplot(p1, p2, p3, p4; layout=(2, 2), legend=false)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe must convert our DataFrame data into the Matrix form as the manipulations that we are about are designed to work with Matrix data. We also separate the features from the labels which will be later used by the Turing sampler to generate samples from the posterior.\n\n# Convert the DataFrame object to matrices.\ndata = Matrix(df[:, [:alcohol_taken, :nomeds_taken, :product_alcohol_meds]])\ndata_labels = df[:, :nsneeze]\ndata\n\n400×3 Matrix{Float64}:\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n ⋮ \n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n\n\nWe must recenter our data about 0 to help the Turing sampler in initialising the parameter estimates. So, normalising the data in each column by subtracting the mean and dividing by the standard deviation:\n\n# Rescale our matrices.\ndata = (data .- mean(data; dims=1)) ./ std(data; dims=1)\n\n400×3 Matrix{Float64}:\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n ⋮ \n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n\n\n\n\nDeclaring the Model: Poisson Regression\nOur model, poisson_regression takes four arguments:\n\nx is our set of independent variables;\ny is the element we want to predict;\nn is the number of observations we have; and\nσ² is the standard deviation we want to assume for our priors.\n\nWithin the model, we create four coefficients (b0, b1, b2, and b3) and assign a prior of normally distributed with means of zero and standard deviations of σ². We want to find values of these four coefficients to predict any given y.\nIntuitively, we can think of the coefficients as:\n\nb1 is the coefficient which represents the effect of taking alcohol on the number of sneezes;\nb2 is the coefficient which represents the effect of taking in no medicines on the number of sneezes;\nb3 is the coefficient which represents the effect of interaction between taking alcohol and no medicine on the number of sneezes;\n\nThe for block creates a variable theta which is the weighted combination of the input features. We have defined the priors on these weights above. We then observe the likelihood of calculating theta given the actual label, y[i].\n\n# Bayesian poisson regression (LR)\n@model function poisson_regression(x, y, n, σ²)\n b0 ~ Normal(0, σ²)\n b1 ~ Normal(0, σ²)\n b2 ~ Normal(0, σ²)\n b3 ~ Normal(0, σ²)\n for i in 1:n\n theta = b0 + b1 * x[i, 1] + b2 * x[i, 2] + b3 * x[i, 3]\n y[i] ~ Poisson(exp(theta))\n end\nend;\n\n\n\nSampling from the posterior\nWe use the NUTS sampler to sample values from the posterior. We run multiple chains using the MCMCThreads() function to nullify the effect of a problematic chain. We then use the Gelman, Rubin, and Brooks Diagnostic to check the convergence of these multiple chains.\n\n# Retrieve the number of observations.\nn, _ = size(data)\n\n# Sample using NUTS.\n\nnum_chains = 4\nm = poisson_regression(data, data_labels, n, 10)\nchain = sample(m, NUTS(), MCMCThreads(), 2_500, num_chains; discard_adapt=false, progress=false)\n\n\n\nChains MCMC chain (2500×16×4 Array{Float64, 3}):\n\nIterations = 1:1:2500\nNumber of chains = 4\nSamples per chain = 2500\nWall duration = 18.13 seconds\nCompute duration = 15.82 seconds\nparameters = b0, b1, b2, b3\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n b0 1.5134 0.6703 0.0515 694.9424 165.3650 1.0044 ⋯\n b1 0.7947 1.0439 0.0956 487.9605 155.6397 1.0063 ⋯\n b2 1.1525 1.1600 0.1022 518.5761 149.7213 1.0078 ⋯\n b3 0.0935 1.0676 0.0924 484.7824 162.8272 1.0073 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n b0 0.9883 1.5595 1.5835 1.6064 1.6491\n b1 0.5406 0.6215 0.6642 0.7092 1.8414\n b2 0.8842 0.9598 1.0004 1.0430 2.2189\n b3 -0.5332 0.1490 0.1894 0.2285 0.3043\n\n\n\n\n\n\n\n\nSampling With Multiple Threads\n\n\n\n\n\nThe sample() call above assumes that you have at least nchains threads available in your Julia instance. If you do not, the multiple chains will run sequentially, and you may notice a warning. For more information, see the Turing documentation on sampling multiple chains.\n\n\n\n\n\nViewing the Diagnostics\nWe use the Gelman, Rubin, and Brooks Diagnostic to check whether our chains have converged. Note that we require multiple chains to use this diagnostic which analyses the difference between these multiple chains.\nWe expect the chains to have converged. This is because we have taken sufficient number of iterations (1500) for the NUTS sampler. However, in case the test fails, then we will have to take a larger number of iterations, resulting in longer computation time.\n\ngelmandiag(chain)\n\nGelman, Rubin, and Brooks diagnostic\n parameters psrf psrfci\n Symbol Float64 Float64\n\n b0 1.0546 1.0749\n b1 1.1892 1.2841\n b2 1.1159 1.1506\n b3 1.1641 1.2571\n\n\nFrom the above diagnostic, we can conclude that the chains have converged because the PSRF values of the coefficients are close to 1.\nSo, we have obtained the posterior distributions of the parameters. We transform the coefficients and recover theta values by taking the exponent of the meaned values of the coefficients b0, b1, b2 and b3. We take the exponent of the means to get a better comparison of the relative values of the coefficients. We then compare this with the intuitive meaning that was described earlier.\n\n# Taking the first chain\nc1 = chain[:, :, 1]\n\n# Calculating the exponentiated means\nb0_exp = exp(mean(c1[:b0]))\nb1_exp = exp(mean(c1[:b1]))\nb2_exp = exp(mean(c1[:b2]))\nb3_exp = exp(mean(c1[:b3]))\n\nprint(\"The exponent of the meaned values of the weights (or coefficients are): \\n\")\nprintln(\"b0: \", b0_exp)\nprintln(\"b1: \", b1_exp)\nprintln(\"b2: \", b2_exp)\nprintln(\"b3: \", b3_exp)\nprint(\"The posterior distributions obtained after sampling can be visualised as :\\n\")\n\nThe exponent of the meaned values of the weights (or coefficients are): \nb0: 4.981298053727232\nb1: 2.046506896456444\nb2: 3.0427981403371183\nb3: 1.1753029983667895\nThe posterior distributions obtained after sampling can be visualised as :\n\n\nVisualising the posterior by plotting it:\n\nplot(chain)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInterpreting the Obtained Mean Values\nThe exponentiated mean of the coefficient b1 is roughly half of that of b2. This makes sense because in the data that we generated, the number of sneezes was more sensitive to the medicinal intake as compared to the alcohol consumption. We also get a weaker dependence on the interaction between the alcohol consumption and the medicinal intake as can be seen from the value of b3.\n\n\nRemoving the Warmup Samples\nAs can be seen from the plots above, the parameters converge to their final distributions after a few iterations. The initial values during the warmup phase increase the standard deviations of the parameters and are not required after we get the desired distributions. Thus, we remove these warmup values and once again view the diagnostics. To remove these warmup values, we take all values except the first 200. This is because we set the second parameter of the NUTS sampler (which is the number of adaptations) to be equal to 200.\n\nchains_new = chain[201:end, :, :]\n\nChains MCMC chain (2300×16×4 Array{Float64, 3}):\n\nIterations = 201:1:2500\nNumber of chains = 4\nSamples per chain = 2300\nWall duration = 18.13 seconds\nCompute duration = 15.82 seconds\nparameters = b0, b1, b2, b3\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n b0 1.5845 0.0328 0.0007 2474.8214 3278.3788 1.0007 ⋯\n b1 0.6628 0.0619 0.0014 1852.6038 2449.5814 1.0024 ⋯\n b2 0.9991 0.0583 0.0014 1832.6878 2713.3240 1.0026 ⋯\n b3 0.1901 0.0566 0.0013 1856.6890 2421.6592 1.0023 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n b0 1.5202 1.5620 1.5846 1.6070 1.6477\n b1 0.5439 0.6203 0.6619 0.7042 0.7874\n b2 0.8887 0.9585 0.9981 1.0388 1.1153\n b3 0.0758 0.1525 0.1908 0.2289 0.2985\n\n\n\nplot(chains_new)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAs can be seen from the numeric values and the plots above, the standard deviation values have decreased and all the plotted values are from the estimated posteriors. The exponentiated mean values, with the warmup samples removed, have not changed by much and they are still in accordance with their intuitive meanings as described earlier.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Tutorials", "Bayesian Poisson Regression" ] }, { "objectID": "tutorials/probabilistic-pca/index.html", "href": "tutorials/probabilistic-pca/index.html", "title": "Probabilistic PCA", "section": "", "text": "Principal component analysis (PCA) is a fundamental technique to analyse and visualise data. It is an unsupervised learning method mainly used for dimensionality reduction.\nFor example, we have a data matrix \\(\\mathbf{X} \\in \\mathbb{R}^{N \\times D}\\), and we would like to extract \\(k \\ll D\\) principal components which captures most of the information from the original matrix. The goal is to understand \\(\\mathbf{X}\\) through a lower dimensional subspace (e.g. two-dimensional subspace for visualisation convenience) spanned by the principal components.\nIn order to project the original data matrix into low dimensions, we need to find the principal directions where most of the variations of \\(\\mathbf{X}\\) lie in. Traditionally, this is implemented via singular value decomposition (SVD) which provides a robust and accurate computational framework for decomposing matrix into products of rotation-scaling-rotation matrices, particularly for large datasets(see an illustration here):\n\\[\n\\mathbf{X}_{N \\times D} = \\mathbf{U}_{N \\times r} \\times \\boldsymbol{\\Sigma}_{r \\times r} \\times \\mathbf{V}^T_{r \\times D}\n\\]\nwhere \\(\\Sigma_{r \\times r}\\) contains only \\(r := \\operatorname{rank} \\mathbf{X} \\leq \\min\\{N,D\\}\\) non-zero singular values of \\(\\mathbf{X}\\). If we pad \\(\\Sigma\\) with zeros and add arbitrary orthonormal columns to \\(\\mathbf{U}\\) and \\(\\mathbf{V}\\), we obtain the more compact form:1\n\\[\n\\mathbf{X}_{N \\times D} = \\mathbf{U}_{N \\times N} \\mathbf{\\Sigma}_{N \\times D} \\mathbf{V}_{D \\times D}^T\n\\]\nwhere \\(\\mathbf{U}\\) and \\(\\mathbf{V}\\) are unitary matrices (i.e. with orthonormal columns). Such a decomposition always exists for any matrix. Columns of \\(\\mathbf{V}\\) are the principal directions/axes. The percentage of variations explained can be calculated using the ratios of singular values.2\nHere we take a probabilistic perspective. For more details and a mathematical derivation, we recommend Bishop’s textbook (Christopher M. Bishop, Pattern Recognition and Machine Learning, 2006). The idea of proabilistic PCA is to find a latent variable \\(z\\) that can be used to describe the hidden structure in a dataset.3 Consider a data set \\(\\mathbf{X}_{D \\times N}=\\{x_i\\}\\) with \\(i=1,2,...,N\\) data points, where each data point \\(x_i\\) is \\(D\\)-dimensional (i.e. \\(x_i \\in \\mathcal{R}^D\\)). Note that, here we use the flipped version of the data matrix. We aim to represent the original \\(n\\) dimensional vector using a lower dimensional a latent variable \\(z_i \\in \\mathcal{R}^k\\).\nWe first assume that each latent variable \\(z_i\\) is normally distributed:\n\\[\nz_i \\sim \\mathcal{N}(0, I)\n\\]\nand the corresponding data point is generated via projection:\n\\[\nx_i | z_i \\sim \\mathcal{N}(\\mathbf{W} z_i + \\boldsymbol{μ}, \\sigma^2 \\mathbf{I})\n\\]\nwhere the projection matrix \\(\\mathbf{W}_{D \\times k}\\) accommodates the principal axes. The above formula expresses \\(x_i\\) as a linear combination of the basis columns in the projection matrix W, where the combination coefficients sit in z_i (they are the coordinats of x_i in the new \\(k\\)-dimensional space.). We can also express the above formula in matrix form: \\(\\mathbf{X}_{D \\times N} \\approx \\mathbf{W}_{D \\times k} \\mathbf{Z}_{k \\times N}\\). We are interested in inferring \\(\\mathbf{W}\\), \\(μ\\) and \\(\\sigma\\).\nClassical PCA is the specific case of probabilistic PCA when the covariance of the noise becomes infinitesimally small, i.e. \\(\\sigma^2 \\to 0\\). Probabilistic PCA generalizes classical PCA, this can be seen by marginalizing out the the latent variable.4", "crumbs": [ "Get Started", "Tutorials", "Probabilistic PCA" ] }, { "objectID": "tutorials/probabilistic-pca/index.html#overview-of-pca", "href": "tutorials/probabilistic-pca/index.html#overview-of-pca", "title": "Probabilistic PCA", "section": "", "text": "Principal component analysis (PCA) is a fundamental technique to analyse and visualise data. It is an unsupervised learning method mainly used for dimensionality reduction.\nFor example, we have a data matrix \\(\\mathbf{X} \\in \\mathbb{R}^{N \\times D}\\), and we would like to extract \\(k \\ll D\\) principal components which captures most of the information from the original matrix. The goal is to understand \\(\\mathbf{X}\\) through a lower dimensional subspace (e.g. two-dimensional subspace for visualisation convenience) spanned by the principal components.\nIn order to project the original data matrix into low dimensions, we need to find the principal directions where most of the variations of \\(\\mathbf{X}\\) lie in. Traditionally, this is implemented via singular value decomposition (SVD) which provides a robust and accurate computational framework for decomposing matrix into products of rotation-scaling-rotation matrices, particularly for large datasets(see an illustration here):\n\\[\n\\mathbf{X}_{N \\times D} = \\mathbf{U}_{N \\times r} \\times \\boldsymbol{\\Sigma}_{r \\times r} \\times \\mathbf{V}^T_{r \\times D}\n\\]\nwhere \\(\\Sigma_{r \\times r}\\) contains only \\(r := \\operatorname{rank} \\mathbf{X} \\leq \\min\\{N,D\\}\\) non-zero singular values of \\(\\mathbf{X}\\). If we pad \\(\\Sigma\\) with zeros and add arbitrary orthonormal columns to \\(\\mathbf{U}\\) and \\(\\mathbf{V}\\), we obtain the more compact form:1\n\\[\n\\mathbf{X}_{N \\times D} = \\mathbf{U}_{N \\times N} \\mathbf{\\Sigma}_{N \\times D} \\mathbf{V}_{D \\times D}^T\n\\]\nwhere \\(\\mathbf{U}\\) and \\(\\mathbf{V}\\) are unitary matrices (i.e. with orthonormal columns). Such a decomposition always exists for any matrix. Columns of \\(\\mathbf{V}\\) are the principal directions/axes. The percentage of variations explained can be calculated using the ratios of singular values.2\nHere we take a probabilistic perspective. For more details and a mathematical derivation, we recommend Bishop’s textbook (Christopher M. Bishop, Pattern Recognition and Machine Learning, 2006). The idea of proabilistic PCA is to find a latent variable \\(z\\) that can be used to describe the hidden structure in a dataset.3 Consider a data set \\(\\mathbf{X}_{D \\times N}=\\{x_i\\}\\) with \\(i=1,2,...,N\\) data points, where each data point \\(x_i\\) is \\(D\\)-dimensional (i.e. \\(x_i \\in \\mathcal{R}^D\\)). Note that, here we use the flipped version of the data matrix. We aim to represent the original \\(n\\) dimensional vector using a lower dimensional a latent variable \\(z_i \\in \\mathcal{R}^k\\).\nWe first assume that each latent variable \\(z_i\\) is normally distributed:\n\\[\nz_i \\sim \\mathcal{N}(0, I)\n\\]\nand the corresponding data point is generated via projection:\n\\[\nx_i | z_i \\sim \\mathcal{N}(\\mathbf{W} z_i + \\boldsymbol{μ}, \\sigma^2 \\mathbf{I})\n\\]\nwhere the projection matrix \\(\\mathbf{W}_{D \\times k}\\) accommodates the principal axes. The above formula expresses \\(x_i\\) as a linear combination of the basis columns in the projection matrix W, where the combination coefficients sit in z_i (they are the coordinats of x_i in the new \\(k\\)-dimensional space.). We can also express the above formula in matrix form: \\(\\mathbf{X}_{D \\times N} \\approx \\mathbf{W}_{D \\times k} \\mathbf{Z}_{k \\times N}\\). We are interested in inferring \\(\\mathbf{W}\\), \\(μ\\) and \\(\\sigma\\).\nClassical PCA is the specific case of probabilistic PCA when the covariance of the noise becomes infinitesimally small, i.e. \\(\\sigma^2 \\to 0\\). Probabilistic PCA generalizes classical PCA, this can be seen by marginalizing out the the latent variable.4", "crumbs": [ "Get Started", "Tutorials", "Probabilistic PCA" ] }, { "objectID": "tutorials/probabilistic-pca/index.html#the-gene-expression-example", "href": "tutorials/probabilistic-pca/index.html#the-gene-expression-example", "title": "Probabilistic PCA", "section": "The gene expression example", "text": "The gene expression example\nIn the first example, we illustrate:\n\nhow to specify the probabilistic model and\nhow to perform inference on \\(\\mathbf{W}\\), \\(\\boldsymbol{\\mu}\\) and \\(\\sigma\\) using MCMC.\n\nWe use simulated gemnome data to demonstrate these. The simulation is inspired by biological measurement of expression of genes in cells, and each cell is characterized by different gene features. While the human genome is (mostly) identical between all the cells in the body, there exist interesting differences in gene expression in different human tissues and disease conditions. One way to investigate certain diseases is to look at differences in gene expression in cells from patients and healthy controls (usually from the same tissue).\nUsually, we can assume that the changes in gene expression only affect a subset of all genes (and these can be linked to diseases in some way). One of the challenges for this kind of data is to explore the underlying structure, e.g. to make the connection between a certain state (healthy/disease) and gene expression. This becomes difficult when the dimensions is very large (up to 20000 genes across 1000s of cells). So in order to find structure in this data, it is useful to project the data into a lower dimensional space.\nRegardless of the biological background, the more abstract problem formulation is to project the data living in high-dimensional space onto a representation in lower-dimensional space where most of the variation is concentrated in the first few dimensions. We use PCA to explore underlying structure or pattern which may not necessarily be obvious from looking at the raw data itself.\n\nStep 1: configuration of dependencies\nFirst, we load the dependencies used.\n\nusing Turing\nusing ReverseDiff\nusing LinearAlgebra, FillArrays\n\n# Packages for visualization\nusing DataFrames, StatsPlots, Measures\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(1789);\n\nAll packages used in this tutorial are listed here. You can install them via using Pkg; Pkg.add(\"package_name\").\n\n\n\n\n\n\nPackage usages:\n\n\n\nWe use DataFrames for instantiating matrices, LinearAlgebra and FillArrays to perform matrix operations; Turing for model specification and MCMC sampling, ReverseDiff for setting the automatic differentiation backend when sampling. StatsPlots for visualising the resutls. , Measures for setting plot margin units. As all examples involve sampling, for reproducibility we set a fixed seed using the Random standard library.\n\n\n\n\nStep 2: Data generation\nHere, we simulate the biological gene expression problem described earlier. We simulate 60 cells, each cell has 9 gene features. This is a simplified problem with only a few cells and genes for demonstration purpose, which is not comparable to the complexity in real-life (e.g. thousands of features for each individual). Even so, spotting the structures or patterns in a 9-feature space would be a challenging task; it would be nice to reduce the dimentionality using p-PCA.\nBy design, we mannually divide the 60 cells into two groups. the first 3 gene features of the first 30 cells have mean 10, while those of the last 30 cells have mean 10. These two groups of cells differ in the expression of genes.\n\nn_genes = 9 # D\nn_cells = 60 # N\n\n# create a diagonal block like expression matrix, with some non-informative genes;\n# not all features/genes are informative, some might just not differ very much between cells)\nmat_exp = randn(n_genes, n_cells)\nmat_exp[1:(n_genes ÷ 3), 1:(n_cells ÷ 2)] .+= 10\nmat_exp[(2 * (n_genes ÷ 3) + 1):end, (n_cells ÷ 2 + 1):end] .+= 10\n\n3×30 view(::Matrix{Float64}, 7:9, 31:60) with eltype Float64:\n 11.7413 10.5735 11.3817 8.50923 … 7.38716 8.7734 11.4717\n 9.28533 11.1225 9.43421 10.8904 11.6846 10.7264 9.64063\n 9.92113 8.42122 9.59885 9.90799 9.40715 8.40956 10.2522\n\n\nTo visualize the \\((D=9) \\times (N=60)\\) data matrix mat_exp, we use the heatmap plot.\n\nheatmap(\n mat_exp;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNote that:\n\nWe have made distinct feature differences between these two groups of cells (it is fairly obvious from looking at the raw data), in practice and with large enough data sets, it is often impossible to spot the differences from the raw data alone.\nIf you have some patience and compute resources you can increase the size of the dataset, or play around with the noise levels to make the problem increasingly harder.\n\n\n\nStep 3: Create the pPCA model\nHere we construct the probabilistic model pPCA(). As per the p-PCA formula, we think of each row (i.e. each gene feature) following a \\(N=60\\) dimensional multivariate normal distribution centered around the corresponding row of \\(\\mathbf{W}_{D \\times k} \\times \\mathbf{Z}_{k \\times N} + \\boldsymbol{\\mu}_{D \\times N}\\).\n\n@model function pPCA(X::AbstractMatrix{<:Real}, k::Int)\n # retrieve the dimension of input matrix X.\n N, D = size(X)\n\n # weights/loadings W\n W ~ filldist(Normal(), D, k)\n\n # latent variable z\n Z ~ filldist(Normal(), k, N)\n\n # mean offset\n μ ~ MvNormal(Eye(D))\n genes_mean = W * Z .+ reshape(μ, n_genes, 1)\n return X ~ arraydist([MvNormal(m, Eye(N)) for m in eachcol(genes_mean')])\nend;\n\nThe function pPCA() accepts:\n\nan data array \\(\\mathbf{X}\\) (with no. of instances x dimension no. of features, NB: it is a transpose of the original data matrix);\nan integer \\(k\\) which indicates the dimension of the latent space (the space the original feature matrix is projected onto).\n\nSpecifically:\n\nit first extracts the dimension \\(D\\) and number of instances \\(N\\) of the input matrix;\ndraw samples of each entries of the projection matrix \\(\\mathbf{W}\\) from a standard normal;\ndraw samples of the latent variable \\(\\mathbf{Z}_{k \\times N}\\) from an MND;\ndraw samples of the offset \\(\\boldsymbol{\\mu}\\) from an MND, assuming uniform offset for all instances;\nFinally, we iterate through each gene dimension in \\(\\mathbf{X}\\), and define an MND for the sampling distribution (i.e. likelihood).\n\n\n\nStep 4: Sampling-based inference of the pPCA model\nHere we aim to perform MCMC sampling to infer the projection matrix \\(\\mathbf{W}_{D \\times k}\\), the latent variable matrix \\(\\mathbf{Z}_{k \\times N}\\), and the offsets \\(\\boldsymbol{\\mu}_{N \\times 1}\\).\nWe run the inference using the NUTS sampler, of which the chain length is set to be 500, target accept ratio 0.65 and initial stepsize 0.1. By default, the NUTS sampler samples 1 chain. You are free to try different samplers.\n\nsetprogress!(false)\n\n\nk = 2 # k is the dimension of the projected space, i.e. the number of principal components/axes of choice\nppca = pPCA(mat_exp', k) # instantiate the probabilistic model\nchain_ppca = sample(ppca, NUTS(;adtype=AutoReverseDiff()), 500);\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\nThe samples are saved in the Chains struct chain_ppca, whose shape can be checked:\n\nsize(chain_ppca) # (no. of iterations, no. of vars, no. of chains) = (500, 159, 1)\n\n(500, 159, 1)\n\n\nThe Chains struct chain_ppca also contains the sampling info such as r-hat, ess, mean estimates, etc. You can print it to check these quantities.\n\n\nStep 5: posterior predictive checks\nWe try to reconstruct the input data using the posterior mean as parameter estimates. We first retrieve the samples for the projection matrix W from chain_ppca. This can be done using the Julia group(chain, parameter_name) function. Then we calculate the mean value for each element in \\(W\\), averaging over the whole chain of samples.\n\n# Extract parameter estimates for predicting x - mean of posterior\nW = reshape(mean(group(chain_ppca, :W))[:, 2], (n_genes, k))\nZ = reshape(mean(group(chain_ppca, :Z))[:, 2], (k, n_cells))\nμ = mean(group(chain_ppca, :μ))[:, 2]\n\nmat_rec = W * Z .+ repeat(μ; inner=(1, n_cells))\n\n9×60 Matrix{Float64}:\n 5.37269 5.3775 5.41309 … 4.2672 4.24075 4.2059\n 5.68997 5.68703 5.74404 4.59195 4.51976 4.44857\n 5.35789 5.37079 5.38524 4.21663 4.23659 4.23827\n -0.120669 -0.131465 -0.104823 -0.0376943 -0.0978681 -0.14477\n 0.147504 0.130951 0.174503 0.213836 0.118547 0.0434642\n -0.0211434 -0.0197489 -0.0208076 … -0.085526 -0.080426 -0.0771657\n 4.4598 4.47051 4.3961 5.45785 5.57138 5.67427\n 4.45833 4.4483 4.42439 5.63131 5.63 5.64358\n 4.27425 4.27471 4.22443 5.3774 5.43509 5.4948\n\n\n\nheatmap(\n mat_rec;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can quantitatively check the absolute magnitudes of the column average of the gap between mat_exp and mat_rec:\n\ndiff_matrix = mat_exp .- mat_rec\nfor col in 4:6\n @assert abs(mean(diff_matrix[:, col])) <= 0.5\nend\n\nWe observe that, using posterior mean, the recovered data matrix mat_rec has values align with the original data matrix - particularly the same pattern in the first and last 3 gene features are captured, which implies the inference and p-PCA decomposition are successful. This is satisfying as we have just projected the original 9-dimensional space onto a 2-dimensional space - some info has been cut off in the projection process, but we haven’t lost any important info, e.g. the key differences between the two groups. The is the desirable property of PCA: it picks up the principal axes along which most of the (original) data variations cluster, and remove those less relevant. If we choose the reduced space dimension \\(k\\) to be exactly \\(D\\) (the original data dimension), we would recover exactly the same original data matrix mat_exp, i.e. all information will be preserved.\nNow we have represented the original high-dimensional data in two dimensions, without lossing the key information about the two groups of cells in the input data. Finally, the benefits of performing PCA is to analyse and visualise the dimension-reduced data in the projected, low-dimensional space. we save the dimension-reduced matrix \\(\\mathbf{Z}\\) as a DataFrame, rename the columns and visualise the first two dimensions.\n\ndf_pca = DataFrame(Z', :auto)\nrename!(df_pca, Symbol.([\"z\" * string(i) for i in collect(1:k)]))\ndf_pca[!, :type] = repeat([1, 2]; inner=n_cells ÷ 2)\n\nscatter(df_pca[:, :z1], df_pca[:, :z2]; xlabel=\"z1\", ylabel=\"z2\", group=df_pca[:, :type])\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe see the two groups are well separated in this 2-D space. As an unsupervised learning method, performing PCA on this dataset gives membership for each cell instance. Another way to put it: 2 dimensions is enough to capture the main structure of the data.\n\n\nFurther extension: automatic choice of the number of principal components with ARD\nA direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data? This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space. In the case of PCA, there exist a lot of heuristics to make that choice. For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained 5.\nFor p-PCA, this can be done in an elegant and principled way, using a technique called Automatic Relevance Determination (ARD). ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features 6. Essentially, we are using a specific prior over the factor loadings \\(\\mathbf{W}\\) that allows us to prune away dimensions in the latent space. The prior is determined by a precision hyperparameter \\(\\alpha\\). Here, smaller values of \\(\\alpha\\) correspond to more important components. You can find more details about this in, for example, Bishop (2006) 7.\n\n@model function pPCA_ARD(X)\n # Dimensionality of the problem.\n N, D = size(X)\n\n # latent variable Z\n Z ~ filldist(Normal(), D, N)\n\n # weights/loadings w with Automatic Relevance Determination part\n α ~ filldist(Gamma(1.0, 1.0), D)\n W ~ filldist(MvNormal(zeros(D), 1.0 ./ sqrt.(α)), D)\n\n mu = (W' * Z)'\n\n tau ~ Gamma(1.0, 1.0)\n return X ~ arraydist([MvNormal(m, 1.0 / sqrt(tau)) for m in eachcol(mu)])\nend;\n\nInstead of drawing samples of each entry in \\(\\mathbf{W}\\) from a standard normal, this time we repeatedly draw \\(D\\) samples from the \\(D\\)-dimensional MND, forming a \\(D \\times D\\) matrix \\(\\mathbf{W}\\). This matrix is a function of \\(\\alpha\\) as the samples are drawn from the MND parameterized by \\(\\alpha\\). We also introduce a hyper-parameter \\(\\tau\\) which is the precision in the sampling distribution. We also re-paramterise the sampling distribution, i.e. each dimension across all instances is a 60-dimensional multivariate normal distribution. Re-parameterisation can sometimes accelrate the sampling process.\nWe instantiate the model and ask Turing to sample from it using NUTS sampler. The sample trajectories of \\(\\alpha\\) is plotted using the plot function from the package StatsPlots.\n\nppca_ARD = pPCA_ARD(mat_exp') # instantiate the probabilistic model\nchain_ppcaARD = sample(ppca_ARD, NUTS(;adtype=AutoReverseDiff()), 500) # sampling\nplot(group(chain_ppcaARD, :α); margin=6.0mm)\n\n┌ Info: Found initial step size\n└ ϵ = 0.05\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAgain, we do some inference diagnostics. Here we look at the convergence of the chains for the \\(α\\) parameter. This parameter determines the relevance of individual components. We see that the chains have converged and the posterior of the \\(\\alpha\\) parameters is centered around much smaller values in two instances. In the following, we will use the mean of the small values to select the relevant dimensions (remember that, smaller values of \\(\\alpha\\) correspond to more important components.). We can clearly see from the values of \\(\\alpha\\) that there should be two dimensions (corresponding to \\(\\bar{\\alpha}_3=\\bar{\\alpha}_5≈0.05\\)) for this dataset.\n\n# Extract parameter mean estimates of the posterior\nW = permutedims(reshape(mean(group(chain_ppcaARD, :W))[:, 2], (n_genes, n_genes)))\nZ = permutedims(reshape(mean(group(chain_ppcaARD, :Z))[:, 2], (n_genes, n_cells)))'\nα = mean(group(chain_ppcaARD, :α))[:, 2]\nplot(α; label=\"α\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can inspect α to see which elements are small (i.e. high relevance). To do this, we first sort α using sortperm() (in ascending order by default), and record the indices of the first two smallest values (among the \\(D=9\\) \\(\\alpha\\) values). After picking the desired principal directions, we extract the corresponding subset loading vectors from \\(\\mathbf{W}\\), and the corresponding dimensions of \\(\\mathbf{Z}\\). We obtain a posterior predicted matrix \\(\\mathbf{X} \\in \\mathbb{R}^{2 \\times 60}\\) as the product of the two sub-matrices, and compare the recovered info with the original matrix.\n\nα_indices = sortperm(α)[1:2]\nk = size(α_indices)[1]\nX_rec = W[:, α_indices] * Z[α_indices, :]\n\ndf_rec = DataFrame(X_rec', :auto)\nheatmap(\n X_rec;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe observe that, the data in the original space is recovered with key information, the distinct feature values in the first and last three genes for the two cell groups, are preserved. We can also examine the data in the dimension-reduced space, i.e. the selected components (rows) in \\(\\mathbf{Z}\\).\n\ndf_pro = DataFrame(Z[α_indices, :]', :auto)\nrename!(df_pro, Symbol.([\"z\" * string(i) for i in collect(1:k)]))\ndf_pro[!, :type] = repeat([1, 2]; inner=n_cells ÷ 2)\nscatter(\n df_pro[:, 1], df_pro[:, 2]; xlabel=\"z1\", ylabel=\"z2\", color=df_pro[:, \"type\"], label=\"\"\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThis plot is very similar to the low-dimensional plot above, with the relevant dimensions chosen based on the values of \\(α\\) via ARD. When you are in doubt about the number of dimensions to project onto, ARD might provide an answer to that question.", "crumbs": [ "Get Started", "Tutorials", "Probabilistic PCA" ] }, { "objectID": "tutorials/probabilistic-pca/index.html#final-comments.", "href": "tutorials/probabilistic-pca/index.html#final-comments.", "title": "Probabilistic PCA", "section": "Final comments.", "text": "Final comments.\np-PCA is a linear map which linearly transforms the data between the original and projected spaces. It can also thought as a matrix factorisation method, in which \\(\\mathbf{X}=(\\mathbf{W} \\times \\mathbf{Z})^T\\). The projection matrix can be understood as a new basis in the projected space, and \\(\\mathbf{Z}\\) are the new coordinates.", "crumbs": [ "Get Started", "Tutorials", "Probabilistic PCA" ] }, { "objectID": "tutorials/probabilistic-pca/index.html#footnotes", "href": "tutorials/probabilistic-pca/index.html#footnotes", "title": "Probabilistic PCA", "section": "Footnotes", "text": "Footnotes\n\n\nGilbert Strang, Introduction to Linear Algebra, 5th Ed., Wellesley-Cambridge Press, 2016.↩︎\nGareth M. James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer, 2013.↩︎\nProbabilistic PCA by TensorFlow, “https://www.tensorflow.org/probability/examples/Probabilistic_PCA”.↩︎\nProbabilistic PCA by TensorFlow, “https://www.tensorflow.org/probability/examples/Probabilistic_PCA”.↩︎\nGareth M. James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer, 2013.↩︎\nDavid Wipf, Srikantan Nagarajan, A New View of Automatic Relevance Determination, NIPS 2007.↩︎\nChristopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.↩︎", "crumbs": [ "Get Started", "Tutorials", "Probabilistic PCA" ] }, { "objectID": "tutorials/multinomial-logistic-regression/index.html", "href": "tutorials/multinomial-logistic-regression/index.html", "title": "Multinomial Logistic Regression", "section": "", "text": "Multinomial logistic regression is an extension of logistic regression. Logistic regression is used to model problems in which there are exactly two possible discrete outcomes. Multinomial logistic regression is used to model problems in which there are two or more possible discrete outcomes.\nIn our example, we’ll be using the iris dataset. The iris multiclass problem aims to predict the species of a flower given measurements (in centimeters) of sepal length and width and petal length and width. There are three possible species: Iris setosa, Iris versicolor, and Iris virginica.\nTo start, let’s import all the libraries we’ll need.\n# Load Turing.\nusing Turing\n\n# Load RDatasets.\nusing RDatasets\n\n# Load StatsPlots for visualizations and diagnostics.\nusing StatsPlots\n\n# Functionality for splitting and normalizing the data.\nusing MLDataUtils: shuffleobs, splitobs, rescale!\n\n# We need a softmax function which is provided by NNlib.\nusing NNlib: softmax\n\n# Functionality for constructing arrays with identical elements efficiently.\nusing FillArrays\n\n# Functionality for working with scaled identity matrices.\nusing LinearAlgebra\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(0);", "crumbs": [ "Get Started", "Tutorials", "Multinomial Logistic Regression" ] }, { "objectID": "tutorials/multinomial-logistic-regression/index.html#data-cleaning-set-up", "href": "tutorials/multinomial-logistic-regression/index.html#data-cleaning-set-up", "title": "Multinomial Logistic Regression", "section": "Data Cleaning & Set Up", "text": "Data Cleaning & Set Up\nNow we’re going to import our dataset. Twenty rows of the dataset are shown below so you can get a good feel for what kind of data we have.\n\n# Import the \"iris\" dataset.\ndata = RDatasets.dataset(\"datasets\", \"iris\");\n\n# Show twenty random rows.\ndata[rand(1:size(data, 1), 20), :]\n\n20×5 DataFrame\n\n\n\nRow\nSepalLength\nSepalWidth\nPetalLength\nPetalWidth\nSpecies\n\n\n\nFloat64\nFloat64\nFloat64\nFloat64\nCat…\n\n\n\n\n1\n5.0\n2.0\n3.5\n1.0\nversicolor\n\n\n2\n5.4\n3.7\n1.5\n0.2\nsetosa\n\n\n3\n7.2\n3.0\n5.8\n1.6\nvirginica\n\n\n4\n4.8\n3.0\n1.4\n0.1\nsetosa\n\n\n5\n5.7\n2.8\n4.1\n1.3\nversicolor\n\n\n6\n5.1\n3.5\n1.4\n0.3\nsetosa\n\n\n7\n5.4\n3.9\n1.3\n0.4\nsetosa\n\n\n8\n7.6\n3.0\n6.6\n2.1\nvirginica\n\n\n9\n5.0\n3.5\n1.6\n0.6\nsetosa\n\n\n10\n5.0\n3.6\n1.4\n0.2\nsetosa\n\n\n11\n5.5\n2.4\n3.8\n1.1\nversicolor\n\n\n12\n6.1\n2.6\n5.6\n1.4\nvirginica\n\n\n13\n4.4\n3.0\n1.3\n0.2\nsetosa\n\n\n14\n7.0\n3.2\n4.7\n1.4\nversicolor\n\n\n15\n6.1\n2.9\n4.7\n1.4\nversicolor\n\n\n16\n7.7\n3.0\n6.1\n2.3\nvirginica\n\n\n17\n6.4\n2.7\n5.3\n1.9\nvirginica\n\n\n18\n5.1\n3.3\n1.7\n0.5\nsetosa\n\n\n19\n6.7\n3.1\n4.7\n1.5\nversicolor\n\n\n20\n6.2\n2.2\n4.5\n1.5\nversicolor\n\n\n\n\n\n\nIn this data set, the outcome Species is currently coded as a string. We convert it to a numerical value by using indices 1, 2, and 3 to indicate species setosa, versicolor, and virginica, respectively.\n\n# Recode the `Species` column.\nspecies = [\"setosa\", \"versicolor\", \"virginica\"]\ndata[!, :Species_index] = indexin(data[!, :Species], species)\n\n# Show twenty random rows of the new species columns\ndata[rand(1:size(data, 1), 20), [:Species, :Species_index]]\n\n20×2 DataFrame\n\n\n\nRow\nSpecies\nSpecies_index\n\n\n\nCat…\nUnion…\n\n\n\n\n1\nsetosa\n1\n\n\n2\nversicolor\n2\n\n\n3\nversicolor\n2\n\n\n4\nsetosa\n1\n\n\n5\nversicolor\n2\n\n\n6\nversicolor\n2\n\n\n7\nversicolor\n2\n\n\n8\nversicolor\n2\n\n\n9\nvirginica\n3\n\n\n10\nversicolor\n2\n\n\n11\nvirginica\n3\n\n\n12\nsetosa\n1\n\n\n13\nsetosa\n1\n\n\n14\nversicolor\n2\n\n\n15\nsetosa\n1\n\n\n16\nsetosa\n1\n\n\n17\nvirginica\n3\n\n\n18\nversicolor\n2\n\n\n19\nvirginica\n3\n\n\n20\nvirginica\n3\n\n\n\n\n\n\nAfter we’ve done that tidying, it’s time to split our dataset into training and testing sets, and separate the features and target from the data. Additionally, we must rescale our feature variables so that they are centered around zero by subtracting each column by the mean and dividing it by the standard deviation. Without this step, Turing’s sampler will have a hard time finding a place to start searching for parameter estimates.\n\n# Split our dataset 50%/50% into training/test sets.\ntrainset, testset = splitobs(shuffleobs(data), 0.5)\n\n# Define features and target.\nfeatures = [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]\ntarget = :Species_index\n\n# Turing requires data in matrix and vector form.\ntrain_features = Matrix(trainset[!, features])\ntest_features = Matrix(testset[!, features])\ntrain_target = trainset[!, target]\ntest_target = testset[!, target]\n\n# Standardize the features.\nμ, σ = rescale!(train_features; obsdim=1)\nrescale!(test_features, μ, σ; obsdim=1);", "crumbs": [ "Get Started", "Tutorials", "Multinomial Logistic Regression" ] }, { "objectID": "tutorials/multinomial-logistic-regression/index.html#model-declaration", "href": "tutorials/multinomial-logistic-regression/index.html#model-declaration", "title": "Multinomial Logistic Regression", "section": "Model Declaration", "text": "Model Declaration\nFinally, we can define our model logistic_regression. It is a function that takes three arguments where\n\nx is our set of independent variables;\ny is the element we want to predict;\nσ is the standard deviation we want to assume for our priors.\n\nWe select the setosa species as the baseline class (the choice does not matter). Then we create the intercepts and vectors of coefficients for the other classes against that baseline. More concretely, we create scalar intercepts intercept_versicolor and intersept_virginica and coefficient vectors coefficients_versicolor and coefficients_virginica with four coefficients each for the features SepalLength, SepalWidth, PetalLength and PetalWidth. We assume a normal distribution with mean zero and standard deviation σ as prior for each scalar parameter. We want to find the posterior distribution of these, in total ten, parameters to be able to predict the species for any given set of features.\n\n# Bayesian multinomial logistic regression\n@model function logistic_regression(x, y, σ)\n n = size(x, 1)\n length(y) == n ||\n throw(DimensionMismatch(\"number of observations in `x` and `y` is not equal\"))\n\n # Priors of intercepts and coefficients.\n intercept_versicolor ~ Normal(0, σ)\n intercept_virginica ~ Normal(0, σ)\n coefficients_versicolor ~ MvNormal(Zeros(4), σ^2 * I)\n coefficients_virginica ~ MvNormal(Zeros(4), σ^2 * I)\n\n # Compute the likelihood of the observations.\n values_versicolor = intercept_versicolor .+ x * coefficients_versicolor\n values_virginica = intercept_virginica .+ x * coefficients_virginica\n for i in 1:n\n # the 0 corresponds to the base category `setosa`\n v = softmax([0, values_versicolor[i], values_virginica[i]])\n y[i] ~ Categorical(v)\n end\nend;", "crumbs": [ "Get Started", "Tutorials", "Multinomial Logistic Regression" ] }, { "objectID": "tutorials/multinomial-logistic-regression/index.html#sampling", "href": "tutorials/multinomial-logistic-regression/index.html#sampling", "title": "Multinomial Logistic Regression", "section": "Sampling", "text": "Sampling\nNow we can run our sampler. This time we’ll use NUTS to sample from our posterior.\n\nsetprogress!(false)\n\n\nm = logistic_regression(train_features, train_target, 1)\nchain = sample(m, NUTS(), MCMCThreads(), 1_500, 3)\n\n\n\nChains MCMC chain (1500×22×3 Array{Float64, 3}):\n\nIterations = 751:1:2250\nNumber of chains = 3\nSamples per chain = 1500\nWall duration = 16.82 seconds\nCompute duration = 14.34 seconds\nparameters = intercept_versicolor, intercept_virginica, coefficients_versicolor[1], coefficients_versicolor[2], coefficients_versicolor[3], coefficients_versicolor[4], coefficients_virginica[1], coefficients_virginica[2], coefficients_virginica[3], coefficients_virginica[4]\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_ ⋯\n Symbol Float64 Float64 Float64 Float64 Flo ⋯\n\n intercept_versicolor 1.0174 0.5070 0.0079 4114.7619 2929. ⋯\n intercept_virginica -0.4600 0.6471 0.0103 3926.2334 2855. ⋯\n coefficients_versicolor[1] 1.2967 0.6335 0.0103 3817.3071 3354. ⋯\n coefficients_versicolor[2] -1.3349 0.4944 0.0076 4213.9936 3103. ⋯\n coefficients_versicolor[3] 0.8538 0.7323 0.0114 4123.7694 3143. ⋯\n coefficients_versicolor[4] 0.2870 0.7182 0.0116 3820.1912 3097. ⋯\n coefficients_virginica[1] 0.8220 0.6650 0.0110 3669.3404 3394. ⋯\n coefficients_virginica[2] -0.7389 0.6211 0.0097 4062.0067 2922. ⋯\n coefficients_virginica[3] 2.4187 0.8282 0.0129 4116.5185 3416. ⋯\n coefficients_virginica[4] 2.7252 0.7664 0.0115 4418.4639 3314. ⋯\n 3 columns omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5% ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 ⋯\n\n intercept_versicolor 0.0378 0.6608 1.0073 1.3554 2.0253 ⋯\n intercept_virginica -1.7187 -0.9010 -0.4660 -0.0161 0.8035 ⋯\n coefficients_versicolor[1] 0.0760 0.8658 1.2885 1.7306 2.5267 ⋯\n coefficients_versicolor[2] -2.3685 -1.6561 -1.3098 -0.9911 -0.4296 ⋯\n coefficients_versicolor[3] -0.6006 0.3615 0.8520 1.3474 2.3110 ⋯\n coefficients_versicolor[4] -1.0971 -0.2119 0.2841 0.7780 1.7040 ⋯\n coefficients_virginica[1] -0.4892 0.3716 0.8225 1.2711 2.1089 ⋯\n coefficients_virginica[2] -1.9880 -1.1547 -0.7263 -0.3323 0.4751 ⋯\n coefficients_virginica[3] 0.8406 1.8371 2.4058 2.9603 4.0733 ⋯\n coefficients_virginica[4] 1.2458 2.2204 2.7073 3.2338 4.2704 ⋯\n\n\n\n\n\n\n\n\nSampling With Multiple Threads\n\n\n\n\n\nThe sample() call above assumes that you have at least nchains threads available in your Julia instance. If you do not, the multiple chains will run sequentially, and you may notice a warning. For more information, see the Turing documentation on sampling multiple chains.\n\n\n\nSince we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.\n\nplot(chain)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLooks good!\nWe can also use the corner function from MCMCChains to show the distributions of the various parameters of our multinomial logistic regression. The corner function requires MCMCChains and StatsPlots.\n\n# Only plotting the first 3 coefficients due to a bug in Plots.jl\ncorner(\n chain,\n MCMCChains.namesingroup(chain, :coefficients_versicolor)[1:3];\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n# Only plotting the first 3 coefficients due to a bug in Plots.jl\ncorner(\n chain,\n MCMCChains.namesingroup(chain, :coefficients_virginica)[1:3];\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFortunately the corner plots appear to demonstrate unimodal distributions for each of our parameters, so it should be straightforward to take the means of each parameter’s sampled values to estimate our model to make predictions.", "crumbs": [ "Get Started", "Tutorials", "Multinomial Logistic Regression" ] }, { "objectID": "tutorials/multinomial-logistic-regression/index.html#making-predictions", "href": "tutorials/multinomial-logistic-regression/index.html#making-predictions", "title": "Multinomial Logistic Regression", "section": "Making Predictions", "text": "Making Predictions\nHow do we test how well the model actually predicts which of the three classes an iris flower belongs to? We need to build a prediction function that takes the test dataset and runs it through the average parameter calculated during sampling.\nThe prediction function below takes a Matrix and a Chains object. It computes the mean of the sampled parameters and calculates the species with the highest probability for each observation. Note that we do not have to evaluate the softmax function since it does not affect the order of its inputs.\n\nfunction prediction(x::Matrix, chain)\n # Pull the means from each parameter's sampled values in the chain.\n intercept_versicolor = mean(chain, :intercept_versicolor)\n intercept_virginica = mean(chain, :intercept_virginica)\n coefficients_versicolor = [\n mean(chain, k) for k in MCMCChains.namesingroup(chain, :coefficients_versicolor)\n ]\n coefficients_virginica = [\n mean(chain, k) for k in MCMCChains.namesingroup(chain, :coefficients_virginica)\n ]\n\n # Compute the index of the species with the highest probability for each observation.\n values_versicolor = intercept_versicolor .+ x * coefficients_versicolor\n values_virginica = intercept_virginica .+ x * coefficients_virginica\n species_indices = [\n argmax((0, x, y)) for (x, y) in zip(values_versicolor, values_virginica)\n ]\n\n return species_indices\nend;\n\nLet’s see how we did! We run the test matrix through the prediction function, and compute the accuracy for our prediction.\n\n# Make the predictions.\npredictions = prediction(test_features, chain)\n\n# Calculate accuracy for our test set.\nmean(predictions .== testset[!, :Species_index])\n\n0.9066666666666666\n\n\nPerhaps more important is to see the accuracy per class.\n\nfor s in 1:3\n rows = testset[!, :Species_index] .== s\n println(\"Number of `\", species[s], \"`: \", count(rows))\n println(\n \"Percentage of `\",\n species[s],\n \"` predicted correctly: \",\n mean(predictions[rows] .== testset[rows, :Species_index]),\n )\nend\n\nNumber of `setosa`: 26\nPercentage of `setosa` predicted correctly: 1.0\nNumber of `versicolor`: 26\nPercentage of `versicolor` predicted correctly: 0.7692307692307693\nNumber of `virginica`: 23\nPercentage of `virginica` predicted correctly: 0.9565217391304348\n\n\nThis tutorial has demonstrated how to use Turing to perform Bayesian multinomial logistic regression.", "crumbs": [ "Get Started", "Tutorials", "Multinomial Logistic Regression" ] }, { "objectID": "getting-started/index.html", "href": "getting-started/index.html", "title": "Getting Started", "section": "", "text": "Installation\nTo use Turing, you need to install Julia first and then install Turing.\nYou will need to install Julia 1.10 or greater, which you can get from the official Julia website.\nTuring is officially registered in the Julia General package registry, which means that you can install a stable version of Turing by running the following in the Julia REPL:\n\nusing Pkg\nPkg.add(\"Turing\")\n\n\n\nSupported versions and platforms\nFormally, we only run continuous integration tests on: (1) the minimum supported minor version (typically an LTS release), and (2) the latest minor version of Julia. We test on Linux (x64), macOS (Apple Silicon), and Windows (x64). The Turing developer team will prioritise fixing issues on these platforms and versions.\nIf you run into a problem on a different version (e.g. older patch releases) or platforms (e.g. 32-bit), please do feel free to post an issue! If we are able to help, we will try to fix it, but we cannot guarantee support for untested versions.\n\n\nExample usage\nFirst, we load the Turing and StatsPlots modules. The latter is required for visualising the results.\n\nusing Turing\nusing StatsPlots\n\nWe then specify our model, which is a simple Gaussian model with unknown mean and variance. Models are defined as ordinary Julia functions, prefixed with the @model macro. Each statement inside closely resembles how the model would be defined with mathematical notation. Here, both x and y are observed values, and are therefore passed as function parameters. m and s² are the parameters to be inferred.\n\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n y ~ Normal(m, sqrt(s²))\nend\n\ngdemo (generic function with 2 methods)\n\n\nSuppose we observe x = 1.5 and y = 2, and want to infer the mean and variance. We can pass these data as arguments to the gdemo function, and run a sampler to collect the results. Here, we collect 1000 samples using the No U-Turn Sampler (NUTS) algorithm.\n\nchain = sample(gdemo(1.5, 2), NUTS(), 1000, progress=false)\n\n┌ Info: Found initial step size\n└ ϵ = 1.6\n\n\nChains MCMC chain (1000×14×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 6.81 seconds\nCompute duration = 6.81 seconds\nparameters = s², m\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s² 2.0845 2.1356 0.0962 471.7849 469.8925 1.0061 ⋯\n m 1.2052 0.8001 0.0370 502.1687 449.9333 1.0061 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.5360 1.0426 1.5308 2.4294 6.0762\n m -0.3534 0.7179 1.2089 1.6823 2.9311\n\n\nWe can plot the results:\n\nplot(chain)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nand obtain summary statistics by indexing the chain:\n\nmean(chain[:m]), mean(chain[:s²])\n\n(1.2052356263649302, 2.084499472624055)\n\n\n\n\nWhere to go next\n\n\n\n\n\n\nNote on prerequisites\n\n\n\nFamiliarity with Julia is assumed throughout the Turing documentation. If you are new to Julia, Learning Julia is a good starting point.\nThe underlying theory of Bayesian machine learning is not explained in detail in this documentation. A thorough introduction to the field is Pattern Recognition and Machine Learning (Bishop, 2006); an online version is available here (PDF, 18.1 MB).\n\n\nThe next page on Turing’s core functionality explains the basic features of the Turing language. From there, you can either look at worked examples of how different models are implemented in Turing, or specific tips and tricks that can help you get the most out of Turing.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Getting Started" ] }, { "objectID": "usage/external-samplers/index.html", "href": "usage/external-samplers/index.html", "title": "Using External Samplers", "section": "", "text": "Turing provides several wrapped samplers from external sampling libraries, e.g., HMC samplers from AdvancedHMC. These wrappers allow new users to seamlessly sample statistical models without leaving Turing However, these wrappers might only sometimes be complete, missing some functionality from the wrapped sampling library. Moreover, users might want to use samplers currently not wrapped within Turing.\nFor these reasons, Turing also makes running external samplers on Turing models easy without any necessary modifications or wrapping! Throughout, we will use a 10-dimensional Neal’s funnel as a running example::\n\n# Import libraries.\nusing Turing, Random, LinearAlgebra\n\nd = 10\n@model function funnel()\n θ ~ Truncated(Normal(0, 3), -3, 3)\n z ~ MvNormal(zeros(d - 1), exp(θ) * I)\n return x ~ MvNormal(z, I)\nend\n\nfunnel (generic function with 2 methods)\n\n\nNow we sample the model to generate some observations, which we can then condition on.\n\n(; x) = rand(funnel() | (θ=0,))\nmodel = funnel() | (; x);\n\nUsers can use any sampler algorithm to sample this model if it follows the AbstractMCMC API. Before discussing how this is done in practice, giving a high-level description of the process is interesting. Imagine that we created an instance of an external sampler that we will call spl such that typeof(spl)<:AbstractMCMC.AbstractSampler. In order to avoid type ambiguity within Turing, at the moment it is necessary to declare spl as an external sampler to Turing espl = externalsampler(spl), where externalsampler(s::AbstractMCMC.AbstractSampler) is a Turing function that types our external sampler adequately.\nAn excellent point to start to show how this is done in practice is by looking at the sampling library AdvancedMH (AdvancedMH’s GitHub) for Metropolis-Hastings (MH) methods. Let’s say we want to use a random walk Metropolis-Hastings sampler without specifying the proposal distributions. The code below constructs an MH sampler using a multivariate Gaussian distribution with zero mean and unit variance in d dimensions as a random walk proposal.\n\n# Importing the sampling library\nusing AdvancedMH\nrwmh = AdvancedMH.RWMH(d)\n\nMetropolisHastings{RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}}(RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}(ZeroMeanIsoNormal(\ndim: 10\nμ: Zeros(10)\nΣ: [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]\n)\n))\n\n\n\nsetprogress!(false)\n\nSampling is then as easy as:\n\nchain = sample(model, externalsampler(rwmh), 10_000)\n\nChains MCMC chain (10000×11×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 3.67 seconds\nCompute duration = 3.67 seconds\nparameters = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9]\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n θ 0.2008 0.8827 0.1414 39.8970 43.4393 1.1049 ⋯\n z[1] -1.3516 0.9407 0.1323 53.4236 143.8069 1.0486 ⋯\n z[2] -0.1062 0.7323 0.0585 147.2589 201.1747 1.0051 ⋯\n z[3] -0.1632 0.7114 0.0615 131.1338 217.7375 1.0227 ⋯\n z[4] 0.8878 0.7907 0.0898 79.2287 189.5564 1.0278 ⋯\n z[5] -1.6500 0.8869 0.1031 75.1115 158.8668 1.0477 ⋯\n z[6] -0.5009 0.8031 0.0816 98.4141 164.7801 1.0135 ⋯\n z[7] 0.5748 0.7782 0.0894 74.4541 118.0200 1.0804 ⋯\n z[8] -0.0390 0.7957 0.0860 84.4884 174.9503 1.0106 ⋯\n z[9] -0.1525 0.6810 0.0581 138.0675 162.1670 1.0215 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n θ -1.5609 -0.3348 0.3016 0.7769 1.9365\n z[1] -3.1808 -1.9899 -1.2590 -0.6542 0.2956\n z[2] -1.6322 -0.6240 -0.0413 0.3162 1.3199\n z[3] -1.5354 -0.6625 -0.1875 0.2927 1.2598\n z[4] -0.5404 0.3248 0.8882 1.3839 2.4254\n z[5] -3.6447 -2.1363 -1.5395 -0.9743 -0.2770\n z[6] -2.2857 -0.9555 -0.5535 0.1269 0.9418\n z[7] -0.9438 0.0766 0.5523 1.0848 2.1347\n z[8] -1.8904 -0.5147 -0.0420 0.5012 1.3646\n z[9] -1.3699 -0.7410 -0.1300 0.3582 1.1788", "crumbs": [ "Get Started", "User Guide", "Using External Samplers" ] }, { "objectID": "usage/external-samplers/index.html#using-external-samplers-on-turing-models", "href": "usage/external-samplers/index.html#using-external-samplers-on-turing-models", "title": "Using External Samplers", "section": "", "text": "Turing provides several wrapped samplers from external sampling libraries, e.g., HMC samplers from AdvancedHMC. These wrappers allow new users to seamlessly sample statistical models without leaving Turing However, these wrappers might only sometimes be complete, missing some functionality from the wrapped sampling library. Moreover, users might want to use samplers currently not wrapped within Turing.\nFor these reasons, Turing also makes running external samplers on Turing models easy without any necessary modifications or wrapping! Throughout, we will use a 10-dimensional Neal’s funnel as a running example::\n\n# Import libraries.\nusing Turing, Random, LinearAlgebra\n\nd = 10\n@model function funnel()\n θ ~ Truncated(Normal(0, 3), -3, 3)\n z ~ MvNormal(zeros(d - 1), exp(θ) * I)\n return x ~ MvNormal(z, I)\nend\n\nfunnel (generic function with 2 methods)\n\n\nNow we sample the model to generate some observations, which we can then condition on.\n\n(; x) = rand(funnel() | (θ=0,))\nmodel = funnel() | (; x);\n\nUsers can use any sampler algorithm to sample this model if it follows the AbstractMCMC API. Before discussing how this is done in practice, giving a high-level description of the process is interesting. Imagine that we created an instance of an external sampler that we will call spl such that typeof(spl)<:AbstractMCMC.AbstractSampler. In order to avoid type ambiguity within Turing, at the moment it is necessary to declare spl as an external sampler to Turing espl = externalsampler(spl), where externalsampler(s::AbstractMCMC.AbstractSampler) is a Turing function that types our external sampler adequately.\nAn excellent point to start to show how this is done in practice is by looking at the sampling library AdvancedMH (AdvancedMH’s GitHub) for Metropolis-Hastings (MH) methods. Let’s say we want to use a random walk Metropolis-Hastings sampler without specifying the proposal distributions. The code below constructs an MH sampler using a multivariate Gaussian distribution with zero mean and unit variance in d dimensions as a random walk proposal.\n\n# Importing the sampling library\nusing AdvancedMH\nrwmh = AdvancedMH.RWMH(d)\n\nMetropolisHastings{RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}}(RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}(ZeroMeanIsoNormal(\ndim: 10\nμ: Zeros(10)\nΣ: [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]\n)\n))\n\n\n\nsetprogress!(false)\n\nSampling is then as easy as:\n\nchain = sample(model, externalsampler(rwmh), 10_000)\n\nChains MCMC chain (10000×11×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 3.67 seconds\nCompute duration = 3.67 seconds\nparameters = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9]\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n θ 0.2008 0.8827 0.1414 39.8970 43.4393 1.1049 ⋯\n z[1] -1.3516 0.9407 0.1323 53.4236 143.8069 1.0486 ⋯\n z[2] -0.1062 0.7323 0.0585 147.2589 201.1747 1.0051 ⋯\n z[3] -0.1632 0.7114 0.0615 131.1338 217.7375 1.0227 ⋯\n z[4] 0.8878 0.7907 0.0898 79.2287 189.5564 1.0278 ⋯\n z[5] -1.6500 0.8869 0.1031 75.1115 158.8668 1.0477 ⋯\n z[6] -0.5009 0.8031 0.0816 98.4141 164.7801 1.0135 ⋯\n z[7] 0.5748 0.7782 0.0894 74.4541 118.0200 1.0804 ⋯\n z[8] -0.0390 0.7957 0.0860 84.4884 174.9503 1.0106 ⋯\n z[9] -0.1525 0.6810 0.0581 138.0675 162.1670 1.0215 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n θ -1.5609 -0.3348 0.3016 0.7769 1.9365\n z[1] -3.1808 -1.9899 -1.2590 -0.6542 0.2956\n z[2] -1.6322 -0.6240 -0.0413 0.3162 1.3199\n z[3] -1.5354 -0.6625 -0.1875 0.2927 1.2598\n z[4] -0.5404 0.3248 0.8882 1.3839 2.4254\n z[5] -3.6447 -2.1363 -1.5395 -0.9743 -0.2770\n z[6] -2.2857 -0.9555 -0.5535 0.1269 0.9418\n z[7] -0.9438 0.0766 0.5523 1.0848 2.1347\n z[8] -1.8904 -0.5147 -0.0420 0.5012 1.3646\n z[9] -1.3699 -0.7410 -0.1300 0.3582 1.1788", "crumbs": [ "Get Started", "User Guide", "Using External Samplers" ] }, { "objectID": "usage/external-samplers/index.html#going-beyond-the-turing-api", "href": "usage/external-samplers/index.html#going-beyond-the-turing-api", "title": "Using External Samplers", "section": "Going beyond the Turing API", "text": "Going beyond the Turing API\nAs previously mentioned, the Turing wrappers can often limit the capabilities of the sampling libraries they wrap. AdvancedHMC1 (AdvancedHMC’s GitHub) is a clear example of this. A common practice when performing HMC is to provide an initial guess for the mass matrix. However, the native HMC sampler within Turing only allows the user to specify the type of the mass matrix despite the two options being possible within AdvancedHMC. Thankfully, we can use Turing’s support for external samplers to define an HMC sampler with a custom mass matrix in AdvancedHMC and then use it to sample our Turing model.\nWe can use the library Pathfinder2 (Pathfinder’s GitHub) to construct our estimate of mass matrix. Pathfinder is a variational inference algorithm that first finds the maximum a posteriori (MAP) estimate of a target posterior distribution and then uses the trace of the optimization to construct a sequence of multivariate normal approximations to the target distribution. In this process, Pathfinder computes an estimate of the mass matrix the user can access. You can see an example of how to use Pathfinder with Turing in Pathfinder’s docs.", "crumbs": [ "Get Started", "User Guide", "Using External Samplers" ] }, { "objectID": "usage/external-samplers/index.html#using-new-inference-methods", "href": "usage/external-samplers/index.html#using-new-inference-methods", "title": "Using External Samplers", "section": "Using new inference methods", "text": "Using new inference methods\nSo far we have used Turing’s support for external samplers to go beyond the capabilities of the wrappers. We want to use this support to employ a sampler not supported within Turing’s ecosystem yet. We will use the recently developed Micro-Cannoncial Hamiltonian Monte Carlo (MCHMC) sampler to showcase this. MCHMC[3,4] ((MCHMC’s GitHub)[https://github.com/JaimeRZP/MicroCanonicalHMC.jl]) is HMC sampler that uses one single Hamiltonian energy level to explore the whole parameter space. This is achieved by simulating the dynamics of a microcanonical Hamiltonian with an additional noise term to ensure ergodicity.\nUsing this as well as other inference methods outside the Turing ecosystem is as simple as executing the code shown below:\n\nusing MicroCanonicalHMC\n# Create MCHMC sampler\nn_adapts = 1_000 # adaptation steps\ntev = 0.01 # target energy variance\nmchmc = MCHMC(n_adapts, tev; adaptive=true)\n\n# Sample\nchain = sample(model, externalsampler(mchmc), 10_000)\n\n[ Info: Tuning eps ⏳\n[ Info: Tuning L ⏳\n[ Info: Tuning sigma ⏳\nTuning: 0%|▏ | ETA: 0:05:59\n ϵ: 2.618963339850589\n L: 3.1622776601683795\n dE/d: -0.017945981667290313\n\n\nTuning: 1%|▍ | ETA: 0:04:02\n ϵ: 0.9081669496624556\n L: 4.513736295674778\n dE/d: 0.07490719412588583\n\n\nTuning: 100%|███████████████████████████████████████████| Time: 0:00:02\n ϵ: 0.9451379205728828\n L: 401.2921135930397\n dE/d: 0.013415481897950698\n\n\nChains MCMC chain (10000×11×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 6.38 seconds\nCompute duration = 6.38 seconds\nparameters = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9]\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n θ -0.0916 1.1250 0.0343 1106.0760 1015.7208 1.0006 ⋯\n z[1] -1.1814 0.8886 0.0216 1757.9202 2358.4054 0.9999 ⋯\n z[2] 0.0683 0.6380 0.0144 1984.5632 2440.2195 1.0063 ⋯\n z[3] -0.1670 0.7176 0.0139 2704.6786 3369.9208 1.0160 ⋯\n z[4] 0.9150 0.8369 0.0228 1395.7787 1861.2561 1.0006 ⋯\n z[5] -1.4478 0.9730 0.0237 1716.9077 2621.4913 1.0000 ⋯\n z[6] -0.4771 0.6885 0.0166 1791.6034 2103.1318 1.0017 ⋯\n z[7] 0.6151 0.7084 0.0171 1815.6212 2259.6315 1.0001 ⋯\n z[8] -0.0312 0.6634 0.0158 1803.2533 2229.2770 1.0018 ⋯\n z[9] -0.1310 0.7550 0.0156 2379.9398 3019.5837 1.0005 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n θ -2.4832 -0.8408 0.0182 0.7389 1.8549\n z[1] -3.1451 -1.7390 -1.0992 -0.5187 0.2748\n z[2] -1.1836 -0.3365 0.0570 0.4684 1.3673\n z[3] -1.6436 -0.5951 -0.1416 0.2688 1.2740\n z[4] -0.4906 0.2952 0.8074 1.4588 2.7277\n z[5] -3.5282 -2.0937 -1.3586 -0.7006 0.1305\n z[6] -2.0169 -0.8989 -0.4087 -0.0095 0.7281\n z[7] -0.5766 0.1130 0.5405 1.0408 2.2024\n z[8] -1.3929 -0.4268 -0.0251 0.3816 1.3076\n z[9] -1.6889 -0.6143 -0.1106 0.3622 1.3583\n\n\nThe only requirement to work with externalsampler is that the provided sampler must implement the AbstractMCMC.jl-interface [INSERT LINK] for a model of type AbstractMCMC.LogDensityModel [INSERT LINK].\nAs previously stated, in order to use external sampling libraries within Turing they must follow the AbstractMCMC API. In this section, we will briefly dwell on what this entails. First and foremost, the sampler should be a subtype of AbstractMCMC.AbstractSampler. Second, the stepping function of the MCMC algorithm must be made defined using AbstractMCMC.step and follow the structure below:\n\n# First step\nfunction AbstractMCMC.step{T<:AbstractMCMC.AbstractSampler}(\n rng::Random.AbstractRNG,\n model::AbstractMCMC.LogDensityModel,\n spl::T;\n kwargs...,\n)\n [...]\n return transition, sample\nend\n\n# N+1 step\nfunction AbstractMCMC.step{T<:AbstractMCMC.AbstractSampler}(\n rng::Random.AbstractRNG,\n model::AbstractMCMC.LogDensityModel,\n sampler::T,\n state;\n kwargs...,\n) \n [...]\n return transition, sample\nend\n\nThere are several characteristics to note in these functions:\n\nThere must be two step functions:\n\nA function that performs the first step and initializes the sampler.\nA function that performs the following steps and takes an extra input, state, which carries the initialization information.\n\nThe functions must follow the displayed signatures.\nThe output of the functions must be a transition, the current state of the sampler, and a sample, what is saved to the MCMC chain.\n\nThe last requirement is that the transition must be structured with a field θ, which contains the values of the parameters of the model for said transition. This allows Turing to seamlessly extract the parameter values at each step of the chain when bundling the chains. Note that if the external sampler produces transitions that Turing cannot parse, the bundling of the samples will be different or fail.\nFor practical examples of how to adapt a sampling library to the AbstractMCMC interface, the readers can consult the following libraries:\n\nAdvancedMH\nAdvancedHMC\nMicroCanonicalHMC", "crumbs": [ "Get Started", "User Guide", "Using External Samplers" ] }, { "objectID": "usage/external-samplers/index.html#footnotes", "href": "usage/external-samplers/index.html#footnotes", "title": "Using External Samplers", "section": "Footnotes", "text": "Footnotes\n\n\nXu et al., AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms, 2019↩︎\nZhang et al., Pathfinder: Parallel quasi-Newton variational inference, 2021↩︎\nRobnik et al, Microcanonical Hamiltonian Monte Carlo, 2022↩︎\nRobnik and Seljak, Langevine Hamiltonian Monte Carlo, 2023↩︎", "crumbs": [ "Get Started", "User Guide", "Using External Samplers" ] }, { "objectID": "usage/performance-tips/index.html", "href": "usage/performance-tips/index.html", "title": "Performance Tips", "section": "", "text": "This section briefly summarises a few common techniques to ensure good performance when using Turing. We refer to the Julia documentation for general techniques to ensure good performance of Julia programs.", "crumbs": [ "Get Started", "User Guide", "Performance Tips" ] }, { "objectID": "usage/performance-tips/index.html#use-multivariate-distributions", "href": "usage/performance-tips/index.html#use-multivariate-distributions", "title": "Performance Tips", "section": "Use multivariate distributions", "text": "Use multivariate distributions\nIt is generally preferable to use multivariate distributions if possible.\nThe following example:\n\nusing Turing\n@model function gmodel(x)\n m ~ Normal()\n for i in 1:length(x)\n x[i] ~ Normal(m, 0.2)\n end\nend\n\ngmodel (generic function with 2 methods)\n\n\ncan be directly expressed more efficiently using a simple transformation:\n\nusing FillArrays\n\n@model function gmodel(x)\n m ~ Normal()\n return x ~ MvNormal(Fill(m, length(x)), 0.04 * I)\nend\n\ngmodel (generic function with 2 methods)", "crumbs": [ "Get Started", "User Guide", "Performance Tips" ] }, { "objectID": "usage/performance-tips/index.html#choose-your-ad-backend", "href": "usage/performance-tips/index.html#choose-your-ad-backend", "title": "Performance Tips", "section": "Choose your AD backend", "text": "Choose your AD backend\nAutomatic differentiation (AD) makes it possible to use modern, efficient gradient-based samplers like NUTS and HMC, and that means a good AD system is incredibly important. Turing currently supports several AD backends, including ForwardDiff (the default), Mooncake, Zygote, and ReverseDiff.\nFor many common types of models, the default ForwardDiff backend performs great, and there is no need to worry about changing it. However, if you need more speed, you can try different backends via the standard ADTypes interface by passing an AbstractADType to the sampler with the optional adtype argument, e.g. NUTS(adtype = AutoZygote()). See Automatic Differentiation for details. Generally, adtype = AutoForwardDiff() is likely to be the fastest and most reliable for models with few parameters (say, less than 20 or so), while reverse-mode backends such as AutoZygote() or AutoReverseDiff() will perform better for models with many parameters or linear algebra operations. If in doubt, it’s easy to try a few different backends to see how they compare.\n\nSpecial care for Zygote\nNote that Zygote will not perform well if your model contains for-loops, due to the way reverse-mode AD is implemented in these packages. Zygote also cannot differentiate code that contains mutating operations. If you can’t implement your model without for-loops or mutation, ReverseDiff will be a better, more performant option. In general, though, vectorized operations are still likely to perform best.\nAvoiding loops can be done using filldist(dist, N) and arraydist(dists). filldist(dist, N) creates a multivariate distribution that is composed of N identical and independent copies of the univariate distribution dist if dist is univariate, or it creates a matrix-variate distribution composed of N identical and independent copies of the multivariate distribution dist if dist is multivariate. filldist(dist, N, M) can also be used to create a matrix-variate distribution from a univariate distribution dist. arraydist(dists) is similar to filldist but it takes an array of distributions dists as input. Writing a custom distribution with a custom adjoint is another option to avoid loops.\n\n\nSpecial care for ReverseDiff with a compiled tape\nFor large models, the fastest option is often ReverseDiff with a compiled tape, specified as adtype=AutoReverseDiff(true). However, it is important to note that if your model contains any branching code, such as if-else statements, the gradients from a compiled tape may be inaccurate, leading to erroneous results. If you use this option for the (considerable) speedup it can provide, make sure to check your code. It’s also a good idea to verify your gradients with another backend.", "crumbs": [ "Get Started", "User Guide", "Performance Tips" ] }, { "objectID": "usage/performance-tips/index.html#ensure-that-types-in-your-model-can-be-inferred", "href": "usage/performance-tips/index.html#ensure-that-types-in-your-model-can-be-inferred", "title": "Performance Tips", "section": "Ensure that types in your model can be inferred", "text": "Ensure that types in your model can be inferred\nFor efficient gradient-based inference, e.g. using HMC, NUTS or ADVI, it is important to ensure the types in your model can be inferred.\nThe following example with abstract types\n\n@model function tmodel(x, y)\n p, n = size(x)\n params = Vector{Real}(undef, n)\n for i in 1:n\n params[i] ~ truncated(Normal(); lower=0)\n end\n\n a = x * params\n return y ~ MvNormal(a, I)\nend\n\ntmodel (generic function with 2 methods)\n\n\ncan be transformed into the following representation with concrete types:\n\n@model function tmodel(x, y, ::Type{T}=Float64) where {T}\n p, n = size(x)\n params = Vector{T}(undef, n)\n for i in 1:n\n params[i] ~ truncated(Normal(); lower=0)\n end\n\n a = x * params\n return y ~ MvNormal(a, I)\nend\n\ntmodel (generic function with 4 methods)\n\n\nAlternatively, you could use filldist in this example:\n\n@model function tmodel(x, y)\n params ~ filldist(truncated(Normal(); lower=0), size(x, 2))\n a = x * params\n return y ~ MvNormal(a, I)\nend\n\ntmodel (generic function with 4 methods)\n\n\nNote that you can use @code_warntype to find types in your model definition that the compiler cannot infer. They are marked in red in the Julia REPL.\nFor example, consider the following simple program:\n\n@model function tmodel(x)\n p = Vector{Real}(undef, 1)\n p[1] ~ Normal()\n p = p .+ 1\n return x ~ Normal(p[1])\nend\n\ntmodel (generic function with 6 methods)\n\n\nWe can use\n\nusing Random\n\nmodel = tmodel(1.0)\n\n@code_warntype model.f(\n model,\n Turing.VarInfo(model),\n Turing.SamplingContext(\n Random.default_rng(), Turing.SampleFromPrior(), Turing.DefaultContext()\n ),\n model.args...,\n)\n\nto inspect type inference in the model.", "crumbs": [ "Get Started", "User Guide", "Performance Tips" ] }, { "objectID": "usage/sampler-visualisation/index.html", "href": "usage/sampler-visualisation/index.html", "title": "Sampler Visualization", "section": "", "text": "For each sampler, we will use the same code to plot sampler paths. The block below loads the relevant libraries and defines a function for plotting the sampler’s trajectory across the posterior.\nThe Turing model definition used here is not especially practical, but it is designed in such a way as to produce visually interesting posterior surfaces to show how different samplers move along the distribution.\n\nENV[\"GKS_ENCODING\"] = \"utf-8\" # Allows the use of unicode characters in Plots.jl\nusing Plots\nusing StatsPlots\nusing Turing\nusing Random\nusing Bijectors\n\n# Set a seed.\nRandom.seed!(0)\n\n# Define a strange model.\n@model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n bumps = sin(m) + cos(m)\n m = m + 5 * bumps\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\n return s², m\nend\n\n# Define our data points.\nx = [1.5, 2.0, 13.0, 2.1, 0.0]\n\n# Set up the model call, sample from the prior.\nmodel = gdemo(x)\n\n# Evaluate surface at coordinates.\nevaluate(m1, m2) = logjoint(model, (m=m2, s²=invlink.(Ref(InverseGamma(2, 3)), m1)))\n\nfunction plot_sampler(chain; label=\"\")\n # Extract values from chain.\n val = get(chain, [:s², :m, :lp])\n ss = link.(Ref(InverseGamma(2, 3)), val.s²)\n ms = val.m\n lps = val.lp\n\n # How many surface points to sample.\n granularity = 100\n\n # Range start/stop points.\n spread = 0.5\n σ_start = minimum(ss) - spread * std(ss)\n σ_stop = maximum(ss) + spread * std(ss)\n μ_start = minimum(ms) - spread * std(ms)\n μ_stop = maximum(ms) + spread * std(ms)\n σ_rng = collect(range(σ_start; stop=σ_stop, length=granularity))\n μ_rng = collect(range(μ_start; stop=μ_stop, length=granularity))\n\n # Make surface plot.\n p = surface(\n σ_rng,\n μ_rng,\n evaluate;\n camera=(30, 65),\n # ticks=nothing,\n colorbar=false,\n color=:inferno,\n title=label,\n )\n\n line_range = 1:length(ms)\n\n scatter3d!(\n ss[line_range],\n ms[line_range],\n lps[line_range];\n mc=:viridis,\n marker_z=collect(line_range),\n msw=0,\n legend=false,\n colorbar=false,\n alpha=0.5,\n xlabel=\"σ\",\n ylabel=\"μ\",\n zlabel=\"Log probability\",\n title=label,\n )\n\n return p\nend;\n\n\nsetprogress!(false)", "crumbs": [ "Get Started", "User Guide", "Sampler Visualization" ] }, { "objectID": "usage/sampler-visualisation/index.html#introduction", "href": "usage/sampler-visualisation/index.html#introduction", "title": "Sampler Visualization", "section": "", "text": "For each sampler, we will use the same code to plot sampler paths. The block below loads the relevant libraries and defines a function for plotting the sampler’s trajectory across the posterior.\nThe Turing model definition used here is not especially practical, but it is designed in such a way as to produce visually interesting posterior surfaces to show how different samplers move along the distribution.\n\nENV[\"GKS_ENCODING\"] = \"utf-8\" # Allows the use of unicode characters in Plots.jl\nusing Plots\nusing StatsPlots\nusing Turing\nusing Random\nusing Bijectors\n\n# Set a seed.\nRandom.seed!(0)\n\n# Define a strange model.\n@model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n bumps = sin(m) + cos(m)\n m = m + 5 * bumps\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\n return s², m\nend\n\n# Define our data points.\nx = [1.5, 2.0, 13.0, 2.1, 0.0]\n\n# Set up the model call, sample from the prior.\nmodel = gdemo(x)\n\n# Evaluate surface at coordinates.\nevaluate(m1, m2) = logjoint(model, (m=m2, s²=invlink.(Ref(InverseGamma(2, 3)), m1)))\n\nfunction plot_sampler(chain; label=\"\")\n # Extract values from chain.\n val = get(chain, [:s², :m, :lp])\n ss = link.(Ref(InverseGamma(2, 3)), val.s²)\n ms = val.m\n lps = val.lp\n\n # How many surface points to sample.\n granularity = 100\n\n # Range start/stop points.\n spread = 0.5\n σ_start = minimum(ss) - spread * std(ss)\n σ_stop = maximum(ss) + spread * std(ss)\n μ_start = minimum(ms) - spread * std(ms)\n μ_stop = maximum(ms) + spread * std(ms)\n σ_rng = collect(range(σ_start; stop=σ_stop, length=granularity))\n μ_rng = collect(range(μ_start; stop=μ_stop, length=granularity))\n\n # Make surface plot.\n p = surface(\n σ_rng,\n μ_rng,\n evaluate;\n camera=(30, 65),\n # ticks=nothing,\n colorbar=false,\n color=:inferno,\n title=label,\n )\n\n line_range = 1:length(ms)\n\n scatter3d!(\n ss[line_range],\n ms[line_range],\n lps[line_range];\n mc=:viridis,\n marker_z=collect(line_range),\n msw=0,\n legend=false,\n colorbar=false,\n alpha=0.5,\n xlabel=\"σ\",\n ylabel=\"μ\",\n zlabel=\"Log probability\",\n title=label,\n )\n\n return p\nend;\n\n\nsetprogress!(false)", "crumbs": [ "Get Started", "User Guide", "Sampler Visualization" ] }, { "objectID": "usage/sampler-visualisation/index.html#samplers", "href": "usage/sampler-visualisation/index.html#samplers", "title": "Sampler Visualization", "section": "Samplers", "text": "Samplers\n\nGibbs\nGibbs sampling tends to exhibit a “jittery” trajectory. The example below combines HMC and PG sampling to traverse the posterior.\n\nc = sample(model, Gibbs(:s² => HMC(0.01, 5), :m => PG(20)), 1000)\nplot_sampler(c)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHMC\nHamiltonian Monte Carlo (HMC) sampling is a typical sampler to use, as it tends to be fairly good at converging in a efficient manner. It can often be tricky to set the correct parameters for this sampler however, and the NUTS sampler is often easier to run if you don’t want to spend too much time fiddling with step size and and the number of steps to take. Note however that HMC does not explore the positive values μ very well, likely due to the leapfrog and step size parameter settings.\n\nc = sample(model, HMC(0.01, 10), 1000)\nplot_sampler(c)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHMCDA\nThe HMCDA sampler is an implementation of the Hamiltonian Monte Carlo with Dual Averaging algorithm found in the paper “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” by Hoffman and Gelman (2011). The paper can be found on arXiv for the interested reader.\n\nc = sample(model, HMCDA(200, 0.65, 0.3), 1000)\nplot_sampler(c)\n\n┌ Info: Found initial step size\n└ ϵ = 0.05\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nMH\nMetropolis-Hastings (MH) sampling is one of the earliest Markov Chain Monte Carlo methods. MH sampling does not “move” a lot, unlike many of the other samplers implemented in Turing. Typically a much longer chain is required to converge to an appropriate parameter estimate.\nThe plot below only uses 1,000 iterations of Metropolis-Hastings.\n\nc = sample(model, MH(), 1000)\nplot_sampler(c)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAs you can see, the MH sampler doesn’t move parameter estimates very often.\n\n\nNUTS\nThe No U-Turn Sampler (NUTS) is an implementation of the algorithm found in the paper “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” by Hoffman and Gelman (2011). The paper can be found on arXiv for the interested reader.\nNUTS tends to be very good at traversing complex posteriors quickly.\n\nc = sample(model, NUTS(0.65), 1000)\nplot_sampler(c)\n\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe only parameter that needs to be set other than the number of iterations to run is the target acceptance rate. In the Hoffman and Gelman paper, they note that a target acceptance rate of 0.65 is typical.\nHere is a plot showing a very high acceptance rate. Note that it appears to “stick” to a mode and is not particularly good at exploring the posterior as compared to the 0.65 target acceptance ratio case.\n\nc = sample(model, NUTS(0.95), 1000)\nplot_sampler(c)\n\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAn exceptionally low acceptance rate will show very few moves on the posterior:\n\nc = sample(model, NUTS(0.2), 1000)\nplot_sampler(c)\n\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPG\nThe Particle Gibbs (PG) sampler is an implementation of an algorithm from the paper “Particle Markov chain Monte Carlo methods” by Andrieu, Doucet, and Holenstein (2010). The interested reader can learn more here.\nThe two parameters are the number of particles, and the number of iterations. The plot below shows the use of 20 particles.\n\nc = sample(model, PG(20), 1000)\nplot_sampler(c)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNext, we plot using 50 particles.\n\nc = sample(model, PG(50), 1000)\nplot_sampler(c)", "crumbs": [ "Get Started", "User Guide", "Sampler Visualization" ] }, { "objectID": "usage/automatic-differentiation/index.html", "href": "usage/automatic-differentiation/index.html", "title": "Automatic Differentiation", "section": "", "text": "Turing currently supports four automatic differentiation (AD) backends for sampling: ForwardDiff for forward-mode AD; and Mooncake, ReverseDiff, and Zygote for reverse-mode AD. ForwardDiff is automatically imported by Turing. To utilize Mooncake, Zygote, or ReverseDiff for AD, users must explicitly import them with import Mooncake, import Zygote or import ReverseDiff, alongside using Turing.\nAs of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of AdTypes.jl, allowing users to specify the AD backend for individual samplers independently. Users can pass the adtype keyword argument to the sampler constructor to select the desired AD backend, with the default being AutoForwardDiff(; chunksize=0).\nFor ForwardDiff, pass adtype=AutoForwardDiff(; chunksize) to the sampler constructor. A chunksize of nothing permits the chunk size to be automatically determined. For more information regarding the selection of chunksize, please refer to related section of ForwardDiff’s documentation.\nFor ReverseDiff, pass adtype=AutoReverseDiff() to the sampler constructor. An additional keyword argument called compile can be provided to AutoReverseDiff. It specifies whether to pre-record the tape only once and reuse it later (compile is set to false by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.\nPre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.\nThus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and if-statements should consistently execute the same branches. For instance, if-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data. However, if-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect. Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.\nFor Zygote, pass adtype=AutoZygote() to the sampler constructor.\nAnd the previously used interface functions including ADBackend, setadbackend, setsafe, setchunksize, and setrdcache are deprecated and removed.", "crumbs": [ "Get Started", "User Guide", "Automatic Differentiation" ] }, { "objectID": "usage/automatic-differentiation/index.html#switching-ad-modes", "href": "usage/automatic-differentiation/index.html#switching-ad-modes", "title": "Automatic Differentiation", "section": "", "text": "Turing currently supports four automatic differentiation (AD) backends for sampling: ForwardDiff for forward-mode AD; and Mooncake, ReverseDiff, and Zygote for reverse-mode AD. ForwardDiff is automatically imported by Turing. To utilize Mooncake, Zygote, or ReverseDiff for AD, users must explicitly import them with import Mooncake, import Zygote or import ReverseDiff, alongside using Turing.\nAs of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of AdTypes.jl, allowing users to specify the AD backend for individual samplers independently. Users can pass the adtype keyword argument to the sampler constructor to select the desired AD backend, with the default being AutoForwardDiff(; chunksize=0).\nFor ForwardDiff, pass adtype=AutoForwardDiff(; chunksize) to the sampler constructor. A chunksize of nothing permits the chunk size to be automatically determined. For more information regarding the selection of chunksize, please refer to related section of ForwardDiff’s documentation.\nFor ReverseDiff, pass adtype=AutoReverseDiff() to the sampler constructor. An additional keyword argument called compile can be provided to AutoReverseDiff. It specifies whether to pre-record the tape only once and reuse it later (compile is set to false by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.\nPre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.\nThus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and if-statements should consistently execute the same branches. For instance, if-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data. However, if-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect. Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.\nFor Zygote, pass adtype=AutoZygote() to the sampler constructor.\nAnd the previously used interface functions including ADBackend, setadbackend, setsafe, setchunksize, and setrdcache are deprecated and removed.", "crumbs": [ "Get Started", "User Guide", "Automatic Differentiation" ] }, { "objectID": "usage/automatic-differentiation/index.html#compositional-sampling-with-differing-ad-modes", "href": "usage/automatic-differentiation/index.html#compositional-sampling-with-differing-ad-modes", "title": "Automatic Differentiation", "section": "Compositional Sampling with Differing AD Modes", "text": "Compositional Sampling with Differing AD Modes\nTuring supports intermixed automatic differentiation methods for different variable spaces. The snippet below shows using ForwardDiff to sample the mean (m) parameter, and using ReverseDiff for the variance (s) parameter:\n\nusing Turing\nusing ReverseDiff\n\n# Define a simple Normal model with unknown mean and variance.\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n return y ~ Normal(m, sqrt(s²))\nend\n\n# Sample using Gibbs and varying autodiff backends.\nc = sample(\n gdemo(1.5, 2),\n Gibbs(\n :m => HMC(0.1, 5; adtype=AutoForwardDiff(; chunksize=0)),\n :s² => HMC(0.1, 5; adtype=AutoReverseDiff(false)),\n ),\n 1000,\n progress=false,\n)\n\nChains MCMC chain (1000×3×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 16.16 seconds\nCompute duration = 16.16 seconds\nparameters = s², m\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s² 2.2807 2.6747 0.3048 125.6046 68.0561 1.0010 ⋯\n m 1.1200 1.0042 0.1621 55.1464 36.0899 1.0008 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.5698 1.0398 1.5140 2.3535 9.4020\n m -0.7544 0.5994 1.0680 1.5163 4.0785\n\n\nGenerally, reverse-mode AD, for instance ReverseDiff, is faster when sampling from variables of high dimensionality (greater than 20), while forward-mode AD, for instance ForwardDiff, is more efficient for lower-dimension variables. This functionality allows those who are performance sensitive to fine tune their automatic differentiation for their specific models.\nIf the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is. Currently, this defaults to ForwardDiff.\nThe most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using TuringBenchmarking:\n\nusing TuringBenchmarking\nbenchmark_model(gdemo(1.5, 2), adbackends=[AutoForwardDiff(), AutoReverseDiff()])\n\n2-element BenchmarkTools.BenchmarkGroup:\n tags: []\n \"evaluation\" => 2-element BenchmarkTools.BenchmarkGroup:\n tags: []\n \"linked\" => Trial(450.000 ns)\n \"standard\" => Trial(350.000 ns)\n \"gradient\" => 2-element BenchmarkTools.BenchmarkGroup:\n tags: []\n \"ADTypes.AutoReverseDiff()\" => 2-element BenchmarkTools.BenchmarkGroup:\n tags: [\"ReverseDiff\"]\n \"linked\" => Trial(14.988 μs)\n \"standard\" => Trial(13.545 μs)\n \"ADTypes.AutoForwardDiff()\" => 2-element BenchmarkTools.BenchmarkGroup:\n tags: [\"ForwardDiff\"]\n \"linked\" => Trial(681.000 ns)\n \"standard\" => Trial(570.000 ns)", "crumbs": [ "Get Started", "User Guide", "Automatic Differentiation" ] }, { "objectID": "usage/dynamichmc/index.html", "href": "usage/dynamichmc/index.html", "title": "Using DynamicHMC", "section": "", "text": "Turing supports the use of DynamicHMC as a sampler through the DynamicNUTS function.\nTo use the DynamicNUTS function, you must import the DynamicHMC package as well as Turing. Turing does not formally require DynamicHMC but will include additional functionality if both packages are present.\nHere is a brief example:\n\nHow to apply DynamicNUTS:\n\n# Import Turing and DynamicHMC.\nusing DynamicHMC, Turing\n\n# Model definition.\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n return y ~ Normal(m, sqrt(s²))\nend\n\n# Pull 2,000 samples using DynamicNUTS.\ndynamic_nuts = externalsampler(DynamicHMC.NUTS())\nchn = sample(gdemo(1.5, 2.0), dynamic_nuts, 2000, progress=false)\n\nChains MCMC chain (2000×3×1 Array{Float64, 3}):\n\nIterations = 1:1:2000\nNumber of chains = 1\nSamples per chain = 2000\nWall duration = 8.11 seconds\nCompute duration = 8.11 seconds\nparameters = s², m\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s² 2.0166 2.1315 0.0941 792.9794 819.2629 1.0015 ⋯\n m 1.1886 0.8285 0.0298 986.9462 637.5032 1.0017 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.5599 1.0100 1.5329 2.3206 6.0351\n m -0.3816 0.6982 1.1666 1.6573 2.9066\n\n\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "User Guide", "Using DynamicHMC" ] }, { "objectID": "usage/modifying-logprob/index.html", "href": "usage/modifying-logprob/index.html", "title": "Modifying the Log Probability", "section": "", "text": "Turing accumulates log probabilities internally in an internal data structure that is accessible through the internal variable __varinfo__ inside of the model definition. To avoid users having to deal with internal data structures, Turing provides the Turing.@addlogprob! macro which increases the accumulated log probability. For instance, this allows you to include arbitrary terms in the likelihood\n\nusing Turing\n\nmyloglikelihood(x, μ) = loglikelihood(Normal(μ, 1), x)\n\n@model function demo(x)\n μ ~ Normal()\n Turing.@addlogprob! myloglikelihood(x, μ)\nend\n\ndemo (generic function with 2 methods)\n\n\nand to force a sampler to reject a sample:\n\nusing Turing\nusing LinearAlgebra\n\n@model function demo(x)\n m ~ MvNormal(zero(x), I)\n if dot(m, x) < 0\n Turing.@addlogprob! -Inf\n # Exit the model evaluation early\n return nothing\n end\n\n x ~ MvNormal(m, I)\n return nothing\nend\n\ndemo (generic function with 2 methods)\n\n\nNote that @addlogprob! always increases the accumulated log probability, regardless of the provided sampling context. For instance, if you do not want to apply Turing.@addlogprob! when evaluating the prior of your model but only when computing the log likelihood and the log joint probability, then you should check the type of the internal variable __context_, as in the following example:\n\nif DynamicPPL.leafcontext(__context__) !== Turing.PriorContext()\n Turing.@addlogprob! myloglikelihood(x, μ)\nend\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "User Guide", "Modifying the Log Probability" ] }, { "objectID": "usage/generated-quantities/index.html", "href": "usage/generated-quantities/index.html", "title": "Generated Quantities", "section": "", "text": "Often, the most natural parameterization for a model is not the most computationally feasible. Consider the following (efficiently reparametrized) implementation of Neal’s funnel (Neal, 2003):\n\nusing Turing\n\n@model function Neal()\n # Raw draws\n y_raw ~ Normal(0, 1)\n x_raw ~ arraydist([Normal(0, 1) for i in 1:9])\n\n # Transform:\n y = 3 * y_raw\n x = exp.(y ./ 2) .* x_raw\n\n # Return:\n return [x; y]\nend\n\nNeal (generic function with 2 methods)\n\n\nIn this case, the random variables exposed in the chain (x_raw, y_raw) are not in a helpful form — what we’re after are the deterministically transformed variables x and y.\nMore generally, there are often quantities in our models that we might be interested in viewing, but which are not explicitly present in our chain.\nWe can generate draws from these variables — in this case, x and y — by adding them as a return statement to the model, and then calling generated_quantities(model, chain). Calling this function outputs an array of values specified in the return statement of the model.\nFor example, in the above reparametrization, we sample from our model:\n\nchain = sample(Neal(), NUTS(), 1000; progress=false)\n\n┌ Info: Found initial step size\n└ ϵ = 1.6\n\n\nChains MCMC chain (1000×22×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 6.91 seconds\nCompute duration = 6.91 seconds\nparameters = y_raw, x_raw[1], x_raw[2], x_raw[3], x_raw[4], x_raw[5], x_raw[6], x_raw[7], x_raw[8], x_raw[9]\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n y_raw -0.0049 1.0205 0.0311 1042.4091 687.8570 0.9993 ⋯\n x_raw[1] 0.0263 1.0231 0.0253 1623.7469 568.7980 0.9996 ⋯\n x_raw[2] -0.0225 0.9914 0.0244 1584.5444 782.0201 1.0003 ⋯\n x_raw[3] 0.0443 1.0469 0.0266 1563.2235 835.6863 1.0016 ⋯\n x_raw[4] -0.0206 1.0204 0.0309 1088.6301 723.6072 1.0003 ⋯\n x_raw[5] -0.0200 0.9509 0.0235 1639.2688 792.1886 0.9992 ⋯\n x_raw[6] 0.0043 0.9806 0.0207 2304.6087 580.2492 0.9991 ⋯\n x_raw[7] -0.0089 1.0127 0.0302 1137.7063 715.0619 1.0011 ⋯\n x_raw[8] -0.0174 0.9802 0.0238 1732.8518 886.9318 0.9993 ⋯\n x_raw[9] 0.0417 0.9893 0.0334 876.2684 678.3714 1.0023 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n y_raw -1.9821 -0.7042 -0.0204 0.7105 2.1069\n x_raw[1] -2.0599 -0.6731 0.0283 0.7312 2.0333\n x_raw[2] -1.9070 -0.7464 0.0066 0.6892 1.9361\n x_raw[3] -2.0056 -0.6609 0.0397 0.7581 2.0819\n x_raw[4] -1.9721 -0.7412 0.0177 0.6919 1.9462\n x_raw[5] -1.8414 -0.6615 -0.0171 0.6282 1.7642\n x_raw[6] -1.8654 -0.6802 -0.0114 0.7010 1.9378\n x_raw[7] -1.9247 -0.7435 0.0148 0.6878 2.0526\n x_raw[8] -2.0525 -0.7129 0.0047 0.6640 1.9413\n x_raw[9] -1.9507 -0.6139 0.0614 0.6752 2.0411\n\n\nNotice that only x_raw and y_raw are stored in the chain; x and y are not because they do not appear on the left-hand side of a tilde-statement.\nTo get x and y, we can then call:\n\ngenerated_quantities(Neal(), chain)\n\n1000×1 Matrix{Vector{Float64}}:\n [0.48768532465602166, -1.5266760444148317, 3.73488420078738, -1.0909559270000706, -1.9596728712134954, 3.0852175728819105, -2.5195585858461467, 1.2629429506796548, -1.591599530392493, 1.9023302173713397]\n [-0.05734781244683999, 0.3977088828436452, -1.025571477756659, 0.41296877821321076, 0.4942144789609048, -0.795376769683169, 0.4863302604767323, -0.10686760230858111, 1.3952552222790144, -0.8340639327394488]\n [-0.1635258174366138, -1.5794615108489594, -0.2383471379499093, -1.5777235846749658, -2.3796390159908274, 0.32287645940727194, -0.47879968367776493, -0.877545342909656, 1.3158798556224653, 1.6753481693380896]\n [-0.028865811437003782, 0.20902646504082328, 0.06208632174265271, 0.19363887403671945, 0.3382086329259947, -0.027075462223248237, 0.12821889784695306, 0.04614160300839528, -0.07705015516108804, -2.416563685276117]\n [-0.2827169519882472, 0.2126543782525096, -0.6785746454605384, 1.501392110809468, -0.1332109497499928, -0.33632582820033935, -0.803215684460826, 1.4307155816782857, -2.501628249516429, 0.5802884701851463]\n [4.38779914889448, 8.642084465132752, -2.4716328583664406, -11.088880294477509, -0.8523390416852469, 3.722867620612254, 1.7217293246326717, -6.45424074846697, -3.3234625032887144, 3.914258847791232]\n [3.0457743042965664, 1.50741605379873, -6.601359879201403, -5.193563844812964, -4.731900000728814, 0.8182062741637043, 2.5886487198355557, 0.032287249952450706, 7.342152101543887, 3.127047813497603]\n [-0.10397963947085465, -0.028004357257766153, 0.18026973468587382, 0.13048789444719938, 0.15934236180397773, -0.0082207046364272, -0.04576184333996519, 0.016167338519608103, -0.3137589493740973, -3.919988848105371]\n [2.705832342269569, 0.44900580342465024, -6.694077561777419, -4.212950758178549, -7.186923991534962, 0.11408142880750857, 0.7692133473335854, -1.342107319522106, 8.085298823837384, 3.1338567456999407]\n [1.1040185547138073, -0.5414381741938825, 1.0701416241808663, 0.3974841448454017, 1.1517508433285584, -0.24039633292376916, -0.10709443361704662, -0.43228834083074513, -1.2904040098699587, -0.529516931961663]\n ⋮\n [0.05076310057414889, -0.11002431004138186, 0.19208970492820446, -0.003586673262555621, -0.06743085099676832, 0.170663300399116, 0.1457632191847885, -0.0799965768955598, -0.024612945147261082, -4.33367621166452]\n [48.05302876514998, 46.82614749302566, -118.85830356157047, -47.39676944916765, -33.76087219909403, 81.54131720642876, -39.11639780468609, 10.730290798967898, 7.462225699850914, 8.626876208810053]\n [-0.00807942957607871, -0.014138752842613476, 0.03412181922129519, 0.008897243732711394, 0.01953471154677091, -0.02503423070204854, 0.017445364220063042, 0.002103765735407101, 0.01914670770936081, -7.624234506582526]\n [8.019314245630634, 12.281391731441687, -38.68415024650252, -6.943240913041845, -21.763725283283115, 24.11480505643119, -21.426880135481223, -5.55739676756192, -5.607852128789843, 6.366285416700511]\n [0.17759884166978251, -0.041860980345948255, 0.3257136140384577, 0.05624358870156626, 0.007457055249199862, 0.047254067226272435, 0.0747963736224535, 0.0021960430145839087, 0.030417524591437255, -3.8490638314202474]\n [-6.1456783266007315, 2.0639825468474013, -9.100053529126066, -1.7268168057528352, 0.9685175803073007, -2.0852174432067154, -2.4886030180844925, -3.1018750733608877, -1.6852258288655844, 2.9269749845141204]\n [0.47597660057441155, -0.1257610705583851, 0.7319731911248898, 0.01306604268164509, -0.12698249293333957, 0.22107799495236927, 0.3245376288312054, 0.36592155686462746, 0.40064273474742623, -2.0613888778412823]\n [1.6521511230430308, 1.394406345726079, 0.3225233615476245, 1.2156824922831395, -2.145421755354559, -0.2977462128608438, -1.5594861785604417, 0.8410244032127643, 2.330971928778687, 0.31941323357064777]\n [0.725861387588008, 0.31391793517428146, -0.36139239788413025, 0.24479664640930127, -0.1991463684577484, -0.025187867806529728, 1.3111318987164706, 1.1400499938577873, 0.886195673376815, -0.6243011961358138]\n\n\nEach element of this corresponds to an array with the values of x1, x2, ..., x9, y for each posterior sample.\nIn this case, it might be useful to reorganize our output into a matrix for plotting:\n\nreparam_chain = reduce(hcat, generated_quantities(Neal(), chain))'\n\n1000×10 adjoint(::Matrix{Float64}) with eltype Float64:\n 0.487685 -1.52668 3.73488 … 1.26294 -1.5916 1.90233\n -0.0573478 0.397709 -1.02557 -0.106868 1.39526 -0.834064\n -0.163526 -1.57946 -0.238347 -0.877545 1.31588 1.67535\n -0.0288658 0.209026 0.0620863 0.0461416 -0.0770502 -2.41656\n -0.282717 0.212654 -0.678575 1.43072 -2.50163 0.580288\n 4.3878 8.64208 -2.47163 … -6.45424 -3.32346 3.91426\n 3.04577 1.50742 -6.60136 0.0322872 7.34215 3.12705\n -0.10398 -0.0280044 0.18027 0.0161673 -0.313759 -3.91999\n 2.70583 0.449006 -6.69408 -1.34211 8.0853 3.13386\n 1.10402 -0.541438 1.07014 -0.432288 -1.2904 -0.529517\n ⋮ ⋱ \n 0.0507631 -0.110024 0.19209 -0.0799966 -0.0246129 -4.33368\n 48.053 46.8261 -118.858 10.7303 7.46223 8.62688\n -0.00807943 -0.0141388 0.0341218 0.00210377 0.0191467 -7.62423\n 8.01931 12.2814 -38.6842 -5.5574 -5.60785 6.36629\n 0.177599 -0.041861 0.325714 … 0.00219604 0.0304175 -3.84906\n -6.14568 2.06398 -9.10005 -3.10188 -1.68523 2.92697\n 0.475977 -0.125761 0.731973 0.365922 0.400643 -2.06139\n 1.65215 1.39441 0.322523 0.841024 2.33097 0.319413\n 0.725861 0.313918 -0.361392 1.14005 0.886196 -0.624301\n\n\nfrom which we can recover a vector of our samples:\n\nx1_samples = reparam_chain[:, 1]\ny_samples = reparam_chain[:, 10]\n\n1000-element Vector{Float64}:\n 1.9023302173713397\n -0.8340639327394488\n 1.6753481693380896\n -2.416563685276117\n 0.5802884701851463\n 3.914258847791232\n 3.127047813497603\n -3.919988848105371\n 3.1338567456999407\n -0.529516931961663\n ⋮\n -4.33367621166452\n 8.626876208810053\n -7.624234506582526\n 6.366285416700511\n -3.8490638314202474\n 2.9269749845141204\n -2.0613888778412823\n 0.31941323357064777\n -0.6243011961358138\n\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "User Guide", "Generated Quantities" ] }, { "objectID": "usage/mode-estimation/index.html", "href": "usage/mode-estimation/index.html", "title": "Mode Estimation", "section": "", "text": "After defining a statistical model, in addition to sampling from its distributions, one may be interested in finding the parameter values that maximise for instance the posterior distribution density function or the likelihood. This is called mode estimation. Turing provides support for two mode estimation techniques, maximum likelihood estimation (MLE) and maximum a posterior (MAP) estimation.\nTo demonstrate mode estimation, let us load Turing and declare a model:\nusing Turing\n\n@model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\nend\n\ngdemo (generic function with 2 methods)\nOnce the model is defined, we can construct a model instance as we normally would:\n# Instantiate the gdemo model with our data.\ndata = [1.5, 2.0]\nmodel = gdemo(data)\n\nDynamicPPL.Model{typeof(gdemo), (:x,), (), (), Tuple{Vector{Float64}}, Tuple{}, DynamicPPL.DefaultContext}(gdemo, (x = [1.5, 2.0],), NamedTuple(), DynamicPPL.DefaultContext())\nFinding the maximum aposteriori or maximum likelihood parameters is as simple as\n# Generate a MLE estimate.\nmle_estimate = maximum_likelihood(model)\n\n# Generate a MAP estimate.\nmap_estimate = maximum_a_posteriori(model)\n\nModeResult with maximized lp of -4.62\n[0.907407407404569, 1.1666666666651762]\nThe estimates are returned as instances of the ModeResult type. It has the fields values for the parameter values found and lp for the log probability at the optimum, as well as f for the objective function and optim_result for more detailed results of the optimisation procedure.\n@show mle_estimate.values\n@show mle_estimate.lp;\n\nmle_estimate.values = [0.06249999999999837, 1.749999999999999]\nmle_estimate.lp = -0.06528834416956464", "crumbs": [ "Get Started", "User Guide", "Mode Estimation" ] }, { "objectID": "usage/mode-estimation/index.html#controlling-the-optimisation-process", "href": "usage/mode-estimation/index.html#controlling-the-optimisation-process", "title": "Mode Estimation", "section": "Controlling the optimisation process", "text": "Controlling the optimisation process\nUnder the hood maximum_likelihood and maximum_a_posteriori use the Optimization.jl package, which provides a unified interface to many other optimisation packages. By default Turing typically uses the LBFGS method from Optim.jl to find the mode estimate, but we can easily change that:\n\nusing OptimizationOptimJL: NelderMead\n@show maximum_likelihood(model, NelderMead())\n\nusing OptimizationNLopt: NLopt.LD_TNEWTON_PRECOND_RESTART\n@show maximum_likelihood(model, LD_TNEWTON_PRECOND_RESTART());\n\nmaximum_likelihood(model, NelderMead()) = [0.06249945218953466, 1.7499988709738943]\n┌ Warning: The selected optimization algorithm requires second order derivatives, but `SecondOrder` ADtype was not provided. \n│ So a `SecondOrder` with ADTypes.AutoForwardDiff() for both inner and outer will be created, this can be suboptimal and not work in some cases so \n│ an explicit `SecondOrder` ADtype is recommended.\n└ @ OptimizationBase ~/.julia/packages/OptimizationBase/UXLhR/src/cache.jl:49\nmaximum_likelihood(model, LD_TNEWTON_PRECOND_RESTART()) = [0.06250000001019859, 1.7499999999477618]\n\n\nThe above are just two examples, Optimization.jl supports many more.\nWe can also help the optimisation by giving it a starting point we know is close to the final solution, or by specifying an automatic differentiation method\n\nusing ADTypes: AutoReverseDiff\nimport ReverseDiff\nmaximum_likelihood(\n model, NelderMead(); initial_params=[0.1, 2], adtype=AutoReverseDiff()\n)\n\nModeResult with maximized lp of -0.07\n[0.062494553692639856, 1.7500042095865365]\n\n\nWhen providing values to arguments like initial_params the parameters are typically specified in the order in which they appear in the code of the model, so in this case first s² then m. More precisely it’s the order returned by Turing.Inference.getparams(model, Turing.VarInfo(model)).\nWe can also do constrained optimisation, by providing either intervals within which the parameters must stay, or costraint functions that they need to respect. For instance, here’s how one can find the MLE with the constraint that the variance must be less than 0.01 and the mean must be between -1 and 1.:\n\nmaximum_likelihood(model; lb=[0.0, -1.0], ub=[0.01, 1.0])\n\nModeResult with maximized lp of -59.73\n[0.009999999983223014, 0.9999999993121435]\n\n\nThe arguments for lower (lb) and upper (ub) bounds follow the arguments of Optimization.OptimizationProblem, as do other parameters for providing constraints, such as cons. Any extraneous keyword arguments given to maximum_likelihood or maximum_a_posteriori are passed to Optimization.solve. Some often useful ones are maxiters for controlling the maximum number of iterations and abstol and reltol for the absolute and relative convergence tolerances:\n\nbadly_converged_mle = maximum_likelihood(\n model, NelderMead(); maxiters=10, reltol=1e-9\n)\n\nModeResult with maximized lp of -0.23\n[0.10997791575476254, 1.6875231979124987]\n\n\nWe can check whether the optimisation converged using the optim_result field of the result:\n\n@show badly_converged_mle.optim_result;\n\nbadly_converged_mle.optim_result = retcode: Failure\nu: [-2.20747569921168, 1.6875231979124987]\nFinal objective value: 0.23418941267356108\n\n\n\nFor more details, such as a full list of possible arguments, we encourage the reader to read the docstring of the function Turing.Optimisation.estimate_mode, which is what maximum_likelihood and maximum_a_posteriori call, and the documentation of Optimization.jl.", "crumbs": [ "Get Started", "User Guide", "Mode Estimation" ] }, { "objectID": "usage/mode-estimation/index.html#analyzing-your-mode-estimate", "href": "usage/mode-estimation/index.html#analyzing-your-mode-estimate", "title": "Mode Estimation", "section": "Analyzing your mode estimate", "text": "Analyzing your mode estimate\nTuring extends several methods from StatsBase that can be used to analyze your mode estimation results. Methods implemented include vcov, informationmatrix, coeftable, params, and coef, among others.\nFor example, let’s examine our ML estimate from above using coeftable:\n\nusing StatsBase: coeftable\ncoeftable(mle_estimate)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCoef.\nStd. Error\nz\nPr(>\nz\n)\n\n\n\n\ns²\n0.0625\n0.0625\n1.0\n0.317311\n-0.0599977\n0.184998\n\n\nm\n1.75\n0.176777\n9.89949\n4.18383e-23\n1.40352\n2.09648\n\n\n\n\n\nStandard errors are calculated from the Fisher information matrix (inverse Hessian of the log likelihood or log joint). Note that standard errors calculated in this way may not always be appropriate for MAP estimates, so please be cautious in interpreting them.", "crumbs": [ "Get Started", "User Guide", "Mode Estimation" ] }, { "objectID": "usage/mode-estimation/index.html#sampling-with-the-mapmle-as-initial-states", "href": "usage/mode-estimation/index.html#sampling-with-the-mapmle-as-initial-states", "title": "Mode Estimation", "section": "Sampling with the MAP/MLE as initial states", "text": "Sampling with the MAP/MLE as initial states\nYou can begin sampling your chain from an MLE/MAP estimate by extracting the vector of parameter values and providing it to the sample function with the keyword initial_params. For example, here is how to sample from the full posterior using the MAP estimate as the starting point:\n\nmap_estimate = maximum_a_posteriori(model)\nchain = sample(model, NUTS(), 1_000; initial_params=map_estimate.values.array)", "crumbs": [ "Get Started", "User Guide", "Mode Estimation" ] }, { "objectID": "usage/custom-distribution/index.html", "href": "usage/custom-distribution/index.html", "title": "Custom Distributions", "section": "", "text": "Turing.jl supports the use of distributions from the Distributions.jl package. By extension, it also supports the use of customized distributions by defining them as subtypes of Distribution type of the Distributions.jl package, as well as corresponding functions.\nThis page shows a workflow of how to define a customized distribution, using our own implementation of a simple Uniform distribution as a simple example.\nusing Distributions, Turing, Random, Bijectors", "crumbs": [ "Get Started", "User Guide", "Custom Distributions" ] }, { "objectID": "usage/custom-distribution/index.html#define-the-distribution-type", "href": "usage/custom-distribution/index.html#define-the-distribution-type", "title": "Custom Distributions", "section": "Define the Distribution Type", "text": "Define the Distribution Type\nFirst, define a type of the distribution, as a subtype of a corresponding distribution type in the Distributions.jl package.\n\nstruct CustomUniform <: ContinuousUnivariateDistribution end", "crumbs": [ "Get Started", "User Guide", "Custom Distributions" ] }, { "objectID": "usage/custom-distribution/index.html#implement-sampling-and-evaluation-of-the-log-pdf", "href": "usage/custom-distribution/index.html#implement-sampling-and-evaluation-of-the-log-pdf", "title": "Custom Distributions", "section": "Implement Sampling and Evaluation of the log-pdf", "text": "Implement Sampling and Evaluation of the log-pdf\nSecond, implement the rand and logpdf functions for your new distribution, which will be used to run the model.\n\n# sample in [0, 1]\nDistributions.rand(rng::AbstractRNG, d::CustomUniform) = rand(rng)\n\n# p(x) = 1 → log[p(x)] = 0\nDistributions.logpdf(d::CustomUniform, x::Real) = zero(x)", "crumbs": [ "Get Started", "User Guide", "Custom Distributions" ] }, { "objectID": "usage/custom-distribution/index.html#define-helper-functions", "href": "usage/custom-distribution/index.html#define-helper-functions", "title": "Custom Distributions", "section": "Define Helper Functions", "text": "Define Helper Functions\nIn most cases, it may be required to define some helper functions.\n\nDomain Transformation\nCertain samplers, such as HMC, require the domain of the priors to be unbounded. Therefore, to use our CustomUniform as a prior in a model we also need to define how to transform samples from [0, 1] to ℝ. To do this, we need to define the corresponding Bijector from Bijectors.jl, which is what Turing.jl uses internally to deal with constrained distributions.\nTo transform from [0, 1] to ℝ we can use the Logit bijector:\n\nBijectors.bijector(d::CustomUniform) = Logit(0.0, 1.0)\n\nIn the present example, CustomUniform is a subtype of ContinuousUnivariateDistribution. The procedure for subtypes of ContinuousMultivariateDistribution and ContinuousMatrixDistribution is exactly the same. For example, Wishart defines a distribution over positive-definite matrices and so bijector returns a PDBijector when called with a Wishart distribution as an argument. For discrete distributions, there is no need to define a bijector; the Identity bijector is used by default.\nAs an alternative to the above, for UnivariateDistribution we could define the minimum and maximum of the distribution:\n\nDistributions.minimum(d::CustomUniform) = 0.0\nDistributions.maximum(d::CustomUniform) = 1.0\n\nand Bijectors.jl will return a default Bijector called TruncatedBijector which makes use of minimum and maximum derive the correct transformation.\nInternally, Turing basically does the following when it needs to convert a constrained distribution to an unconstrained distribution, e.g. when sampling using HMC:\n\ndist = Gamma(2,3)\nb = bijector(dist)\ntransformed_dist = transformed(dist, b) # results in distribution with transformed support + correction for logpdf\n\nBijectors.UnivariateTransformed{Distributions.Gamma{Float64}, Base.Fix1{typeof(broadcast), typeof(log)}}(\ndist: Distributions.Gamma{Float64}(α=2.0, θ=3.0)\ntransform: Base.Fix1{typeof(broadcast), typeof(log)}(broadcast, log)\n)\n\n\nand then we can call rand and logpdf as usual, where\n\nrand(transformed_dist) returns a sample in the unconstrained space, and\nlogpdf(transformed_dist, y) returns the log density of the original distribution, but with y living in the unconstrained space.\n\nTo read more about Bijectors.jl, check out its documentation.", "crumbs": [ "Get Started", "User Guide", "Custom Distributions" ] }, { "objectID": "usage/probability-interface/index.html", "href": "usage/probability-interface/index.html", "title": "Querying Model Probabilities", "section": "", "text": "The easiest way to manipulate and query Turing models is via the DynamicPPL probability interface.\nLet’s use a simple model of normally-distributed data as an example.\nusing Turing\nusing DynamicPPL\nusing Random\n\n@model function gdemo(n)\n μ ~ Normal(0, 1)\n x ~ MvNormal(fill(μ, n), I)\nend\n\ngdemo (generic function with 2 methods)\nWe generate some data using μ = 0:\nRandom.seed!(1776)\ndataset = randn(100)\ndataset[1:5]\n\n5-element Vector{Float64}:\n 0.8488780584442736\n -0.31936138249336765\n -1.3982098801744465\n -0.05198933163879332\n -1.1465116601038348", "crumbs": [ "Get Started", "User Guide", "Querying Model Probabilities" ] }, { "objectID": "usage/probability-interface/index.html#conditioning-and-deconditioning", "href": "usage/probability-interface/index.html#conditioning-and-deconditioning", "title": "Querying Model Probabilities", "section": "Conditioning and Deconditioning", "text": "Conditioning and Deconditioning\nBayesian models can be transformed with two main operations, conditioning and deconditioning (also known as marginalization). Conditioning takes a variable and fixes its value as known. We do this by passing a model and a collection of conditioned variables to |, or its alias, condition:\n\n# (equivalently)\n# conditioned_model = condition(gdemo(length(dataset)), (x=dataset, μ=0))\nconditioned_model = gdemo(length(dataset)) | (x=dataset, μ=0)\n\nModel{typeof(gdemo), (:n,), (), (), Tuple{Int64}, Tuple{}, ConditionContext{@NamedTuple{x::Vector{Float64}, μ::Int64}, DefaultContext}}(gdemo, (n = 100,), NamedTuple(), ConditionContext((x = [0.8488780584442736, -0.31936138249336765, -1.3982098801744465, -0.05198933163879332, -1.1465116601038348, -0.6306168227545849, 0.6862766694322289, -0.5485073478947856, -0.17212004616875684, 1.2883226251958486, -0.13661316034377538, 2.4316115122026973, 0.2251319215717449, -0.5115708179083417, -0.7810712258995324, -1.0191704692490737, 1.1210038448250719, -1.6944509713762377, -0.27314823183454695, 0.25273963222687423, 1.3914215917992434, 0.7525340831125464, 0.847154387311101, -0.7130402796655171, 0.2983575202861233, -0.1785631526879386, 0.08659477535701691, -0.5167265137098563, 2.111309740316035, 0.3957655443124509, -0.0804390853521051, 1.255042471667049, -0.07882822403959532, 1.2261373761992618, 0.43953618247769816, -0.40640013183427787, -0.6868635949523503, 1.7380713294668497, 0.13685965156352295, 0.1485185624825999, -0.7798816720822024, 2.2595105995080846, -0.13609014938597142, 0.22785777205259913, -2.1005250433485725, 0.44205288222935385, -1.238456637875994, -2.3727125492433427, -0.24406624959402184, -0.04488042525902438, 0.27510026183444175, 0.42472846594528796, 1.0337924022589282, 0.9126364433535069, -0.9006583845907805, 0.8665471057463393, 1.4924737539852484, 1.2886591566091432, 1.037264411147446, 1.4731954133339449, -0.31874662373651885, 1.2255399151799211, -1.6642044048811695, -0.5717328092786154, -1.2700237196779645, 0.5748199649058684, 0.16467729820692942, -1.195290550625328, -0.37133526877621703, -0.3018979982049836, -2.0183406292097397, -0.9588803575112745, 0.7177183994733006, -1.0133440177662316, -1.0881357990941283, 1.0487446580734279, 2.627227367991459, -1.59963908284846, -0.3122512299247273, -1.0265333654194488, 0.5557085182114885, -0.3206725445321106, -1.4314746067673778, 1.5740113510560039, -0.6566477752702335, 0.31342313477927125, 0.33135361418686027, -1.0489180508346863, -0.2670759024309527, 0.4683952221006179, 0.04918061587657951, 1.239814741442417, 2.2239462179369296, 1.8507671783064434, 1.756319462015174, -0.6577450354719728, 2.2795431083561626, -0.492273906928334, 0.7045614632761499, 0.11260553216111485], μ = 0), DynamicPPL.DefaultContext()))\n\n\nThis operation can be reversed by applying decondition:\n\noriginal_model = decondition(conditioned_model)\n\nModel{typeof(gdemo), (:n,), (), (), Tuple{Int64}, Tuple{}, DefaultContext}(gdemo, (n = 100,), NamedTuple(), DefaultContext())\n\n\nWe can also decondition only some of the variables:\n\npartially_conditioned = decondition(conditioned_model, :μ)\n\nModel{typeof(gdemo), (:n,), (), (), Tuple{Int64}, Tuple{}, ConditionContext{@NamedTuple{x::Vector{Float64}}, DefaultContext}}(gdemo, (n = 100,), NamedTuple(), ConditionContext((x = [0.8488780584442736, -0.31936138249336765, -1.3982098801744465, -0.05198933163879332, -1.1465116601038348, -0.6306168227545849, 0.6862766694322289, -0.5485073478947856, -0.17212004616875684, 1.2883226251958486, -0.13661316034377538, 2.4316115122026973, 0.2251319215717449, -0.5115708179083417, -0.7810712258995324, -1.0191704692490737, 1.1210038448250719, -1.6944509713762377, -0.27314823183454695, 0.25273963222687423, 1.3914215917992434, 0.7525340831125464, 0.847154387311101, -0.7130402796655171, 0.2983575202861233, -0.1785631526879386, 0.08659477535701691, -0.5167265137098563, 2.111309740316035, 0.3957655443124509, -0.0804390853521051, 1.255042471667049, -0.07882822403959532, 1.2261373761992618, 0.43953618247769816, -0.40640013183427787, -0.6868635949523503, 1.7380713294668497, 0.13685965156352295, 0.1485185624825999, -0.7798816720822024, 2.2595105995080846, -0.13609014938597142, 0.22785777205259913, -2.1005250433485725, 0.44205288222935385, -1.238456637875994, -2.3727125492433427, -0.24406624959402184, -0.04488042525902438, 0.27510026183444175, 0.42472846594528796, 1.0337924022589282, 0.9126364433535069, -0.9006583845907805, 0.8665471057463393, 1.4924737539852484, 1.2886591566091432, 1.037264411147446, 1.4731954133339449, -0.31874662373651885, 1.2255399151799211, -1.6642044048811695, -0.5717328092786154, -1.2700237196779645, 0.5748199649058684, 0.16467729820692942, -1.195290550625328, -0.37133526877621703, -0.3018979982049836, -2.0183406292097397, -0.9588803575112745, 0.7177183994733006, -1.0133440177662316, -1.0881357990941283, 1.0487446580734279, 2.627227367991459, -1.59963908284846, -0.3122512299247273, -1.0265333654194488, 0.5557085182114885, -0.3206725445321106, -1.4314746067673778, 1.5740113510560039, -0.6566477752702335, 0.31342313477927125, 0.33135361418686027, -1.0489180508346863, -0.2670759024309527, 0.4683952221006179, 0.04918061587657951, 1.239814741442417, 2.2239462179369296, 1.8507671783064434, 1.756319462015174, -0.6577450354719728, 2.2795431083561626, -0.492273906928334, 0.7045614632761499, 0.11260553216111485],), DynamicPPL.DefaultContext()))\n\n\nWe can see which of the variables in a model have been conditioned with DynamicPPL.conditioned:\n\nDynamicPPL.conditioned(partially_conditioned)\n\n(x = [0.8488780584442736, -0.31936138249336765, -1.3982098801744465, -0.05198933163879332, -1.1465116601038348, -0.6306168227545849, 0.6862766694322289, -0.5485073478947856, -0.17212004616875684, 1.2883226251958486 … 0.04918061587657951, 1.239814741442417, 2.2239462179369296, 1.8507671783064434, 1.756319462015174, -0.6577450354719728, 2.2795431083561626, -0.492273906928334, 0.7045614632761499, 0.11260553216111485],)\n\n\n\n\n\n\n\n\nNote\n\n\n\nSometimes it is helpful to define convenience functions for conditioning on some variable(s). For instance, in this example we might want to define a version of gdemo that conditions on some observations of x:\ngdemo(x::AbstractVector{<:Real}) = gdemo(length(x)) | (; x)\nFor illustrative purposes, however, we do not use this function in the examples below.", "crumbs": [ "Get Started", "User Guide", "Querying Model Probabilities" ] }, { "objectID": "usage/probability-interface/index.html#probabilities-and-densities", "href": "usage/probability-interface/index.html#probabilities-and-densities", "title": "Querying Model Probabilities", "section": "Probabilities and Densities", "text": "Probabilities and Densities\nWe often want to calculate the (unnormalized) probability density for an event. This probability might be a prior, a likelihood, or a posterior (joint) density. DynamicPPL provides convenient functions for this. To begin, let’s define a model gdemo, condition it on a dataset, and draw a sample. The returned sample only contains μ, since the value of x has already been fixed:\n\nmodel = gdemo(length(dataset)) | (x=dataset,)\n\nRandom.seed!(124)\nsample = rand(model)\n\n(μ = -0.6680014719649068,)\n\n\nWe can then calculate the joint probability of a set of samples (here drawn from the prior) with logjoint.\n\nlogjoint(model, sample)\n\n-181.7247437162069\n\n\nFor models with many variables rand(model) can be prohibitively slow since it returns a NamedTuple of samples from the prior distribution of the unconditioned variables. We recommend working with samples of type DataStructures.OrderedDict in this case (which Turing re-exports, so can be used directly):\n\nRandom.seed!(124)\nsample_dict = rand(OrderedDict, model)\n\nOrderedDict{Any, Any} with 1 entry:\n μ => -0.668001\n\n\nlogjoint can also be used on this sample:\n\nlogjoint(model, sample_dict)\n\n-181.7247437162069\n\n\nThe prior probability and the likelihood of a set of samples can be calculated with the functions logprior and loglikelihood respectively. The log joint probability is the sum of these two quantities:\n\nlogjoint(model, sample) ≈ loglikelihood(model, sample) + logprior(model, sample)\n\ntrue\n\n\n\nlogjoint(model, sample_dict) ≈ loglikelihood(model, sample_dict) + logprior(model, sample_dict)\n\ntrue", "crumbs": [ "Get Started", "User Guide", "Querying Model Probabilities" ] }, { "objectID": "usage/probability-interface/index.html#example-cross-validation", "href": "usage/probability-interface/index.html#example-cross-validation", "title": "Querying Model Probabilities", "section": "Example: Cross-validation", "text": "Example: Cross-validation\nTo give an example of the probability interface in use, we can use it to estimate the performance of our model using cross-validation. In cross-validation, we split the dataset into several equal parts. Then, we choose one of these sets to serve as the validation set. Here, we measure fit using the cross entropy (Bayes loss).1 (For the sake of simplicity, in the following code, we enforce that nfolds must divide the number of data points. For a more competent implementation, see MLUtils.jl.)\n\n# Calculate the train/validation splits across `nfolds` partitions, assume `length(dataset)` divides `nfolds`\nfunction kfolds(dataset::Array{<:Real}, nfolds::Int)\n fold_size, remaining = divrem(length(dataset), nfolds)\n if remaining != 0\n error(\"The number of folds must divide the number of data points.\")\n end\n first_idx = firstindex(dataset)\n last_idx = lastindex(dataset)\n splits = map(0:(nfolds - 1)) do i\n start_idx = first_idx + i * fold_size\n end_idx = start_idx + fold_size\n train_set_indices = [first_idx:(start_idx - 1); end_idx:last_idx]\n return (view(dataset, train_set_indices), view(dataset, start_idx:(end_idx - 1)))\n end\n return splits\nend\n\nfunction cross_val(\n dataset::Vector{<:Real};\n nfolds::Int=5,\n nsamples::Int=1_000,\n rng::Random.AbstractRNG=Random.default_rng(),\n)\n # Initialize `loss` in a way such that the loop below does not change its type\n model = gdemo(1) | (x=[first(dataset)],)\n loss = zero(logjoint(model, rand(rng, model)))\n\n for (train, validation) in kfolds(dataset, nfolds)\n # First, we train the model on the training set, i.e., we obtain samples from the posterior.\n # For normally-distributed data, the posterior can be computed in closed form.\n # For general models, however, typically samples will be generated using MCMC with Turing.\n posterior = Normal(mean(train), 1)\n samples = rand(rng, posterior, nsamples)\n\n # Evaluation on the validation set.\n validation_model = gdemo(length(validation)) | (x=validation,)\n loss += sum(samples) do sample\n logjoint(validation_model, (μ=sample,))\n end\n end\n\n return loss\nend\n\ncross_val(dataset)\n\n-212760.30282411768", "crumbs": [ "Get Started", "User Guide", "Querying Model Probabilities" ] }, { "objectID": "usage/probability-interface/index.html#footnotes", "href": "usage/probability-interface/index.html#footnotes", "title": "Querying Model Probabilities", "section": "Footnotes", "text": "Footnotes\n\n\nSee ParetoSmooth.jl for a faster and more accurate implementation of cross-validation than the one provided here.↩︎", "crumbs": [ "Get Started", "User Guide", "Querying Model Probabilities" ] }, { "objectID": "changelog.html", "href": "changelog.html", "title": "Changelog", "section": "", "text": "0.37 removes the old Gibbs constructors deprecated in 0.36.\n\n\n\nZygote is no longer officially supported as an automatic differentiation backend, and AutoZygote is no longer exported. You can continue to use Zygote by importing AutoZygote from ADTypes and it may well continue to work, but it is no longer tested and no effort will be expended to fix it if something breaks.\nMooncake is the recommended replacement for Zygote.\n\n\n\nTuring.jl v0.37 uses DynamicPPL v0.35, which brings with it several breaking changes:\n\nThe right hand side of .~ must from now on be a univariate distribution.\nIndexing VarInfo objects by samplers has been removed completely.\nThe order in which nested submodel prefixes are applied has been reversed.\nThe arguments for the constructor of LogDensityFunction have changed. LogDensityFunction also now satisfies the LogDensityProblems interface, without needing a wrapper object.\n\nFor more details about all of the above, see the changelog of DynamicPPL here.\n\n\n\nTuring.jl’s export list has been cleaned up a fair bit. This affects what is imported into your namespace when you do an unqualified using Turing. You may need to import things more explicitly than before.\n\nThe DynamicPPL and AbstractMCMC modules are no longer exported. You will need to import DynamicPPL or using DynamicPPL: DynamicPPL (likewise AbstractMCMC) yourself, which in turn means that they have to be made available in your project environment.\n@logprob_str and @prob_str have been removed following a long deprecation period.\nWe no longer re-export everything from Bijectors and Libtask. To get around this, add using Bijectors or using Libtask at the top of your script (but we recommend using more selective imports).\n\nWe no longer export Bijectors.ordered. If you were using ordered, even Bijectors does not (currently) export this. You will have to manually import it with using Bijectors: ordered.\n\n\nOn the other hand, we have added a few more exports:\n\nDynamicPPL.returned and DynamicPPL.prefix are exported (for use with submodels).\nLinearAlgebra.I is exported for convenience." }, { "objectID": "changelog.html#breaking-changes", "href": "changelog.html#breaking-changes", "title": "Changelog", "section": "", "text": "0.37 removes the old Gibbs constructors deprecated in 0.36.\n\n\n\nZygote is no longer officially supported as an automatic differentiation backend, and AutoZygote is no longer exported. You can continue to use Zygote by importing AutoZygote from ADTypes and it may well continue to work, but it is no longer tested and no effort will be expended to fix it if something breaks.\nMooncake is the recommended replacement for Zygote.\n\n\n\nTuring.jl v0.37 uses DynamicPPL v0.35, which brings with it several breaking changes:\n\nThe right hand side of .~ must from now on be a univariate distribution.\nIndexing VarInfo objects by samplers has been removed completely.\nThe order in which nested submodel prefixes are applied has been reversed.\nThe arguments for the constructor of LogDensityFunction have changed. LogDensityFunction also now satisfies the LogDensityProblems interface, without needing a wrapper object.\n\nFor more details about all of the above, see the changelog of DynamicPPL here.\n\n\n\nTuring.jl’s export list has been cleaned up a fair bit. This affects what is imported into your namespace when you do an unqualified using Turing. You may need to import things more explicitly than before.\n\nThe DynamicPPL and AbstractMCMC modules are no longer exported. You will need to import DynamicPPL or using DynamicPPL: DynamicPPL (likewise AbstractMCMC) yourself, which in turn means that they have to be made available in your project environment.\n@logprob_str and @prob_str have been removed following a long deprecation period.\nWe no longer re-export everything from Bijectors and Libtask. To get around this, add using Bijectors or using Libtask at the top of your script (but we recommend using more selective imports).\n\nWe no longer export Bijectors.ordered. If you were using ordered, even Bijectors does not (currently) export this. You will have to manually import it with using Bijectors: ordered.\n\n\nOn the other hand, we have added a few more exports:\n\nDynamicPPL.returned and DynamicPPL.prefix are exported (for use with submodels).\nLinearAlgebra.I is exported for convenience." }, { "objectID": "changelog.html#breaking-changes-1", "href": "changelog.html#breaking-changes-1", "title": "Changelog", "section": "Breaking changes", "text": "Breaking changes\n0.36.0 introduces a new Gibbs sampler. It’s been included in several previous releases as Turing.Experimental.Gibbs, but now takes over the old Gibbs sampler, which gets removed completely.\nThe new Gibbs sampler currently supports the same user-facing interface as the old one, but the old constructors have been deprecated, and will be removed in the future. Also, given that the internals have been completely rewritten in a very different manner, there may be accidental breakage that we haven’t anticipated. Please report any you find.\nGibbsConditional has also been removed. It was never very user-facing, but it was exported, so technically this is breaking.\nThe old Gibbs constructor relied on being called with several subsamplers, and each of the constructors of the subsamplers would take as arguments the symbols for the variables that they are to sample, e.g. Gibbs(HMC(:x), MH(:y)). This constructor has been deprecated, and will be removed in the future. The new constructor works by mapping symbols, VarNames, or iterables thereof to samplers, e.g. Gibbs(x=>HMC(), y=>MH()), Gibbs(@varname(x) => HMC(), @varname(y) => MH()), Gibbs((:x, :y) => NUTS(), :z => MH()). This allows more granular specification of which sampler to use for which variable.\nLikewise, the old constructor for calling one subsampler more often than another, Gibbs((HMC(0.01, 4, :x), 2), (MH(:y), 1)) has been deprecated. The new way to do this is to use RepeatSampler, also introduced at this version: Gibbs(@varname(x) => RepeatSampler(HMC(0.01, 4), 2), @varname(y) => MH())." }, { "objectID": "changelog.html#breaking-changes-2", "href": "changelog.html#breaking-changes-2", "title": "Changelog", "section": "Breaking changes", "text": "Breaking changes\nJulia 1.10 is now the minimum required version for Turing.\nTapir.jl has been removed and replaced with its successor, Mooncake.jl. You can use Mooncake.jl by passing adbackend=AutoMooncake(; config=nothing) to the relevant samplers.\nSupport for Tracker.jl as an AD backend has been removed." }, { "objectID": "changelog.html#breaking-changes-3", "href": "changelog.html#breaking-changes-3", "title": "Changelog", "section": "Breaking changes", "text": "Breaking changes\nThe following exported functions have been removed:\n\nconstrained_space\nget_parameter_bounds\noptim_objective\noptim_function\noptim_problem\n\nThe same functionality is now offered by the new exported functions\n\nmaximum_likelihood\nmaximum_a_posteriori" }, { "objectID": "tutorials/gaussian-process-latent-variable-models/index.html", "href": "tutorials/gaussian-process-latent-variable-models/index.html", "title": "Gaussian Process Latent Variable Models", "section": "", "text": "In a previous tutorial, we have discussed latent variable models, in particular probabilistic principal component analysis (pPCA). Here, we show how we can extend the mapping provided by pPCA to non-linear mappings between input and output. For more details about the Gaussian Process Latent Variable Model (GPLVM), we refer the reader to the original publication and a further extension.\nIn short, the GPVLM is a dimensionality reduction technique that allows us to embed a high-dimensional dataset in a lower-dimensional embedding. Importantly, it provides the advantage that the linear mappings from the embedded space can be non-linearised through the use of Gaussian Processes.\n\nLet’s start by loading some dependencies.\n\nusing Turing\nusing AbstractGPs\nusing FillArrays\nusing LaTeXStrings\nusing Plots\nusing RDatasets\nusing ReverseDiff\nusing StatsBase\n\nusing LinearAlgebra\nusing Random\n\nRandom.seed!(1789);\n\nWe demonstrate the GPLVM with a very small dataset: Fisher’s Iris data set. This is mostly for reasons of run time, so the tutorial can be run quickly. As you will see, one of the major drawbacks of using GPs is their speed, although this is an active area of research. We will briefly touch on some ways to speed things up at the end of this tutorial. We transform the original data with non-linear operations in order to demonstrate the power of GPs to work on non-linear relationships, while keeping the problem reasonably small.\n\ndata = dataset(\"datasets\", \"iris\")\nspecies = data[!, \"Species\"]\nindex = shuffle(1:150)\n# we extract the four measured quantities,\n# so the dimension of the data is only d=4 for this toy example\ndat = Matrix(data[index, 1:4])\nlabels = data[index, \"Species\"]\n\n# non-linearize data to demonstrate ability of GPs to deal with non-linearity\ndat[:, 1] = 0.5 * dat[:, 1] .^ 2 + 0.1 * dat[:, 1] .^ 3\ndat[:, 2] = dat[:, 2] .^ 3 + 0.2 * dat[:, 2] .^ 4\ndat[:, 3] = 0.1 * exp.(dat[:, 3]) - 0.2 * dat[:, 3] .^ 2\ndat[:, 4] = 0.5 * log.(dat[:, 4]) .^ 2 + 0.01 * dat[:, 3] .^ 5\n\n# normalize data\ndt = fit(ZScoreTransform, dat; dims=1);\nStatsBase.transform!(dt, dat);\n\nWe will start out by demonstrating the basic similarity between pPCA (see the tutorial on this topic) and the GPLVM model. Indeed, pPCA is basically equivalent to running the GPLVM model with an automatic relevance determination (ARD) linear kernel.\nFirst, we re-introduce the pPCA model (see the tutorial on pPCA for details)\n\n@model function pPCA(x)\n # Dimensionality of the problem.\n N, D = size(x)\n # latent variable z\n z ~ filldist(Normal(), D, N)\n # weights/loadings W\n w ~ filldist(Normal(), D, D)\n mu = (w * z)'\n for d in 1:D\n x[:, d] ~ MvNormal(mu[:, d], I)\n end\n return nothing\nend;\n\nWe define two different kernels, a simple linear kernel with an Automatic Relevance Determination transform and a squared exponential kernel.\n\nlinear_kernel(α) = LinearKernel() ∘ ARDTransform(α)\nsekernel(α, σ) = σ * SqExponentialKernel() ∘ ARDTransform(α);\n\nAnd here is the GPLVM model. We create separate models for the two types of kernel.\n\n@model function GPLVM_linear(Y, K)\n # Dimensionality of the problem.\n N, D = size(Y)\n # K is the dimension of the latent space\n @assert K <= D\n noise = 1e-3\n\n # Priors\n α ~ MvLogNormal(MvNormal(Zeros(K), I))\n Z ~ filldist(Normal(), K, N)\n mu ~ filldist(Normal(), N)\n\n gp = GP(linear_kernel(α))\n gpz = gp(ColVecs(Z), noise)\n Y ~ filldist(MvNormal(mu, cov(gpz)), D)\n\n return nothing\nend;\n\n@model function GPLVM(Y, K)\n # Dimensionality of the problem.\n N, D = size(Y)\n # K is the dimension of the latent space\n @assert K <= D\n noise = 1e-3\n\n # Priors\n α ~ MvLogNormal(MvNormal(Zeros(K), I))\n σ ~ LogNormal(0.0, 1.0)\n Z ~ filldist(Normal(), K, N)\n mu ~ filldist(Normal(), N)\n\n gp = GP(sekernel(α, σ))\n gpz = gp(ColVecs(Z), noise)\n Y ~ filldist(MvNormal(mu, cov(gpz)), D)\n\n return nothing\nend;\n\n\n# Standard GPs don't scale very well in n, so we use a small subsample for the purpose of this tutorial\nn_data = 40\n# number of features to use from dataset\nn_features = 4\n# latent dimension for GP case\nndim = 4;\n\n\nppca = pPCA(dat[1:n_data, 1:n_features])\nchain_ppca = sample(ppca, NUTS{Turing.ReverseDiffAD{true}}(), 1000);\n\n\n# we extract the posterior mean estimates of the parameters from the chain\nz_mean = reshape(mean(group(chain_ppca, :z))[:, 2], (n_features, n_data))\nscatter(z_mean[1, :], z_mean[2, :]; group=labels[1:n_data], xlabel=L\"z_1\", ylabel=L\"z_2\")\n\nWe can see that the pPCA fails to distinguish the groups. In particular, the setosa species is not clearly separated from versicolor and virginica. This is due to the non-linearities that we introduced, as without them the two groups can be clearly distinguished using pPCA (see the pPCA tutorial).\nLet’s try the same with our linear kernel GPLVM model.\n\ngplvm_linear = GPLVM_linear(dat[1:n_data, 1:n_features], ndim)\nchain_linear = sample(gplvm_linear, NUTS{Turing.ReverseDiffAD{true}}(), 500);\n\n\n# we extract the posterior mean estimates of the parameters from the chain\nz_mean = reshape(mean(group(chain_linear, :Z))[:, 2], (n_features, n_data))\nalpha_mean = mean(group(chain_linear, :α))[:, 2]\n\nalpha1, alpha2 = partialsortperm(alpha_mean, 1:2; rev=true)\nscatter(\n z_mean[alpha1, :],\n z_mean[alpha2, :];\n group=labels[1:n_data],\n xlabel=L\"z_{\\mathrm{ard}_1}\",\n ylabel=L\"z_{\\mathrm{ard}_2}\",\n)\n\nWe can see that similar to the pPCA case, the linear kernel GPLVM fails to distinguish between the two groups (setosa on the one hand, and virginica and verticolor on the other).\nFinally, we demonstrate that by changing the kernel to a non-linear function, we are able to separate the data again.\n\ngplvm = GPLVM(dat[1:n_data, 1:n_features], ndim)\nchain_gplvm = sample(gplvm, NUTS{Turing.ReverseDiffAD{true}}(), 500);\n\n\n# we extract the posterior mean estimates of the parameters from the chain\nz_mean = reshape(mean(group(chain_gplvm, :Z))[:, 2], (ndim, n_data))\nalpha_mean = mean(group(chain_gplvm, :α))[:, 2]\n\nalpha1, alpha2 = partialsortperm(alpha_mean, 1:2; rev=true)\nscatter(\n z_mean[alpha1, :],\n z_mean[alpha2, :];\n group=labels[1:n_data],\n xlabel=L\"z_{\\mathrm{ard}_1}\",\n ylabel=L\"z_{\\mathrm{ard}_2}\",\n)\n\n\nlet\n @assert abs(\n mean(z_mean[alpha1, labels[1:n_data] .== \"setosa\"]) -\n mean(z_mean[alpha1, labels[1:n_data] .!= \"setosa\"]),\n ) > 1\nend\n\nNow, the split between the two groups is visible again.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Tutorials", "Gaussian Process Latent Variable Models" ] }, { "objectID": "tutorials/bayesian-time-series-analysis/index.html", "href": "tutorials/bayesian-time-series-analysis/index.html", "title": "Bayesian Time Series Analysis", "section": "", "text": "In time series analysis we are often interested in understanding how various real-life circumstances impact our quantity of interest. These can be, for instance, season, day of week, or time of day. To analyse this it is useful to decompose time series into simpler components (corresponding to relevant circumstances) and infer their relevance. In this tutorial we are going to use Turing for time series analysis and learn about useful ways to decompose time series.\n\nModelling time series\nBefore we start coding, let us talk about what exactly we mean with time series decomposition. In a nutshell, it is a divide-and-conquer approach where we express a time series as a sum or a product of simpler series. For instance, the time series \\(f(t)\\) can be decomposed into a sum of \\(n\\) components\n\\[f(t) = \\sum_{i=1}^n f_i(t),\\]\nor we can decompose \\(g(t)\\) into a product of \\(m\\) components\n\\[g(t) = \\prod_{i=1}^m g_i(t).\\]\nWe refer to this as additive or multiplicative decomposition respectively. This type of decomposition is great since it lets us reason about individual components, which makes encoding prior information and interpreting model predictions very easy. Two common components are trends, which represent the overall change of the time series (often assumed to be linear), and cyclic effects which contribute oscillating effects around the trend. Let us simulate some data with an additive linear trend and oscillating effects.\n\nusing Turing\nusing FillArrays\nusing StatsPlots\n\nusing LinearAlgebra\nusing Random\nusing Statistics\n\nRandom.seed!(12345)\n\ntrue_sin_freq = 2\ntrue_sin_amp = 5\ntrue_cos_freq = 7\ntrue_cos_amp = 2.5\ntmax = 10\nβ_true = 2\nα_true = -1\ntt = 0:0.05:tmax\nf₁(t) = α_true + β_true * t\nf₂(t) = true_sin_amp * sinpi(2 * t * true_sin_freq / tmax)\nf₃(t) = true_cos_amp * cospi(2 * t * true_cos_freq / tmax)\nf(t) = f₁(t) + f₂(t) + f₃(t)\n\nplot(f, tt; label=\"f(t)\", title=\"Observed time series\", legend=:topleft, linewidth=3)\nplot!(\n [f₁, f₂, f₃],\n tt;\n label=[\"f₁(t)\" \"f₂(t)\" \"f₃(t)\"],\n style=[:dot :dash :dashdot],\n linewidth=1,\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nEven though we use simple components, combining them can give rise to fairly complex time series. In this time series, cyclic effects are just added on top of the trend. If we instead multiply the components the cyclic effects cause the series to oscillate between larger and larger values, since they get scaled by the trend.\n\ng(t) = f₁(t) * f₂(t) * f₃(t)\n\nplot(g, tt; label=\"f(t)\", title=\"Observed time series\", legend=:topleft, linewidth=3)\nplot!([f₁, f₂, f₃], tt; label=[\"f₁(t)\" \"f₂(t)\" \"f₃(t)\"], linewidth=1)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nUnlike \\(f\\), \\(g\\) oscillates around \\(0\\) since it is being multiplied with sines and cosines. To let a multiplicative decomposition oscillate around the trend we could define it as \\(\\tilde{g}(t) = f₁(t) * (1 + f₂(t)) * (1 + f₃(t)),\\) but for convenience we will leave it as is. The inference machinery is the same for both cases.\n\n\nModel fitting\nHaving discussed time series decomposition, let us fit a model to the time series above and recover the true parameters. Before building our model, we standardise the time axis to \\([0, 1]\\) and subtract the max of the time series. This helps convergence while maintaining interpretability and the correct scales for the cyclic components.\n\nσ_true = 0.35\nt = collect(tt[begin:3:end])\nt_min, t_max = extrema(t)\nx = (t .- t_min) ./ (t_max - t_min)\nyf = f.(t) .+ σ_true .* randn(size(t))\nyf_max = maximum(yf)\nyf = yf .- yf_max\n\nscatter(x, yf; title=\"Standardised data\", legend=false)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLet us now build our model. We want to assume a linear trend, and cyclic effects. Encoding a linear trend is easy enough, but what about cyclical effects? We will take a scattergun approach, and create multiple cyclical features using both sine and cosine functions and let our inference machinery figure out which to keep. To do this, we define how long a one period should be, and create features in reference to said period. How long a period should be is problem dependent, but as an example let us say it is \\(1\\) year. If we then find evidence for a cyclic effect with a frequency of 2, that would mean a biannual effect. A frequency of 4 would mean quarterly etc. Since we are using synthetic data, we are simply going to let the period be 1, which is the entire length of the time series.\n\nfreqs = 1:10\nnum_freqs = length(freqs)\nperiod = 1\ncyclic_features = [sinpi.(2 .* freqs' .* x ./ period) cospi.(2 .* freqs' .* x ./ period)]\n\nplot_freqs = [1, 3, 5]\nfreq_ptl = plot(\n cyclic_features[:, plot_freqs];\n label=permutedims([\"sin(2π$(f)x)\" for f in plot_freqs]),\n title=\"Cyclical features subset\",\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHaving constructed the cyclical features, we can finally build our model. The model we will implement looks like this\n\\[\nf(t) = \\alpha + \\beta_t t + \\sum_{i=1}^F \\beta_{\\sin{},i} \\sin{}(2\\pi f_i t) + \\sum_{i=1}^F \\beta_{\\cos{},i} \\cos{}(2\\pi f_i t),\n\\]\nwith a Gaussian likelihood \\(y \\sim \\mathcal{N}(f(t), \\sigma^2)\\). For convenience we are treating the cyclical feature weights \\(\\beta_{\\sin{},i}\\) and \\(\\beta_{\\cos{},i}\\) the same in code and weight them with \\(\\beta_c\\). And just because it is so easy, we parameterise our model with the operation with which to apply the cyclic effects. This lets us use the exact same code for both additive and multiplicative models. Finally, we plot prior predictive samples to make sure our priors make sense.\n\n@model function decomp_model(t, c, op)\n α ~ Normal(0, 10)\n βt ~ Normal(0, 2)\n βc ~ MvNormal(Zeros(size(c, 2)), I)\n σ ~ truncated(Normal(0, 0.1); lower=0)\n\n cyclic = c * βc\n trend = α .+ βt .* t\n μ = op(trend, cyclic)\n y ~ MvNormal(μ, σ^2 * I)\n return (; trend, cyclic)\nend\n\ny_prior_samples = mapreduce(hcat, 1:100) do _\n rand(decomp_model(t, cyclic_features, +)).y\nend\nplot(t, y_prior_samples; linewidth=1, alpha=0.5, color=1, label=\"\", title=\"Prior samples\")\nscatter!(t, yf; color=2, label=\"Data\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWith the model specified and with a reasonable prior we can now let Turing decompose the time series for us!\n\nfunction mean_ribbon(samples)\n qs = quantile(samples)\n low = qs[:, Symbol(\"2.5%\")]\n up = qs[:, Symbol(\"97.5%\")]\n m = mean(samples)[:, :mean]\n return m, (m - low, up - m)\nend\n\nfunction get_decomposition(model, x, cyclic_features, chain, op)\n chain_params = Turing.MCMCChains.get_sections(chain, :parameters)\n return generated_quantities(model(x, cyclic_features, op), chain_params)\nend\n\nfunction plot_fit(x, y, decomp, ymax)\n trend = mapreduce(x -> x.trend, hcat, decomp)\n cyclic = mapreduce(x -> x.cyclic, hcat, decomp)\n\n trend_plt = plot(\n x,\n trend .+ ymax;\n color=1,\n label=nothing,\n alpha=0.2,\n title=\"Trend\",\n xlabel=\"Time\",\n ylabel=\"f₁(t)\",\n )\n ls = [ones(length(t)) t] \\ y\n α̂, β̂ = ls[1], ls[2:end]\n plot!(\n trend_plt,\n t,\n α̂ .+ t .* β̂ .+ ymax;\n label=\"Least squares trend\",\n color=5,\n linewidth=4,\n )\n\n scatter!(trend_plt, x, y .+ ymax; label=nothing, color=2, legend=:topleft)\n cyclic_plt = plot(\n x,\n cyclic;\n color=1,\n label=nothing,\n alpha=0.2,\n title=\"Cyclic effect\",\n xlabel=\"Time\",\n ylabel=\"f₂(t)\",\n )\n return trend_plt, cyclic_plt\nend\n\nchain = sample(decomp_model(x, cyclic_features, +) | (; y=yf), NUTS(), 2000, progress=false)\nyf_samples = predict(decomp_model(x, cyclic_features, +), chain)\nm, conf = mean_ribbon(yf_samples)\npredictive_plt = plot(\n t,\n m .+ yf_max;\n ribbon=conf,\n label=\"Posterior density\",\n title=\"Posterior decomposition\",\n xlabel=\"Time\",\n ylabel=\"f(t)\",\n)\nscatter!(predictive_plt, t, yf .+ yf_max; color=2, label=\"Data\", legend=:topleft)\n\ndecomp = get_decomposition(decomp_model, x, cyclic_features, chain, +)\ndecomposed_plt = plot_fit(t, yf, decomp, yf_max)\nplot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))\n\n┌ Info: Found initial step size\n└ ϵ = 0.025\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInference is successful and the posterior beautifully captures the data. We see that the least squares linear fit deviates somewhat from the posterior trend. Since our model takes cyclic effects into account separately, we get a better estimate of the true overall trend than if we would have just fitted a line. But what frequency content did the model identify?\n\nfunction plot_cyclic_features(βsin, βcos)\n labels = reshape([\"freq = $i\" for i in freqs], 1, :)\n colors = collect(freqs)'\n style = reshape([i <= 10 ? :solid : :dash for i in 1:length(labels)], 1, :)\n sin_features_plt = density(\n βsin[:, :, 1];\n title=\"Sine features posterior\",\n label=labels,\n ylabel=\"Density\",\n xlabel=\"Weight\",\n color=colors,\n linestyle=style,\n legend=nothing,\n )\n cos_features_plt = density(\n βcos[:, :, 1];\n title=\"Cosine features posterior\",\n ylabel=\"Density\",\n xlabel=\"Weight\",\n label=nothing,\n color=colors,\n linestyle=style,\n )\n\n return seasonal_features_plt = plot(\n sin_features_plt,\n cos_features_plt;\n layout=(2, 1),\n size=(800, 600),\n legend=:outerright,\n )\nend\n\nβc = Array(group(chain, :βc))\nplot_cyclic_features(βc[:, begin:num_freqs, :], βc[:, (num_freqs + 1):end, :])\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPlotting the posterior over the cyclic features reveals that the model managed to extract the true frequency content.\nSince we wrote our model to accept a combining operator, we can easily run the same analysis for a multiplicative model.\n\nyg = g.(t) .+ σ_true .* randn(size(t))\n\ny_prior_samples = mapreduce(hcat, 1:100) do _\n rand(decomp_model(t, cyclic_features, .*)).y\nend\nplot(t, y_prior_samples; linewidth=1, alpha=0.5, color=1, label=\"\", title=\"Prior samples\")\nscatter!(t, yf; color=2, label=\"Data\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nchain = sample(decomp_model(x, cyclic_features, .*) | (; y=yg), NUTS(), 2000, progress=false)\nyg_samples = predict(decomp_model(x, cyclic_features, .*), chain)\nm, conf = mean_ribbon(yg_samples)\npredictive_plt = plot(\n t,\n m;\n ribbon=conf,\n label=\"Posterior density\",\n title=\"Posterior decomposition\",\n xlabel=\"Time\",\n ylabel=\"g(t)\",\n)\nscatter!(predictive_plt, t, yg; color=2, label=\"Data\", legend=:topleft)\n\ndecomp = get_decomposition(decomp_model, x, cyclic_features, chain, .*)\ndecomposed_plt = plot_fit(t, yg, decomp, 0)\nplot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))\n\n┌ Info: Found initial step size\n└ ϵ = 0.0015625\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe model fits! What about the infered cyclic components?\n\nβc = Array(group(chain, :βc))\nplot_cyclic_features(βc[:, begin:num_freqs, :], βc[:, (num_freqs + 1):end, :])\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWhile multiplicative model fits to the data, it does not recover the true parameters for this dataset.\n\n\nWrapping up\nIn this tutorial we have seen how to implement and fit time series models using additive and multiplicative decomposition. We also saw how to visualise the model fit, and how to interpret learned cyclical components.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Tutorials", "Bayesian Time Series Analysis" ] }, { "objectID": "tutorials/bayesian-logistic-regression/index.html", "href": "tutorials/bayesian-logistic-regression/index.html", "title": "Bayesian Logistic Regression", "section": "", "text": "Bayesian logistic regression is the Bayesian counterpart to a common tool in machine learning, logistic regression. The goal of logistic regression is to predict a one or a zero for a given training item. An example might be predicting whether someone is sick or ill given their symptoms and personal information.\nIn our example, we’ll be working to predict whether someone is likely to default with a synthetic dataset found in the RDatasets package. This dataset, Defaults, comes from R’s ISLR package and contains information on borrowers.\nTo start, let’s import all the libraries we’ll need.\n# Import Turing and Distributions.\nusing Turing, Distributions\n\n# Import RDatasets.\nusing RDatasets\n\n# Import MCMCChains, Plots, and StatsPlots for visualizations and diagnostics.\nusing MCMCChains, Plots, StatsPlots\n\n# We need a logistic function, which is provided by StatsFuns.\nusing StatsFuns: logistic\n\n# Functionality for splitting and normalizing the data\nusing MLDataUtils: shuffleobs, stratifiedobs, rescale!\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(0);", "crumbs": [ "Get Started", "Tutorials", "Bayesian Logistic Regression" ] }, { "objectID": "tutorials/bayesian-logistic-regression/index.html#data-cleaning-set-up", "href": "tutorials/bayesian-logistic-regression/index.html#data-cleaning-set-up", "title": "Bayesian Logistic Regression", "section": "Data Cleaning & Set Up", "text": "Data Cleaning & Set Up\nNow we’re going to import our dataset. The first six rows of the dataset are shown below so you can get a good feel for what kind of data we have.\n\n# Import the \"Default\" dataset.\ndata = RDatasets.dataset(\"ISLR\", \"Default\");\n\n# Show the first six rows of the dataset.\nfirst(data, 6)\n\n6×4 DataFrame\n\n\n\nRow\nDefault\nStudent\nBalance\nIncome\n\n\n\nCat…\nCat…\nFloat64\nFloat64\n\n\n\n\n1\nNo\nNo\n729.526\n44361.6\n\n\n2\nNo\nYes\n817.18\n12106.1\n\n\n3\nNo\nNo\n1073.55\n31767.1\n\n\n4\nNo\nNo\n529.251\n35704.5\n\n\n5\nNo\nNo\n785.656\n38463.5\n\n\n6\nNo\nYes\n919.589\n7491.56\n\n\n\n\n\n\nMost machine learning processes require some effort to tidy up the data, and this is no different. We need to convert the Default and Student columns, which say “Yes” or “No” into 1s and 0s. Afterwards, we’ll get rid of the old words-based columns.\n\n# Convert \"Default\" and \"Student\" to numeric values.\ndata[!, :DefaultNum] = [r.Default == \"Yes\" ? 1.0 : 0.0 for r in eachrow(data)]\ndata[!, :StudentNum] = [r.Student == \"Yes\" ? 1.0 : 0.0 for r in eachrow(data)]\n\n# Delete the old columns which say \"Yes\" and \"No\".\nselect!(data, Not([:Default, :Student]))\n\n# Show the first six rows of our edited dataset.\nfirst(data, 6)\n\n6×4 DataFrame\n\n\n\nRow\nBalance\nIncome\nDefaultNum\nStudentNum\n\n\n\nFloat64\nFloat64\nFloat64\nFloat64\n\n\n\n\n1\n729.526\n44361.6\n0.0\n0.0\n\n\n2\n817.18\n12106.1\n0.0\n1.0\n\n\n3\n1073.55\n31767.1\n0.0\n0.0\n\n\n4\n529.251\n35704.5\n0.0\n0.0\n\n\n5\n785.656\n38463.5\n0.0\n0.0\n\n\n6\n919.589\n7491.56\n0.0\n1.0\n\n\n\n\n\n\nAfter we’ve done that tidying, it’s time to split our dataset into training and testing sets, and separate the labels from the data. We separate our data into two halves, train and test. You can use a higher percentage of splitting (or a lower one) by modifying the at = 0.05 argument. We have highlighted the use of only a 5% sample to show the power of Bayesian inference with small sample sizes.\nWe must rescale our variables so that they are centered around zero by subtracting each column by the mean and dividing it by the standard deviation. Without this step, Turing’s sampler will have a hard time finding a place to start searching for parameter estimates. To do this we will leverage MLDataUtils, which also lets us effortlessly shuffle our observations and perform a stratified split to get a representative test set.\n\nfunction split_data(df, target; at=0.70)\n shuffled = shuffleobs(df)\n return trainset, testset = stratifiedobs(row -> row[target], shuffled; p=at)\nend\n\nfeatures = [:StudentNum, :Balance, :Income]\nnumerics = [:Balance, :Income]\ntarget = :DefaultNum\n\ntrainset, testset = split_data(data, target; at=0.05)\nfor feature in numerics\n μ, σ = rescale!(trainset[!, feature]; obsdim=1)\n rescale!(testset[!, feature], μ, σ; obsdim=1)\nend\n\n# Turing requires data in matrix form, not dataframe\ntrain = Matrix(trainset[:, features])\ntest = Matrix(testset[:, features])\ntrain_label = trainset[:, target]\ntest_label = testset[:, target];", "crumbs": [ "Get Started", "Tutorials", "Bayesian Logistic Regression" ] }, { "objectID": "tutorials/bayesian-logistic-regression/index.html#model-declaration", "href": "tutorials/bayesian-logistic-regression/index.html#model-declaration", "title": "Bayesian Logistic Regression", "section": "Model Declaration", "text": "Model Declaration\nFinally, we can define our model.\nlogistic_regression takes four arguments:\n\nx is our set of independent variables;\ny is the element we want to predict;\nn is the number of observations we have; and\nσ is the standard deviation we want to assume for our priors.\n\nWithin the model, we create four coefficients (intercept, student, balance, and income) and assign a prior of normally distributed with means of zero and standard deviations of σ. We want to find values of these four coefficients to predict any given y.\nThe for block creates a variable v which is the logistic function. We then observe the likelihood of calculating v given the actual label, y[i].\n\n# Bayesian logistic regression (LR)\n@model function logistic_regression(x, y, n, σ)\n intercept ~ Normal(0, σ)\n\n student ~ Normal(0, σ)\n balance ~ Normal(0, σ)\n income ~ Normal(0, σ)\n\n for i in 1:n\n v = logistic(intercept + student * x[i, 1] + balance * x[i, 2] + income * x[i, 3])\n y[i] ~ Bernoulli(v)\n end\nend;", "crumbs": [ "Get Started", "Tutorials", "Bayesian Logistic Regression" ] }, { "objectID": "tutorials/bayesian-logistic-regression/index.html#sampling", "href": "tutorials/bayesian-logistic-regression/index.html#sampling", "title": "Bayesian Logistic Regression", "section": "Sampling", "text": "Sampling\nNow we can run our sampler. This time we’ll use NUTS to sample from our posterior.\n\nsetprogress!(false)\n\n\n# Retrieve the number of observations.\nn, _ = size(train)\n\n# Sample using NUTS.\nm = logistic_regression(train, train_label, n, 1)\nchain = sample(m, NUTS(), MCMCThreads(), 1_500, 3)\n\n\n\nChains MCMC chain (1500×16×3 Array{Float64, 3}):\n\nIterations = 751:1:2250\nNumber of chains = 3\nSamples per chain = 1500\nWall duration = 12.15 seconds\nCompute duration = 9.81 seconds\nparameters = intercept, student, balance, income\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n intercept -4.4125 0.4324 0.0099 1925.3272 2581.3685 1.0018 ⋯\n student -0.3758 0.6218 0.0122 2575.8226 2517.5469 1.0013 ⋯\n balance 1.8527 0.2800 0.0061 2080.2853 2362.6946 1.0012 ⋯\n income 0.2971 0.3097 0.0060 2679.7466 2643.8063 1.0002 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n intercept -5.2827 -4.7058 -4.4081 -4.1118 -3.5812\n student -1.6228 -0.7953 -0.3586 0.0397 0.8516\n balance 1.3099 1.6601 1.8536 2.0375 2.4132\n income -0.3160 0.0929 0.2937 0.5060 0.9065\n\n\n\n\n\n\n\n\nSampling With Multiple Threads\n\n\n\n\n\nThe sample() call above assumes that you have at least nchains threads available in your Julia instance. If you do not, the multiple chains will run sequentially, and you may notice a warning. For more information, see the Turing documentation on sampling multiple chains.\n\n\n\nSince we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.\n\nplot(chain)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLooks good!\nWe can also use the corner function from MCMCChains to show the distributions of the various parameters of our logistic regression.\n\n# The labels to use.\nl = [:student, :balance, :income]\n\n# Use the corner function. Requires StatsPlots and MCMCChains.\ncorner(chain, l)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFortunately the corner plot appears to demonstrate unimodal distributions for each of our parameters, so it should be straightforward to take the means of each parameter’s sampled values to estimate our model to make predictions.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Logistic Regression" ] }, { "objectID": "tutorials/bayesian-logistic-regression/index.html#making-predictions", "href": "tutorials/bayesian-logistic-regression/index.html#making-predictions", "title": "Bayesian Logistic Regression", "section": "Making Predictions", "text": "Making Predictions\nHow do we test how well the model actually predicts whether someone is likely to default? We need to build a prediction function that takes the test object we made earlier and runs it through the average parameter calculated during sampling.\nThe prediction function below takes a Matrix and a Chain object. It takes the mean of each parameter’s sampled values and re-runs the logistic function using those mean values for every element in the test set.\n\nfunction prediction(x::Matrix, chain, threshold)\n # Pull the means from each parameter's sampled values in the chain.\n intercept = mean(chain[:intercept])\n student = mean(chain[:student])\n balance = mean(chain[:balance])\n income = mean(chain[:income])\n\n # Retrieve the number of rows.\n n, _ = size(x)\n\n # Generate a vector to store our predictions.\n v = Vector{Float64}(undef, n)\n\n # Calculate the logistic function for each element in the test set.\n for i in 1:n\n num = logistic(\n intercept .+ student * x[i, 1] + balance * x[i, 2] + income * x[i, 3]\n )\n if num >= threshold\n v[i] = 1\n else\n v[i] = 0\n end\n end\n return v\nend;\n\nLet’s see how we did! We run the test matrix through the prediction function, and compute the mean squared error (MSE) for our prediction. The threshold variable sets the sensitivity of the predictions. For example, a threshold of 0.07 will predict a defualt value of 1 for any predicted value greater than 0.07 and no default if it is less than 0.07.\n\n# Set the prediction threshold.\nthreshold = 0.07\n\n# Make the predictions.\npredictions = prediction(test, chain, threshold)\n\n# Calculate MSE for our test set.\nloss = sum((predictions - test_label) .^ 2) / length(test_label)\n\n0.12978947368421054\n\n\nPerhaps more important is to see what percentage of defaults we correctly predicted. The code below simply counts defaults and predictions and presents the results.\n\ndefaults = sum(test_label)\nnot_defaults = length(test_label) - defaults\n\npredicted_defaults = sum(test_label .== predictions .== 1)\npredicted_not_defaults = sum(test_label .== predictions .== 0)\n\nprintln(\"Defaults: $defaults\n Predictions: $predicted_defaults\n Percentage defaults correct $(predicted_defaults/defaults)\")\n\nprintln(\"Not defaults: $not_defaults\n Predictions: $predicted_not_defaults\n Percentage non-defaults correct $(predicted_not_defaults/not_defaults)\")\n\nDefaults: 316.0\n Predictions: 273\n Percentage defaults correct 0.8639240506329114\nNot defaults: 9184.0\n Predictions: 7994\n Percentage non-defaults correct 0.8704268292682927\n\n\nThe above shows that with a threshold of 0.07, we correctly predict a respectable portion of the defaults, and correctly identify most non-defaults. This is fairly sensitive to a choice of threshold, and you may wish to experiment with it.\nThis tutorial has demonstrated how to use Turing to perform Bayesian logistic regression.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Logistic Regression" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html", "href": "tutorials/bayesian-differential-equations/index.html", "title": "Bayesian Differential Equations", "section": "", "text": "Most of the scientific community deals with the basic problem of trying to mathematically model the reality around them and this often involves dynamical systems. The general trend to model these complex dynamical systems is through the use of differential equations. Differential equation models often have non-measurable parameters. The popular “forward-problem” of simulation consists of solving the differential equations for a given set of parameters, the “inverse problem” to simulation, known as parameter estimation, is the process of utilizing data to determine these model parameters. Bayesian inference provides a robust approach to parameter estimation with quantified uncertainty.\nusing Turing\nusing DifferentialEquations\n\n# Load StatsPlots for visualizations and diagnostics.\nusing StatsPlots\n\nusing LinearAlgebra\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(14);", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html#the-lotka-volterra-model", "href": "tutorials/bayesian-differential-equations/index.html#the-lotka-volterra-model", "title": "Bayesian Differential Equations", "section": "The Lotka-Volterra Model", "text": "The Lotka-Volterra Model\nThe Lotka–Volterra equations, also known as the predator–prey equations, are a pair of first-order nonlinear differential equations. These differential equations are frequently used to describe the dynamics of biological systems in which two species interact, one as a predator and the other as prey. The populations change through time according to the pair of equations\n\\[\n\\begin{aligned}\n\\frac{\\mathrm{d}x}{\\mathrm{d}t} &= (\\alpha - \\beta y(t))x(t), \\\\\n\\frac{\\mathrm{d}y}{\\mathrm{d}t} &= (\\delta x(t) - \\gamma)y(t)\n\\end{aligned}\n\\]\nwhere \\(x(t)\\) and \\(y(t)\\) denote the populations of prey and predator at time \\(t\\), respectively, and \\(\\alpha, \\beta, \\gamma, \\delta\\) are positive parameters.\nWe implement the Lotka-Volterra model and simulate it with parameters \\(\\alpha = 1.5\\), \\(\\beta = 1\\), \\(\\gamma = 3\\), and \\(\\delta = 1\\) and initial conditions \\(x(0) = y(0) = 1\\).\n\n# Define Lotka-Volterra model.\nfunction lotka_volterra(du, u, p, t)\n # Model parameters.\n α, β, γ, δ = p\n # Current state.\n x, y = u\n\n # Evaluate differential equations.\n du[1] = (α - β * y) * x # prey\n du[2] = (δ * x - γ) * y # predator\n\n return nothing\nend\n\n# Define initial-value problem.\nu0 = [1.0, 1.0]\np = [1.5, 1.0, 3.0, 1.0]\ntspan = (0.0, 10.0)\nprob = ODEProblem(lotka_volterra, u0, tspan, p)\n\n# Plot simulation.\nplot(solve(prob, Tsit5()))\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe generate noisy observations to use for the parameter estimation tasks in this tutorial. With the saveat argument we specify that the solution is stored only at 0.1 time units. To make the example more realistic we add random normally distributed noise to the simulation.\n\nsol = solve(prob, Tsit5(); saveat=0.1)\nodedata = Array(sol) + 0.8 * randn(size(Array(sol)))\n\n# Plot simulation and noisy observations.\nplot(sol; alpha=0.3)\nscatter!(sol.t, odedata'; color=[1 2], label=\"\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAlternatively, we can use real-world data from Hudson’s Bay Company records (an Stan implementation with slightly different priors can be found here: https://mc-stan.org/users/documentation/case-studies/lotka-volterra-predator-prey.html).", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html#direct-handling-of-bayesian-estimation-with-turing", "href": "tutorials/bayesian-differential-equations/index.html#direct-handling-of-bayesian-estimation-with-turing", "title": "Bayesian Differential Equations", "section": "Direct Handling of Bayesian Estimation with Turing", "text": "Direct Handling of Bayesian Estimation with Turing\nPreviously, functions in Turing and DifferentialEquations were not inter-composable, so Bayesian inference of differential equations needed to be handled by another package called DiffEqBayes.jl (note that DiffEqBayes works also with CmdStan.jl, Turing.jl, DynamicHMC.jl and ApproxBayes.jl - see the DiffEqBayes docs for more info).\nNowadays, however, Turing and DifferentialEquations are completely composable and we can just simulate differential equations inside a Turing @model. Therefore, we write the Lotka-Volterra parameter estimation problem using the Turing @model macro as below:\n\n@model function fitlv(data, prob)\n # Prior distributions.\n σ ~ InverseGamma(2, 3)\n α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)\n β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)\n γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)\n δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)\n\n # Simulate Lotka-Volterra model. \n p = [α, β, γ, δ]\n predicted = solve(prob, Tsit5(); p=p, saveat=0.1)\n\n # Observations.\n for i in 1:length(predicted)\n data[:, i] ~ MvNormal(predicted[i], σ^2 * I)\n end\n\n return nothing\nend\n\nmodel = fitlv(odedata, prob)\n\n# Sample 3 independent chains with forward-mode automatic differentiation (the default).\nchain = sample(model, NUTS(), MCMCSerial(), 1000, 3; progress=false)\n\n┌ Info: Found initial step size\n└ ϵ = 0.2\n┌ Info: Found initial step size\n└ ϵ = 0.00625\n┌ Info: Found initial step size\n└ ϵ = 0.025\n\n\nChains MCMC chain (1000×17×3 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 3\nSamples per chain = 1000\nWall duration = 46.85 seconds\nCompute duration = 44.25 seconds\nparameters = σ, α, β, γ, δ\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n σ 1.1939 0.5920 0.2404 9.5877 80.3641 1.6629 ⋯\n α 1.3844 0.1382 0.0498 9.7484 79.5412 1.6535 ⋯\n β 0.9535 0.0973 0.0279 15.5082 85.1761 1.3124 ⋯\n γ 2.4582 0.9462 0.3829 9.6632 79.8928 1.6648 ⋯\n δ 0.8860 0.2302 0.0910 9.6781 70.1185 1.6648 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n σ 0.7115 0.7630 0.8034 1.9549 2.1849\n α 1.0418 1.2857 1.4388 1.4819 1.5520\n β 0.7265 0.9146 0.9770 1.0179 1.0936\n γ 1.0103 1.1924 3.0233 3.1616 3.3819\n δ 0.4813 0.6153 1.0072 1.0585 1.1369\n\n\nThe estimated parameters are close to the parameter values the observations were generated with. We can also check visually that the chains have converged.\n\nplot(chain)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nData retrodiction\nIn Bayesian analysis it is often useful to retrodict the data, i.e. generate simulated data using samples from the posterior distribution, and compare to the original data (see for instance section 3.3.2 - model checking of McElreath’s book “Statistical Rethinking”). Here, we solve the ODE for 300 randomly picked posterior samples in the chain. We plot the ensemble of solutions to check if the solution resembles the data. The 300 retrodicted time courses from the posterior are plotted in gray, the noisy observations are shown as blue and red dots, and the green and purple lines are the ODE solution that was used to generate the data.\n\nplot(; legend=false)\nposterior_samples = sample(chain[[:α, :β, :γ, :δ]], 300; replace=false)\nfor p in eachrow(Array(posterior_samples))\n sol_p = solve(prob, Tsit5(); p=p, saveat=0.1)\n plot!(sol_p; alpha=0.1, color=\"#BBBBBB\")\nend\n\n# Plot simulation and noisy observations.\nplot!(sol; color=[1 2], linewidth=1)\nscatter!(sol.t, odedata'; color=[1 2])\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can see that, even though we added quite a bit of noise to the data the posterior distribution reproduces quite accurately the “true” ODE solution.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html#lotka-volterra-model-without-data-of-prey", "href": "tutorials/bayesian-differential-equations/index.html#lotka-volterra-model-without-data-of-prey", "title": "Bayesian Differential Equations", "section": "Lotka-Volterra model without data of prey", "text": "Lotka-Volterra model without data of prey\nOne can also perform parameter inference for a Lotka-Volterra model with incomplete data. For instance, let us suppose we have only observations of the predators but not of the prey. I.e., we fit the model only to the \\(y\\) variable of the system without providing any data for \\(x\\):\n\n@model function fitlv2(data::AbstractVector, prob)\n # Prior distributions.\n σ ~ InverseGamma(2, 3)\n α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)\n β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)\n γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)\n δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)\n\n # Simulate Lotka-Volterra model but save only the second state of the system (predators).\n p = [α, β, γ, δ]\n predicted = solve(prob, Tsit5(); p=p, saveat=0.1, save_idxs=2)\n\n # Observations of the predators.\n data ~ MvNormal(predicted.u, σ^2 * I)\n\n return nothing\nend\n\nmodel2 = fitlv2(odedata[2, :], prob)\n\n# Sample 3 independent chains.\nchain2 = sample(model2, NUTS(0.45), MCMCSerial(), 5000, 3; progress=false)\n\n┌ Info: Found initial step size\n└ ϵ = 0.025\n┌ Info: Found initial step size\n└ ϵ = 0.0125\n┌ Info: Found initial step size\n└ ϵ = 0.0125\n\n\nChains MCMC chain (5000×17×3 Array{Float64, 3}):\n\nIterations = 1001:1:6000\nNumber of chains = 3\nSamples per chain = 5000\nWall duration = 29.88 seconds\nCompute duration = 29.49 seconds\nparameters = σ, α, β, γ, δ\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n σ 0.8114 0.0569 0.0062 79.3526 101.1938 1.0272 ⋯\n α 1.7041 0.1874 0.0178 107.8966 365.1547 1.0548 ⋯\n β 1.1881 0.1471 0.0138 111.2735 362.0397 1.0549 ⋯\n γ 2.8614 0.2488 0.0224 120.9846 190.0660 1.0474 ⋯\n δ 0.7642 0.2059 0.0190 109.0147 363.6769 1.0535 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n σ 0.7192 0.7686 0.8056 0.8491 0.9344\n α 1.3638 1.5797 1.6871 1.8186 2.1269\n β 0.9230 1.0872 1.1740 1.2782 1.5226\n γ 2.3866 2.7007 2.8490 3.0185 3.3906\n δ 0.3980 0.6223 0.7503 0.8859 1.2372\n\n\nAgain we inspect the trajectories of 300 randomly selected posterior samples.\n\nplot(; legend=false)\nposterior_samples = sample(chain2[[:α, :β, :γ, :δ]], 300; replace=false)\nfor p in eachrow(Array(posterior_samples))\n sol_p = solve(prob, Tsit5(); p=p, saveat=0.1)\n plot!(sol_p; alpha=0.1, color=\"#BBBBBB\")\nend\n\n# Plot simulation and noisy observations.\nplot!(sol; color=[1 2], linewidth=1)\nscatter!(sol.t, odedata'; color=[1 2])\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNote that here the observations of the prey (blue dots) were not used in the parameter estimation! Yet, the model can predict the values of \\(x\\) relatively accurately, albeit with a wider distribution of solutions, reflecting the greater uncertainty in the prediction of the \\(x\\) values.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html#inference-of-delay-differential-equations", "href": "tutorials/bayesian-differential-equations/index.html#inference-of-delay-differential-equations", "title": "Bayesian Differential Equations", "section": "Inference of Delay Differential Equations", "text": "Inference of Delay Differential Equations\nHere we show an example of inference with another type of differential equation: a Delay Differential Equation (DDE). DDEs are differential equations where derivatives are function of values at an earlier point in time. This is useful to model a delayed effect, like incubation time of a virus for instance.\nHere is a delayed version of the Lokta-Voltera system:\n\\[\n\\begin{aligned}\n\\frac{\\mathrm{d}x}{\\mathrm{d}t} &= \\alpha x(t-\\tau) - \\beta y(t) x(t),\\\\\n\\frac{\\mathrm{d}y}{\\mathrm{d}t} &= - \\gamma y(t) + \\delta x(t) y(t),\n\\end{aligned}\n\\]\nwhere \\(\\tau\\) is a (positive) delay and \\(x(t-\\tau)\\) is the variable \\(x\\) at an earlier time point \\(t - \\tau\\).\nThe initial-value problem of the delayed system can be implemented as a DDEProblem. As described in the DDE example, here the function h is the history function that can be used to obtain a state at an earlier time point. Again we use parameters \\(\\alpha = 1.5\\), \\(\\beta = 1\\), \\(\\gamma = 3\\), and \\(\\delta = 1\\) and initial conditions \\(x(0) = y(0) = 1\\). Moreover, we assume \\(x(t) = 1\\) for \\(t < 0\\).\n\nfunction delay_lotka_volterra(du, u, h, p, t)\n # Model parameters.\n α, β, γ, δ = p\n\n # Current state.\n x, y = u\n # Evaluate differential equations\n du[1] = α * h(p, t - 1; idxs=1) - β * x * y\n du[2] = -γ * y + δ * x * y\n\n return nothing\nend\n\n# Define initial-value problem.\np = (1.5, 1.0, 3.0, 1.0)\nu0 = [1.0; 1.0]\ntspan = (0.0, 10.0)\nh(p, t; idxs::Int) = 1.0\nprob_dde = DDEProblem(delay_lotka_volterra, u0, h, tspan, p);\n\nWe generate observations by adding normally distributed noise to the results of our simulations.\n\nsol_dde = solve(prob_dde; saveat=0.1)\nddedata = Array(sol_dde) + 0.5 * randn(size(sol_dde))\n\n# Plot simulation and noisy observations.\nplot(sol_dde)\nscatter!(sol_dde.t, ddedata'; color=[1 2], label=\"\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNow we define the Turing model for the Lotka-Volterra model with delay and sample 3 independent chains.\n\n@model function fitlv_dde(data, prob)\n # Prior distributions.\n σ ~ InverseGamma(2, 3)\n α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)\n β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)\n γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)\n δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)\n\n # Simulate Lotka-Volterra model.\n p = [α, β, γ, δ]\n predicted = solve(prob, MethodOfSteps(Tsit5()); p=p, saveat=0.1)\n\n # Observations.\n for i in 1:length(predicted)\n data[:, i] ~ MvNormal(predicted[i], σ^2 * I)\n end\nend\n\nmodel_dde = fitlv_dde(ddedata, prob_dde)\n\n# Sample 3 independent chains.\nchain_dde = sample(model_dde, NUTS(), MCMCSerial(), 300, 3; progress=false)\n\n┌ Info: Found initial step size\n└ ϵ = 0.0125\n┌ Info: Found initial step size\n└ ϵ = 0.0125\n┌ Info: Found initial step size\n└ ϵ = 0.025\n\n\nChains MCMC chain (300×17×3 Array{Float64, 3}):\n\nIterations = 151:1:450\nNumber of chains = 3\nSamples per chain = 300\nWall duration = 12.41 seconds\nCompute duration = 12.17 seconds\nparameters = σ, α, β, γ, δ\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n σ 0.4743 0.0247 0.0010 629.1926 436.2981 1.0037 ⋯\n α 1.5013 0.0572 0.0039 225.7901 150.4257 1.0060 ⋯\n β 1.0191 0.0454 0.0027 270.1622 261.3597 1.0073 ⋯\n γ 3.0599 0.1440 0.0103 234.2060 143.3927 1.0029 ⋯\n δ 1.0064 0.0495 0.0036 224.7767 148.9292 1.0046 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n σ 0.4282 0.4579 0.4738 0.4915 0.5216\n α 1.3896 1.4619 1.5020 1.5383 1.6150\n β 0.9349 0.9877 1.0187 1.0480 1.1150\n γ 2.8119 2.9585 3.0530 3.1563 3.3474\n δ 0.9178 0.9721 1.0039 1.0393 1.1089\n\n\n\nplot(chain_dde)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFinally, plot trajectories of 300 randomly selected samples from the posterior. Again, the dots indicate our observations, the colored lines are the “true” simulations without noise, and the gray lines are trajectories from the posterior samples.\n\nplot(; legend=false)\nposterior_samples = sample(chain_dde[[:α, :β, :γ, :δ]], 300; replace=false)\nfor p in eachrow(Array(posterior_samples))\n sol_p = solve(prob_dde, MethodOfSteps(Tsit5()); p=p, saveat=0.1)\n plot!(sol_p; alpha=0.1, color=\"#BBBBBB\")\nend\n\n# Plot simulation and noisy observations.\nplot!(sol_dde; color=[1 2], linewidth=1)\nscatter!(sol_dde.t, ddedata'; color=[1 2])\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe fit is pretty good even though the data was quite noisy to start.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html#scaling-to-large-models-adjoint-sensitivities", "href": "tutorials/bayesian-differential-equations/index.html#scaling-to-large-models-adjoint-sensitivities", "title": "Bayesian Differential Equations", "section": "Scaling to Large Models: Adjoint Sensitivities", "text": "Scaling to Large Models: Adjoint Sensitivities\nDifferentialEquations.jl’s efficiency for large stiff models has been shown in multiple benchmarks. To learn more about how to optimize solving performance for stiff problems you can take a look at the docs.\nSensitivity analysis, or automatic differentiation (AD) of the solver, is provided by the DiffEq suite. The model sensitivities are the derivatives of the solution with respect to the parameters. Specifically, the local sensitivity of the solution to a parameter is defined by how much the solution would change by changes in the parameter. Sensitivity analysis provides a cheap way to calculate the gradient of the solution which can be used in parameter estimation and other optimization tasks.\nThe AD ecosystem in Julia allows you to switch between forward mode, reverse mode, source to source and other choices of AD and have it work with any Julia code. For a user to make use of this within SciML, high level interactions in solve automatically plug into those AD systems to allow for choosing advanced sensitivity analysis (derivative calculation) methods.\nMore theoretical details on these methods can be found at: https://docs.sciml.ai/latest/extras/sensitivity_math/.\nWhile these sensitivity analysis methods may seem complicated, using them is dead simple. Here is a version of the Lotka-Volterra model using adjoint sensitivities.\nAll we have to do is switch the AD backend to one of the adjoint-compatible backends (ReverseDiff or Zygote)! Notice that on this model adjoints are slower. This is because adjoints have a higher overhead on small parameter models and therefore we suggest using these methods only for models with around 100 parameters or more. For more details, see https://arxiv.org/abs/1812.01892.\n\nusing Zygote, SciMLSensitivity\n\n# Sample a single chain with 1000 samples using Zygote.\nsample(model, NUTS(;adtype=AutoZygote()), 1000; progress=false)\n\n┌ Warning: The AD backend ADTypes.AutoZygote() is not officially supported by DynamicPPL. Gradient calculations may still work, but compatibility is not guaranteed.\n└ @ DynamicPPL ~/.julia/packages/DynamicPPL/DxBhI/src/logdensityfunction.jl:123\n┌ Info: Found initial step size\n└ ϵ = 0.025\n\n\nChains MCMC chain (1000×17×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 430.32 seconds\nCompute duration = 430.32 seconds\nparameters = σ, α, β, γ, δ\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n σ 2.0300 0.1035 0.0048 457.6097 380.2479 1.0048 ⋯\n α 1.2099 0.1085 0.0118 105.4163 53.9933 1.0066 ⋯\n β 0.8661 0.1143 0.0070 266.7880 197.9273 1.0185 ⋯\n γ 1.1465 0.1455 0.0169 108.2718 70.5073 1.0000 ⋯\n δ 0.5764 0.0855 0.0102 113.7977 48.1457 1.0004 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n σ 1.8405 1.9544 2.0265 2.1018 2.2306\n α 0.9427 1.1438 1.2266 1.2893 1.3840\n β 0.6777 0.7881 0.8522 0.9337 1.1310\n γ 1.0044 1.0382 1.0978 1.2100 1.5385\n δ 0.4650 0.5153 0.5582 0.6149 0.8167\n\n\nIf desired, we can control the sensitivity analysis method that is used by providing the sensealg keyword argument to solve. Here we will not choose a sensealg and let it use the default choice:\n\n@model function fitlv_sensealg(data, prob)\n # Prior distributions.\n σ ~ InverseGamma(2, 3)\n α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)\n β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)\n γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)\n δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)\n\n # Simulate Lotka-Volterra model and use a specific algorithm for computing sensitivities.\n p = [α, β, γ, δ]\n predicted = solve(prob; p=p, saveat=0.1)\n\n # Observations.\n for i in 1:length(predicted)\n data[:, i] ~ MvNormal(predicted[i], σ^2 * I)\n end\n\n return nothing\nend;\n\nmodel_sensealg = fitlv_sensealg(odedata, prob)\n\n# Sample a single chain with 1000 samples using Zygote.\nsample(model_sensealg, NUTS(;adtype=AutoZygote()), 1000; progress=false)\n\n┌ Warning: The AD backend ADTypes.AutoZygote() is not officially supported by DynamicPPL. Gradient calculations may still work, but compatibility is not guaranteed.\n└ @ DynamicPPL ~/.julia/packages/DynamicPPL/DxBhI/src/logdensityfunction.jl:123\n┌ Info: Found initial step size\n└ ϵ = 0.05\n\n\nChains MCMC chain (1000×17×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 726.04 seconds\nCompute duration = 726.04 seconds\nparameters = σ, α, β, γ, δ\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n σ 0.7758 0.0400 0.0018 501.4662 506.3637 1.0095 ⋯\n α 1.4727 0.0437 0.0039 129.0846 268.9586 1.0110 ⋯\n β 1.0024 0.0421 0.0030 195.5532 293.7168 1.0040 ⋯\n γ 3.1105 0.1372 0.0116 146.3238 237.5844 1.0073 ⋯\n δ 1.0388 0.0499 0.0043 132.8099 278.3969 1.0107 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n σ 0.7037 0.7466 0.7732 0.8039 0.8608\n α 1.3834 1.4462 1.4709 1.4983 1.5592\n β 0.9244 0.9736 1.0015 1.0291 1.0848\n γ 2.8512 3.0230 3.1083 3.1890 3.3983\n δ 0.9456 1.0069 1.0387 1.0679 1.1433\n\n\nFor more examples of adjoint usage on large parameter models, consult the DiffEqFlux documentation.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-linear-regression/index.html", "href": "tutorials/bayesian-linear-regression/index.html", "title": "Bayesian Linear Regression", "section": "", "text": "Turing is powerful when applied to complex hierarchical models, but it can also be put to task at common statistical procedures, like linear regression. This tutorial covers how to implement a linear regression model in Turing.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Linear Regression" ] }, { "objectID": "tutorials/bayesian-linear-regression/index.html#set-up", "href": "tutorials/bayesian-linear-regression/index.html#set-up", "title": "Bayesian Linear Regression", "section": "Set Up", "text": "Set Up\nWe begin by importing all the necessary libraries.\n\n# Import Turing.\nusing Turing\n\n# Package for loading the data set.\nusing RDatasets\n\n# Package for visualization.\nusing StatsPlots\n\n# Functionality for splitting the data.\nusing MLUtils: splitobs\n\n# Functionality for constructing arrays with identical elements efficiently.\nusing FillArrays\n\n# Functionality for normalizing the data and evaluating the model predictions.\nusing StatsBase\n\n# Functionality for working with scaled identity matrices.\nusing LinearAlgebra\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(0);\n\n\nsetprogress!(false)\n\nWe will use the mtcars dataset from the RDatasets package. mtcars contains a variety of statistics on different car models, including their miles per gallon, number of cylinders, and horsepower, among others.\nWe want to know if we can construct a Bayesian linear regression model to predict the miles per gallon of a car, given the other statistics it has. Let us take a look at the data we have.\n\n# Load the dataset.\ndata = RDatasets.dataset(\"datasets\", \"mtcars\")\n\n# Show the first six rows of the dataset.\nfirst(data, 6)\n\n6×12 DataFrame\n\n\n\nRow\nModel\nMPG\nCyl\nDisp\nHP\nDRat\nWT\nQSec\nVS\nAM\nGear\nCarb\n\n\n\nString31\nFloat64\nInt64\nFloat64\nInt64\nFloat64\nFloat64\nFloat64\nInt64\nInt64\nInt64\nInt64\n\n\n\n\n1\nMazda RX4\n21.0\n6\n160.0\n110\n3.9\n2.62\n16.46\n0\n1\n4\n4\n\n\n2\nMazda RX4 Wag\n21.0\n6\n160.0\n110\n3.9\n2.875\n17.02\n0\n1\n4\n4\n\n\n3\nDatsun 710\n22.8\n4\n108.0\n93\n3.85\n2.32\n18.61\n1\n1\n4\n1\n\n\n4\nHornet 4 Drive\n21.4\n6\n258.0\n110\n3.08\n3.215\n19.44\n1\n0\n3\n1\n\n\n5\nHornet Sportabout\n18.7\n8\n360.0\n175\n3.15\n3.44\n17.02\n0\n0\n3\n2\n\n\n6\nValiant\n18.1\n6\n225.0\n105\n2.76\n3.46\n20.22\n1\n0\n3\n1\n\n\n\n\n\n\n\nsize(data)\n\n(32, 12)\n\n\nThe next step is to get our data ready for testing. We’ll split the mtcars dataset into two subsets, one for training our model and one for evaluating our model. Then, we separate the targets we want to learn (MPG, in this case) and standardize the datasets by subtracting each column’s means and dividing by the standard deviation of that column. The resulting data is not very familiar looking, but this standardization process helps the sampler converge far easier.\n\n# Remove the model column.\nselect!(data, Not(:Model))\n\n# Split our dataset 70%/30% into training/test sets.\ntrainset, testset = map(DataFrame, splitobs(data; at=0.7, shuffle=true))\n\n# Turing requires data in matrix form.\ntarget = :MPG\ntrain = Matrix(select(trainset, Not(target)))\ntest = Matrix(select(testset, Not(target)))\ntrain_target = trainset[:, target]\ntest_target = testset[:, target]\n\n# Standardize the features.\ndt_features = fit(ZScoreTransform, train; dims=1)\nStatsBase.transform!(dt_features, train)\nStatsBase.transform!(dt_features, test)\n\n# Standardize the targets.\ndt_targets = fit(ZScoreTransform, train_target)\nStatsBase.transform!(dt_targets, train_target)\nStatsBase.transform!(dt_targets, test_target);", "crumbs": [ "Get Started", "Tutorials", "Bayesian Linear Regression" ] }, { "objectID": "tutorials/bayesian-linear-regression/index.html#model-specification", "href": "tutorials/bayesian-linear-regression/index.html#model-specification", "title": "Bayesian Linear Regression", "section": "Model Specification", "text": "Model Specification\nIn a traditional frequentist model using OLS, our model might look like:\n\\[\n\\mathrm{MPG}_i = \\alpha + \\boldsymbol{\\beta}^\\mathsf{T}\\boldsymbol{X_i}\n\\]\nwhere \\(\\boldsymbol{\\beta}\\) is a vector of coefficients and \\(\\boldsymbol{X}\\) is a vector of inputs for observation \\(i\\). The Bayesian model we are more concerned with is the following:\n\\[\n\\mathrm{MPG}_i \\sim \\mathcal{N}(\\alpha + \\boldsymbol{\\beta}^\\mathsf{T}\\boldsymbol{X_i}, \\sigma^2)\n\\]\nwhere \\(\\alpha\\) is an intercept term common to all observations, \\(\\boldsymbol{\\beta}\\) is a coefficient vector, \\(\\boldsymbol{X_i}\\) is the observed data for car \\(i\\), and \\(\\sigma^2\\) is a common variance term.\nFor \\(\\sigma^2\\), we assign a prior of truncated(Normal(0, 100); lower=0). This is consistent with Andrew Gelman’s recommendations on noninformative priors for variance. The intercept term (\\(\\alpha\\)) is assumed to be normally distributed with a mean of zero and a variance of three. This represents our assumptions that miles per gallon can be explained mostly by our assorted variables, but a high variance term indicates our uncertainty about that. Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10. We do not know that our coefficients are different from zero, and we don’t know which ones are likely to be the most important, so the variance term is quite high. Lastly, each observation \\(y_i\\) is distributed according to the calculated mu term given by \\(\\alpha + \\boldsymbol{\\beta}^\\mathsf{T}\\boldsymbol{X_i}\\).\n\n# Bayesian linear regression.\n@model function linear_regression(x, y)\n # Set variance prior.\n σ² ~ truncated(Normal(0, 100); lower=0)\n\n # Set intercept prior.\n intercept ~ Normal(0, sqrt(3))\n\n # Set the priors on our coefficients.\n nfeatures = size(x, 2)\n coefficients ~ MvNormal(Zeros(nfeatures), 10.0 * I)\n\n # Calculate all the mu terms.\n mu = intercept .+ x * coefficients\n return y ~ MvNormal(mu, σ² * I)\nend\n\nlinear_regression (generic function with 2 methods)\n\n\nWith our model specified, we can call the sampler. We will use the No U-Turn Sampler (NUTS) here.\n\nmodel = linear_regression(train, train_target)\nchain = sample(model, NUTS(), 5_000)\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\nChains MCMC chain (5000×24×1 Array{Float64, 3}):\n\nIterations = 1001:1:6000\nNumber of chains = 1\nSamples per chain = 5000\nWall duration = 10.13 seconds\nCompute duration = 10.13 seconds\nparameters = σ², intercept, coefficients[1], coefficients[2], coefficients[3], coefficients[4], coefficients[5], coefficients[6], coefficients[7], coefficients[8], coefficients[9], coefficients[10]\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Flo ⋯\n\n σ² 0.4662 0.2829 0.0073 1460.3513 2094.2806 1. ⋯\n intercept 0.0024 0.1432 0.0020 5093.6144 3104.9433 1. ⋯\n coefficients[1] -0.5316 0.5746 0.0099 3424.9405 2993.0319 1. ⋯\n coefficients[2] 0.3356 0.7675 0.0177 1880.0951 2498.0553 1. ⋯\n coefficients[3] -0.4278 0.4996 0.0092 2961.6625 2402.6465 1. ⋯\n coefficients[4] 0.0102 0.2689 0.0044 3904.1255 2940.7091 1. ⋯\n coefficients[5] -0.2000 0.5486 0.0126 1889.6821 2144.1611 1. ⋯\n coefficients[6] -0.0483 0.4492 0.0090 2493.8472 2956.9039 0. ⋯\n coefficients[7] -0.1196 0.4424 0.0086 2672.7257 2819.7686 1. ⋯\n coefficients[8] 0.0970 0.3831 0.0081 2232.0571 2479.1029 0. ⋯\n coefficients[9] 0.2283 0.3678 0.0077 2290.1404 2697.5683 0. ⋯\n coefficients[10] -0.1518 0.4739 0.0119 1600.9388 2062.6620 1. ⋯\n 2 columns omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n σ² 0.1738 0.2905 0.3922 0.5580 1.1437\n intercept -0.2808 -0.0856 0.0014 0.0893 0.2943\n coefficients[1] -1.6414 -0.8921 -0.5381 -0.1692 0.6134\n coefficients[2] -1.2254 -0.1519 0.3427 0.8326 1.8339\n coefficients[3] -1.4291 -0.7389 -0.4283 -0.1269 0.5974\n coefficients[4] -0.5343 -0.1554 0.0111 0.1786 0.5452\n coefficients[5] -1.2764 -0.5586 -0.2077 0.1572 0.8833\n coefficients[6] -0.9542 -0.3386 -0.0462 0.2381 0.8566\n coefficients[7] -1.0148 -0.3921 -0.1244 0.1603 0.7402\n coefficients[8] -0.6788 -0.1393 0.0974 0.3368 0.8563\n coefficients[9] -0.4920 -0.0067 0.2261 0.4647 0.9622\n coefficients[10] -1.0839 -0.4498 -0.1464 0.1535 0.7634\n\n\nWe can also check the densities and traces of the parameters visually using the plot functionality.\n\nplot(chain)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nIt looks like all parameters have converged.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Linear Regression" ] }, { "objectID": "tutorials/bayesian-linear-regression/index.html#comparing-to-ols", "href": "tutorials/bayesian-linear-regression/index.html#comparing-to-ols", "title": "Bayesian Linear Regression", "section": "Comparing to OLS", "text": "Comparing to OLS\nA satisfactory test of our model is to evaluate how well it predicts. Importantly, we want to compare our model to existing tools like OLS. The code below uses the GLM.jl package to generate a traditional OLS multiple regression model on the same data as our probabilistic model.\n\n# Import the GLM package.\nusing GLM\n\n# Perform multiple regression OLS.\ntrain_with_intercept = hcat(ones(size(train, 1)), train)\nols = lm(train_with_intercept, train_target)\n\n# Compute predictions on the training data set and unstandardize them.\ntrain_prediction_ols = GLM.predict(ols)\nStatsBase.reconstruct!(dt_targets, train_prediction_ols)\n\n# Compute predictions on the test data set and unstandardize them.\ntest_with_intercept = hcat(ones(size(test, 1)), test)\ntest_prediction_ols = GLM.predict(ols, test_with_intercept)\nStatsBase.reconstruct!(dt_targets, test_prediction_ols);\n\nThe function below accepts a chain and an input matrix and calculates predictions. We use the samples of the model parameters in the chain starting with sample 200.\n\n# Make a prediction given an input vector.\nfunction prediction(chain, x)\n p = get_params(chain[200:end, :, :])\n targets = p.intercept' .+ x * reduce(hcat, p.coefficients)'\n return vec(mean(targets; dims=2))\nend\n\nprediction (generic function with 1 method)\n\n\nWhen we make predictions, we unstandardize them so they are more understandable.\n\n# Calculate the predictions for the training and testing sets and unstandardize them.\ntrain_prediction_bayes = prediction(chain, train)\nStatsBase.reconstruct!(dt_targets, train_prediction_bayes)\ntest_prediction_bayes = prediction(chain, test)\nStatsBase.reconstruct!(dt_targets, test_prediction_bayes)\n\n# Show the predictions on the test data set.\nDataFrame(; MPG=testset[!, target], Bayes=test_prediction_bayes, OLS=test_prediction_ols)\n\n10×3 DataFrame\n\n\n\nRow\nMPG\nBayes\nOLS\n\n\n\nFloat64\nFloat64\nFloat64\n\n\n\n\n1\n33.9\n26.7932\n26.804\n\n\n2\n21.0\n22.3845\n22.4669\n\n\n3\n21.4\n20.4432\n20.5666\n\n\n4\n26.0\n28.933\n28.9169\n\n\n5\n15.0\n11.6806\n11.584\n\n\n6\n10.4\n13.5769\n13.7006\n\n\n7\n30.4\n27.3246\n27.4661\n\n\n8\n10.4\n14.3243\n14.5346\n\n\n9\n18.7\n17.2586\n17.2897\n\n\n10\n17.3\n14.7579\n14.6084\n\n\n\n\n\n\nNow let’s evaluate the loss for each method, and each prediction set. We will use the mean squared error to evaluate loss, given by \\[\n\\mathrm{MSE} = \\frac{1}{n} \\sum_{i=1}^n {(y_i - \\hat{y_i})^2}\n\\] where \\(y_i\\) is the actual value (true MPG) and \\(\\hat{y_i}\\) is the predicted value using either OLS or Bayesian linear regression. A lower SSE indicates a closer fit to the data.\n\nprintln(\n \"Training set:\",\n \"\\n\\tBayes loss: \",\n msd(train_prediction_bayes, trainset[!, target]),\n \"\\n\\tOLS loss: \",\n msd(train_prediction_ols, trainset[!, target]),\n)\n\nprintln(\n \"Test set:\",\n \"\\n\\tBayes loss: \",\n msd(test_prediction_bayes, testset[!, target]),\n \"\\n\\tOLS loss: \",\n msd(test_prediction_ols, testset[!, target]),\n)\n\nTraining set:\n Bayes loss: 4.269843318379247\n OLS loss: 4.264328587912453\nTest set:\n Bayes loss: 11.645119457361819\n OLS loss: 11.920700014054287\n\n\nAs we can see above, OLS and our Bayesian model fit our training and test data set about the same.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Linear Regression" ] }, { "objectID": "tutorials/gaussian-processes-introduction/index.html", "href": "tutorials/gaussian-processes-introduction/index.html", "title": "Gaussian Processes: Introduction", "section": "", "text": "JuliaGPs packages integrate well with Turing.jl because they implement the Distributions.jl interface. You should be able to understand what is going on in this tutorial if you know what a GP is. For a more in-depth understanding of the JuliaGPs functionality used here, please consult the JuliaGPs docs.\nIn this tutorial, we will model the putting dataset discussed in Chapter 21 of Bayesian Data Analysis. The dataset comprises the result of measuring how often a golfer successfully gets the ball in the hole, depending on how far away from it they are. The goal of inference is to estimate the probability of any given shot being successful at a given distance.\n\nLet’s download the data and take a look at it:\n\nusing CSV, DataFrames\n\ndf = CSV.read(\"golf.dat\", DataFrame; delim=' ', ignorerepeated=true)\ndf[1:5, :]\n\n5×3 DataFrame\n\n\n\nRow\ndistance\nn\ny\n\n\n\nInt64\nInt64\nInt64\n\n\n\n\n1\n2\n1443\n1346\n\n\n2\n3\n694\n577\n\n\n3\n4\n455\n337\n\n\n4\n5\n353\n208\n\n\n5\n6\n272\n149\n\n\n\n\n\n\nWe’ve printed the first 5 rows of the dataset (which comprises only 19 rows in total). Observe it has three columns:\n\ndistance – how far away from the hole. I’ll refer to distance as d throughout the rest of this tutorial\nn – how many shots were taken from a given distance\ny – how many shots were successful from a given distance\n\nWe will use a Binomial model for the data, whose success probability is parametrised by a transformation of a GP. Something along the lines of: \\[\n\\begin{aligned}\nf & \\sim \\operatorname{GP}(0, k) \\\\\ny_j \\mid f(d_j) & \\sim \\operatorname{Binomial}(n_j, g(f(d_j))) \\\\\ng(x) & := \\frac{1}{1 + e^{-x}}\n\\end{aligned}\n\\]\nTo do this, let’s define our Turing.jl model:\n\nusing AbstractGPs, LogExpFunctions, Turing\n\n@model function putting_model(d, n; jitter=1e-4)\n v ~ Gamma(2, 1)\n l ~ Gamma(4, 1)\n f = GP(v * with_lengthscale(SEKernel(), l))\n f_latent ~ f(d, jitter)\n y ~ product_distribution(Binomial.(n, logistic.(f_latent)))\n return (fx=f(d, jitter), f_latent=f_latent, y=y)\nend\n\nputting_model (generic function with 2 methods)\n\n\nWe first define an AbstractGPs.GP, which represents a distribution over functions, and is entirely separate from Turing.jl. We place a prior over its variance v and length-scale l. f(d, jitter) constructs the multivariate Gaussian comprising the random variables in f whose indices are in d (plus a bit of independent Gaussian noise with variance jitter – see the docs for more details). f(d, jitter) has the type AbstractMvNormal, and is the bit of AbstractGPs.jl that implements the Distributions.jl interface, so it’s legal to put it on the right-hand side of a ~. From this you should deduce that f_latent is distributed according to a multivariate Gaussian. The remaining lines comprise standard Turing.jl code that is encountered in other tutorials and Turing documentation.\nBefore performing inference, we might want to inspect the prior that our model places over the data, to see whether there is anything obviously wrong. These kinds of prior predictive checks are straightforward to perform using Turing.jl, since it is possible to sample from the prior easily by just calling the model:\n\nm = putting_model(Float64.(df.distance), df.n)\nm().y\n\n19-element Vector{Int64}:\n 594\n 270\n 154\n 103\n 70\n 49\n 39\n 21\n 14\n 20\n 24\n 31\n 50\n 54\n 103\n 112\n 106\n 74\n 70\n\n\nWe make use of this to see what kinds of datasets we simulate from the prior:\n\nusing Plots\n\nfunction plot_data(d, n, y, xticks, yticks)\n ylims = (0, round(maximum(n), RoundUp; sigdigits=2))\n margin = -0.5 * Plots.mm\n plt = plot(; xticks=xticks, yticks=yticks, ylims=ylims, margin=margin, grid=false)\n bar!(plt, d, n; color=:red, label=\"\", alpha=0.5)\n bar!(plt, d, y; label=\"\", color=:blue, alpha=0.7)\n return plt\nend\n\n# Construct model and run some prior predictive checks.\nm = putting_model(Float64.(df.distance), df.n)\nhists = map(1:20) do j\n xticks = j > 15 ? :auto : nothing\n yticks = rem(j, 5) == 1 ? :auto : nothing\n return plot_data(df.distance, df.n, m().y, xticks, yticks)\nend\nplot(hists...; layout=(4, 5))\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nIn this case, the only prior knowledge I have is that the proportion of successful shots ought to decrease monotonically as the distance from the hole increases, which should show up in the data as the blue lines generally go down as we move from left to right on each graph. Unfortunately, there is not a simple way to enforce monotonicity in the samples from a GP, and we can see this in some of the plots above, so we must hope that we have enough data to ensure that this relationship holds approximately under the posterior. In any case, you can judge for yourself whether you think this is the most useful visualisation that we can perform – if you think there is something better to look at, please let us know!\nMoving on, we generate samples from the posterior using the default NUTS sampler. We’ll make use of ReverseDiff.jl, as it has better performance than ForwardDiff.jl on this example. See Turing.jl’s docs on Automatic Differentiation for more info.\n\nusing Random, ReverseDiff\n\nm_post = m | (y=df.y,)\nchn = sample(Xoshiro(123456), m_post, NUTS(; adtype=AutoReverseDiff()), 1_000, progress=false)\n\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\nChains MCMC chain (1000×33×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 147.92 seconds\nCompute duration = 147.92 seconds\nparameters = v, l, f_latent[1], f_latent[2], f_latent[3], f_latent[4], f_latent[5], f_latent[6], f_latent[7], f_latent[8], f_latent[9], f_latent[10], f_latent[11], f_latent[12], f_latent[13], f_latent[14], f_latent[15], f_latent[16], f_latent[17], f_latent[18], f_latent[19]\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n v 2.8019 1.2781 0.0604 447.1359 695.8370 0.9999 ⋯\n l 3.5776 0.9968 0.0975 99.0035 104.5329 1.0025 ⋯\n f_latent[1] 2.5443 0.1044 0.0038 759.1492 580.5865 0.9994 ⋯\n f_latent[2] 1.7056 0.0746 0.0024 978.9541 584.5681 0.9996 ⋯\n f_latent[3] 0.9769 0.0866 0.0050 296.1646 356.9508 1.0025 ⋯\n f_latent[4] 0.4787 0.0801 0.0043 348.8566 647.5708 1.0003 ⋯\n f_latent[5] 0.1949 0.0769 0.0026 886.3665 726.8519 1.0003 ⋯\n f_latent[6] -0.0085 0.0995 0.0067 226.0623 424.0574 1.0005 ⋯\n f_latent[7] -0.2471 0.0840 0.0035 562.4772 570.8933 1.0078 ⋯\n f_latent[8] -0.5136 0.0921 0.0046 470.3300 336.7384 1.0073 ⋯\n f_latent[9] -0.7265 0.1052 0.0056 386.3241 403.0347 1.0054 ⋯\n f_latent[10] -0.8624 0.0989 0.0040 621.3075 598.6442 1.0053 ⋯\n f_latent[11] -0.9501 0.0949 0.0031 991.0415 582.7107 1.0004 ⋯\n f_latent[12] -1.0407 0.1094 0.0048 516.4460 326.2720 0.9998 ⋯\n f_latent[13] -1.1915 0.1211 0.0060 412.8273 382.3954 1.0004 ⋯\n f_latent[14] -1.4158 0.1179 0.0043 749.2159 602.7765 1.0005 ⋯\n f_latent[15] -1.6236 0.1210 0.0067 361.9174 367.8756 1.0010 ⋯\n ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱\n 1 column and 4 rows omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n v 1.1006 1.8502 2.5868 3.5028 5.9022\n l 1.9488 2.9584 3.3903 4.0312 6.1431\n f_latent[1] 2.3442 2.4750 2.5412 2.6113 2.7543\n f_latent[2] 1.5614 1.6566 1.7076 1.7549 1.8505\n f_latent[3] 0.8022 0.9188 0.9770 1.0375 1.1456\n f_latent[4] 0.3165 0.4253 0.4790 0.5323 0.6334\n f_latent[5] 0.0505 0.1442 0.1939 0.2438 0.3491\n f_latent[6] -0.1944 -0.0762 -0.0138 0.0593 0.1860\n f_latent[7] -0.4010 -0.3060 -0.2498 -0.1930 -0.0765\n f_latent[8] -0.7105 -0.5716 -0.5058 -0.4509 -0.3537\n f_latent[9] -0.9519 -0.7975 -0.7173 -0.6467 -0.5409\n f_latent[10] -1.0627 -0.9294 -0.8638 -0.7926 -0.6789\n f_latent[11] -1.1261 -1.0213 -0.9486 -0.8875 -0.7594\n f_latent[12] -1.2554 -1.1134 -1.0396 -0.9683 -0.8256\n f_latent[13] -1.4260 -1.2797 -1.1950 -1.1095 -0.9385\n f_latent[14] -1.6462 -1.4917 -1.4186 -1.3336 -1.1888\n f_latent[15] -1.8718 -1.7031 -1.6169 -1.5314 -1.4072\n ⋮ ⋮ ⋮ ⋮ ⋮ ⋮\n 4 rows omitted\n\n\nWe can use these samples and the posterior function from AbstractGPs to sample from the posterior probability of success at any distance we choose:\n\nd_pred = 1:0.2:21\nsamples = map(generated_quantities(m_post, chn)[1:10:end]) do x\n return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4)))\nend\np = plot()\nplot!(d_pred, reduce(hcat, samples); label=\"\", color=:blue, alpha=0.2)\nscatter!(df.distance, df.y ./ df.n; label=\"\", color=:red)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can see that the general trend is indeed down as the distance from the hole increases, and that if we move away from the data, the posterior uncertainty quickly inflates. This suggests that the model is probably going to do a reasonable job of interpolating between observed data, but less good a job at extrapolating to larger distances.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Tutorials", "Gaussian Processes: Introduction" ] }, { "objectID": "tutorials/variational-inference/index.html", "href": "tutorials/variational-inference/index.html", "title": "Variational Inference", "section": "", "text": "In this post we’ll have a look at what’s know as variational inference (VI), a family of approximate Bayesian inference methods, and how to use it in Turing.jl as an alternative to other approaches such as MCMC. In particular, we will focus on one of the more standard VI methods called Automatic Differentation Variational Inference (ADVI).\nHere we will focus on how to use VI in Turing and not much on the theory underlying VI. If you are interested in understanding the mathematics you can checkout our write-up or any other resource online (there a lot of great ones).\nUsing VI in Turing.jl is very straight forward. If model denotes a definition of a Turing.Model, performing VI is as simple as\nm = model(data...) # instantiate model on the data\nq = vi(m, vi_alg) # perform VI on `m` using the VI method `vi_alg`, which returns a `VariationalPosterior`\nThus it’s no more work than standard MCMC sampling in Turing.\nTo get a bit more into what we can do with vi, we’ll first have a look at a simple example and then we’ll reproduce the tutorial on Bayesian linear regression using VI instead of MCMC. Finally we’ll look at some of the different parameters of vi and how you for example can use your own custom variational family.\nWe first import the packages to be used:\nusing Random\nusing Turing\nusing Turing: Variational\nusing Bijectors: bijector\nusing StatsPlots, Measures\n\nRandom.seed!(42);", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#simple-example-normal-gamma-conjugate-model", "href": "tutorials/variational-inference/index.html#simple-example-normal-gamma-conjugate-model", "title": "Variational Inference", "section": "Simple example: Normal-Gamma conjugate model", "text": "Simple example: Normal-Gamma conjugate model\nThe Normal-(Inverse)Gamma conjugate model is defined by the following generative process\n\\[\\begin{align}\ns &\\sim \\mathrm{InverseGamma}(2, 3) \\\\\nm &\\sim \\mathcal{N}(0, s) \\\\\nx_i &\\overset{\\text{i.i.d.}}{=} \\mathcal{N}(m, s), \\quad i = 1, \\dots, n\n\\end{align}\\]\nRecall that conjugate refers to the fact that we can obtain a closed-form expression for the posterior. Of course one wouldn’t use something like variational inference for a conjugate model, but it’s useful as a simple demonstration as we can compare the result to the true posterior.\nFirst we generate some synthetic data, define the Turing.Model and instantiate the model on the data:\n\n# generate data\nx = randn(2000);\n\n\n@model function model(x)\n s ~ InverseGamma(2, 3)\n m ~ Normal(0.0, sqrt(s))\n for i in 1:length(x)\n x[i] ~ Normal(m, sqrt(s))\n end\nend;\n\n\n# Instantiate model\nm = model(x);\n\nNow we’ll produce some samples from the posterior using a MCMC method, which in constrast to VI is guaranteed to converge to the exact posterior (as the number of samples go to infinity).\nWe’ll produce 10 000 samples with 200 steps used for adaptation and a target acceptance rate of 0.65\nIf you don’t understand what “adaptation” or “target acceptance rate” refers to, all you really need to know is that NUTS is known to be one of the most accurate and efficient samplers (when applicable) while requiring little to no hand-tuning to work well.\n\nsetprogress!(false)\n\n\nsamples_nuts = sample(m, NUTS(), 10_000);\n\n┌ Info: Found initial step size\n└ ϵ = 0.0125\n\n\nNow let’s try VI. The most important function you need to now about to do VI in Turing is vi:\n\n@doc(Variational.vi)\n\nvi(model, alg::VariationalInference)\nvi(model, alg::VariationalInference, q::VariationalPosterior)\nvi(model, alg::VariationalInference, getq::Function, θ::AbstractArray)\nConstructs the variational posterior from the model and performs the optimization following the configuration of the given VariationalInference instance.\nArguments\n\nmodel: Turing.Model or Function z ↦ log p(x, z) where x denotes the observations\n\nalg: the VI algorithm used\n\nq: a VariationalPosterior for which it is assumed a specialized implementation of the variational objective used exists.\n\ngetq: function taking parameters θ as input and returns a VariationalPosterior\n\nθ: only required if getq is used, in which case it is the initial parameters for the variational posterior\n\n\n\n\n\n\n\nAdditionally, you can pass\n\nan initial variational posterior q, for which we assume there exists a implementation of update(::typeof(q), θ::AbstractVector) returning an updated posterior q with parameters θ.\na function mapping \\(\\theta \\mapsto q_{\\theta}\\) (denoted above getq) together with initial parameters θ. This provides more flexibility in the types of variational families that we can use, and can sometimes be slightly more convenient for quick and rough work.\n\nBy default, i.e. when calling vi(m, advi), Turing use a mean-field approximation with a multivariate normal as the base-distribution. Mean-field refers to the fact that we assume all the latent variables to be independent. This the “standard” ADVI approach; see Automatic Differentiation Variational Inference (2016) for more. In Turing, one can obtain such a mean-field approximation by calling Variational.meanfield(model) for which there exists an internal implementation for update:\n\n@doc(Variational.meanfield)\n\nmeanfield([rng, ]model::Model)\nCreates a mean-field approximation with multivariate normal as underlying distribution.\n\n\n\n\n\nCurrently the only implementation of VariationalInference available is ADVI, which is very convenient and applicable as long as your Model is differentiable with respect to the variational parameters, that is, the parameters of your variational distribution, e.g. mean and variance in the mean-field approximation.\n\n@doc(Variational.ADVI)\n\nstruct ADVI{AD} <: AdvancedVI.VariationalInference{AD}\nAutomatic Differentiation Variational Inference (ADVI) with automatic differentiation backend AD.\nFields\n\nsamples_per_step::Int64: Number of samples used to estimate the ELBO in each optimization step.\n\nmax_iters::Int64: Maximum number of gradient steps.\n\nadtype::Any: AD backend used for automatic differentiation.\n\n\n\n\n\n\n\nTo perform VI on the model m using 10 samples for gradient estimation and taking 1000 gradient steps is then as simple as:\n\n# ADVI\nadvi = ADVI(10, 1000)\nq = vi(m, advi);\n\n┌ Info: [ADVI] Should only be seen once: optimizer created for θ\n└ objectid(θ) = 0x1ae73fcb0b544130\n\n\nUnfortunately, for such a small problem Turing’s new NUTS sampler is so efficient now that it’s not that much more efficient to use ADVI. So, so very unfortunate…\nWith that being said, this is not the case in general. For very complex models we’ll later find that ADVI produces very reasonable results in a much shorter time than NUTS.\nAnd one significant advantage of using vi is that we can sample from the resulting q with ease. In fact, the result of the vi call is a TransformedDistribution from Bijectors.jl, and it implements the Distributions.jl interface for a Distribution:\n\nq isa MultivariateDistribution\n\ntrue\n\n\nThis means that we can call rand to sample from the variational posterior q\n\nhistogram(rand(q, 1_000)[1, :])\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nand logpdf to compute the log-probability\n\nlogpdf(q, rand(q))\n\n5.138524009192519\n\n\nLet’s check the first and second moments of the data to see how our approximation compares to the point-estimates form the data:\n\nvar(x), mean(x)\n\n(1.0149828942385808, -0.03802639687322374)\n\n\n\n(mean(rand(q, 1000); dims=2)...,)\n\n(1.0132825041280484, -0.02915414669811955)\n\n\nThat’s pretty close! But we’re Bayesian so we’re not interested in just matching the mean. Let’s instead look the actual density q.\nFor that we need samples:\n\nsamples = rand(q, 10000);\nsize(samples)\n\n(2, 10000)\n\n\n\np1 = histogram(\n samples[1, :]; bins=100, normed=true, alpha=0.2, color=:blue, label=\"\", ylabel=\"density\"\n)\ndensity!(samples[1, :]; label=\"s (ADVI)\", color=:blue, linewidth=2)\ndensity!(samples_nuts, :s; label=\"s (NUTS)\", color=:green, linewidth=2)\nvline!([var(x)]; label=\"s (data)\", color=:black)\nvline!([mean(samples[1, :])]; color=:blue, label=\"\")\n\np2 = histogram(\n samples[2, :]; bins=100, normed=true, alpha=0.2, color=:blue, label=\"\", ylabel=\"density\"\n)\ndensity!(samples[2, :]; label=\"m (ADVI)\", color=:blue, linewidth=2)\ndensity!(samples_nuts, :m; label=\"m (NUTS)\", color=:green, linewidth=2)\nvline!([mean(x)]; color=:black, label=\"m (data)\")\nvline!([mean(samples[2, :])]; color=:blue, label=\"\")\n\nplot(p1, p2; layout=(2, 1), size=(900, 500), legend=true)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFor this particular Model, we can in fact obtain the posterior of the latent variables in closed form. This allows us to compare both NUTS and ADVI to the true posterior \\(p(s, m \\mid x_1, \\ldots, x_n)\\).\nThe code below is just work to get the marginals \\(p(s \\mid x_1, \\ldots, x_n)\\) and \\(p(m \\mid x_1, \\ldots, x_n)\\). Feel free to skip it.\n\n# closed form computation of the Normal-inverse-gamma posterior\n# based on \"Conjugate Bayesian analysis of the Gaussian distribution\" by Murphy\nfunction posterior(μ₀::Real, κ₀::Real, α₀::Real, β₀::Real, x::AbstractVector{<:Real})\n # Compute summary statistics\n n = length(x)\n x̄ = mean(x)\n sum_of_squares = sum(xi -> (xi - x̄)^2, x)\n\n # Compute parameters of the posterior\n κₙ = κ₀ + n\n μₙ = (κ₀ * μ₀ + n * x̄) / κₙ\n αₙ = α₀ + n / 2\n βₙ = β₀ + (sum_of_squares + n * κ₀ / κₙ * (x̄ - μ₀)^2) / 2\n\n return μₙ, κₙ, αₙ, βₙ\nend\nμₙ, κₙ, αₙ, βₙ = posterior(0.0, 1.0, 2.0, 3.0, x)\n\n# marginal distribution of σ²\n# cf. Eq. (90) in \"Conjugate Bayesian analysis of the Gaussian distribution\" by Murphy\np_σ² = InverseGamma(αₙ, βₙ)\np_σ²_pdf = z -> pdf(p_σ², z)\n\n# marginal of μ\n# Eq. (91) in \"Conjugate Bayesian analysis of the Gaussian distribution\" by Murphy\np_μ = μₙ + sqrt(βₙ / (αₙ * κₙ)) * TDist(2 * αₙ)\np_μ_pdf = z -> pdf(p_μ, z)\n\n# posterior plots\np1 = plot()\nhistogram!(samples[1, :]; bins=100, normed=true, alpha=0.2, color=:blue, label=\"\")\ndensity!(samples[1, :]; label=\"s (ADVI)\", color=:blue)\ndensity!(samples_nuts, :s; label=\"s (NUTS)\", color=:green)\nvline!([mean(samples[1, :])]; linewidth=1.5, color=:blue, label=\"\")\nplot!(range(0.75, 1.35; length=1_001), p_σ²_pdf; label=\"s (posterior)\", color=:red)\nvline!([var(x)]; label=\"s (data)\", linewidth=1.5, color=:black, alpha=0.7)\nxlims!(0.75, 1.35)\n\np2 = plot()\nhistogram!(samples[2, :]; bins=100, normed=true, alpha=0.2, color=:blue, label=\"\")\ndensity!(samples[2, :]; label=\"m (ADVI)\", color=:blue)\ndensity!(samples_nuts, :m; label=\"m (NUTS)\", color=:green)\nvline!([mean(samples[2, :])]; linewidth=1.5, color=:blue, label=\"\")\nplot!(range(-0.25, 0.25; length=1_001), p_μ_pdf; label=\"m (posterior)\", color=:red)\nvline!([mean(x)]; label=\"m (data)\", linewidth=1.5, color=:black, alpha=0.7)\nxlims!(-0.25, 0.25)\n\nplot(p1, p2; layout=(2, 1), size=(900, 500))", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#bayesian-linear-regression-example-using-advi", "href": "tutorials/variational-inference/index.html#bayesian-linear-regression-example-using-advi", "title": "Variational Inference", "section": "Bayesian linear regression example using ADVI", "text": "Bayesian linear regression example using ADVI\nThis is simply a duplication of the tutorial on Bayesian linear regression (much of the code is directly lifted), but now with the addition of an approximate posterior obtained using ADVI.\nAs we’ll see, there is really no additional work required to apply variational inference to a more complex Model.\n\nRandom.seed!(1);\n\n\nusing FillArrays\nusing RDatasets\n\nusing LinearAlgebra\n\n\n# Import the \"Default\" dataset.\ndata = RDatasets.dataset(\"datasets\", \"mtcars\");\n\n# Show the first six rows of the dataset.\nfirst(data, 6)\n\n6×12 DataFrame\n\n\n\nRow\nModel\nMPG\nCyl\nDisp\nHP\nDRat\nWT\nQSec\nVS\nAM\nGear\nCarb\n\n\n\nString31\nFloat64\nInt64\nFloat64\nInt64\nFloat64\nFloat64\nFloat64\nInt64\nInt64\nInt64\nInt64\n\n\n\n\n1\nMazda RX4\n21.0\n6\n160.0\n110\n3.9\n2.62\n16.46\n0\n1\n4\n4\n\n\n2\nMazda RX4 Wag\n21.0\n6\n160.0\n110\n3.9\n2.875\n17.02\n0\n1\n4\n4\n\n\n3\nDatsun 710\n22.8\n4\n108.0\n93\n3.85\n2.32\n18.61\n1\n1\n4\n1\n\n\n4\nHornet 4 Drive\n21.4\n6\n258.0\n110\n3.08\n3.215\n19.44\n1\n0\n3\n1\n\n\n5\nHornet Sportabout\n18.7\n8\n360.0\n175\n3.15\n3.44\n17.02\n0\n0\n3\n2\n\n\n6\nValiant\n18.1\n6\n225.0\n105\n2.76\n3.46\n20.22\n1\n0\n3\n1\n\n\n\n\n\n\n\n# Function to split samples.\nfunction split_data(df, at=0.70)\n r = size(df, 1)\n index = Int(round(r * at))\n train = df[1:index, :]\n test = df[(index + 1):end, :]\n return train, test\nend\n\n# A handy helper function to rescale our dataset.\nfunction standardize(x)\n return (x .- mean(x; dims=1)) ./ std(x; dims=1)\nend\n\nfunction standardize(x, orig)\n return (x .- mean(orig; dims=1)) ./ std(orig; dims=1)\nend\n\n# Another helper function to unstandardize our datasets.\nfunction unstandardize(x, orig)\n return x .* std(orig; dims=1) .+ mean(orig; dims=1)\nend\n\nfunction unstandardize(x, mean_train, std_train)\n return x .* std_train .+ mean_train\nend\n\nunstandardize (generic function with 2 methods)\n\n\n\n# Remove the model column.\nselect!(data, Not(:Model))\n\n# Split our dataset 70%/30% into training/test sets.\ntrain, test = split_data(data, 0.7)\ntrain_unstandardized = copy(train)\n\n# Standardize both datasets.\nstd_train = standardize(Matrix(train))\nstd_test = standardize(Matrix(test), Matrix(train))\n\n# Save dataframe versions of our dataset.\ntrain_cut = DataFrame(std_train, names(data))\ntest_cut = DataFrame(std_test, names(data))\n\n# Create our labels. These are the values we are trying to predict.\ntrain_label = train_cut[:, :MPG]\ntest_label = test_cut[:, :MPG]\n\n# Get the list of columns to keep.\nremove_names = filter(x -> !in(x, [\"MPG\"]), names(data))\n\n# Filter the test and train sets.\ntrain = Matrix(train_cut[:, remove_names]);\ntest = Matrix(test_cut[:, remove_names]);\n\n\n# Bayesian linear regression.\n@model function linear_regression(x, y, n_obs, n_vars, ::Type{T}=Vector{Float64}) where {T}\n # Set variance prior.\n σ² ~ truncated(Normal(0, 100); lower=0)\n\n # Set intercept prior.\n intercept ~ Normal(0, 3)\n\n # Set the priors on our coefficients.\n coefficients ~ MvNormal(Zeros(n_vars), 10.0 * I)\n\n # Calculate all the mu terms.\n mu = intercept .+ x * coefficients\n return y ~ MvNormal(mu, σ² * I)\nend;\n\n\nn_obs, n_vars = size(train)\nm = linear_regression(train, train_label, n_obs, n_vars);", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#performing-vi", "href": "tutorials/variational-inference/index.html#performing-vi", "title": "Variational Inference", "section": "Performing VI", "text": "Performing VI\nFirst we define the initial variational distribution, or, equivalently, the family of distributions to consider. We’re going to use the same mean-field approximation as Turing will use by default when we call vi(m, advi), which we obtain by calling Variational.meanfield. This returns a TransformedDistribution with a TuringDiagMvNormal as the underlying distribution and the transformation mapping from the reals to the domain of the latent variables.\n\nq0 = Variational.meanfield(m)\ntypeof(q0)\n\nMultivariateTransformed{TuringDiagMvNormal{Vector{Float64}, Vector{Float64}}, Stacked{Vector{Any}, Vector{UnitRange{Int64}}}} (alias for Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64, 1}, Array{Float64, 1}}, Bijectors.Stacked{Array{Any, 1}, Array{UnitRange{Int64}, 1}}, ArrayLikeVariate{1}})\n\n\n\nadvi = ADVI(10, 10_000)\n\nADVI{AutoForwardDiff{nothing, Nothing}}(10, 10000, AutoForwardDiff())\n\n\nTuring also provides a couple of different optimizers:\n\nTruncatedADAGrad (default)\nDecayedADAGrad as these are well-suited for problems with high-variance stochastic objectives, which is usually what the ELBO ends up being at different times in our optimization process.\n\nWith that being said, thanks to Requires.jl, if we add a using Flux prior to using Turing we can also make use of all the optimizers in Flux, e.g. ADAM, without any additional changes to your code! For example:\n\nusing Flux, Turing\nusing Turing.Variational\n\nvi(m, advi; optimizer=Flux.ADAM())\n\njust works.\nFor this problem we’ll use the DecayedADAGrad from Turing:\n\nopt = Variational.DecayedADAGrad(1e-2, 1.1, 0.9)\n\nAdvancedVI.DecayedADAGrad(0.01, 1.1, 0.9, IdDict{Any, Any}())\n\n\n\nq = vi(m, advi, q0; optimizer=opt)\ntypeof(q)\n\nMultivariateTransformed{TuringDiagMvNormal{Vector{Float64}, Vector{Float64}}, Stacked{Vector{Any}, Vector{UnitRange{Int64}}}} (alias for Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64, 1}, Array{Float64, 1}}, Bijectors.Stacked{Array{Any, 1}, Array{UnitRange{Int64}, 1}}, ArrayLikeVariate{1}})\n\n\nNote: as mentioned before, we internally define a update(q::TransformedDistribution{<:TuringDiagMvNormal}, θ::AbstractVector) method which takes in the current variational approximation q together with new parameters z and returns the new variational approximation. This is required so that we can actually update the Distribution object after each optimization step.\nAlternatively, we can instead provide the mapping \\(\\theta \\mapsto q_{\\theta}\\) directly together with initial parameters using the signature vi(m, advi, getq, θ_init) as mentioned earlier. We’ll see an explicit example of this later on!\nTo compute statistics for our approximation we need samples:\n\nz = rand(q, 10_000);\n\nNow we can for example look at the average\n\navg = vec(mean(z; dims=2))\n\n12-element Vector{Float64}:\n 0.21679879777882793\n -0.0019900869604398967\n 0.3741409817644457\n -0.10952738792415241\n -0.07725729815729414\n 0.6160368962339698\n 0.005810919495507477\n 0.0802319135468359\n -0.06893439782898787\n 0.11827476354229562\n 0.1902688030072634\n -0.6104172868796531\n\n\nThe vector has the same ordering as the model, e.g. in this case σ² has index 1, intercept has index 2 and coefficients has indices 3:12. If you forget or you might want to do something programmatically with the result, you can obtain the sym → indices mapping as follows:\n\n_, sym2range = bijector(m, Val(true));\nsym2range\n\n(intercept = UnitRange{Int64}[2:2], σ² = UnitRange{Int64}[1:1], coefficients = UnitRange{Int64}[3:12])\n\n\nFor example, we can check the sample distribution and mean value of σ²:\n\nhistogram(z[1, :])\navg[union(sym2range[:σ²]...)]\n\n1-element Vector{Float64}:\n 0.21679879777882793\n\n\n\navg[union(sym2range[:intercept]...)]\n\n1-element Vector{Float64}:\n -0.0019900869604398967\n\n\n\navg[union(sym2range[:coefficients]...)]\n\n10-element Vector{Float64}:\n 0.3741409817644457\n -0.10952738792415241\n -0.07725729815729414\n 0.6160368962339698\n 0.005810919495507477\n 0.0802319135468359\n -0.06893439782898787\n 0.11827476354229562\n 0.1902688030072634\n -0.6104172868796531\n\n\nNote: as you can see, this is slightly awkward to work with at the moment. We’ll soon add a better way of dealing with this.\nWith a bit of work (this will be much easier in the future), we can also visualize the approximate marginals of the different variables, similar to plot(chain):\n\nfunction plot_variational_marginals(z, sym2range)\n ps = []\n\n for (i, sym) in enumerate(keys(sym2range))\n indices = union(sym2range[sym]...) # <= array of ranges\n if sum(length.(indices)) > 1\n offset = 1\n for r in indices\n p = density(\n z[r, :];\n title=\"$(sym)[$offset]\",\n titlefontsize=10,\n label=\"\",\n ylabel=\"Density\",\n margin=1.5mm,\n )\n push!(ps, p)\n offset += 1\n end\n else\n p = density(\n z[first(indices), :];\n title=\"$(sym)\",\n titlefontsize=10,\n label=\"\",\n ylabel=\"Density\",\n margin=1.5mm,\n )\n push!(ps, p)\n end\n end\n\n return plot(ps...; layout=(length(ps), 1), size=(500, 2000), margin=4.0mm)\nend\n\nplot_variational_marginals (generic function with 1 method)\n\n\n\nplot_variational_marginals(z, sym2range)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAnd let’s compare this to using the NUTS sampler:\n\nchain = sample(m, NUTS(), 10_000);\n\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\n\nplot(chain; margin=12.00mm)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nvi_mean = vec(mean(z; dims=2))[[\n union(sym2range[:coefficients]...)...,\n union(sym2range[:intercept]...)...,\n union(sym2range[:σ²]...)...,\n]]\n\n12-element Vector{Float64}:\n 0.3741409817644457\n -0.10952738792415241\n -0.07725729815729414\n 0.6160368962339698\n 0.005810919495507477\n 0.0802319135468359\n -0.06893439782898787\n 0.11827476354229562\n 0.1902688030072634\n -0.6104172868796531\n -0.0019900869604398967\n 0.21679879777882793\n\n\n\nmcmc_mean = mean(chain, names(chain, :parameters))[:, 2]\n\n12-element Vector{Float64}:\n 0.24418264307597065\n 0.0002645052150999318\n 0.3735236697132521\n -0.10595725783949556\n -0.08067868939791037\n 0.6090348281790653\n 0.0012670225588758636\n 0.07791221355377874\n -0.07053269307786576\n 0.125170536004104\n 0.18960893606979076\n -0.6157292272006891\n\n\n\nplot(mcmc_mean; xticks=1:1:length(mcmc_mean), linestyle=:dot, label=\"NUTS\")\nplot!(vi_mean; linestyle=:dot, label=\"VI\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nOne thing we can look at is simply the squared error between the means:\n\nsum(abs2, mcmc_mean .- vi_mean)\n\n2.386012782478003\n\n\nThat looks pretty good! But let’s see how the predictive distributions looks for the two.", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#prediction", "href": "tutorials/variational-inference/index.html#prediction", "title": "Variational Inference", "section": "Prediction", "text": "Prediction\nSimilarily to the linear regression tutorial, we’re going to compare to multivariate ordinary linear regression using the GLM package:\n\n# Import the GLM package.\nusing GLM\n\n# Perform multivariate OLS.\nols = lm(\n @formula(MPG ~ Cyl + Disp + HP + DRat + WT + QSec + VS + AM + Gear + Carb), train_cut\n)\n\n# Store our predictions in the original dataframe.\ntrain_cut.OLSPrediction = unstandardize(GLM.predict(ols), train_unstandardized.MPG)\ntest_cut.OLSPrediction = unstandardize(GLM.predict(ols, test_cut), train_unstandardized.MPG);\n\n\n# Make a prediction given an input vector, using mean parameter values from a chain.\nfunction prediction_chain(chain, x)\n p = get_params(chain)\n α = mean(p.intercept)\n β = collect(mean.(p.coefficients))\n return α .+ x * β\nend\n\nprediction_chain (generic function with 1 method)\n\n\n\n# Make a prediction using samples from the variational posterior given an input vector.\nfunction prediction(samples::AbstractVector, sym2ranges, x)\n α = mean(samples[union(sym2ranges[:intercept]...)])\n β = vec(mean(samples[union(sym2ranges[:coefficients]...)]; dims=2))\n return α .+ x * β\nend\n\nfunction prediction(samples::AbstractMatrix, sym2ranges, x)\n α = mean(samples[union(sym2ranges[:intercept]...), :])\n β = vec(mean(samples[union(sym2ranges[:coefficients]...), :]; dims=2))\n return α .+ x * β\nend\n\nprediction (generic function with 2 methods)\n\n\n\n# Unstandardize the dependent variable.\ntrain_cut.MPG = unstandardize(train_cut.MPG, train_unstandardized.MPG)\ntest_cut.MPG = unstandardize(test_cut.MPG, train_unstandardized.MPG);\n\n\n# Show the first side rows of the modified dataframe.\nfirst(test_cut, 6)\n\n6×12 DataFrame\n\n\n\nRow\nMPG\nCyl\nDisp\nHP\nDRat\nWT\nQSec\nVS\nAM\nGear\nCarb\nOLSPrediction\n\n\n\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\n\n\n\n\n1\n15.2\n1.04746\n0.565102\n0.258882\n-0.652405\n0.0714991\n-0.716725\n-0.977008\n-0.598293\n-0.891883\n-0.469126\n19.8583\n\n\n2\n13.3\n1.04746\n0.929057\n1.90345\n0.380435\n0.465717\n-1.90403\n-0.977008\n-0.598293\n-0.891883\n1.11869\n16.0462\n\n\n3\n19.2\n1.04746\n1.32466\n0.691663\n-0.777058\n0.470584\n-0.873777\n-0.977008\n-0.598293\n-0.891883\n-0.469126\n18.5746\n\n\n4\n27.3\n-1.25696\n-1.21511\n-1.19526\n1.0037\n-1.38857\n0.288403\n0.977008\n1.59545\n1.07026\n-1.26303\n29.3233\n\n\n5\n26.0\n-1.25696\n-0.888346\n-0.762482\n1.62697\n-1.18903\n-1.09365\n-0.977008\n1.59545\n3.0324\n-0.469126\n30.7731\n\n\n6\n30.4\n-1.25696\n-1.08773\n-0.381634\n0.451665\n-1.79933\n-0.968007\n0.977008\n1.59545\n3.0324\n-0.469126\n25.2892\n\n\n\n\n\n\n\nz = rand(q, 10_000);\n\n\n# Calculate the predictions for the training and testing sets using the samples `z` from variational posterior\ntrain_cut.VIPredictions = unstandardize(\n prediction(z, sym2range, train), train_unstandardized.MPG\n)\ntest_cut.VIPredictions = unstandardize(\n prediction(z, sym2range, test), train_unstandardized.MPG\n)\n\ntrain_cut.BayesPredictions = unstandardize(\n prediction_chain(chain, train), train_unstandardized.MPG\n)\ntest_cut.BayesPredictions = unstandardize(\n prediction_chain(chain, test), train_unstandardized.MPG\n);\n\n\nvi_loss1 = mean((train_cut.VIPredictions - train_cut.MPG) .^ 2)\nbayes_loss1 = mean((train_cut.BayesPredictions - train_cut.MPG) .^ 2)\nols_loss1 = mean((train_cut.OLSPrediction - train_cut.MPG) .^ 2)\n\nvi_loss2 = mean((test_cut.VIPredictions - test_cut.MPG) .^ 2)\nbayes_loss2 = mean((test_cut.BayesPredictions - test_cut.MPG) .^ 2)\nols_loss2 = mean((test_cut.OLSPrediction - test_cut.MPG) .^ 2)\n\nprintln(\"Training set:\n VI loss: $vi_loss1\n Bayes loss: $bayes_loss1\n OLS loss: $ols_loss1\nTest set:\n VI loss: $vi_loss2\n Bayes loss: $bayes_loss2\n OLS loss: $ols_loss2\")\n\nTraining set:\n VI loss: 3.0741620727506263\n Bayes loss: 3.0719371505086626\n OLS loss: 3.0709261248930093\nTest set:\n VI loss: 26.113432547456\n Bayes loss: 26.253308626144253\n OLS loss: 27.09481307076057\n\n\nInterestingly the squared difference between true- and mean-prediction on the test-set is actually better for the mean-field variational posterior than for the “true” posterior obtained by MCMC sampling using NUTS. But, as Bayesians, we know that the mean doesn’t tell the entire story. One quick check is to look at the mean predictions ± standard deviation of the two different approaches:\n\nz = rand(q, 1000);\npreds = mapreduce(hcat, eachcol(z)) do zi\n return unstandardize(prediction(zi, sym2range, test), train_unstandardized.MPG)\nend\n\nscatter(\n 1:size(test, 1),\n mean(preds; dims=2);\n yerr=std(preds; dims=2),\n label=\"prediction (mean ± std)\",\n size=(900, 500),\n markersize=8,\n)\nscatter!(1:size(test, 1), unstandardize(test_label, train_unstandardized.MPG); label=\"true\")\nxaxis!(1:size(test, 1))\nylims!(10, 40)\ntitle!(\"Mean-field ADVI (Normal)\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\npreds = mapreduce(hcat, 1:5:size(chain, 1)) do i\n return unstandardize(prediction_chain(chain[i], test), train_unstandardized.MPG)\nend\n\nscatter(\n 1:size(test, 1),\n mean(preds; dims=2);\n yerr=std(preds; dims=2),\n label=\"prediction (mean ± std)\",\n size=(900, 500),\n markersize=8,\n)\nscatter!(1:size(test, 1), unstandardize(test_label, train_unstandardized.MPG); label=\"true\")\nxaxis!(1:size(test, 1))\nylims!(10, 40)\ntitle!(\"MCMC (NUTS)\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nIndeed we see that the MCMC approach generally provides better uncertainty estimates than the mean-field ADVI approach! Good. So all the work we’ve done to make MCMC fast isn’t for nothing.", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#alternative-provide-parameter-to-distribution-instead-of-q-with-update-implemented", "href": "tutorials/variational-inference/index.html#alternative-provide-parameter-to-distribution-instead-of-q-with-update-implemented", "title": "Variational Inference", "section": "Alternative: provide parameter-to-distribution instead of \\(q\\) with update implemented", "text": "Alternative: provide parameter-to-distribution instead of \\(q\\) with update implemented\nAs mentioned earlier, it’s also possible to just provide the mapping \\(\\theta \\mapsto q_{\\theta}\\) rather than the variational family / initial variational posterior q, i.e. use the interface vi(m, advi, getq, θ_init) where getq is the mapping \\(\\theta \\mapsto q_{\\theta}\\)\nIn this section we’re going to construct a mean-field approximation to the model by hand using a composition ofShift and Scale from Bijectors.jl togheter with a standard multivariate Gaussian as the base distribution.\n\nusing Bijectors\n\n\nusing Bijectors: Scale, Shift\n\n\nd = length(q)\nbase_dist = Turing.DistributionsAD.TuringDiagMvNormal(zeros(d), ones(d))\n\nDistributionsAD.TuringDiagMvNormal{Vector{Float64}, Vector{Float64}}(\nm: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]\nσ: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]\n)\n\n\nbijector(model::Turing.Model) is defined by Turing, and will return a bijector which takes you from the space of the latent variables to the real space. In this particular case, this is a mapping ((0, ∞) × ℝ × ℝ¹⁰) → ℝ¹². We’re interested in using a normal distribution as a base-distribution and transform samples to the latent space, thus we need the inverse mapping from the reals to the latent space:\n\nto_constrained = inverse(bijector(m));\n\n\nfunction getq(θ)\n d = length(θ) ÷ 2\n A = @inbounds θ[1:d]\n b = @inbounds θ[(d + 1):(2 * d)]\n\n b = to_constrained ∘ Shift(b) ∘ Scale(exp.(A))\n\n return transformed(base_dist, b)\nend\n\ngetq (generic function with 1 method)\n\n\n\nq_mf_normal = vi(m, advi, getq, randn(2 * d));\n\n┌ Info: [ADVI] Should only be seen once: optimizer created for θ\n└ objectid(θ) = 0x6cb16dab34583b97\n\n\n\np1 = plot_variational_marginals(rand(q_mf_normal, 10_000), sym2range) # MvDiagNormal + Affine transformation + to_constrained\np2 = plot_variational_marginals(rand(q, 10_000), sym2range) # Turing.meanfield(m)\n\nplot(p1, p2; layout=(1, 2), size=(800, 2000))\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAs expected, the fits look pretty much identical.\nBut using this interface it becomes trivial to go beyond the mean-field assumption we made for the variational posterior, as we’ll see in the next section.\n\nRelaxing the mean-field assumption\nHere we’ll instead consider the variational family to be a full non-diagonal multivariate Gaussian. As in the previous section we’ll implement this by transforming a standard multivariate Gaussian using Scale and Shift, but now Scale will instead be using a lower-triangular matrix (representing the Cholesky of the covariance matrix of a multivariate normal) in contrast to the diagonal matrix we used in for the mean-field approximate posterior.\n\n# Using `ComponentArrays.jl` together with `UnPack.jl` makes our lives much easier.\nusing ComponentArrays, UnPack\n\n\nproto_arr = ComponentArray(; L=zeros(d, d), b=zeros(d))\nproto_axes = getaxes(proto_arr)\nnum_params = length(proto_arr)\n\nfunction getq(θ)\n L, b = begin\n @unpack L, b = ComponentArray(θ, proto_axes)\n LowerTriangular(L), b\n end\n # For this to represent a covariance matrix we need to ensure that the diagonal is positive.\n # We can enforce this by zeroing out the diagonal and then adding back the diagonal exponentiated.\n D = Diagonal(diag(L))\n A = L - D + exp(D) # exp for Diagonal is the same as exponentiating only the diagonal entries\n\n b = to_constrained ∘ Shift(b) ∘ Scale(A)\n\n return transformed(base_dist, b)\nend\n\ngetq (generic function with 1 method)\n\n\n\nadvi = ADVI(10, 20_000)\n\nADVI{AutoForwardDiff{nothing, Nothing}}(10, 20000, AutoForwardDiff())\n\n\n\nq_full_normal = vi(\n m, advi, getq, randn(num_params); optimizer=Variational.DecayedADAGrad(1e-2)\n);\n\nLet’s have a look at the learned covariance matrix:\n\nA = q_full_normal.transform.inner.a\n\n12×12 LowerTriangular{Float64, Matrix{Float64}}:\n 0.327556 ⋅ ⋅ … ⋅ ⋅ ⋅ \n -0.00442516 0.10072 ⋅ ⋅ ⋅ ⋅ \n -0.00791923 0.00688464 0.426388 ⋅ ⋅ ⋅ \n 0.00284573 0.00407961 -0.044823 ⋅ ⋅ ⋅ \n -0.00624689 -0.00138634 -0.0548034 ⋅ ⋅ ⋅ \n -0.00153762 0.00334011 0.0956288 … ⋅ ⋅ ⋅ \n -0.00709976 0.00976431 0.0208722 ⋅ ⋅ ⋅ \n -0.00614631 -0.00743286 0.0959543 ⋅ ⋅ ⋅ \n 0.00147981 0.00175452 0.0853957 ⋅ ⋅ ⋅ \n -0.00502135 -0.0185893 0.0625881 0.143211 ⋅ ⋅ \n 0.00667361 -0.000441113 0.0402142 … -0.105319 0.10063 ⋅ \n -0.00571477 0.0179101 -0.0725691 0.0200098 0.00477431 0.102392\n\n\n\nheatmap(cov(A * A'))\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\nzs = rand(q_full_normal, 10_000);\n\n\np1 = plot_variational_marginals(rand(q_mf_normal, 10_000), sym2range)\np2 = plot_variational_marginals(rand(q_full_normal, 10_000), sym2range)\n\nplot(p1, p2; layout=(1, 2), size=(800, 2000))\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSo it seems like the “full” ADVI approach, i.e. no mean-field assumption, obtain the same modes as the mean-field approach but with greater uncertainty for some of the coefficients. This\n\n# Unfortunately, it seems like this has quite a high variance which is likely to be due to numerical instability,\n# so we consider a larger number of samples. If we get a couple of outliers due to numerical issues,\n# these kind affect the mean prediction greatly.\nz = rand(q_full_normal, 10_000);\n\n\ntrain_cut.VIFullPredictions = unstandardize(\n prediction(z, sym2range, train), train_unstandardized.MPG\n)\ntest_cut.VIFullPredictions = unstandardize(\n prediction(z, sym2range, test), train_unstandardized.MPG\n);\n\n\nvi_loss1 = mean((train_cut.VIPredictions - train_cut.MPG) .^ 2)\nvifull_loss1 = mean((train_cut.VIFullPredictions - train_cut.MPG) .^ 2)\nbayes_loss1 = mean((train_cut.BayesPredictions - train_cut.MPG) .^ 2)\nols_loss1 = mean((train_cut.OLSPrediction - train_cut.MPG) .^ 2)\n\nvi_loss2 = mean((test_cut.VIPredictions - test_cut.MPG) .^ 2)\nvifull_loss2 = mean((test_cut.VIFullPredictions - test_cut.MPG) .^ 2)\nbayes_loss2 = mean((test_cut.BayesPredictions - test_cut.MPG) .^ 2)\nols_loss2 = mean((test_cut.OLSPrediction - test_cut.MPG) .^ 2)\n\nprintln(\"Training set:\n VI loss: $vi_loss1\n Bayes loss: $bayes_loss1\n OLS loss: $ols_loss1\nTest set:\n VI loss: $vi_loss2\n Bayes loss: $bayes_loss2\n OLS loss: $ols_loss2\")\n\nTraining set:\n VI loss: 3.0741620727506263\n Bayes loss: 3.0719371505086626\n OLS loss: 3.0709261248930093\nTest set:\n VI loss: 26.113432547456\n Bayes loss: 26.253308626144253\n OLS loss: 27.09481307076057\n\n\n\nz = rand(q_mf_normal, 1000);\npreds = mapreduce(hcat, eachcol(z)) do zi\n return unstandardize(prediction(zi, sym2range, test), train_unstandardized.MPG)\nend\n\np1 = scatter(\n 1:size(test, 1),\n mean(preds; dims=2);\n yerr=std(preds; dims=2),\n label=\"prediction (mean ± std)\",\n size=(900, 500),\n markersize=8,\n)\nscatter!(1:size(test, 1), unstandardize(test_label, train_unstandardized.MPG); label=\"true\")\nxaxis!(1:size(test, 1))\nylims!(10, 40)\ntitle!(\"Mean-field ADVI (Normal)\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nz = rand(q_full_normal, 1000);\npreds = mapreduce(hcat, eachcol(z)) do zi\n return unstandardize(prediction(zi, sym2range, test), train_unstandardized.MPG)\nend\n\np2 = scatter(\n 1:size(test, 1),\n mean(preds; dims=2);\n yerr=std(preds; dims=2),\n label=\"prediction (mean ± std)\",\n size=(900, 500),\n markersize=8,\n)\nscatter!(1:size(test, 1), unstandardize(test_label, train_unstandardized.MPG); label=\"true\")\nxaxis!(1:size(test, 1))\nylims!(10, 40)\ntitle!(\"Full ADVI (Normal)\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\npreds = mapreduce(hcat, 1:5:size(chain, 1)) do i\n return unstandardize(prediction_chain(chain[i], test), train_unstandardized.MPG)\nend\n\np3 = scatter(\n 1:size(test, 1),\n mean(preds; dims=2);\n yerr=std(preds; dims=2),\n label=\"prediction (mean ± std)\",\n size=(900, 500),\n markersize=8,\n)\nscatter!(1:size(test, 1), unstandardize(test_label, train_unstandardized.MPG); label=\"true\")\nxaxis!(1:size(test, 1))\nylims!(10, 40)\ntitle!(\"MCMC (NUTS)\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nplot(p1, p2, p3; layout=(1, 3), size=(900, 250), label=\"\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHere we actually see that indeed both the full ADVI and the MCMC approaches does a much better job of quantifying the uncertainty of predictions for never-before-seen samples, with full ADVI seemingly underestimating the variance slightly compared to MCMC.\nSo now you know how to do perform VI on your Turing.jl model! Great isn’t it?", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/hidden-markov-models/index.html", "href": "tutorials/hidden-markov-models/index.html", "title": "Hidden Markov Models", "section": "", "text": "This tutorial illustrates training Bayesian Hidden Markov Models (HMM) using Turing. The main goals are learning the transition matrix, emission parameter, and hidden states. For a more rigorous academic overview on Hidden Markov Models, see An introduction to Hidden Markov Models and Bayesian Networks (Ghahramani, 2001).\nIn this tutorial, we assume there are \\(k\\) discrete hidden states; the observations are continuous and normally distributed - centered around the hidden states. This assumption reduces the number of parameters to be estimated in the emission matrix.\nLet’s load the libraries we’ll need. We also set a random seed (for reproducibility) and the automatic differentiation backend to forward mode (more here on why this is useful).\n# Load libraries.\nusing Turing, StatsPlots, Random\n\n# Set a random seed and use the forward_diff AD mode.\nRandom.seed!(12345678);", "crumbs": [ "Get Started", "Tutorials", "Hidden Markov Models" ] }, { "objectID": "tutorials/hidden-markov-models/index.html#simple-state-detection", "href": "tutorials/hidden-markov-models/index.html#simple-state-detection", "title": "Hidden Markov Models", "section": "Simple State Detection", "text": "Simple State Detection\nIn this example, we’ll use something where the states and emission parameters are straightforward.\n\n# Define the emission parameter.\ny = [\n 1.0,\n 1.0,\n 1.0,\n 1.0,\n 1.0,\n 1.0,\n 2.0,\n 2.0,\n 2.0,\n 2.0,\n 2.0,\n 2.0,\n 3.0,\n 3.0,\n 3.0,\n 3.0,\n 3.0,\n 3.0,\n 3.0,\n 2.0,\n 2.0,\n 2.0,\n 2.0,\n 1.0,\n 1.0,\n 1.0,\n 1.0,\n 1.0,\n 1.0,\n 1.0,\n];\nN = length(y);\nK = 3;\n\n# Plot the data we just made.\nplot(y; xlim=(0, 30), ylim=(-1, 5), size=(500, 250))\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can see that we have three states, one for each height of the plot (1, 2, 3). This height is also our emission parameter, so state one produces a value of one, state two produces a value of two, and so on.\nUltimately, we would like to understand three major parameters:\n\nThe transition matrix. This is a matrix that assigns a probability of switching from one state to any other state, including the state that we are already in.\nThe emission matrix, which describes a typical value emitted by some state. In the plot above, the emission parameter for state one is simply one.\nThe state sequence is our understanding of what state we were actually in when we observed some data. This is very important in more sophisticated HMM models, where the emission value does not equal our state.\n\nWith this in mind, let’s set up our model. We are going to use some of our knowledge as modelers to provide additional information about our system. This takes the form of the prior on our emission parameter.\n\\[\nm_i \\sim \\mathrm{Normal}(i, 0.5) \\quad \\text{where} \\quad m = \\{1,2,3\\}\n\\]\nSimply put, this says that we expect state one to emit values in a Normally distributed manner, where the mean of each state’s emissions is that state’s value. The variance of 0.5 helps the model converge more quickly — consider the case where we have a variance of 1 or 2. In this case, the likelihood of observing a 2 when we are in state 1 is actually quite high, as it is within a standard deviation of the true emission value. Applying the prior that we are likely to be tightly centered around the mean prevents our model from being too confused about the state that is generating our observations.\nThe priors on our transition matrix are noninformative, using T[i] ~ Dirichlet(ones(K)/K). The Dirichlet prior used in this way assumes that the state is likely to change to any other state with equal probability. As we’ll see, this transition matrix prior will be overwritten as we observe data.\n\n# Turing model definition.\n@model function BayesHmm(y, K)\n # Get observation length.\n N = length(y)\n\n # State sequence.\n s = zeros(Int, N)\n\n # Emission matrix.\n m = Vector(undef, K)\n\n # Transition matrix.\n T = Vector{Vector}(undef, K)\n\n # Assign distributions to each element\n # of the transition matrix and the\n # emission matrix.\n for i in 1:K\n T[i] ~ Dirichlet(ones(K) / K)\n m[i] ~ Normal(i, 0.5)\n end\n\n # Observe each point of the input.\n s[1] ~ Categorical(K)\n y[1] ~ Normal(m[s[1]], 0.1)\n\n for i in 2:N\n s[i] ~ Categorical(vec(T[s[i - 1]]))\n y[i] ~ Normal(m[s[i]], 0.1)\n end\nend;\n\nWe will use a combination of two samplers (HMC and Particle Gibbs) by passing them to the Gibbs sampler. The Gibbs sampler allows for compositional inference, where we can utilize different samplers on different parameters.\nIn this case, we use HMC for m and T, representing the emission and transition matrices respectively. We use the Particle Gibbs sampler for s, the state sequence. You may wonder why it is that we are not assigning s to the HMC sampler, and why it is that we need compositional Gibbs sampling at all.\nThe parameter s is not a continuous variable. It is a vector of integers, and thus Hamiltonian methods like HMC and NUTS won’t work correctly. Gibbs allows us to apply the right tools to the best effect. If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use different automatic differentiation backends for each parameter space.\nTime to run our sampler.\n\nsetprogress!(false)\n\n\ng = Gibbs((:m, :T) => HMC(0.01, 50), :s => PG(120))\nchn = sample(BayesHmm(y, 3), g, 1000);\n\nLet’s see how well our chain performed. Ordinarily, using display(chn) would be a good first step, but we have generated a lot of parameters here (s[1], s[2], m[1], and so on). It’s a bit easier to show how our model performed graphically.\nThe code below generates an animation showing the graph of the data above, and the data our model generates in each sample.\n\n# Extract our m and s parameters from the chain.\nm_set = MCMCChains.group(chn, :m).value\ns_set = MCMCChains.group(chn, :s).value\n\n# Iterate through the MCMC samples.\nNs = 1:length(chn)\n\n# Make an animation.\nanimation = @gif for i in Ns\n m = m_set[i, :]\n s = Int.(s_set[i, :])\n emissions = m[s]\n\n p = plot(\n y;\n chn=:red,\n size=(500, 250),\n xlabel=\"Time\",\n ylabel=\"State\",\n legend=:topright,\n label=\"True data\",\n xlim=(0, 30),\n ylim=(-1, 5),\n )\n plot!(emissions; color=:blue, label=\"Sample $i\")\nend every 3\n\n[ Info: Saved animation to /tmp/jl_8vg3iBSpUl.gif\n\n\n\n\n\nLooks like our model did a pretty good job, but we should also check to make sure our chain converges. A quick check is to examine whether the diagonal (representing the probability of remaining in the current state) of the transition matrix appears to be stationary. The code below extracts the diagonal and shows a traceplot of each persistence probability.\n\n# Index the chain with the persistence probabilities.\nsubchain = chn[[\"T[1][1]\", \"T[2][2]\", \"T[3][3]\"]]\n\nplot(subchain; seriestype=:traceplot, title=\"Persistence Probability\", legend=false)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nA cursory examination of the traceplot above indicates that all three chains converged to something resembling stationary. We can use the diagnostic functions provided by MCMCChains to engage in some more formal tests, like the Heidelberg and Welch diagnostic:\n\nheideldiag(MCMCChains.group(chn, :T))[1]\n\nHeidelberger and Welch diagnostic - Chain 1\n parameters burnin stationarity pvalue mean halfwidth tes ⋯\n Symbol Int64 Bool Float64 Float64 Float64 Boo ⋯\n\n T[1][1] 100.0000 1.0000 0.2667 0.7583 0.0223 1.000 ⋯\n T[1][2] 0.0000 1.0000 0.7908 0.1030 0.0152 0.000 ⋯\n T[1][3] 100.0000 1.0000 0.1364 0.1398 0.0245 0.000 ⋯\n T[2][1] 0.0000 1.0000 0.6433 0.1182 0.0204 0.000 ⋯\n T[2][2] 0.0000 1.0000 0.5017 0.8527 0.0251 1.000 ⋯\n T[2][3] 0.0000 1.0000 0.7079 0.0291 0.0122 0.000 ⋯\n T[3][1] 0.0000 1.0000 0.6959 0.1457 0.0309 0.000 ⋯\n T[3][2] 400.0000 1.0000 0.2112 0.0354 0.0248 0.000 ⋯\n T[3][3] 0.0000 1.0000 0.2364 0.8034 0.0333 1.000 ⋯\n 1 column omitted\n\n\nThe p-values on the test suggest that we cannot reject the hypothesis that the observed sequence comes from a stationary distribution, so we can be reasonably confident that our transition matrix has converged to something reasonable.", "crumbs": [ "Get Started", "Tutorials", "Hidden Markov Models" ] }, { "objectID": "developers/contributing/index.html", "href": "developers/contributing/index.html", "title": "Contributing", "section": "", "text": "Turing is an open-source project and is hosted on GitHub. We welcome contributions from the community in all forms large or small: bug reports, feature implementations, code contributions, or improvements to documentation or infrastructure are all extremely valuable. We would also very much appreciate examples of models written using Turing.\n\nHow to get involved\nOur outstanding issues are tabulated on our issue tracker. Closing one of these may involve implementing new features, fixing bugs, or writing example models.\nYou can also join the #turing channel on the Julia Slack and say hello!\nIf you are new to open-source software, please see GitHub’s introduction or Julia’s contribution guide on using version control for collaboration.\n\n\nDocumentation\nEach of the packages in the Turing ecosystem (see Libraries) has its own documentation, which is typically found in the docs folder of the corresponding package. For example, the source code for DynamicPPL’s documentation can be found in its repository.\nThe documentation for Turing.jl itself consists of the tutorials that you see on this website, and is built from the separate docs repository. None of the documentation is generated from the main Turing.jl repository; in particular, the API that Turing exports does not currently form part of the documentation.\nOther sections of the website (anything that isn’t a package, or a tutorial) – for example, the list of libraries – is built from the turinglang.github.io repository.\n\n\nTests\nTuring, like most software libraries, has a test suite. You can run the whole suite by running julia --project=. from the root of the Turing repository, and then running\nimport Pkg; Pkg.test(\"Turing\")\nThe test suite subdivides into files in the test folder, and you can run only some of them using commands like\nimport Pkg; Pkg.test(\"Turing\"; test_args=[\"optim\", \"hmc\", \"--skip\", \"ext\"])\nThis one would run all files with “optim” or “hmc” in their path, such as test/optimisation/Optimisation.jl, but not files with “ext” in their path. Alternatively, you can set these arguments as command line arguments when you run Julia\njulia --project=. -e 'import Pkg; Pkg.test(; test_args=ARGS)' -- optim hmc --skip ext\nOr otherwise, set the global ARGS variable, and call include(\"test/runtests.jl\").\n\n\nStyle Guide\nTuring has a style guide, described below. Reviewing it before making a pull request is not strictly necessary, but you may be asked to change portions of your code to conform with the style guide before it is merged.\nMost Turing code follows Blue: a Style Guide for Julia. These conventions were created from a variety of sources including Python’s PEP8, Julia’s Notes for Contributors, and Julia’s Style Guide.\n\nSynopsis\n\nUse 4 spaces per indentation level, no tabs.\nTry to adhere to a 92 character line length limit.\nUse upper camel case convention for modules and types.\nUse lower case with underscores for method names (note: Julia code likes to use lower case without underscores).\nComments are good, try to explain the intentions of the code.\nUse whitespace to make the code more readable.\nNo whitespace at the end of a line (trailing whitespace).\nAvoid padding brackets with spaces. ex. Int64(value) preferred over Int64( value ).\n\n\n\nA Word on Consistency\nWhen adhering to the Blue style, it’s important to realize that these are guidelines, not rules. This is stated best in the PEP8:\n\nA style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is most important.\n\n\nBut most importantly: know when to be inconsistent – sometimes the style guide just doesn’t apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don’t hesitate to ask!\n\n\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Developers", "Contributing" ] }, { "objectID": "developers/compiler/model-manual/index.html", "href": "developers/compiler/model-manual/index.html", "title": "Manually Defining a Model", "section": "", "text": "Traditionally, models in Turing are defined using the @model macro:\n\nusing Turing\n\n@model function gdemo(x)\n # Set priors.\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n\n # Observe each value of x.\n x .~ Normal(m, sqrt(s²))\n\n return nothing\nend\n\nmodel = gdemo([1.5, 2.0])\n\nDynamicPPL.Model{typeof(gdemo), (:x,), (), (), Tuple{Vector{Float64}}, Tuple{}, DynamicPPL.DefaultContext}(gdemo, (x = [1.5, 2.0],), NamedTuple(), DynamicPPL.DefaultContext())\n\n\nThe @model macro accepts a function definition and rewrites it such that call of the function generates a Model struct for use by the sampler.\nHowever, models can be constructed by hand without the use of a macro. Taking the gdemo model above as an example, the macro-based definition can be implemented also (a bit less generally) with the macro-free version\n\nusing DynamicPPL\n\n# Create the model function.\nfunction gdemo2(model, varinfo, context, x)\n # Assume s² has an InverseGamma distribution.\n s², varinfo = DynamicPPL.tilde_assume!!(\n context, InverseGamma(2, 3), Turing.@varname(s²), varinfo\n )\n\n # Assume m has a Normal distribution.\n m, varinfo = DynamicPPL.tilde_assume!!(\n context, Normal(0, sqrt(s²)), Turing.@varname(m), varinfo\n )\n\n # Observe each value of x[i] according to a Normal distribution.\n for i in eachindex(x)\n _retval, varinfo = DynamicPPL.tilde_observe!!(\n context, Normal(m, sqrt(s²)), x[i], Turing.@varname(x[i]), varinfo\n )\n end\n\n # The final return statement should comprise both the original return\n # value and the updated varinfo.\n return nothing, varinfo\nend\ngdemo2(x) = Turing.Model(gdemo2, (; x))\n\n# Instantiate a Model object with our data variables.\nmodel2 = gdemo2([1.5, 2.0])\n\nModel{typeof(gdemo2), (:x,), (), (), Tuple{Vector{Float64}}, Tuple{}, DefaultContext}(gdemo2, (x = [1.5, 2.0],), NamedTuple(), DefaultContext())\n\n\nWe can sample from this model in the same way:\n\nchain = sample(model2, NUTS(), 1000; progress=false)\n\n┌ Info: Found initial step size\n└ ϵ = 0.80625\n\n\nChains MCMC chain (1000×14×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 6.82 seconds\nCompute duration = 6.82 seconds\nparameters = s², m\ninternals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s² 1.9300 1.5977 0.0809 484.2915 498.0668 1.0039 ⋯\n m 1.1994 0.7943 0.0415 408.7467 254.5830 1.0069 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.6071 1.0266 1.4945 2.2515 6.3826\n m -0.4355 0.7312 1.1992 1.6738 2.8152\n\n\nThe subsequent pages in this section will show how the @model macro does this behind-the-scenes.\n\n\n\n Back to top", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "Manually Defining a Model" ] }, { "objectID": "developers/compiler/design-overview/index.html", "href": "developers/compiler/design-overview/index.html", "title": "Turing Compiler Design (Outdated)", "section": "", "text": "In this section, the current design of Turing’s model “compiler” is described which enables Turing to perform various types of Bayesian inference without changing the model definition. The “compiler” is essentially just a macro that rewrites the user’s model definition to a function that generates a Model struct that Julia’s dispatch can operate on and that Julia’s compiler can successfully do type inference on for efficient machine code generation.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "Turing Compiler Design (Outdated)" ] }, { "objectID": "developers/compiler/design-overview/index.html#the-model", "href": "developers/compiler/design-overview/index.html#the-model", "title": "Turing Compiler Design (Outdated)", "section": "The model", "text": "The model\nA model::Model is a callable struct that one can sample from by calling\n\n(model::Model)([rng, varinfo, sampler, context])\n\nwhere rng is a random number generator (default: Random.default_rng()), varinfo is a data structure that stores information about the random variables (default: DynamicPPL.VarInfo()), sampler is a sampling algorithm (default: DynamicPPL.SampleFromPrior()), and context is a sampling context that can, e.g., modify how the log probability is accumulated (default: DynamicPPL.DefaultContext()).\nSampling resets the log joint probability of varinfo and increases the evaluation counter of sampler. If context is a LikelihoodContext, only the log likelihood of D will be accumulated, whereas with PriorContext only the log prior probability of P is. With the DefaultContext the log joint probability of both P and D is accumulated.\nThe Model struct contains the four internal fields f, args, defaults, and context. When model::Model is called, then the internal function model.f is called as model.f(rng, varinfo, sampler, context, model.args...) (for multithreaded sampling, instead of varinfo a threadsafe wrapper is passed to model.f). The positional and keyword arguments that were passed to the user-defined model function when the model was created are saved as a NamedTuple in model.args. The default values of the positional and keyword arguments of the user-defined model functions, if any, are saved as a NamedTuple in model.defaults. They are used for constructing model instances with different arguments by the logprob and prob string macros. The context variable sets an evaluation context that can be used to control for instance whether log probabilities should be evaluated for the prior, likelihood, or joint probability. By default it is set to evaluate the log joint.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "Turing Compiler Design (Outdated)" ] }, { "objectID": "developers/compiler/design-overview/index.html#step-1-break-up-the-model-definition", "href": "developers/compiler/design-overview/index.html#step-1-break-up-the-model-definition", "title": "Turing Compiler Design (Outdated)", "section": "Step 1: Break up the model definition", "text": "Step 1: Break up the model definition\nFirst, the @model macro breaks up the user-provided function definition using DynamicPPL.build_model_info. This function returns a dictionary consisting of:\n\nallargs_exprs: The expressions of the positional and keyword arguments, without default values.\nallargs_syms: The names of the positional and keyword arguments, e.g., [:x, :y, :TV] above.\nallargs_namedtuple: An expression that constructs a NamedTuple of the positional and keyword arguments, e.g., :((x = x, y = y, TV = TV)) above.\ndefaults_namedtuple: An expression that constructs a NamedTuple of the default positional and keyword arguments, if any, e.g., :((x = missing, y = 1, TV = Vector{Float64})) above.\nmodeldef: A dictionary with the name, arguments, and function body of the model definition, as returned by MacroTools.splitdef.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "Turing Compiler Design (Outdated)" ] }, { "objectID": "developers/compiler/design-overview/index.html#step-2-generate-the-body-of-the-internal-model-function", "href": "developers/compiler/design-overview/index.html#step-2-generate-the-body-of-the-internal-model-function", "title": "Turing Compiler Design (Outdated)", "section": "Step 2: Generate the body of the internal model function", "text": "Step 2: Generate the body of the internal model function\nIn a second step, DynamicPPL.generate_mainbody generates the main part of the transformed function body using the user-provided function body and the provided function arguments, without default values, for figuring out if a variable denotes an observation or a random variable. Hereby the function DynamicPPL.generate_tilde replaces the L ~ R lines in the model and the function DynamicPPL.generate_dot_tilde replaces the @. L ~ R and L .~ R lines in the model.\nIn the above example, p[1] ~ InverseGamma(2, 3) is replaced with something similar to\n\n#= REPL[25]:6 =#\nbegin\n var\"##tmpright#323\" = InverseGamma(2, 3)\n var\"##tmpright#323\" isa Union{Distribution,AbstractVector{<:Distribution}} || throw(\n ArgumentError(\n \"Right-hand side of a ~ must be subtype of Distribution or a vector of Distributions.\",\n ),\n )\n var\"##vn#325\" = (DynamicPPL.VarName)(:p, ((1,),))\n var\"##inds#326\" = ((1,),)\n p[1] = (DynamicPPL.tilde_assume)(\n _rng,\n _context,\n _sampler,\n var\"##tmpright#323\",\n var\"##vn#325\",\n var\"##inds#326\",\n _varinfo,\n )\nend\n\nHere the first line is a so-called line number node that enables more helpful error messages by providing users with the exact location of the error in their model definition. Then the right hand side (RHS) of the ~ is assigned to a variable (with an automatically generated name). We check that the RHS is a distribution or an array of distributions, otherwise an error is thrown. Next we extract a compact representation of the variable with its name and index (or indices). Finally, the ~ expression is replaced with a call to DynamicPPL.tilde_assume since the compiler figured out that p[1] is a random variable using the following heuristic:\n\nIf the symbol on the LHS of ~, :p in this case, is not among the arguments to the model, (:x, :y, :T) in this case, it is a random variable.\nIf the symbol on the LHS of ~, :p in this case, is among the arguments to the model but has a value of missing, it is a random variable.\nIf the value of the LHS of ~, p[1] in this case, is missing, then it is a random variable.\nOtherwise, it is treated as an observation.\n\nThe DynamicPPL.tilde_assume function takes care of sampling the random variable, if needed, and updating its value and the accumulated log joint probability in the _varinfo object. If L ~ R is an observation, DynamicPPL.tilde_observe is called with the same arguments except the random number generator _rng (since observations are never sampled).\nA similar transformation is performed for expressions of the form @. L ~ R and L .~ R. For instance, @. x[1:2] ~ Normal(p[2], sqrt(p[1])) is replaced with\n\n#= REPL[25]:8 =#\nbegin\n var\"##tmpright#331\" = Normal.(p[2], sqrt.(p[1]))\n var\"##tmpright#331\" isa Union{Distribution,AbstractVector{<:Distribution}} || throw(\n ArgumentError(\n \"Right-hand side of a ~ must be subtype of Distribution or a vector of Distributions.\",\n ),\n )\n var\"##vn#333\" = (DynamicPPL.VarName)(:x, ((1:2,),))\n var\"##inds#334\" = ((1:2,),)\n var\"##isassumption#335\" = begin\n let var\"##vn#336\" = (DynamicPPL.VarName)(:x, ((1:2,),))\n if !((DynamicPPL.inargnames)(var\"##vn#336\", _model)) ||\n (DynamicPPL.inmissings)(var\"##vn#336\", _model)\n true\n else\n x[1:2] === missing\n end\n end\n end\n if var\"##isassumption#335\"\n x[1:2] .= (DynamicPPL.dot_tilde_assume)(\n _rng,\n _context,\n _sampler,\n var\"##tmpright#331\",\n x[1:2],\n var\"##vn#333\",\n var\"##inds#334\",\n _varinfo,\n )\n else\n (DynamicPPL.dot_tilde_observe)(\n _context,\n _sampler,\n var\"##tmpright#331\",\n x[1:2],\n var\"##vn#333\",\n var\"##inds#334\",\n _varinfo,\n )\n end\nend\n\nThe main difference in the expanded code between L ~ R and @. L ~ R is that the former doesn’t assume L to be defined, it can be a new Julia variable in the scope, while the latter assumes L already exists. Moreover, DynamicPPL.dot_tilde_assume and DynamicPPL.dot_tilde_observe are called instead of DynamicPPL.tilde_assume and DynamicPPL.tilde_observe.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "Turing Compiler Design (Outdated)" ] }, { "objectID": "developers/compiler/design-overview/index.html#step-3-replace-the-user-provided-function-body", "href": "developers/compiler/design-overview/index.html#step-3-replace-the-user-provided-function-body", "title": "Turing Compiler Design (Outdated)", "section": "Step 3: Replace the user-provided function body", "text": "Step 3: Replace the user-provided function body\nFinally, we replace the user-provided function body using DynamicPPL.build_output. This function uses MacroTools.combinedef to reassemble the user-provided function with a new function body. In the modified function body an anonymous function is created whose function body was generated in step 2 above and whose arguments are\n\na random number generator _rng,\na model _model,\na datastructure _varinfo,\na sampler _sampler,\na sampling context _context,\nand all positional and keyword arguments of the user-provided model function as positional arguments without any default values. Finally, in the new function body a model::Model with this anonymous function as internal function is returned.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "Turing Compiler Design (Outdated)" ] }, { "objectID": "developers/compiler/design-overview/index.html#overview-1", "href": "developers/compiler/design-overview/index.html#overview-1", "title": "Turing Compiler Design (Outdated)", "section": "Overview", "text": "Overview\nVarInfo is the data structure in Turing that facilitates tracking random variables and certain metadata about them that are required for sampling. For instance, the distribution of every random variable is stored in VarInfo because we need to know the support of every random variable when sampling using HMC for example. Random variables whose distributions have a constrained support are transformed using a bijector from Bijectors.jl so that the sampling happens in the unconstrained space. Different samplers require different metadata about the random variables.\nThe definition of VarInfo in Turing is:\n\nstruct VarInfo{Tmeta, Tlogp} <: AbstractVarInfo\n metadata::Tmeta\n logp::Base.RefValue{Tlogp}\n num_produce::Base.RefValue{Int}\nend\n\nBased on the type of metadata, the VarInfo is either aliased UntypedVarInfo or TypedVarInfo. metadata can be either a subtype of the union type Metadata or a NamedTuple of multiple such subtypes. Let vi be an instance of VarInfo. If vi isa VarInfo{<:Metadata}, then it is called an UntypedVarInfo. If vi isa VarInfo{<:NamedTuple}, then vi.metadata would be a NamedTuple mapping each symbol in P to an instance of Metadata. vi would then be called a TypedVarInfo. The other fields of VarInfo include logp which is used to accumulate the log probability or log probability density of the variables in P and D. num_produce keeps track of how many observations have been made in the model so far. This is incremented when running a ~ statement when the symbol on the LHS is in D.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "Turing Compiler Design (Outdated)" ] }, { "objectID": "developers/compiler/design-overview/index.html#metadata", "href": "developers/compiler/design-overview/index.html#metadata", "title": "Turing Compiler Design (Outdated)", "section": "Metadata", "text": "Metadata\nThe Metadata struct stores some metadata about the random variables sampled. This helps query certain information about a variable such as: its distribution, which samplers sample this variable, its value and whether this value is transformed to real space or not. Let md be an instance of Metadata:\n\nmd.vns is the vector of all VarName instances. Let vn be an arbitrary element of md.vns\nmd.idcs is the dictionary that maps each VarName instance to its index in md.vns, md.ranges, md.dists, md.orders and md.flags.\nmd.vns[md.idcs[vn]] == vn.\nmd.dists[md.idcs[vn]] is the distribution of vn.\nmd.gids[md.idcs[vn]] is the set of algorithms used to sample vn. This was used by the Gibbs sampler. Since Turing v0.36 it is unused and will eventually be deleted.\nmd.orders[md.idcs[vn]] is the number of observe statements before vn is sampled.\nmd.ranges[md.idcs[vn]] is the index range of vn in md.vals.\nmd.vals[md.ranges[md.idcs[vn]]] is the linearized vector of values of corresponding to vn.\nmd.flags is a dictionary of true/false flags. md.flags[flag][md.idcs[vn]] is the value of flag corresponding to vn.\n\nNote that in order to make md::Metadata type stable, all the md.vns must have the same symbol and distribution type. However, one can have a single Julia variable, e.g. x, that is a matrix or a hierarchical array sampled in partitions, e.g. x[1][:] ~ MvNormal(zeros(2), I); x[2][:] ~ MvNormal(ones(2), I). The symbol x can still be managed by a single md::Metadata without hurting the type stability since all the distributions on the RHS of ~ are of the same type.\nHowever, in Turing models one cannot have this restriction, so we must use a type unstable Metadata if we want to use one Metadata instance for the whole model. This is what UntypedVarInfo does. A type unstable Metadata will still work but will have inferior performance.\nTo strike a balance between flexibility and performance when constructing the spl::Sampler instance, the model is first run by sampling the parameters in P from their priors using an UntypedVarInfo, i.e. a type unstable Metadata is used for all the variables. Then once all the symbols and distribution types have been identified, a vi::TypedVarInfo is constructed where vi.metadata is a NamedTuple mapping each symbol in P to a specialized instance of Metadata. So as long as each symbol in P is sampled from only one type of distributions, vi::TypedVarInfo will have fully concretely typed fields which brings out the peak performance of Julia.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "Turing Compiler Design (Outdated)" ] }, { "objectID": "developers/inference/variational-inference/index.html", "href": "developers/inference/variational-inference/index.html", "title": "Variational Inference", "section": "", "text": "In this post, we’ll examine variational inference (VI), a family of approximate Bayesian inference methods. We will focus on one of the more standard VI methods, Automatic Differentiation Variational Inference (ADVI).\nHere, we’ll examine the theory behind VI, but if you’re interested in using ADVI in Turing, check out this tutorial.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Variational Inference" ] }, { "objectID": "developers/inference/variational-inference/index.html#computing-kl-divergence-without-knowing-the-posterior", "href": "developers/inference/variational-inference/index.html#computing-kl-divergence-without-knowing-the-posterior", "title": "Variational Inference", "section": "Computing KL-divergence without knowing the posterior", "text": "Computing KL-divergence without knowing the posterior\nFirst off, recall that\n\n\\[\np(z \\mid x\\_i) = \\frac{p(x\\_i, z)}{p(x\\_i)}\n\\]\n\nso we can write\n\n\\[\n\\begin{align*}\n\\mathrm{D\\_{KL}} \\left( q(z), p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) \\right) &= \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log q(z) \\right] - \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) - \\log p(x\\_i) \\right] \\\\\n &= \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log q(z) \\right] - \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) \\right] + \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x_i) \\right] \\\\\n &= \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log q(z) \\right] - \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) \\right] + \\sum\\_{i = 1}^n \\log p(x\\_i),\n\\end{align*}\n\\]\n\nwhere in the last equality we used the fact that \\(p(x_i)\\) is independent of \\(z\\).\nNow you’re probably thinking “Oh great! Now you’ve introduced \\(p(x_i)\\) which we also can’t compute (in general)!”. Woah. Calm down human. Let’s do some more algebra. The above expression can be rearranged to\n\n\\[\n\\mathrm{D\\_{KL}} \\left( q(z), p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) \\right) + \\underbrace{\\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) \\right] - \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log q(z) \\right]}\\_{=: \\mathrm{ELBO}(q)} = \\underbrace{\\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i) \\right]}\\_{\\text{constant}}.\n\\]\n\nSee? The left-hand side is constant and, as we mentioned before, \\(\\mathrm{D_{KL}} \\ge 0\\). What happens if we try to maximize the term we just gave the completely arbitrary name \\(\\mathrm{ELBO}\\)? Well, if \\(\\mathrm{ELBO}\\) goes up while \\(p(x_i)\\) stays constant then \\(\\mathrm{D_{KL}}\\) has to go down! That is, the \\(q(z)\\) which minimizes the KL-divergence is the same \\(q(z)\\) which maximizes \\(\\mathrm{ELBO}(q)\\):\n\n\\[\n\\underset{q}{\\mathrm{argmin}} \\ \\mathrm{D\\_{KL}} \\left( q(z), p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) \\right) = \\underset{q}{\\mathrm{argmax}} \\ \\mathrm{ELBO}(q)\n\\]\n\nwhere\n\n\\[\n\\begin{align*}\n\\mathrm{ELBO}(q) &:= \\left( \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) \\right] \\right) - \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log q(z) \\right] \\\\\n &= \\left( \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) \\right] \\right) + \\mathbb{H}\\left( q(z) \\right)\n\\end{align*}\n\\]\n\nand \\(\\mathbb{H} \\left(q(z) \\right)\\) denotes the (differential) entropy of \\(q(z)\\).\nAssuming joint \\(p(x_i, z)\\) and the entropy \\(\\mathbb{H}\\left(q(z)\\right)\\) are both tractable, we can use a Monte-Carlo for the remaining expectation. This leaves us with the following tractable expression\n\n\\[\n\\underset{q}{\\mathrm{argmin}} \\ \\mathrm{D\\_{KL}} \\left( q(z), p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) \\right) \\approx \\underset{q}{\\mathrm{argmax}} \\ \\widehat{\\mathrm{ELBO}}(q)\n\\]\n\nwhere\n\n\\[\n\\widehat{\\mathrm{ELBO}}(q) = \\frac{1}{m} \\left( \\sum\\_{k = 1}^m \\sum\\_{i = 1}^n \\log p(x\\_i, z\\_k) \\right) + \\mathbb{H} \\left(q(z)\\right) \\quad \\text{where} \\quad z\\_k \\sim q(z) \\quad \\forall k = 1, \\dots, m.\n\\]\n\nHence, as long as we can sample from \\(q(z)\\) somewhat efficiently, we can indeed minimize the KL-divergence! Neat, eh?\nSidenote: in the case where \\(q(z)\\) is tractable but \\(\\mathbb{H} \\left(q(z) \\right)\\) is not , we can use an Monte-Carlo estimate for this term too but this generally results in a higher-variance estimate.\nAlso, I fooled you real good: the ELBO isn’t an arbitrary name, hah! In fact it’s an abbreviation for the expected lower bound (ELBO) because it, uhmm, well, it’s the expected lower bound (remember \\(\\mathrm{D_{KL}} \\ge 0\\)). Yup.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Variational Inference" ] }, { "objectID": "developers/inference/variational-inference/index.html#maximizing-the-elbo", "href": "developers/inference/variational-inference/index.html#maximizing-the-elbo", "title": "Variational Inference", "section": "Maximizing the ELBO", "text": "Maximizing the ELBO\nFinding the optimal \\(q\\) over all possible densities of course isn’t feasible. Instead we consider a family of parameterized densities \\(\\mathscr{D}\\_{\\Theta}\\) where \\(\\Theta\\) denotes the space of possible parameters. Each density in this family \\(q\\_{\\theta} \\in \\mathscr{D}\\_{\\Theta}\\) is parameterized by a unique \\(\\theta \\in \\Theta\\). Moreover, we’ll assume\n\n\\(q\\_{\\theta}(z)\\), i.e. evaluating the probability density \\(q\\) at any point \\(z\\), is differentiable\n\\(z \\sim q\\_{\\theta}(z)\\), i.e. the process of sampling from \\(q\\_{\\theta}(z)\\), is differentiable\n\n\nis fairly straight-forward, but (2) is a bit tricky. What does it even mean for a sampling process to be differentiable? This is quite an interesting problem in its own right and would require something like a 50-page paper to properly review the different approaches (highly recommended read).\n\nWe’re going to make use of a particular such approach which goes under a bunch of different names: reparametrization trick, path derivative, etc. This refers to making the assumption that all elements \\(q\\_{\\theta} \\in \\mathscr{Q}\\_{\\Theta}\\) can be considered as reparameterizations of some base density, say \\(\\bar{q}(z)\\). That is, if \\(q\\_{\\theta} \\in \\mathscr{Q}\\_{\\Theta}\\) then\n\n\\[\nz \\sim q\\_{\\theta}(z) \\quad \\iff \\quad z := g\\_{\\theta}(\\tilde{z}) \\quad \\text{where} \\quad \\bar{z} \\sim \\bar{q}(z)\n\\]\n\nfor some function \\(g\\_{\\theta}\\) differentiable wrt. \\(\\theta\\). So all \\(q_{\\theta} \\in \\mathscr{Q}\\_{\\Theta}\\) are using the same reparameterization-function \\(g\\) but each \\(q\\_{\\theta}\\) correspond to different choices of \\(\\theta\\) for \\(f\\_{\\theta}\\).\nUnder this assumption we can differentiate the sampling process by taking the derivative of \\(g\\_{\\theta}\\) wrt. \\(\\theta\\), and thus we can differentiate the entire \\(\\widehat{\\mathrm{ELBO}}(q\\_{\\theta})\\) wrt. \\(\\theta\\)! With the gradient available we can either try to solve for optimality either by setting the gradient equal to zero or maximize \\(\\widehat{\\mathrm{ELBO}}(q\\_{\\theta})\\) stepwise by traversing \\(\\mathscr{Q}\\_{\\Theta}\\) in the direction of steepest ascent. For the sake of generality, we’re going to go with the stepwise approach.\nWith all this nailed down, we eventually reach the section on Automatic Differentiation Variational Inference (ADVI).", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Variational Inference" ] }, { "objectID": "developers/inference/variational-inference/index.html#automatic-differentiation-variational-inference-advi", "href": "developers/inference/variational-inference/index.html#automatic-differentiation-variational-inference-advi", "title": "Variational Inference", "section": "Automatic Differentiation Variational Inference (ADVI)", "text": "Automatic Differentiation Variational Inference (ADVI)\nSo let’s revisit the assumptions we’ve made at this point:\n\nThe variational posterior \\(q\\_{\\theta}\\) is in a parameterized family of densities denoted \\(\\mathscr{Q}\\_{\\Theta}\\), with \\(\\theta \\in \\Theta\\).\n\\(\\mathscr{Q}\\_{\\Theta}\\) is a space of reparameterizable densities with \\(\\bar{q}(z)\\) as the base-density.\nThe parameterization function \\(g\\_{\\theta}\\) is differentiable wrt. \\(\\theta\\).\nEvaluation of the probability density \\(q\\_{\\theta}(z)\\) is differentiable wrt. \\(\\theta\\).\n\\(\\mathbb{H}\\left(q\\_{\\theta}(z)\\right)\\) is tractable.\nEvaluation of the joint density \\(p(x, z)\\) is tractable and differentiable wrt. \\(z\\)\nThe support of \\(q(z)\\) is a subspace of the support of \\(p(z \\mid x)\\) : \\(\\mathrm{supp}\\left(q(z)\\right) \\subseteq \\mathrm{supp}\\left(p(z \\mid x)\\right)\\).\n\nAll of these are not necessary to do VI, but they are very convenient and results in a fairly flexible approach. One distribution which has a density satisfying all of the above assumptions except (7) (we’ll get back to this in second) for any tractable and differentiable \\(p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n)\\) is the good ole’ Gaussian/normal distribution:\n\n\\[\nz \\sim \\mathcal{N}(\\mu, \\Sigma) \\quad \\iff \\quad z = g\\_{\\mu, L}(\\bar{z}) := \\mu + L^T \\tilde{z} \\quad \\text{where} \\quad \\bar{z} \\sim \\bar{q}(z) := \\mathcal{N}(1\\_d, I\\_{d \\times d})\n\\]\n\nwhere \\(\\Sigma = L L^T,\\) with \\(L\\) obtained from the Cholesky-decomposition. Abusing notation a bit, we’re going to write\n\n\\[\n\\theta = (\\mu, \\Sigma) := (\\mu\\_1, \\dots, \\mu\\_d, L\\_{11}, \\dots, L\\_{1, d}, L\\_{2, 1}, \\dots, L\\_{2, d}, \\dots, L\\_{d, 1}, \\dots, L\\_{d, d}).\n\\]\n\nWith this assumption we finally have a tractable expression for \\(\\widehat{\\mathrm{ELBO}}(q_{\\mu, \\Sigma})\\)! Well, assuming (7) is holds. Since a Gaussian has non-zero probability on the entirety of \\(\\mathbb{R}^d\\), we also require \\(p(z \\mid \\\\{ x_i \\\\}_{i = 1}^n)\\) to have non-zero probability on all of \\(\\mathbb{R}^d\\).\nThough not necessary, we’ll often make a mean-field assumption for the variational posterior \\(q(z)\\), i.e. assume independence between the latent variables. In this case, we’ll write\n\n\\[\n\\theta = (\\mu, \\sigma^2) := (\\mu\\_1, \\dots, \\mu\\_d, \\sigma\\_1^2, \\dots, \\sigma\\_d^2).\n\\]\n\n\nExamples\nAs a (trivial) example we could apply the approach described above to is the following generative model for \\(p(z \\mid \\\\{ x_i \\\\}\\_{i = 1}^n)\\):\n\n\\[\n\\begin{align*}\n m &\\sim \\mathcal{N}(0, 1) \\\\\n x\\_i &\\overset{\\text{i.i.d.}}{=} \\mathcal{N}(m, 1), \\quad i = 1, \\dots, n.\n\\end{align*}\n\\]\n\nIn this case \\(z = m\\) and we have the posterior defined \\(p(m \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) = p(m) \\prod\\_{i = 1}^n p(x\\_i \\mid m)\\). Then the variational posterior would be\n\n\\[\nq\\_{\\mu, \\sigma} = \\mathcal{N}(\\mu, \\sigma^2), \\quad \\text{where} \\quad \\mu \\in \\mathbb{R}, \\ \\sigma^2 \\in \\mathbb{R}^{ + }.\n\\]\n\nAnd since prior of \\(m\\), \\(\\mathcal{N}(0, 1)\\), has non-zero probability on the entirety of \\(\\mathbb{R}\\), same as \\(q(m)\\), i.e. assumption (7) above holds, everything is fine and life is good.\nBut what about this generative model for \\(p(z \\mid \\\\{ x_i \\\\}_{i = 1}^n)\\):\n\n\\[\n\\begin{align*}\n s &\\sim \\mathrm{InverseGamma}(2, 3), \\\\\n m &\\sim \\mathcal{N}(0, s), \\\\\n x\\_i &\\overset{\\text{i.i.d.}}{=} \\mathcal{N}(m, s), \\quad i = 1, \\dots, n,\n\\end{align*}\n\\]\n\nwith posterior \\(p(s, m \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) = p(s) p(m \\mid s) \\prod\\_{i = 1}^n p(x\\_i \\mid s, m)\\) and the mean-field variational posterior \\(q(s, m)\\) will be\n\n\\[\nq\\_{\\mu\\_1, \\mu\\_2, \\sigma\\_1^2, \\sigma\\_2^2}(s, m) = p\\_{\\mathcal{N}(\\mu\\_1, \\sigma\\_1^2)}(s)\\ p\\_{\\mathcal{N}(\\mu\\_2, \\sigma\\_2^2)}(m),\n\\]\n\nwhere we’ve denoted the evaluation of the probability density of a Gaussian as \\(p_{\\mathcal{N}(\\mu, \\sigma^2)}(x)\\).\nObserve that \\(\\mathrm{InverseGamma}(2, 3)\\) has non-zero probability only on \\(\\mathbb{R}^{ + } := (0, \\infty)\\) which is clearly not all of \\(\\mathbb{R}\\) like \\(q(s, m)\\) has, i.e.\n\n\\[\n\\mathrm{supp} \\left( q(s, m) \\right) \\not\\subseteq \\mathrm{supp} \\left( p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) \\right).\n\\]\n\nRecall from the definition of the KL-divergence that when this is the case, the KL-divergence isn’t well defined. This gets us to the automatic part of ADVI.\n\n\n“Automatic”? How?\nFor a lot of the standard (continuous) densities \\(p\\) we can actually construct a probability density \\(\\tilde{p}\\) with non-zero probability on all of \\(\\mathbb{R}\\) by transforming the “constrained” probability density \\(p\\) to \\(\\tilde{p}\\). In fact, in these cases this is a one-to-one relationship. As we’ll see, this helps solve the support-issue we’ve been going on and on about.\n\nTransforming densities using change of variables\nIf we want to compute the probability of \\(x\\) taking a value in some set \\(A \\subseteq \\mathrm{supp} \\left( p(x) \\right)\\), we have to integrate \\(p(x)\\) over \\(A\\), i.e.\n\n\\[\n\\mathbb{P}_p(x \\in A) = \\int_A p(x) \\mathrm{d}x.\n\\]\n\nThis means that if we have a differentiable bijection \\(f: \\mathrm{supp} \\left( q(x) \\right) \\to \\mathbb{R}^d\\) with differentiable inverse \\(f^{-1}: \\mathbb{R}^d \\to \\mathrm{supp} \\left( p(x) \\right)\\), we can perform a change of variables\n\n\\[\n\\mathbb{P}\\_p(x \\in A) = \\int\\_{f^{-1}(A)} p \\left(f^{-1}(y) \\right) \\ \\left| \\det \\mathcal{J}\\_{f^{-1}}(y) \\right| \\mathrm{d}y,\n\\]\n\nwhere \\(\\mathcal{J}_{f^{-1}}(x)\\) denotes the jacobian of \\(f^{-1}\\) evaluated at \\(x\\). Observe that this defines a probability distribution\n\n\\[\n\\mathbb{P}\\_{\\tilde{p}}\\left(y \\in f^{-1}(A) \\right) = \\int\\_{f^{-1}(A)} \\tilde{p}(y) \\mathrm{d}y,\n\\]\n\nsince \\(f^{-1}\\left(\\mathrm{supp} (p(x)) \\right) = \\mathbb{R}^d\\) which has probability 1. This probability distribution has density \\(\\tilde{p}(y)\\) with \\(\\mathrm{supp} \\left( \\tilde{p}(y) \\right) = \\mathbb{R}^d\\), defined\n\n\\[\n\\tilde{p}(y) = p \\left( f^{-1}(y) \\right) \\ \\left| \\det \\mathcal{J}\\_{f^{-1}}(y) \\right|\n\\]\n\nor equivalently\n\n\\[\n\\tilde{p} \\left( f(x) \\right) = \\frac{p(x)}{\\big| \\det \\mathcal{J}\\_{f}(x) \\big|}\n\\]\n\ndue to the fact that\n\n\\[\n\\big| \\det \\mathcal{J}\\_{f^{-1}}(y) \\big| = \\big| \\det \\mathcal{J}\\_{f}(x) \\big|^{-1}\n\\]\n\nNote: it’s also necessary that the log-abs-det-jacobian term is non-vanishing. This can for example be accomplished by assuming \\(f\\) to also be elementwise monotonic.\n\n\nBack to VI\nSo why is this is useful? Well, we’re looking to generalize our approach using a normal distribution to cases where the supports don’t match up. How about defining \\(q(z)\\) by\n\n\\[\n\\begin{align*}\n \\eta &\\sim \\mathcal{N}(\\mu, \\Sigma), \\\\\\\\\n z &= f^{-1}(\\eta),\n\\end{align*}\n\\]\n\nwhere \\(f^{-1}: \\mathbb{R}^d \\to \\mathrm{supp} \\left( p(z \\mid x) \\right)\\) is a differentiable bijection with differentiable inverse. Then \\(z \\sim q_{\\mu, \\Sigma}(z) \\implies z \\in \\mathrm{supp} \\left( p(z \\mid x) \\right)\\) as we wanted. The resulting variational density is\n\n\\[\nq\\_{\\mu, \\Sigma}(z) = p\\_{\\mathcal{N}(\\mu, \\Sigma)}\\left( f(z) \\right) \\ \\big| \\det \\mathcal{J}\\_{f}(z) \\big|.\n\\]\n\nNote that the way we’ve constructed \\(q(z)\\) here is basically a reverse of the approach we described above. Here we sample from a distribution with support on \\(\\mathbb{R}\\) and transform to \\(\\mathrm{supp} \\left( p(z \\mid x) \\right)\\).\nIf we want to write the ELBO explicitly in terms of \\(\\eta\\) rather than \\(z\\), the first term in the ELBO becomes\n\n\\[\n\\begin{align*}\n \\mathbb{E}\\_{z \\sim q_{\\mu, \\Sigma}(z)} \\left[ \\log p(x\\_i, z) \\right] &= \\mathbb{E}\\_{\\eta \\sim \\mathcal{N}(\\mu, \\Sigma)} \\Bigg[ \\log \\frac{p\\left(x\\_i, f^{-1}(\\eta) \\right)}{\\big| \\det \\mathcal{J}_{f^{-1}}(\\eta) \\big|} \\Bigg] \\\\\n &= \\mathbb{E}\\_{\\eta \\sim \\mathcal{N}(\\mu, \\Sigma)} \\left[ \\log p\\left(x\\_i, f^{-1}(\\eta) \\right) \\right] - \\mathbb{E}\\_{\\eta \\sim \\mathcal{N}(\\mu, \\Sigma)} \\left[ \\left| \\det \\mathcal{J}\\_{f^{-1}}(\\eta) \\right| \\right].\n\\end{align*}\n\\]\n\nThe entropy is invariant under change of variables, thus \\(\\mathbb{H} \\left(q\\_{\\mu, \\Sigma}(z)\\right)\\) is simply the entropy of the normal distribution which is known analytically.\nHence, the resulting empirical estimate of the ELBO is\n\n\\[\n\\begin{align*}\n\\widehat{\\mathrm{ELBO}}(q\\_{\\mu, \\Sigma}) &= \\frac{1}{m} \\left( \\sum\\_{k = 1}^m \\sum\\_{i = 1}^n \\left(\\log p\\left(x\\_i, f^{-1}(\\eta_k)\\right) - \\log \\big| \\det \\mathcal{J}\\_{f^{-1}}(\\eta\\_k) \\big| \\right) \\right) + \\mathbb{H} \\left(p\\_{\\mathcal{N}(\\mu, \\Sigma)}(z)\\right) \\\\\n& \\text{where} \\quad z\\_k \\sim \\mathcal{N}(\\mu, \\Sigma) \\quad \\forall k = 1, \\dots, m\n\\end{align*}.\n\\]\n\nAnd maximizing this wrt. \\(\\mu\\) and \\(\\Sigma\\) is what’s referred to as Automatic Differentiation Variational Inference (ADVI)!\nNow if you want to try it out, check out the tutorial on how to use ADVI in Turing.jl!", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "Variational Inference" ] }, { "objectID": "developers/inference/abstractmcmc-turing/index.html", "href": "developers/inference/abstractmcmc-turing/index.html", "title": "How Turing Implements AbstractMCMC", "section": "", "text": "Prerequisite: Interface guide.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "How Turing Implements AbstractMCMC" ] }, { "objectID": "developers/inference/abstractmcmc-turing/index.html#introduction", "href": "developers/inference/abstractmcmc-turing/index.html#introduction", "title": "How Turing Implements AbstractMCMC", "section": "Introduction", "text": "Introduction\nConsider the following Turing, code block:\n\nusing Turing\n\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n return y ~ Normal(m, sqrt(s²))\nend\n\nmod = gdemo(1.5, 2)\nalg = IS()\nn_samples = 1000\n\nchn = sample(mod, alg, n_samples, progress=false)\n\nChains MCMC chain (1000×3×1 Array{Float64, 3}):\n\nLog evidence = -3.7234848633413686\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 2.31 seconds\nCompute duration = 2.31 seconds\nparameters = s², m\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s² 2.9641 4.7036 0.1648 769.7047 649.7451 0.9996 ⋯\n m 0.0293 1.5942 0.0576 775.4565 907.0437 1.0030 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.5600 1.1201 1.7686 3.0974 12.5427\n m -3.2566 -0.8768 -0.0375 0.9703 3.0222\n\n\nThe function sample is part of the AbstractMCMC interface. As explained in the interface guide, building a sampling method that can be used by sample consists in overloading the structs and functions in AbstractMCMC. The interface guide also gives a standalone example of their implementation, AdvancedMH.jl.\nTuring sampling methods (most of which are written here) also implement AbstractMCMC. Turing defines a particular architecture for AbstractMCMC implementations, that enables working with models defined by the @model macro, and uses DynamicPPL as a backend. The goal of this page is to describe this architecture, and how you would go about implementing your own sampling method in Turing, using Importance Sampling as an example. I don’t go into all the details: for instance, I don’t address selectors or parallelism.\nFirst, we explain how Importance Sampling works in the abstract. Consider the model defined in the first code block. Mathematically, it can be written:\n\\[\n\\begin{align*}\ns &\\sim \\text{InverseGamma}(2, 3), \\\\\nm &\\sim \\text{Normal}(0, \\sqrt{s}), \\\\\nx &\\sim \\text{Normal}(m, \\sqrt{s}), \\\\\ny &\\sim \\text{Normal}(m, \\sqrt{s}).\n\\end{align*}\n\\]\nThe latent variables are \\(s\\) and \\(m\\), the observed variables are \\(x\\) and \\(y\\). The model joint distribution \\(p(s,m,x,y)\\) decomposes into the prior \\(p(s,m)\\) and the likelihood \\(p(x,y \\mid s,m).\\) Since \\(x = 1.5\\) and \\(y = 2\\) are observed, the goal is to infer the posterior distribution \\(p(s,m \\mid x,y).\\)\nImportance Sampling produces independent samples \\((s_i, m_i)\\) from the prior distribution. It also outputs unnormalized weights\n\\[\nw_i = \\frac {p(x,y,s_i,m_i)} {p(s_i, m_i)} = p(x,y \\mid s_i, m_i)\n\\]\nsuch that the empirical distribution\n\\[\n\\frac{1}{N} \\sum_{i =1}^N \\frac {w_i} {\\sum_{j=1}^N w_j} \\delta_{(s_i, m_i)}\n\\]\nis a good approximation of the posterior.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "How Turing Implements AbstractMCMC" ] }, { "objectID": "developers/inference/abstractmcmc-turing/index.html#define-a-sampler", "href": "developers/inference/abstractmcmc-turing/index.html#define-a-sampler", "title": "How Turing Implements AbstractMCMC", "section": "1. Define a Sampler", "text": "1. Define a Sampler\nRecall the last line of the above code block:\n\nchn = sample(mod, alg, n_samples, progress=false)\n\nChains MCMC chain (1000×3×1 Array{Float64, 3}):\n\nLog evidence = -3.7514858678906693\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 0.05 seconds\nCompute duration = 0.05 seconds\nparameters = s², m\ninternals = lp\n\nSummary Statistics\n parameters mean std mcse ess_bulk ess_tail rhat ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n s² 4.5301 44.2434 1.3914 1065.9308 1016.4024 1.0008 ⋯\n m -0.1032 1.7947 0.0579 1019.0850 952.0448 0.9996 ⋯\n 1 column omitted\n\nQuantiles\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n s² 0.5717 1.1219 1.7160 3.1358 15.4342\n m -3.5391 -0.9427 -0.0167 0.8150 3.3023\n\n\nHere sample takes as arguments a model mod, an algorithm alg, and a number of samples n_samples, and returns an instance chn of Chains which can be analysed using the functions in MCMCChains.\n\nModels\nTo define a model, you declare a joint distribution on variables in the @model macro, and specify which variables are observed and which should be inferred, as well as the value of the observed variables. Thus, when implementing Importance Sampling,\n\nmod = gdemo(1.5, 2)\n\nDynamicPPL.Model{typeof(gdemo), (:x, :y), (), (), Tuple{Float64, Int64}, Tuple{}, DynamicPPL.DefaultContext}(gdemo, (x = 1.5, y = 2), NamedTuple(), DynamicPPL.DefaultContext())\n\n\ncreates an instance mod of the struct Model, which corresponds to the observations of a value of 1.5 for x, and a value of 2 for y.\nThis is all handled by DynamicPPL, more specifically here. I will return to how models are used to inform sampling algorithms below.\n\n\nAlgorithms\nAn algorithm is just a sampling method: in Turing, it is a subtype of the abstract type InferenceAlgorithm. Defining an algorithm may require specifying a few high-level parameters. For example, “Hamiltonian Monte-Carlo” may be too vague, but “Hamiltonian Monte Carlo with 10 leapfrog steps per proposal and a stepsize of 0.01” is an algorithm. “Metropolis-Hastings” may be too vague, but “Metropolis-Hastings with proposal distribution p” is an algorithm. Thus\n\nstepsize = 0.01\nL = 10\nalg = HMC(stepsize, L)\n\nHMC{AutoForwardDiff{nothing, Nothing}, AdvancedHMC.UnitEuclideanMetric}(0.01, 10, AutoForwardDiff())\n\n\ndefines a Hamiltonian Monte-Carlo algorithm, an instance of HMC, which is a subtype of InferenceAlgorithm.\nIn the case of Importance Sampling, there is no need to specify additional parameters:\n\nalg = IS()\n\nIS()\n\n\ndefines an Importance Sampling algorithm, an instance of IS, a subtype of InferenceAlgorithm.\nWhen creating your own Turing sampling method, you must, therefore, build a subtype of InferenceAlgorithm corresponding to your method.\n\n\nSamplers\nSamplers are not the same as algorithms. An algorithm is a generic sampling method, a sampler is an object that stores information about how algorithm and model interact during sampling, and is modified as sampling progresses. The Sampler struct is defined in DynamicPPL.\nTuring implements AbstractMCMC’s AbstractSampler with the Sampler struct defined in DynamicPPL. The most important attributes of an instance spl of Sampler are:\n\nspl.alg: the sampling method used, an instance of a subtype of InferenceAlgorithm\nspl.state: information about the sampling process, see below\n\nWhen you call sample(mod, alg, n_samples), Turing first uses model and alg to build an instance spl of Sampler , then calls the native AbstractMCMC function sample(mod, spl, n_samples).\nWhen you define your own Turing sampling method, you must therefore build:\n\na sampler constructor that uses a model and an algorithm to initialize an instance of Sampler. For Importance Sampling:\n\n\nfunction Sampler(alg::IS, model::Model, s::Selector)\n info = Dict{Symbol,Any}()\n state = ISState(model)\n return Sampler(alg, info, s, state)\nend\n\n\na state struct implementing AbstractSamplerState corresponding to your method: we cover this in the following paragraph.\n\n\n\nStates\nThe vi field contains all the important information about sampling: first and foremost, the values of all the samples, but also the distributions from which they are sampled, the names of model parameters, and other metadata. As we will see below, many important steps during sampling correspond to queries or updates to spl.state.vi.\nBy default, you can use SamplerState, a concrete type defined in inference/Inference.jl, which extends AbstractSamplerState and has no field except for vi:\n\nmutable struct SamplerState{VIType<:VarInfo} <: AbstractSamplerState\n vi::VIType\nend\n\nWhen doing Importance Sampling, we care not only about the values of the samples but also their weights. We will see below that the weight of each sample is also added to spl.state.vi. Moreover, the average\n\\[\n\\frac 1 N \\sum_{j=1}^N w_i = \\frac 1 N \\sum_{j=1}^N p(x,y \\mid s_i, m_i)\n\\]\nof the sample weights is a particularly important quantity:\n\nit is used to normalize the empirical approximation of the posterior distribution\nits logarithm is the importance sampling estimate of the log evidence \\(\\log p(x, y)\\)\n\nTo avoid having to compute it over and over again, is.jldefines an IS-specific concrete type ISState for sampler states, with an additional field final_logevidence containing\n\\[\n\\log \\frac 1 N \\sum_{j=1}^N w_i.\n\\]\n\nmutable struct ISState{V<:VarInfo,F<:AbstractFloat} <: AbstractSamplerState\n vi::V\n final_logevidence::F\nend\n\n# additional constructor\nISState(model::Model) = ISState(VarInfo(model), 0.0)\n\nThe following diagram summarizes the hierarchy presented above.\n\n\n\n\n\n\n\nG\n\n\n\nspl\n\nspl\nSampler\n<:AbstractSampler\n\n\n\nstate\n\nspl.state\nState\n<:AbstractSamplerState\n\n\n\nspl->state\n\n\n\n\n\nalg\n\nspl.alg\nAlgorithm\n<:InferenceAlgorithm\n\n\n\nspl->alg\n\n\n\n\n\nplaceholder1\n\n...\n\n\n\nspl->placeholder1\n\n\n\n\n\nvi\n\nspl.state.vi\nVarInfo\n<:AbstractVarInfo\n\n\n\nstate->vi\n\n\n\n\n\nplaceholder2\n\n...\n\n\n\nstate->placeholder2\n\n\n\n\n\nplaceholder3\n\n...\n\n\n\nalg->placeholder3\n\n\n\n\n\nplaceholder4\n\n...\n\n\n\nplaceholder1->placeholder4", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "How Turing Implements AbstractMCMC" ] }, { "objectID": "developers/inference/abstractmcmc-turing/index.html#overload-the-functions-used-inside-mcmcsample", "href": "developers/inference/abstractmcmc-turing/index.html#overload-the-functions-used-inside-mcmcsample", "title": "How Turing Implements AbstractMCMC", "section": "2. Overload the functions used inside mcmcsample", "text": "2. Overload the functions used inside mcmcsample\nA lot of the things here are method-specific. However, Turing also has some functions that make it easier for you to implement these functions, for example.\n\nTransitions\nAbstractMCMC stores information corresponding to each individual sample in objects called transition, but does not specify what the structure of these objects could be. You could decide to implement a type MyTransition for transitions corresponding to the specifics of your methods. However, there are many situations in which the only information you need for each sample is:\n\nits value: \\(\\theta\\)\nlog of the joint probability of the observed data and this sample: lp\n\nInference.jl defines a struct Transition, which corresponds to this default situation\n\nstruct Transition{T,F<:AbstractFloat}\n θ::T\n lp::F\nend\n\nIt also contains a constructor that builds an instance of Transition from an instance spl of Sampler: \\(\\theta\\) is spl.state.vi converted to a namedtuple, and lp is getlogp(spl.state.vi). is.jl uses this default constructor at the end of the step! function here.\n\n\nHow sample works\nA crude summary, which ignores things like parallelism, is the following:\nsample calls mcmcsample, which calls\n\nsample_init! to set things up\nstep! repeatedly to produce multiple new transitions\nsample_end! to perform operations once all samples have been obtained\nbundle_samples to convert a vector of transitions into a more palatable type, for instance a Chain.\n\nYou can, of course, implement all of these functions, but AbstractMCMC as well as Turing, also provide default implementations for simple cases. For instance, importance sampling uses the default implementations of sample_init! and bundle_samples, which is why you don’t see code for them inside is.jl.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "How Turing Implements AbstractMCMC" ] }, { "objectID": "developers/inference/abstractmcmc-turing/index.html#overload-assume-and-observe", "href": "developers/inference/abstractmcmc-turing/index.html#overload-assume-and-observe", "title": "How Turing Implements AbstractMCMC", "section": "3. Overload assume and observe", "text": "3. Overload assume and observe\nThe functions mentioned above, such as sample_init!, step!, etc., must, of course, use information about the model in order to generate samples! In particular, these functions may need samples from distributions defined in the model or to evaluate the density of these distributions at some values of the corresponding parameters or observations.\nFor an example of the former, consider Importance Sampling as defined in is.jl. This implementation of Importance Sampling uses the model prior distribution as a proposal distribution, and therefore requires samples from the prior distribution of the model. Another example is Approximate Bayesian Computation, which requires multiple samples from the model prior and likelihood distributions in order to generate a single sample.\nAn example of the latter is the Metropolis-Hastings algorithm. At every step of sampling from a target posterior\n\\[\np(\\theta \\mid x_{\\text{obs}}),\n\\]\nin order to compute the acceptance ratio, you need to evaluate the model joint density\n\\[\np\\left(\\theta_{\\text{prop}}, x_{\\text{obs}}\\right)\n\\]\nwith \\(\\theta_{\\text{prop}}\\) a sample from the proposal and \\(x_{\\text{obs}}\\) the observed data.\nThis begs the question: how can these functions access model information during sampling? Recall that the model is stored as an instance m of Model. One of the attributes of m is the model evaluation function m.f, which is built by compiling the @model macro. Executing f runs the tilde statements of the model in order, and adds model information to the sampler (the instance of Sampler that stores information about the ongoing sampling process) at each step (see here for more information about how the @model macro is compiled). The DynamicPPL functions assume and observe determine what kind of information to add to the sampler for every tilde statement.\nConsider an instance m of Model and a sampler spl, with associated VarInfo vi = spl.state.vi. At some point during the sampling process, an AbstractMCMC function such as step! calls m(vi, ...), which calls the model evaluation function m.f(vi, ...).\n\nfor every tilde statement in the @model macro, m.f(vi, ...) returns model-related information (samples, value of the model density, etc.), and adds it to vi. How does it do that?\n\nrecall that the code for m.f(vi, ...) is automatically generated by compilation of the @model macro\nfor every tilde statement in the @model declaration, this code contains a call to assume(vi, ...) if the variable on the LHS of the tilde is a model parameter to infer, and observe(vi, ...) if the variable on the LHS of the tilde is an observation\nin the file corresponding to your sampling method (ie in Turing.jl/src/inference/<your_method>.jl), you have overloaded assume and observe, so that they can modify vi to include the information and samples that you care about!\nat a minimum, assume and observe return the log density lp of the sample or observation. the model evaluation function then immediately calls acclogp!!(vi, lp), which adds lp to the value of the log joint density stored in vi.\n\n\nHere’s what assume looks like for Importance Sampling:\n\nfunction DynamicPPL.assume(rng, spl::Sampler{<:IS}, dist::Distribution, vn::VarName, vi)\n r = rand(rng, dist)\n push!(vi, vn, r, dist, spl)\n return r, 0\nend\n\nThe function first generates a sample r from the distribution dist (the right hand side of the tilde statement). It then adds r to vi, and returns r and 0.\nThe observe function is even simpler:\n\nfunction DynamicPPL.observe(spl::Sampler{<:IS}, dist::Distribution, value, vi)\n return logpdf(dist, value)\nend\n\nIt simply returns the density (in the discrete case, the probability) of the observed value under the distribution dist.", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "How Turing Implements AbstractMCMC" ] }, { "objectID": "developers/inference/abstractmcmc-turing/index.html#summary-importance-sampling-step-by-step", "href": "developers/inference/abstractmcmc-turing/index.html#summary-importance-sampling-step-by-step", "title": "How Turing Implements AbstractMCMC", "section": "4. Summary: Importance Sampling step by step", "text": "4. Summary: Importance Sampling step by step\nWe focus on the AbstractMCMC functions that are overridden in is.jl and executed inside mcmcsample: step!, which is called n_samples times, and sample_end!, which is executed once after those n_samples iterations.\n\nDuring the \\(i\\)-th iteration, step! does 3 things:\n\nempty!!(spl.state.vi): remove information about the previous sample from the sampler’s VarInfo\nmodel(rng, spl.state.vi, spl): call the model evaluation function\n\ncalls to assume add the samples from the prior \\(s_i\\) and \\(m_i\\) to spl.state.vi\ncalls to assume or observe are followed by the line acclogp!!(vi, lp), where lp is an output of assume and observe\nlp is set to 0 after assume, and to the value of the density at the observation after observe\nWhen all the tilde statements have been covered, spl.state.vi.logp[] is the sum of the lp, i.e., the likelihood \\(\\log p(x, y \\mid s_i, m_i) = \\log p(x \\mid s_i, m_i) + \\log p(y \\mid s_i, m_i)\\) of the observations given the latent variable samples \\(s_i\\) and \\(m_i\\).\n\nreturn Transition(spl): build a transition from the sampler, and return that transition\n\nthe transition’s vi field is simply spl.state.vi\nthe lp field contains the likelihood spl.state.vi.logp[]\n\n\nWhen the n_samples iterations are completed, sample_end! fills the final_logevidence field of spl.state\n\nIt simply takes the logarithm of the average of the sample weights, using the log weights for numerical stability", "crumbs": [ "Get Started", "Developers", "Inference (note: outdated)", "How Turing Implements AbstractMCMC" ] }, { "objectID": "developers/transforms/distributions/index.html", "href": "developers/transforms/distributions/index.html", "title": "Distributions and the Jacobian", "section": "", "text": "This series of articles will seek to motivate the Bijectors.jl package, which provides the tools for transforming distributions in the Turing.jl probabilistic programming language.\nIt assumes:\nimport Random\nRandom.seed!(468);\n\nusing Distributions: Normal, LogNormal, logpdf, Distributions\nusing Plots: histogram", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Distributions and the Jacobian" ] }, { "objectID": "developers/transforms/distributions/index.html#sampling-from-a-distribution", "href": "developers/transforms/distributions/index.html#sampling-from-a-distribution", "title": "Distributions and the Jacobian", "section": "Sampling from a distribution", "text": "Sampling from a distribution\nTo sample from a distribution (as defined in Distributions.jl), we can use the rand function. Let’s sample from a normal distribution and then plot a histogram of the samples.\n\nsamples = rand(Normal(), 5000)\nhistogram(samples, bins=50)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n(Calling Normal() without any arguments, as we do here, gives us a normal distribution with mean 0 and standard deviation 1.) If you want to know the log probability density of observing any of the samples, you can use logpdf:\n\nprintln(\"sample: $(samples[1])\")\nprintln(\"logpdf: $(logpdf(Normal(), samples[1]))\")\n\nsample: 0.04374853981619864\nlogpdf: -0.9198955005726975\n\n\nThe probability density function for the normal distribution with mean 0 and standard deviation 1 is\n\\[p(x) = \\frac{1}{\\sqrt{2\\pi}} \\exp{\\left(-\\frac{x^2}{2}\\right)},\\]\nso we could also have calculated this manually using:\n\nlog(1 / sqrt(2π) * exp(-samples[1]^2 / 2))\n\n-0.9198955005726974\n\n\n(or more efficiently, -(samples[1]^2 + log2π) / 2, where log2π is from the IrrationalConstants.jl package).", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Distributions and the Jacobian" ] }, { "objectID": "developers/transforms/distributions/index.html#sampling-from-a-transformed-distribution", "href": "developers/transforms/distributions/index.html#sampling-from-a-transformed-distribution", "title": "Distributions and the Jacobian", "section": "Sampling from a transformed distribution", "text": "Sampling from a transformed distribution\nSay that \\(x\\) is distributed according to Normal(), and we want to draw samples of \\(y = \\exp(x)\\). Now, \\(y\\) is itself a random variable, and like any other random variable, will have a probability distribution, which we’ll call \\(q(y)\\).\nIn this specific case, the distribution of \\(y\\) is known as a log-normal distribution. For the purposes of this tutorial, let’s implement our own MyLogNormal distribution that we can sample from. (Distributions.jl already defines its own LogNormal, so we have to use a different name.) To do this, we need to overload Base.rand for our new distribution.\n\nstruct MyLogNormal <: Distributions.ContinuousUnivariateDistribution\n μ::Float64\n σ::Float64\nend\nMyLogNormal() = MyLogNormal(0.0, 1.0)\n\nfunction Base.rand(rng::Random.AbstractRNG, d::MyLogNormal)\n exp(rand(rng, Normal(d.μ, d.σ)))\nend\n\nNow we can do the same as above:\n\nsamples_lognormal = rand(MyLogNormal(), 5000)\n# Cut off the tail for clearer visualisation\nhistogram(samples_lognormal, bins=0:0.1:5; xlims=(0, 5))\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHow do we implement logpdf for our new distribution, though? Or in other words, if we observe a sample \\(y\\), how do we know what the probability of drawing that sample was?\nNaively, we might think to just un-transform the variable y by reversing the exponential, i.e. taking the logarithm. We could then use the logpdf of the original distribution of x.\n\nnaive_logpdf(d::MyLogNormal, y) = logpdf(Normal(d.μ, d.σ), log(y))\n\nnaive_logpdf (generic function with 1 method)\n\n\nWe can compare this function against the logpdf implemented in Distributions.jl:\n\nprintln(\"Sample : $(samples_lognormal[1])\")\nprintln(\"Expected : $(logpdf(LogNormal(), samples_lognormal[1]))\")\nprintln(\"Actual : $(naive_logpdf(MyLogNormal(), samples_lognormal[1]))\")\n\nSample : 2.2331001636281114\nExpected : -2.0450477723405234\nActual : -1.2416569444078478\n\n\nClearly this approach is not quite correct!", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Distributions and the Jacobian" ] }, { "objectID": "developers/transforms/distributions/index.html#the-derivative", "href": "developers/transforms/distributions/index.html#the-derivative", "title": "Distributions and the Jacobian", "section": "The derivative", "text": "The derivative\nThe reason why this doesn’t work is because transforming a (continuous) distribution causes probability density to be stretched and otherwise moved around. For example, in the normal distribution, half of the probability density is between \\(-\\infty\\) and \\(0\\), and half is between \\(0\\) and \\(\\infty\\). When exponentiated (i.e. in the log-normal distribution), the first half of the density is mapped to the interval \\((0, 1)\\), and the second half to \\((1, \\infty)\\).\nThis ‘explanation’ on its own does not really mean much, though. A perhaps more useful approach is to not talk about probability densities, but instead to make it more concrete by relating them to actual probabilities. If we think about the normal distribution as a continuous curve, what the probability density function \\(p(x)\\) really tells us is that: for any two points \\(a\\) and \\(b\\) (where \\(a \\leq b\\)), the probability of drawing a sample between \\(a\\) and \\(b\\) is the corresponding area under the curve, i.e.\n\\[\\int_a^b p(x) \\, \\mathrm{d}x.\\]\nFor example, if \\((a, b) = (-\\infty, \\infty)\\), then the probability of drawing a sample between \\(a\\) and \\(b\\) is 1.\nLet’s say that the probability density function of the log-normal distribution is \\(q(y)\\). Then, the area under the curve between the two points \\(\\exp(a)\\) and \\(\\exp(b)\\) is:\n\\[\\int_{\\exp(a)}^{\\exp(b)} q(y) \\, \\mathrm{d}y.\\]\nThis integral should be equal to the one above, because the probability of drawing from \\([a, b]\\) in the original distribution should be the same as the probability of drawing from \\([\\exp(a), \\exp(b)]\\) in the transformed distribution. The question we have to solve here is: how do we find a function \\(q(y)\\) such that this equality holds?\nWe can approach this by making the substitution \\(y = \\exp(x)\\) in the first integral (see Wikipedia for a refresher on substitutions in integrals, if needed). We have that:\n\\[\\frac{\\mathrm{d}y}{\\mathrm{d}x} = \\exp(x) = y \\implies \\mathrm{d}x = \\frac{1}{y}\\,\\mathrm{d}y\\]\nand so\n\\[\\int_{x=a}^{x=b} p(x) \\, \\mathrm{d}x\n = \\int_{y=\\exp(a)}^{y=\\exp(b)} p(\\log(y)) \\frac{1}{y} \\,\\mathrm{d}y\n = \\int_{\\exp(a)}^{\\exp(b)} q(y) \\, \\mathrm{d}y,\n\\]\nfrom which we can read off \\(q(y) = p(\\log(y)) / y\\).\nIn contrast, when we implemented naive_logpdf\n\nnaive_logpdf(d::MyLogNormal, y) = logpdf(Normal(d.μ, d.σ), log(y))\n\nnaive_logpdf (generic function with 1 method)\n\n\nthat was the equivalent of saying that \\(q(y) = p(\\log(y))\\). We left out a factor of \\(1/y\\)!\nIndeed, now we can define the correct logpdf function. Since everything is a logarithm here, instead of multiplying by \\(1/y\\) we subtract \\(\\log(y)\\):\n\nDistributions.logpdf(d::MyLogNormal, y) = logpdf(Normal(d.μ, d.σ), log(y)) - log(y)\n\nand check that it works:\n\nprintln(\"Sample : $(samples_lognormal[1])\")\nprintln(\"Expected : $(logpdf(LogNormal(), samples_lognormal[1]))\")\nprintln(\"Actual : $(logpdf(MyLogNormal(), samples_lognormal[1]))\")\n\nSample : 2.2331001636281114\nExpected : -2.0450477723405234\nActual : -2.0450477723405234\n\n\nThe same process can be applied to any kind of (invertible) transformation. If we have some transformation from \\(x\\) to \\(y\\), and the probability density functions of \\(x\\) and \\(y\\) are \\(p(x)\\) and \\(q(y)\\) respectively, then we have a general formula that:\n\\[q(y) = p(x) \\left| \\frac{\\mathrm{d}x}{\\mathrm{d}y} \\right|.\\]\nIn this case, we had \\(y = \\exp(x)\\), so \\(\\mathrm{d}x/\\mathrm{d}y = 1/y\\). (This equation is (11.5) in Bishop’s textbook.)\n\n\n\n\n\n\nNote\n\n\n\nThe absolute value here takes care of the case where \\(f\\) is a decreasing function, i.e., \\(f(x) > f(y)\\) when \\(x < y\\). You can try this out with the transformation \\(y = -\\exp(x)\\). If \\(a < b\\), then \\(-\\exp(a) > -\\exp(b)\\), and so you will have to swap the integration limits to ensure that the integral comes out positive.\n\n\nNote that \\(\\mathrm{d}y/\\mathrm{d}x\\) is equal to \\((\\mathrm{d}x/\\mathrm{d}y)^{-1}\\), so the formula above can also be written as:\n\\[q(y) \\left| \\frac{\\mathrm{d}y}{\\mathrm{d}x} \\right| = p(x).\\]", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Distributions and the Jacobian" ] }, { "objectID": "developers/transforms/distributions/index.html#the-jacobian", "href": "developers/transforms/distributions/index.html#the-jacobian", "title": "Distributions and the Jacobian", "section": "The Jacobian", "text": "The Jacobian\nIn general, we may have transforms that act on multivariate distributions: for example, something mapping \\(p(x_1, x_2)\\) to \\(q(y_1, y_2)\\). In this case, we need to extend the rule above by introducing what is known as the Jacobian matrix:\nIn this case, the rule above has to be extended by replacing the derivative \\(\\mathrm{d}x/\\mathrm{d}y\\) with the determinant of the inverse Jacobian matrix:\n\\[\\mathbf{J} = \\begin{pmatrix}\n\\partial y_1/\\partial x_1 & \\partial y_1/\\partial x_2 \\\\\n\\partial y_2/\\partial x_1 & \\partial y_2/\\partial x_2\n\\end{pmatrix}.\\]\nThis allows us to write the direct generalisation as:\n\\[q(y_1, y_2) \\left| \\det(\\mathbf{J}) \\right| = p(x_1, x_2),\\]\nor equivalently,\n\\[q(y_1, y_2) = p(x_1, x_2) \\left| \\det(\\mathbf{J}^{-1}) \\right|.\\]\nwhere \\(\\mathbf{J}^{-1}\\) is the inverse of the Jacobian matrix. This is the same as equation (11.9) in Bishop.\n\n\n\n\n\n\nNote\n\n\n\nInstead of inverting the original Jacobian matrix to get \\(\\mathbf{J}^{-1}\\), we could also use the Jacobian of the inverse function:\n\\[\\mathbf{J}_\\text{inv} = \\begin{pmatrix}\n\\partial x_1/\\partial y_1 & \\partial x_1/\\partial y_2 \\\\\n\\partial x_2/\\partial y_1 & \\partial x_2/\\partial y_2\n\\end{pmatrix}.\\]\nAs it turns out, these are entirely equivalent: the Jacobian of the inverse function is the inverse of the original Jacobian matrix.\n\n\nThe rest of this section will be devoted to an example to show that this works, and contains some slightly less pretty mathematics. If you are already suitably convinced by this stage, then you can skip the rest of this section. (Or if you prefer something more formal, the Wikipedia article on integration by substitution discusses the multivariate case as well.)\n\nAn example: the Box–Muller transform\nA motivating example where one might like to use a Jacobian is the Box–Muller transform, which is a technique for sampling from a normal distribution.\nThe Box–Muller transform works by first sampling two random variables from the uniform distribution between 0 and 1:\n\\[\\begin{align}\nx_1 &\\sim U(0, 1) \\\\\nx_2 &\\sim U(0, 1).\n\\end{align}\\]\nBoth of these have a probability density function of \\(p(x) = 1\\) for \\(0 < x \\leq 1\\), and 0 otherwise. Because they are independent, we can write that\n\\[p(x_1, x_2) = p(x_1) p(x_2) = \\begin{cases}\n1 & \\text{if } 0 < x_1 \\leq 1 \\text{ and } 0 < x_2 \\leq 1, \\\\\n0 & \\text{otherwise}.\n\\end{cases}\\]\nThe next step is to perform the transforms\n\\[\\begin{align}\ny_1 &= \\sqrt{-2 \\log(x_1)} \\cos(2\\pi x_2); \\\\\ny_2 &= \\sqrt{-2 \\log(x_1)} \\sin(2\\pi x_2),\n\\end{align}\\]\nand it turns out that with these transforms, both \\(y_1\\) and \\(y_2\\) are independent and normally distributed with mean 0 and standard deviation 1, i.e.\n\\[q(y_1, y_2) = \\frac{1}{2\\pi} \\exp{\\left(-\\frac{y_1^2}{2}\\right)} \\exp{\\left(-\\frac{y_2^2}{2}\\right)}.\\]\nHow can we show that this is the case?\nThere are many ways to work out the required calculus. Some are more elegant and some rather less so! One of the less headache-inducing ways is to define the intermediate variables:\n\\[r = \\sqrt{-2 \\log(x_1)}; \\quad \\theta = 2\\pi x_2,\\]\nfrom which we can see that \\(y_1 = r\\cos\\theta\\) and \\(y_2 = r\\sin\\theta\\), and hence\n\\[\\begin{align}\nx_1 &= \\exp{\\left(-\\frac{r^2}{2}\\right)} = \\exp{\\left(-\\frac{y_1^2}{2}\\right)}\\exp{\\left(-\\frac{y_2^2}{2}\\right)}; \\\\\nx_2 &= \\frac{\\theta}{2\\pi} = \\frac{1}{2\\pi} \\, \\arctan\\left(\\frac{y_2}{y_1}\\right).\n\\end{align}\\]\nThis lets us obtain the requisite partial derivatives in a way that doesn’t involve too much algebra. As an example, we have\n\\[\\frac{\\partial x_1}{\\partial y_1} = -y_1 \\exp{\\left(-\\frac{y_1^2}{2}\\right)}\\exp{\\left(-\\frac{y_2^2}{2}\\right)} = -y_1 x_1,\\]\n(where we used the product rule), and\n\\[\\frac{\\partial x_2}{\\partial y_1} = \\frac{1}{2\\pi} \\left(\\frac{1}{1 + (y_2/y_1)^2}\\right) \\left(-\\frac{y_2}{y_1^2}\\right),\\]\n(where we used the chain rule, and the derivative \\(\\mathrm{d}(\\arctan(a))/\\mathrm{d}a = 1/(1 + a^2)\\)).\nPutting together the Jacobian matrix, we have:\n\\[\\mathbf{J} = \\begin{pmatrix}\n-y_1 x_1 & -y_2 x_1 \\\\\n-cy_2/y_1^2 & c/y_1 \\\\\n\\end{pmatrix},\\]\nwhere \\(c = [2\\pi(1 + (y_2/y_1)^2)]^{-1}\\). The determinant of this matrix is\n\\[\\begin{align}\n\\det(\\mathbf{J}) &= -cx_1 - cx_1(y_2/y_1)^2 \\\\\n&= -cx_1\\left[1 + \\left(\\frac{y_2}{y_1}\\right)^2\\right] \\\\\n&= -\\frac{1}{2\\pi} x_1 \\\\\n&= -\\frac{1}{2\\pi}\\exp{\\left(-\\frac{y_1^2}{2}\\right)}\\exp{\\left(-\\frac{y_2^2}{2}\\right)},\n\\end{align}\\]\nComing right back to our probability density, we have that\n\\[\\begin{align}\nq(y_1, y_2) &= p(x_1, x_2) \\cdot |\\det(\\mathbf{J})| \\\\\n&= \\frac{1}{2\\pi}\\exp{\\left(-\\frac{y_1^2}{2}\\right)}\\exp{\\left(-\\frac{y_2^2}{2}\\right)},\n\\end{align}\\]\nas desired.\n\n\n\n\n\n\nNote\n\n\n\nWe haven’t yet explicitly accounted for the fact that \\(p(x_1, x_2)\\) is 0 if either \\(x_1\\) or \\(x_2\\) are outside the range \\((0, 1]\\). For example, if this constraint on \\(x_1\\) and \\(x_2\\) were to result in inaccessible values of \\(y_1\\) or \\(y_2\\), then \\(q(y_1, y_2)\\) should be 0 for those values. Formally, for the transformation \\(f: X \\to Y\\) where \\(X\\) is the unit square (i.e. \\(0 < x_1, x_2 \\leq 1\\)), \\(q(y_1, y_2)\\) should only take the above value for the image of \\(f\\), and anywhere outside of the image it should be 0.\nIn our case, the \\(\\log(x_1)\\) term in the transform varies between 0 and \\(\\infty\\), and the \\(\\cos(2\\pi x_2)\\) term ranges from \\(-1\\) to \\(1\\). Hence \\(y_1\\), which is the product of these two terms, ranges from \\(-\\infty\\) to \\(\\infty\\), and likewise for \\(y_2\\). So the image of \\(f\\) is the entire real plane, and we don’t have to worry about this.\n\n\nHaving seen the theory that underpins how distributions can be transformed, let’s now turn to how this is implemented in the Turing ecosystem.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Distributions and the Jacobian" ] } ]