[ { "objectID": "changelog.html", "href": "changelog.html", "title": "Changelog", "section": "", "text": "Add support for TensorBoardLogger.jl via AbstractMCMC.mcmc_callback. See the AbstractMCMC documentation for more details." }, { "objectID": "changelog.html#dynamicppl-0.39", "href": "changelog.html#dynamicppl-0.39", "title": "Changelog", "section": "DynamicPPL 0.39", "text": "DynamicPPL 0.39\nTuring.jl v0.42 brings with it all the underlying changes in DynamicPPL 0.39. Please see the DynamicPPL changelog for full details; in here we summarise only the changes that are most pertinent to end-users of Turing.jl.\n\nThread safety opt-in\nTuring.jl has supported threaded tilde-statements for a while now, as long as said tilde-statements are observations (i.e., likelihood terms). For example:\n@model function f(y)\n x ~ Normal()\n Threads.@threads for i in eachindex(y)\n y[i] ~ Normal(x)\n end\nend\nModels where tilde-statements or @addlogprob! are used in parallel require what we call ‘threadsafe evaluation’. In previous releases of Turing.jl, threadsafe evaluation was enabled whenever Julia was launched with more than one thread. However, this is an imprecise way of determining whether threadsafe evaluation is really needed. It causes performance degradation for models that do not actually need threadsafe evaluation, and generally led to ill-defined behaviour in various parts of the Turing codebase.\nIn Turing.jl v0.42, threadsafe evaluation is now opt-in. To enable threadsafe evaluation, after defining a model, you now need to call setthreadsafe(model, true) (note that this is not a mutating function, it returns a new model):\ny = randn(100)\nmodel = f(y)\nmodel = setthreadsafe(model, true)\nYou only need to do this if your model uses tilde-statements or @addlogprob! in parallel. You do not need to do this if:\n\nyour model has other kinds of parallelism but does not include tilde-statements inside;\nor you are using MCMCThreads() or MCMCDistributed() to sample multiple chains in parallel, but your model itself does not use parallelism.\n\nIf your model does include parallelised tilde-statements or @addlogprob! calls, and you evaluate it/sample from it without setting setthreadsafe(model, true), then you may get statistically incorrect results without any warnings or errors.\n\n\nFaster performance\nMany operations in DynamicPPL have been substantially sped up. You should find that anything that uses LogDensityFunction (i.e., HMC/NUTS samplers, optimisation) is faster in this release. Prior sampling should also be much faster than before.\n\n\npredict improvements\nIf you have a model that requires threadsafe evaluation (i.e., parallel observations), you can now use this with predict. Carrying on from the previous example, you can do:\nmodel = setthreadsafe(f(y), true)\nchain = sample(model, NUTS(), 1000)\n\npdn_model = f(fill(missing, length(y)))\npdn_model = setthreadsafe(pdn_model, true) # set threadsafe\npredictions = predict(pdn_model, chain) # generate new predictions in parallel\n\n\nLog-density names in chains\nWhen sampling from a Turing model, the resulting MCMCChains.Chains object now contains the log-joint, log-prior, and log-likelihood under the names :logjoint, :logprior, and :loglikelihood respectively. Previously, :logjoint would be stored under the name :lp.\n\n\nLog-evidence in chains\nWhen sampling using MCMCChains, the chain object will no longer have its chain.logevidence field set. Instead, you can calculate this yourself from the log-likelihoods stored in the chain. For SMC samplers, the log-evidence of the entire trajectory is stored in chain[:logevidence] (which is the same for every particle in the ‘chain’).\n\n\nTuring.Inference.Transition\nTuring.Inference.Transition(model, vi[, stats]) has been removed; you can directly replace this with DynamicPPL.ParamsWithStats(vi, model[, stats])." }, { "objectID": "changelog.html#advancedvi-0.6", "href": "changelog.html#advancedvi-0.6", "title": "Changelog", "section": "AdvancedVI 0.6", "text": "AdvancedVI 0.6\nTuring.jl v0.42 updates AdvancedVI.jl compatibility to 0.6 (we skipped the breaking 0.5 update as it does not introduce new features). AdvancedVI.jl@0.6 introduces major structural changes including breaking changes to the interface and multiple new features. The summary of the changes below are the things that affect the end-users of Turing. For a more comprehensive list of changes, please refer to the changelogs in AdvancedVI.\n\nBreaking changes\nA new level of interface for defining different variational algorithms has been introduced in AdvancedVI v0.5. As a result, the function Turing.vi now receives a keyword argument algorithm. The object algorithm <: AdvancedVI.AbstractVariationalAlgorithm should now contain all the algorithm-specific configurations. Therefore, keyword arguments of vi that were algorithm-specific such as objective, operator, averager and so on, have been moved as fields of the relevant <: AdvancedVI.AbstractVariationalAlgorithm structs.\nIn addition, the outputs also changed. Previously, vi returned both the last-iterate of the algorithm q and the iterate average q_avg. Now, for the algorithms running parameter averaging, only q_avg is returned. As a result, the number of returned values reduced from 4 to 3.\nFor example,\nq, q_avg, info, state = vi(\n model, q, n_iters; objective=RepGradELBO(10), operator=AdvancedVI.ClipScale()\n)\nis now\nq_avg, info, state = vi(\n model,\n q,\n n_iters;\n algorithm=KLMinRepGradDescent(adtype; n_samples=10, operator=AdvancedVI.ClipScale()),\n)\nSimilarly,\nvi(\n model,\n q,\n n_iters;\n objective=RepGradELBO(10; entropy=AdvancedVI.ClosedFormEntropyZeroGradient()),\n operator=AdvancedVI.ProximalLocationScaleEntropy(),\n)\nis now\nvi(model, q, n_iters; algorithm=KLMinRepGradProxDescent(adtype; n_samples=10))\nLastly, to obtain the last-iterate q of KLMinRepGradDescent, which is not returned in the new interface, simply select the averaging strategy to be AdvancedVI.NoAveraging(). That is,\nq, info, state = vi(\n model,\n q,\n n_iters;\n algorithm=KLMinRepGradDescent(\n adtype;\n n_samples=10,\n operator=AdvancedVI.ClipScale(),\n averager=AdvancedVI.NoAveraging(),\n ),\n)\nAdditionally,\n\nThe default hyperparameters of DoGand DoWG have been altered.\nThe deprecated AdvancedVI@0.2-era interface is now removed.\nestimate_objective now always returns the value to be minimized by the optimization algorithm. For example, for ELBO maximization algorithms, estimate_objective will return the negative ELBO. This is breaking change from the previous behavior where the ELBO was returned.\nThe initial value for the q_meanfield_gaussian, q_fullrank_gaussian, and q_locationscale have changed. Specificially, the default initial value for the scale matrix has been changed from I to 0.6*I.\nWhen using algorithms that expect to operate in unconstrained spaces, the user is now explicitly expected to provide a Bijectors.TransformedDistribution wrapping an unconstrained distribution. (Refer to the docstring of vi.)\n\n\n\nNew Features\nAdvancedVI@0.6 adds numerous new features including the following new VI algorithms:\n\nKLMinWassFwdBwd: Also known as “Wasserstein variational inference,” this algorithm minimizes the KL divergence under the Wasserstein-2 metric.\nKLMinNaturalGradDescent: This algorithm, also known as “online variational Newton,” is the canonical “black-box” natural gradient variational inference algorithm, which minimizes the KL divergence via mirror descent under the KL divergence as the Bregman divergence.\nKLMinSqrtNaturalGradDescent: This is a recent variant of KLMinNaturalGradDescent that operates in the Cholesky-factor parameterization of Gaussians instead of precision matrices.\nFisherMinBatchMatch: This algorithm called “batch-and-match,” minimizes the variation of the 2nd order Fisher divergence via a proximal point-type algorithm.\n\nAny of the new algorithms above can readily be used by simply swappin the algorithm keyword argument of vi. For example, to use batch-and-match:\nvi(model, q, n_iters; algorithm=FisherMinBatchMatch())" }, { "objectID": "changelog.html#external-sampler-interface", "href": "changelog.html#external-sampler-interface", "title": "Changelog", "section": "External sampler interface", "text": "External sampler interface\nThe interface for defining an external sampler has been reworked. In general, implementations of external samplers should now no longer need to depend on Turing. This is because the interface functions required have been shifted upstream to AbstractMCMC.jl.\nIn particular, you now only need to define the following functions:\n\nAbstractMCMC.step(rng::Random.AbstractRNG, model::AbstractMCMC.LogDensityModel, ::MySampler; kwargs...) (and also a method with state, and the corresponding step_warmup methods if needed)\nAbstractMCMC.getparams(::MySamplerState) -> Vector{<:Real}\nAbstractMCMC.getstats(::MySamplerState) -> NamedTuple\nAbstractMCMC.requires_unconstrained_space(::MySampler) -> Bool (default true)\n\nThis means that you only need to depend on AbstractMCMC.jl. As long as the above functions are defined correctly, Turing will be able to use your external sampler.\nThe Turing.Inference.isgibbscomponent(::MySampler) interface function still exists, but in this version the default has been changed to true, so you should not need to overload this." }, { "objectID": "changelog.html#optimisation-interface", "href": "changelog.html#optimisation-interface", "title": "Changelog", "section": "Optimisation interface", "text": "Optimisation interface\nThe Optim.jl interface has been removed (so you cannot call Optim.optimize directly on Turing models). You can use the maximum_likelihood or maximum_a_posteriori functions with an Optim.jl solver instead (via Optimization.jl: see https://docs.sciml.ai/Optimization/stable/optimization_packages/optim/ for documentation of the available solvers)." }, { "objectID": "changelog.html#internal-changes", "href": "changelog.html#internal-changes", "title": "Changelog", "section": "Internal changes", "text": "Internal changes\nThe constructors of OptimLogDensity have been replaced with a single constructor, OptimLogDensity(::DynamicPPL.LogDensityFunction)." }, { "objectID": "changelog.html#dynamicppl-0.38", "href": "changelog.html#dynamicppl-0.38", "title": "Changelog", "section": "DynamicPPL 0.38", "text": "DynamicPPL 0.38\nTuring.jl v0.41 brings with it all the underlying changes in DynamicPPL 0.38. Please see the DynamicPPL changelog for full details: in this section we only describe the changes that will directly affect end-users of Turing.jl.\n\nPerformance\nA number of functions such as returned and predict will have substantially better performance in this release.\n\n\nProductNamedTupleDistribution\nDistributions.ProductNamedTupleDistribution can now be used on the right-hand side of ~ in Turing models.\n\n\nInitial parameters\nInitial parameters for MCMC sampling must now be specified in a different form. You still need to use the initial_params keyword argument to sample, but the allowed values are different. For almost all samplers in Turing.jl (except Emcee) this should now be a DynamicPPL.AbstractInitStrategy.\nThere are three kinds of initialisation strategies provided out of the box with Turing.jl (they are exported so you can use these directly with using Turing):\n\nInitFromPrior(): Sample from the prior distribution. This is the default for most samplers in Turing.jl (if you don’t specify initial_params).\nInitFromUniform(a, b): Sample uniformly from [a, b] in linked space. This is the default for Hamiltonian samplers. If a and b are not specified it defaults to [-2, 2], which preserves the behaviour in previous versions (and mimics that of Stan).\nInitFromParams(p): Explicitly provide a set of initial parameters. Note: p must be either a NamedTuple or an AbstractDict{<:VarName}; it can no longer be a Vector. Parameters must be provided in unlinked space, even if the sampler later performs linking.\n\nFor this release of Turing.jl, you can also provide a NamedTuple or AbstractDict{<:VarName} and this will be automatically wrapped in InitFromParams for you. This is an intermediate measure for backwards compatibility, and will eventually be removed.\n\n\nThis change is made because Vectors are semantically ambiguous. It is not clear which element of the vector corresponds to which variable in the model, nor is it clear whether the parameters are in linked or unlinked space. Previously, both of these would depend on the internal structure of the VarInfo, which is an implementation detail. In contrast, the behaviour of AbstractDicts and NamedTuples is invariant to the ordering of variables and it is also easier for readers to understand which variable is being set to which value.\nIf you were previously using varinfo[:] to extract a vector of initial parameters, you can now use Dict(k => varinfo[k] for k in keys(varinfo) to extract a Dict of initial parameters.\nFor more details about initialisation you can also refer to the main TuringLang docs, and/or the DynamicPPL API docs.\n\n\nresume_from and loadstate\nThe resume_from keyword argument to sample is now removed. Instead of sample(...; resume_from=chain) you can use sample(...; initial_state=loadstate(chain)) which is entirely equivalent. loadstate is exported from Turing now instead of in DynamicPPL.\nNote that loadstate only works for MCMCChains.Chains. For FlexiChains users please consult the FlexiChains docs directly where this functionality is described in detail.\n\n\npointwise_logdensities\npointwise_logdensities(model, chn), pointwise_loglikelihoods(...), and pointwise_prior_logdensities(...) now return an MCMCChains.Chains object if chn is itself an MCMCChains.Chains object. The old behaviour of returning an OrderedDict is still available: you just need to pass OrderedDict as the third argument, i.e., pointwise_logdensities(model, chn, OrderedDict)." }, { "objectID": "changelog.html#initial-step-in-mcmc-sampling", "href": "changelog.html#initial-step-in-mcmc-sampling", "title": "Changelog", "section": "Initial step in MCMC sampling", "text": "Initial step in MCMC sampling\nHMC and NUTS samplers no longer take an extra single step before starting the chain. This means that if you do not discard any samples at the start, the first sample will be the initial parameters (which may be user-provided).\nNote that if the initial sample is included, the corresponding sampler statistics will be missing. Due to a technical limitation of MCMCChains.jl, this causes all indexing into MCMCChains to return Union{Float64, Missing} or similar. If you want the old behaviour, you can discard the first sample (e.g. using discard_initial=1)." }, { "objectID": "changelog.html#breaking-changes-1", "href": "changelog.html#breaking-changes-1", "title": "Changelog", "section": "Breaking changes", "text": "Breaking changes\nDynamicPPL 0.37\nTuring.jl v0.40 updates DynamicPPL compatibility to 0.37. The summary of the changes provided here is intended for end-users of Turing. If you are a package developer, or would otherwise like to understand these changes in-depth, please see the DynamicPPL changelog.\n\n@submodel is now completely removed; please use to_submodel.\nPrior and likelihood calculations are now completely separated in Turing. Previously, the log-density used to be accumulated in a single field and thus there was no clear way to separate prior and likelihood components.\n\n@addlogprob! f, where f is a float, now adds to the likelihood by default.\nYou can instead use @addlogprob! (; logprior=x, loglikelihood=y) to control which log-density component to add to.\nThis means that usage of PriorContext and LikelihoodContext is no longer needed, and these have now been removed.\n\nThe special __context__ variable has been removed. If you still need to access the evaluation context, it is now available as __model__.context.\n\nLog-density in chains\nWhen sampling from a Turing model, the resulting MCMCChains.Chains object now contains not only the log-joint (accessible via chain[:lp]) but also the log-prior and log-likelihood (chain[:logprior] and chain[:loglikelihood] respectively).\nThese values now correspond to the log density of the sampled variables exactly as per the model definition / user parameterisation and thus will ignore any linking (transformation to unconstrained space). For example, if the model is @model f() = x ~ LogNormal(), chain[:lp] would always contain the value of logpdf(LogNormal(), x) for each sampled value of x. Previously these values could be incorrect if linking had occurred: some samplers would return logpdf(Normal(), log(x)) i.e. the log-density with respect to the transformed distribution.\nGibbs sampler\nWhen using Turing’s Gibbs sampler, e.g. Gibbs(:x => MH(), :y => HMC(0.1, 20)), the conditioned variables (for example y during the MH step, or x during the HMC step) are treated as true observations. Thus the log-density associated with them is added to the likelihood. Previously these would effectively be added to the prior (in the sense that if LikelihoodContext was used they would be ignored). This is unlikely to affect users but we mention it here to be explicit. This change only affects the log probabilities as the Gibbs component samplers see them; the resulting chain will include the usual log prior, likelihood, and joint, as described above.\nParticle Gibbs\nPreviously, only ‘true’ observations (i.e., x ~ dist where x is a model argument or conditioned upon) would trigger resampling of particles. Specifically, there were two cases where resampling would not be triggered:\n\nCalls to @addlogprob!\nGibbs-conditioned variables: e.g. y in Gibbs(:x => PG(20), :y => MH())\n\nTuring 0.40 changes this such that both of the above cause resampling. (The second case follows from the changes to the Gibbs sampler, see above.)\nThis release also fixes a bug where, if the model ended with one of these statements, their contribution to the particle weight would be ignored, leading to incorrect results.\nThe changes above also mean that certain models that previously worked with PG-within-Gibbs may now error. Specifically this is likely to happen when the dimension of the model is variable. For example:\n@model function f()\n x ~ Bernoulli()\n if x\n y1 ~ Normal()\n else\n y1 ~ Normal()\n y2 ~ Normal()\n end\n # (some likelihood term...)\nend\nsample(f(), Gibbs(:x => PG(20), (:y1, :y2) => MH()), 100)\nThis sampler now cannot be used for this model because depending on which branch is taken, the number of observations will be different. To use PG-within-Gibbs, the number of observations that the PG component sampler sees must be constant. Thus, for example, this will still work if x, y1, and y2 are grouped together under the PG component sampler.\nIf you absolutely require the old behaviour, we recommend using Turing.jl v0.39, but also thinking very carefully about what the expected behaviour of the model is, and checking that Turing is sampling from it correctly (note that the behaviour on v0.39 may in general be incorrect because of the fact that Gibbs-conditioned variables did not trigger resampling). We would also welcome any GitHub issues highlighting such problems. Our support for dynamic models is incomplete and is liable to undergo further changes." }, { "objectID": "changelog.html#other-changes", "href": "changelog.html#other-changes", "title": "Changelog", "section": "Other changes", "text": "Other changes\n\nSampling using Prior() should now be about twice as fast because we now avoid evaluating the model twice on every iteration.\nTuring.Inference.Transition now has different fields. If t isa Turing.Inference.Transition, t.stat is always a NamedTuple, not nothing (if it genuinely has no information then it’s an empty NamedTuple). Furthermore, t.lp has now been split up into t.logprior and t.loglikelihood (see also ‘Log-density in chains’ section above)." }, { "objectID": "changelog.html#update-to-the-advancedvi-interface", "href": "changelog.html#update-to-the-advancedvi-interface", "title": "Changelog", "section": "Update to the AdvancedVI interface", "text": "Update to the AdvancedVI interface\nTuring’s variational inference interface was updated to match version 0.4 version of AdvancedVI.jl.\nAdvancedVI v0.4 introduces various new features:\n\nlocation-scale families with dense scale matrices,\nparameter-free stochastic optimization algorithms like DoG and DoWG,\nproximal operators for stable optimization,\nthe sticking-the-landing control variate for faster convergence, and\nthe score gradient estimator for non-differentiable targets.\n\nPlease see the Turing API documentation, and AdvancedVI’s documentation, for more details." }, { "objectID": "changelog.html#removal-of-turing.essential", "href": "changelog.html#removal-of-turing.essential", "title": "Changelog", "section": "Removal of Turing.Essential", "text": "Removal of Turing.Essential\nThe Turing.Essential module has been removed. Anything exported from there can be imported from either Turing or DynamicPPL." }, { "objectID": "changelog.html#addlogprob", "href": "changelog.html#addlogprob", "title": "Changelog", "section": "@addlogprob!", "text": "@addlogprob!\nThe @addlogprob! macro is now exported from Turing, making it officially part of the public interface." }, { "objectID": "changelog.html#dynamicppl-version", "href": "changelog.html#dynamicppl-version", "title": "Changelog", "section": "DynamicPPL version", "text": "DynamicPPL version\nDynamicPPL compatibility has been bumped to 0.36. This brings with it a number of changes: the ones most likely to affect you are submodel prefixing and conditioning. Variables in submodels are now represented correctly with field accessors. For example:\nusing Turing\n@model inner() = x ~ Normal()\n@model outer() = a ~ to_submodel(inner())\nkeys(VarInfo(outer())) now returns [@varname(a.x)] instead of [@varname(var\"a.x\")]\nFurthermore, you can now either condition on the outer model like outer() | (@varname(a.x) => 1.0), or the inner model like inner() | (@varname(x) => 1.0). If you use the conditioned inner model as a submodel, the conditioning will still apply correctly.\nPlease see the DynamicPPL release notes for fuller details." }, { "objectID": "changelog.html#gibbs-sampler", "href": "changelog.html#gibbs-sampler", "title": "Changelog", "section": "Gibbs sampler", "text": "Gibbs sampler\nTuring’s Gibbs sampler now allows for more complex VarNames, such as x[1] or x.a, to be used. For example, you can now do this:\n@model function f()\n x = Vector{Float64}(undef, 2)\n x[1] ~ Normal()\n return x[2] ~ Normal()\nend\nsample(f(), Gibbs(@varname(x[1]) => MH(), @varname(x[2]) => MH()), 100)\nPerformance for the cases which used to previously work (i.e. VarNames like x which only consist of a single symbol) is unaffected, and VarNames with only field accessors (e.g. x.a) should be equally fast. It is possible that VarNames with indexing (e.g. x[1]) may be slower (although this is still an improvement over not working at all!). If you find any cases where you think the performance is worse than it should be, please do file an issue." }, { "objectID": "changelog.html#breaking-changes-2", "href": "changelog.html#breaking-changes-2", "title": "Changelog", "section": "Breaking changes", "text": "Breaking changes\n\nGibbs constructors\n0.37 removes the old Gibbs constructors deprecated in 0.36.\n\n\nRemove Zygote support\nZygote is no longer officially supported as an automatic differentiation backend, and AutoZygote is no longer exported. You can continue to use Zygote by importing AutoZygote from ADTypes and it may well continue to work, but it is no longer tested and no effort will be expended to fix it if something breaks.\nMooncake is the recommended replacement for Zygote.\n\n\nDynamicPPL 0.35\nTuring.jl v0.37 uses DynamicPPL v0.35, which brings with it several breaking changes:\n\nThe right hand side of .~ must from now on be a univariate distribution.\nIndexing VarInfo objects by samplers has been removed completely.\nThe order in which nested submodel prefixes are applied has been reversed.\nThe arguments for the constructor of LogDensityFunction have changed. LogDensityFunction also now satisfies the LogDensityProblems interface, without needing a wrapper object.\n\nFor more details about all of the above, see the changelog of DynamicPPL here.\n\n\nExport list\nTuring.jl’s export list has been cleaned up a fair bit. This affects what is imported into your namespace when you do an unqualified using Turing. You may need to import things more explicitly than before.\n\nThe DynamicPPL and AbstractMCMC modules are no longer exported. You will need to import DynamicPPL or using DynamicPPL: DynamicPPL (likewise AbstractMCMC) yourself, which in turn means that they have to be made available in your project environment.\n@logprob_str and @prob_str have been removed following a long deprecation period.\nWe no longer re-export everything from Bijectors and Libtask. To get around this, add using Bijectors or using Libtask at the top of your script (but we recommend using more selective imports).\n\nWe no longer export Bijectors.ordered. If you were using ordered, even Bijectors does not (currently) export this. You will have to manually import it with using Bijectors: ordered.\n\n\nOn the other hand, we have added a few more exports:\n\nDynamicPPL.returned and DynamicPPL.prefix are exported (for use with submodels).\nLinearAlgebra.I is exported for convenience." }, { "objectID": "changelog.html#breaking-changes-3", "href": "changelog.html#breaking-changes-3", "title": "Changelog", "section": "Breaking changes", "text": "Breaking changes\n0.36.0 introduces a new Gibbs sampler. It’s been included in several previous releases as Turing.Experimental.Gibbs, but now takes over the old Gibbs sampler, which gets removed completely.\nThe new Gibbs sampler currently supports the same user-facing interface as the old one, but the old constructors have been deprecated, and will be removed in the future. Also, given that the internals have been completely rewritten in a very different manner, there may be accidental breakage that we haven’t anticipated. Please report any you find.\nGibbsConditional has also been removed. It was never very user-facing, but it was exported, so technically this is breaking.\nThe old Gibbs constructor relied on being called with several subsamplers, and each of the constructors of the subsamplers would take as arguments the symbols for the variables that they are to sample, e.g. Gibbs(HMC(:x), MH(:y)). This constructor has been deprecated, and will be removed in the future. The new constructor works by mapping symbols, VarNames, or iterables thereof to samplers, e.g. Gibbs(x=>HMC(), y=>MH()), Gibbs(@varname(x) => HMC(), @varname(y) => MH()), Gibbs((:x, :y) => NUTS(), :z => MH()). This allows more granular specification of which sampler to use for which variable.\nLikewise, the old constructor for calling one subsampler more often than another, Gibbs((HMC(0.01, 4, :x), 2), (MH(:y), 1)) has been deprecated. The new way to do this is to use RepeatSampler, also introduced at this version: Gibbs(@varname(x) => RepeatSampler(HMC(0.01, 4), 2), @varname(y) => MH())." }, { "objectID": "changelog.html#breaking-changes-4", "href": "changelog.html#breaking-changes-4", "title": "Changelog", "section": "Breaking changes", "text": "Breaking changes\nJulia 1.10 is now the minimum required version for Turing.\nTapir.jl has been removed and replaced with its successor, Mooncake.jl. You can use Mooncake.jl by passing adbackend=AutoMooncake(; config=nothing) to the relevant samplers.\nSupport for Tracker.jl as an AD backend has been removed." }, { "objectID": "changelog.html#breaking-changes-5", "href": "changelog.html#breaking-changes-5", "title": "Changelog", "section": "Breaking changes", "text": "Breaking changes\nThe following exported functions have been removed:\n\nconstrained_space\nget_parameter_bounds\noptim_objective\noptim_function\noptim_problem\n\nThe same functionality is now offered by the new exported functions\n\nmaximum_likelihood\nmaximum_a_posteriori" }, { "objectID": "tutorials/bayesian-neural-networks/index.html", "href": "tutorials/bayesian-neural-networks/index.html", "title": "Bayesian Neural Networks", "section": "", "text": "In this tutorial, we demonstrate how one can implement a Bayesian Neural Network using a combination of Turing and Lux, a suite of machine learning tools. We will use Lux to specify the neural network’s layers and Turing to implement the probabilistic inference, with the goal of implementing a classification algorithm.\nWe will begin with importing the relevant libraries.\nusing Turing\nusing FillArrays\nusing Lux\nusing Plots\nimport Mooncake\nusing Functors\n\nusing LinearAlgebra\nusing Random\nOur goal here is to use a Bayesian neural network to classify points in an artificial dataset. The code below generates data points arranged in a box-like pattern and displays a graph of the dataset we will be working with.\n# Number of points to generate\nN = 80\nM = round(Int, N / 4)\nrng = Random.default_rng()\nRandom.seed!(rng, 1234)\n\n# Generate artificial data\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nxt1s = Array([[x1s[i] + 0.5f0; x2s[i] + 0.5f0] for i in 1:M])\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nappend!(xt1s, Array([[x1s[i] - 5.0f0; x2s[i] - 5.0f0] for i in 1:M]))\n\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nxt0s = Array([[x1s[i] + 0.5f0; x2s[i] - 5.0f0] for i in 1:M])\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nappend!(xt0s, Array([[x1s[i] - 5.0f0; x2s[i] + 0.5f0] for i in 1:M]))\n\n# Store all the data for later\nxs = [xt1s; xt0s]\nts = [ones(2 * M); zeros(2 * M)]\n\n# Plot data points.\nfunction plot_data()\n x1 = map(e -> e[1], xt1s)\n y1 = map(e -> e[2], xt1s)\n x2 = map(e -> e[1], xt0s)\n y2 = map(e -> e[2], xt0s)\n\n Plots.scatter(x1, y1; color=\"red\", clim=(0, 1))\n return Plots.scatter!(x2, y2; color=\"blue\", clim=(0, 1))\nend\n\nplot_data()", "crumbs": [ "Get Started", "Tutorials", "Bayesian Neural Networks" ] }, { "objectID": "tutorials/bayesian-neural-networks/index.html#building-a-neural-network", "href": "tutorials/bayesian-neural-networks/index.html#building-a-neural-network", "title": "Bayesian Neural Networks", "section": "Building a Neural Network", "text": "Building a Neural Network\nThe next step is to define a feedforward neural network where we express our parameters as distributions, and not single points as with traditional neural networks. For this we will use Dense to define linear layers and compose them via Chain, both are neural network primitives from Lux. The network nn_initial we created has two hidden layers with tanh activations and one output layer with sigmoid (σ) activation, as shown below.\n\n\n\n\n\n\n\nG\n\nInput layer                   Hidden layers                  Output layer\n\ncluster_hidden1\n\n\n\ncluster_hidden2\n\n\n\ncluster_output\n\n\n\ncluster_input\n\n\n\n\ninput1\n\n\n\n\nhidden11\n\n\n\n\ninput1--hidden11\n\n\n\n\nhidden12\n\n\n\n\ninput1--hidden12\n\n\n\n\nhidden13\n\n\n\n\ninput1--hidden13\n\n\n\n\ninput2\n\n\n\n\ninput2--hidden11\n\n\n\n\ninput2--hidden12\n\n\n\n\ninput2--hidden13\n\n\n\n\nhidden21\n\n\n\n\nhidden11--hidden21\n\n\n\n\nhidden22\n\n\n\n\nhidden11--hidden22\n\n\n\n\nhidden12--hidden21\n\n\n\n\nhidden12--hidden22\n\n\n\n\nhidden13--hidden21\n\n\n\n\nhidden13--hidden22\n\n\n\n\noutput1\n\n\n\n\nhidden21--output1\n\n\n\n\nhidden22--output1\n\n\n\n\n\n\n\n\n\nThe nn_initial is an instance that acts as a function and can take data as inputs and output predictions. We will define distributions on the neural network parameters.\n\n# Construct a neural network using Lux\nnn_initial = Chain(Dense(2 => 3, tanh), Dense(3 => 2, tanh), Dense(2 => 1, σ))\n\n# Initialize the model weights and state\nps, st = Lux.setup(rng, nn_initial)\n\nLux.parameterlength(nn_initial) # number of parameters in NN\n\n20\n\n\nThe probabilistic model specification below creates a parameters variable, which has IID normal variables. The parameters vector represents all parameters of our neural net (weights and biases).\n\n# Create a regularization term and a Gaussian prior variance term.\nalpha = 0.09\nsigma = sqrt(1.0 / alpha)\n\n3.3333333333333335\n\n\nWe also define a function to construct a named tuple from a vector of sampled parameters. (We could use ComponentArrays here and broadcast to avoid doing this, but this way avoids introducing an extra dependency.)\n\nfunction vector_to_parameters(ps_new::AbstractVector, ps::NamedTuple)\n @assert length(ps_new) == Lux.parameterlength(ps)\n i = 1\n function get_ps(x)\n z = reshape(view(ps_new, i:(i + length(x) - 1)), size(x))\n i += length(x)\n return z\n end\n return fmap(get_ps, ps)\nend\n\nvector_to_parameters (generic function with 1 method)\n\n\nTo interface with external libraries it is often desirable to use the StatefulLuxLayer to automatically handle the neural network states.\n\nconst nn = StatefulLuxLayer{true}(nn_initial, nothing, st)\n\n# Specify the probabilistic model.\n@model function bayes_nn(xs, ts; sigma = sigma, ps = ps, nn = nn)\n # Sample the parameters\n nparameters = Lux.parameterlength(nn_initial)\n parameters ~ MvNormal(zeros(nparameters), Diagonal(abs2.(sigma .* ones(nparameters))))\n\n # Forward NN to make predictions\n preds = Lux.apply(nn, xs, f32(vector_to_parameters(parameters, ps)))\n\n # Observe each prediction.\n for i in eachindex(ts)\n ts[i] ~ Bernoulli(preds[i])\n end\nend\n\nbayes_nn (generic function with 2 methods)\n\n\nInference can now be performed by calling sample. We use the NUTS Hamiltonian Monte Carlo sampler here.\n\nsetprogress!(false)\n\n\n# Perform inference.\nn_iters = 2_000\nch = sample(bayes_nn(reduce(hcat, xs), ts), NUTS(; adtype=AutoMooncake()), n_iters);\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.8\n\n\n\n\nNow we extract the parameter samples from the sampled chain as θ (this is of size 5000 x 20 where 5000 is the number of iterations and 20 is the number of parameters). We’ll use these primarily to determine how good our model’s classifier is.\n\n# Extract all weight and bias parameters.\nθ = MCMCChains.group(ch, :parameters).value;", "crumbs": [ "Get Started", "Tutorials", "Bayesian Neural Networks" ] }, { "objectID": "tutorials/bayesian-neural-networks/index.html#prediction-visualization", "href": "tutorials/bayesian-neural-networks/index.html#prediction-visualization", "title": "Bayesian Neural Networks", "section": "Prediction Visualization", "text": "Prediction Visualization\nWe can use MAP estimation to classify our population by using the set of weights that provided the highest log posterior.\n\n# A helper to run the nn through data `x` using parameters `θ`\nnn_forward(x, θ) = nn(x, vector_to_parameters(θ, ps))\n\n# Plot the data we have.\nfig = plot_data()\n\n# Find the index that provided the highest log posterior in the chain.\n_, i = findmax(ch[:logjoint])\n\n# Extract the max row value from i.\ni = i.I[1]\n\n# Plot the posterior distribution with a contour plot\nx1_range = collect(range(-6; stop=6, length=25))\nx2_range = collect(range(-6; stop=6, length=25))\nZ = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]\ncontour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)\nfig\n\n\n\n\nThe contour plot above shows that the MAP method is not too bad at classifying our data.\nNow we can visualise our predictions.\n\\[\np(\\tilde{x} | X, \\alpha) = \\int_{\\theta} p(\\tilde{x} | \\theta) p(\\theta | X, \\alpha) \\approx \\sum_{\\theta \\sim p(\\theta | X, \\alpha)}f_{\\theta}(\\tilde{x})\n\\]\nThe nn_predict function takes the average predicted value from a network parameterized by weights drawn from the MCMC chain.\n\n# Return the average predicted value across\n# multiple weights.\nfunction nn_predict(x, θ, num)\n num = min(num, size(θ, 1)) # make sure num does not exceed the number of samples\n return mean([first(nn_forward(x, view(θ, i, :))) for i in 1:10:num])\nend\n\nnn_predict (generic function with 1 method)\n\n\nNext, we use the nn_predict function to predict the value at a sample of points where the x1 and x2 coordinates range between -6 and 6. As we can see below, we still have a satisfactory fit to our data, and more importantly, we can also see where the neural network is uncertain about its predictions much easier—those regions between cluster boundaries.\n\n# Plot the average prediction.\nfig = plot_data()\n\nn_end = 1500\nx1_range = collect(range(-6; stop=6, length=25))\nx2_range = collect(range(-6; stop=6, length=25))\nZ = [nn_predict([x1, x2], θ, n_end)[1] for x1 in x1_range, x2 in x2_range]\ncontour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)\nfig\n\n\n\n\nSuppose we are interested in how the predictive power of our Bayesian neural network evolved between samples. In that case, the following graph displays an animation of the contour plot generated from the network weights in samples 1 to 1,000.\n\n# Number of iterations to plot.\nn_end = 500\n\nanim = @gif for i in 1:n_end\n plot_data()\n Z = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]\n contour!(x1_range, x2_range, Z; title=\"Iteration $i\", clim=(0, 1))\nend every 5\n\n\n[ Info: Saved animation to /tmp/jl_Sh8bWvEXaP.gif\n\n\n\n\n\n\n\nThis has been an introduction to the applications of Turing and Lux in defining Bayesian neural networks.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Neural Networks" ] }, { "objectID": "tutorials/multinomial-logistic-regression/index.html", "href": "tutorials/multinomial-logistic-regression/index.html", "title": "Multinomial Logistic Regression", "section": "", "text": "Multinomial logistic regression is an extension of logistic regression. Logistic regression is used to model problems in which there are exactly two possible discrete outcomes. Multinomial logistic regression is used to model problems in which there are two or more possible discrete outcomes.\nIn our example, we’ll be using the iris dataset. The iris multiclass problem aims to predict the species of a flower given measurements (in centimetres) of sepal length and width and petal length and width. There are three possible species: Iris setosa, Iris versicolor, and Iris virginica.\nTo start, let’s import all the libraries we’ll need.\n# Load Turing.\nusing Turing\n\n# Load RDatasets.\nusing RDatasets\n\n# Load StatsPlots for visualisations and diagnostics.\nusing StatsPlots\n\n# Functionality for splitting and normalising the data.\nusing MLDataUtils: shuffleobs, splitobs, rescale!\n\n# We need a softmax function which is provided by NNlib.\nusing NNlib: softmax\n\n# Functionality for constructing arrays with identical elements efficiently.\nusing FillArrays\n\n# Functionality for working with scaled identity matrices.\nusing LinearAlgebra\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(0);", "crumbs": [ "Get Started", "Tutorials", "Multinomial Logistic Regression" ] }, { "objectID": "tutorials/multinomial-logistic-regression/index.html#data-cleaning-set-up", "href": "tutorials/multinomial-logistic-regression/index.html#data-cleaning-set-up", "title": "Multinomial Logistic Regression", "section": "Data Cleaning & Set Up", "text": "Data Cleaning & Set Up\nNow we’re going to import our dataset. Twenty rows of the dataset are shown below so you can get a good feel for what kind of data we have.\n\n# Import the \"iris\" dataset.\ndata = RDatasets.dataset(\"datasets\", \"iris\");\n\n# Show twenty random rows.\ndata[rand(1:size(data, 1), 20), :]\n\n20×5 DataFrame\n\n\n\nRow\nSepalLength\nSepalWidth\nPetalLength\nPetalWidth\nSpecies\n\n\n\nFloat64\nFloat64\nFloat64\nFloat64\nCat…\n\n\n\n\n1\n5.0\n2.0\n3.5\n1.0\nversicolor\n\n\n2\n5.4\n3.7\n1.5\n0.2\nsetosa\n\n\n3\n7.2\n3.0\n5.8\n1.6\nvirginica\n\n\n4\n4.8\n3.0\n1.4\n0.1\nsetosa\n\n\n5\n5.7\n2.8\n4.1\n1.3\nversicolor\n\n\n6\n5.1\n3.5\n1.4\n0.3\nsetosa\n\n\n7\n5.4\n3.9\n1.3\n0.4\nsetosa\n\n\n8\n7.6\n3.0\n6.6\n2.1\nvirginica\n\n\n9\n5.0\n3.5\n1.6\n0.6\nsetosa\n\n\n10\n5.0\n3.6\n1.4\n0.2\nsetosa\n\n\n11\n5.5\n2.4\n3.8\n1.1\nversicolor\n\n\n12\n6.1\n2.6\n5.6\n1.4\nvirginica\n\n\n13\n4.4\n3.0\n1.3\n0.2\nsetosa\n\n\n14\n7.0\n3.2\n4.7\n1.4\nversicolor\n\n\n15\n6.1\n2.9\n4.7\n1.4\nversicolor\n\n\n16\n7.7\n3.0\n6.1\n2.3\nvirginica\n\n\n17\n6.4\n2.7\n5.3\n1.9\nvirginica\n\n\n18\n5.1\n3.3\n1.7\n0.5\nsetosa\n\n\n19\n6.7\n3.1\n4.7\n1.5\nversicolor\n\n\n20\n6.2\n2.2\n4.5\n1.5\nversicolor\n\n\n\n\n\n\nIn this data set, the outcome Species is currently coded as a string. We convert it to a numerical value by using indices 1, 2, and 3 to indicate species setosa, versicolor, and virginica, respectively.\n\n# Recode the `Species` column.\nspecies = [\"setosa\", \"versicolor\", \"virginica\"]\ndata[!, :Species_index] = indexin(data[!, :Species], species)\n\n# Show twenty random rows of the new species columns\ndata[rand(1:size(data, 1), 20), [:Species, :Species_index]]\n\n20×2 DataFrame\n\n\n\nRow\nSpecies\nSpecies_index\n\n\n\nCat…\nUnion…\n\n\n\n\n1\nsetosa\n1\n\n\n2\nversicolor\n2\n\n\n3\nversicolor\n2\n\n\n4\nsetosa\n1\n\n\n5\nversicolor\n2\n\n\n6\nversicolor\n2\n\n\n7\nversicolor\n2\n\n\n8\nversicolor\n2\n\n\n9\nvirginica\n3\n\n\n10\nversicolor\n2\n\n\n11\nvirginica\n3\n\n\n12\nsetosa\n1\n\n\n13\nsetosa\n1\n\n\n14\nversicolor\n2\n\n\n15\nsetosa\n1\n\n\n16\nsetosa\n1\n\n\n17\nvirginica\n3\n\n\n18\nversicolor\n2\n\n\n19\nvirginica\n3\n\n\n20\nvirginica\n3\n\n\n\n\n\n\nAfter we’ve done that tidying, it’s time to split our dataset into training and testing sets, and separate the features and target from the data. Additionally, we must rescale our feature variables so that they are centred around zero by subtracting each column by the mean and dividing it by the standard deviation. This standardisation improves sampler efficiency by ensuring all features are on comparable scales.\n\n# Split our dataset 50%/50% into training/test sets.\ntrainset, testset = splitobs(shuffleobs(data), 0.5)\n\n# Define features and target.\nfeatures = [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]\ntarget = :Species_index\n\n# Turing requires data in matrix and vector form.\ntrain_features = Matrix(trainset[!, features])\ntest_features = Matrix(testset[!, features])\ntrain_target = trainset[!, target]\ntest_target = testset[!, target]\n\n# Standardise the features.\nμ, σ = rescale!(train_features; obsdim=1)\nrescale!(test_features, μ, σ; obsdim=1);", "crumbs": [ "Get Started", "Tutorials", "Multinomial Logistic Regression" ] }, { "objectID": "tutorials/multinomial-logistic-regression/index.html#model-declaration", "href": "tutorials/multinomial-logistic-regression/index.html#model-declaration", "title": "Multinomial Logistic Regression", "section": "Model Declaration", "text": "Model Declaration\nFinally, we can define our model logistic_regression. It is a function that takes three arguments where\n\nx is our set of independent variables;\ny is the element we want to predict;\nσ is the standard deviation we want to assume for our priors.\n\nWe select the setosa species as the baseline class (the choice does not matter). Then we create the intercepts and vectors of coefficients for the other classes against that baseline. More concretely, we create scalar intercepts intercept_versicolor and intersept_virginica and coefficient vectors coefficients_versicolor and coefficients_virginica with four coefficients each for the features SepalLength, SepalWidth, PetalLength and PetalWidth. We assume a normal distribution with mean zero and standard deviation σ as prior for each scalar parameter. We want to find the posterior distribution of these, in total ten, parameters to be able to predict the species for any given set of features.\n\n# Bayesian multinomial logistic regression\n@model function logistic_regression(x, y, σ)\n n = size(x, 1)\n length(y) == n ||\n throw(DimensionMismatch(\"number of observations in `x` and `y` is not equal\"))\n\n # Priors of intercepts and coefficients.\n intercept_versicolor ~ Normal(0, σ)\n intercept_virginica ~ Normal(0, σ)\n coefficients_versicolor ~ MvNormal(Zeros(4), σ^2 * I)\n coefficients_virginica ~ MvNormal(Zeros(4), σ^2 * I)\n\n # Compute the likelihood of the observations.\n values_versicolor = intercept_versicolor .+ x * coefficients_versicolor\n values_virginica = intercept_virginica .+ x * coefficients_virginica\n for i in 1:n\n # the 0 corresponds to the base category `setosa`\n v = softmax([0, values_versicolor[i], values_virginica[i]])\n y[i] ~ Categorical(v)\n end\nend;", "crumbs": [ "Get Started", "Tutorials", "Multinomial Logistic Regression" ] }, { "objectID": "tutorials/multinomial-logistic-regression/index.html#sampling", "href": "tutorials/multinomial-logistic-regression/index.html#sampling", "title": "Multinomial Logistic Regression", "section": "Sampling", "text": "Sampling\nNow we can run our sampler. This time we’ll use NUTS to sample from our posterior.\n\nsetprogress!(false)\n\n\nm = logistic_regression(train_features, train_target, 1)\nchain = sample(m, NUTS(), MCMCThreads(), 1_500, 3)\n\n\n\nChains MCMC chain (1500×24×3 Array{Float64, 3}):\n\nIterations = 751:1:2250\nNumber of chains = 3\nSamples per chain = 1500\nWall duration = 18.14 seconds\nCompute duration = 13.29 seconds\nparameters = intercept_versicolor, intercept_virginica, coefficients_versicolor[1], coefficients_versicolor[2], coefficients_versicolor[3], coefficients_versicolor[4], coefficients_virginica[1], coefficients_virginica[2], coefficients_virginica[3], coefficients_virginica[4]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\n\n\n\n\n\nWarningSampling With Multiple Threads\n\n\n\n\n\nThe sample() call above assumes that you have at least nchains threads available in your Julia instance. If you do not, the multiple chains will run sequentially, and you may notice a warning. For more information, see the Turing documentation on sampling multiple chains.\n\n\n\nSince we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.\n\nplot(chain)\n\n\n\n\nLooks good!\nWe can also use the corner function from MCMCChains to show the distributions of the various parameters of our multinomial logistic regression. The corner function requires MCMCChains and StatsPlots.\n\n# Only plotting the first 3 coefficients due to a bug in Plots.jl\ncorner(\n chain,\n MCMCChains.namesingroup(chain, :coefficients_versicolor)[1:3];\n)\n\n\n\n\n\n# Only plotting the first 3 coefficients due to a bug in Plots.jl\ncorner(\n chain,\n MCMCChains.namesingroup(chain, :coefficients_virginica)[1:3];\n)\n\n\n\n\nFortunately the corner plots appear to demonstrate unimodal distributions for each of our parameters, so it should be straightforward to take the means of each parameter’s sampled values to estimate our model to make predictions.", "crumbs": [ "Get Started", "Tutorials", "Multinomial Logistic Regression" ] }, { "objectID": "tutorials/multinomial-logistic-regression/index.html#making-predictions", "href": "tutorials/multinomial-logistic-regression/index.html#making-predictions", "title": "Multinomial Logistic Regression", "section": "Making Predictions", "text": "Making Predictions\nHow do we test how well the model actually predicts which of the three classes an iris flower belongs to? We need to build a prediction function that takes the test dataset and runs it through the average parameter calculated during sampling.\nThe prediction function below takes a Matrix and a Chains object. It computes the mean of the sampled parameters and calculates the species with the highest probability for each observation. Note that we do not have to evaluate the softmax function since it does not affect the order of its inputs.\n\nfunction prediction(x::Matrix, chain)\n # Pull the means from each parameter's sampled values in the chain.\n intercept_versicolor = mean(chain, :intercept_versicolor)\n intercept_virginica = mean(chain, :intercept_virginica)\n coefficients_versicolor = [\n mean(chain, k) for k in MCMCChains.namesingroup(chain, :coefficients_versicolor)\n ]\n coefficients_virginica = [\n mean(chain, k) for k in MCMCChains.namesingroup(chain, :coefficients_virginica)\n ]\n\n # Compute the index of the species with the highest probability for each observation.\n values_versicolor = intercept_versicolor .+ x * coefficients_versicolor\n values_virginica = intercept_virginica .+ x * coefficients_virginica\n species_indices = [\n argmax((0, x, y)) for (x, y) in zip(values_versicolor, values_virginica)\n ]\n\n return species_indices\nend;\n\nLet’s see how we did! We run the test matrix through the prediction function, and compute the accuracy for our prediction.\n\n# Make the predictions.\npredictions = prediction(test_features, chain)\n\n# Calculate accuracy for our test set.\nmean(predictions .== testset[!, :Species_index])\n\n0.9066666666666666\n\n\nPerhaps more important is to see the accuracy per class.\n\nfor s in 1:3\n rows = testset[!, :Species_index] .== s\n println(\"Number of `\", species[s], \"`: \", count(rows))\n println(\n \"Percentage of `\",\n species[s],\n \"` predicted correctly: \",\n mean(predictions[rows] .== testset[rows, :Species_index]),\n )\nend\n\nNumber of `setosa`: 26\nPercentage of `setosa` predicted correctly: 1.0\nNumber of `versicolor`: 26\nPercentage of `versicolor` predicted correctly: 0.7692307692307693\nNumber of `virginica`: 23\nPercentage of `virginica` predicted correctly: 0.9565217391304348\n\n\nThis tutorial has demonstrated how to use Turing to perform Bayesian multinomial logistic regression.", "crumbs": [ "Get Started", "Tutorials", "Multinomial Logistic Regression" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html", "href": "tutorials/gaussian-mixture-models/index.html", "title": "Gaussian Mixture Models", "section": "", "text": "The following tutorial illustrates the use of Turing for an unsupervised task, namely, clustering data using a Bayesian mixture model. The aim of this task is to infer a latent grouping (hidden structure) from unlabelled data.", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#synthetic-data", "href": "tutorials/gaussian-mixture-models/index.html#synthetic-data", "title": "Gaussian Mixture Models", "section": "Synthetic Data", "text": "Synthetic Data\nWe generate a synthetic dataset of \\(N = 60\\) two-dimensional points \\(x_i \\in \\mathbb{R}^2\\) drawn from a Gaussian mixture model. For simplicity, we use \\(K = 2\\) clusters with\n\nequal weights, i.e., we use mixture weights \\(w = [0.5, 0.5]\\), and\nisotropic Gaussian distributions of the points in each cluster.\n\nMore concretely, we use the Gaussian distributions \\(\\mathcal{N}([\\mu_k, \\mu_k]^\\mathsf{T}, I)\\) with parameters \\(\\mu_1 = -3.5\\) and \\(\\mu_2 = 0.5\\).\n\nusing Distributions\nusing FillArrays\nusing StatsPlots\n\nusing LinearAlgebra\nusing Random\n\n# Set a random seed.\nRandom.seed!(3)\n\n# Define Gaussian mixture model.\nw = [0.5, 0.5]\nμ = [-2.0, 2.0]\nmixturemodel = MixtureModel([MvNormal(Fill(μₖ, 2), 0.2 * I) for μₖ in μ], w)\n\n# We draw the data points.\nN = 30\nx = rand(mixturemodel, N);\n\nThe following plot shows the dataset.\n\nscatter(x[1, :], x[2, :]; legend=false, title=\"Synthetic Dataset\")", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#gaussian-mixture-model-in-turing", "href": "tutorials/gaussian-mixture-models/index.html#gaussian-mixture-model-in-turing", "title": "Gaussian Mixture Models", "section": "Gaussian Mixture Model in Turing", "text": "Gaussian Mixture Model in Turing\nWe are interested in recovering the grouping from the dataset. More precisely, we want to infer the mixture weights, the parameters \\(\\mu_1\\) and \\(\\mu_2\\), and the assignment of each datum to a cluster for the generative Gaussian mixture model.\nIn a Bayesian Gaussian mixture model with \\(K\\) components, each data point \\(x_i\\) (\\(i = 1,\\ldots,N\\)) is generated according to the following generative process. First we draw the model parameters: the cluster means \\(\\mu_k\\) and the mixture weights \\(w\\) that determine the probability of each cluster. We use standard normal distributions as priors for \\(\\mu_k\\) and a Dirichlet distribution with parameters \\(\\alpha_1 = \\cdots = \\alpha_K = 1\\) as prior for \\(w\\): \\[\n\\begin{aligned}\n\\mu_k &\\sim \\mathcal{N}(0, 1) \\qquad (k = 1,\\ldots,K)\\\\\nw &\\sim \\operatorname{Dirichlet}(\\alpha_1, \\ldots, \\alpha_K)\n\\end{aligned}\n\\] After having constructed all the necessary model parameters, we can generate an observation by first selecting one of the clusters \\[\nz_i \\sim \\operatorname{Categorical}(w) \\qquad (i = 1,\\ldots,N),\n\\] and then drawing the datum accordingly, i.e., in our example drawing \\[\nx_i \\sim \\mathcal{N}([\\mu_{z_i}, \\mu_{z_i}]^\\mathsf{T}, I) \\qquad (i=1,\\ldots,N).\n\\] For more details on Gaussian mixture models, refer to Chapter 9 of Christopher M. Bishop, Pattern Recognition and Machine Learning.\nWe specify the model in Turing:\n\nusing Turing\n\n@model function gaussian_mixture_model(x)\n # Draw the parameters for each of the K=2 clusters from a standard normal distribution.\n K = 2\n μ ~ MvNormal(Zeros(K), I)\n\n # Draw the weights for the K clusters from a Dirichlet distribution with parameters αₖ = 1.\n w ~ Dirichlet(K, 1.0)\n # Alternatively, one could use a fixed set of weights.\n # w = fill(1/K, K)\n\n # Construct categorical distribution of assignments.\n distribution_assignments = Categorical(w)\n\n # Construct multivariate normal distributions of each cluster.\n D, N = size(x)\n distribution_clusters = [MvNormal(Fill(μₖ, D), I) for μₖ in μ]\n\n # Draw assignments for each datum and generate it from the multivariate normal distribution.\n k = Vector{Int}(undef, N)\n for i in 1:N\n k[i] ~ distribution_assignments\n x[:, i] ~ distribution_clusters[k[i]]\n end\n\n return k\nend\n\nmodel = gaussian_mixture_model(x);\n\nWe run a MCMC simulation to obtain an approximation of the posterior distribution of the parameters \\(\\mu\\) and \\(w\\) and assignments \\(k\\). We use a Gibbs sampler that combines a particle Gibbs sampler for the discrete parameters (assignments \\(k\\)) and a Hamiltonian Monte Carlo sampler for the continuous parameters (\\(\\mu\\) and \\(w\\)). We generate multiple chains in parallel using multi-threading.\n\nsampler = Gibbs(:k => PG(100), (:μ, :w) => HMC(0.05, 10))\nnsamples = 150\nnchains = 4\nburn = 10\nchains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = burn);\n\n\n\n\n\n\n\nWarningSampling With Multiple Threads\n\n\n\nThe sample() call above assumes that you have at least two threads available in your Julia instance. If you do not, the multiple chains will run sequentially, and you may notice a warning. For more information, see the Turing documentation on sampling multiple chains.", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#inferred-mixture-model", "href": "tutorials/gaussian-mixture-models/index.html#inferred-mixture-model", "title": "Gaussian Mixture Models", "section": "Inferred Mixture Model", "text": "Inferred Mixture Model\nAfter sampling we can visualise the trace and density of the parameters of interest.\nWe consider the samples of the location parameters \\(\\mu_1\\) and \\(\\mu_2\\) for the two clusters.\n\nplot(chains[[\"μ[1]\", \"μ[2]\"]]; legend=true)\n\n\n\n\nFrom the plots above, we can see that the chains have converged to seemingly different values for the parameters \\(\\mu_1\\) and \\(\\mu_2\\). However, these actually represent the same solution: it does not matter whether we assign \\(\\mu_1\\) to the first cluster and \\(\\mu_2\\) to the second, or vice versa, since the resulting sum is the same. (In principle it is also possible for the parameters to swap places within a single chain, although this does not happen in this example.) For more information see the Stan documentation, or Bishop’s book, where the concept of identifiability is discussed.\nHaving \\(\\mu_1\\) and \\(\\mu_2\\) swap can complicate the interpretation of the results, especially when different chains converge to different assignments. One solution here is to enforce an ordering on our \\(\\mu\\) vector, requiring \\(\\mu_k \\geq \\mu_{k-1}\\) for all \\(k\\). Bijectors.jl provides a convenient function, ordered(), which can be applied to a (continuous multivariate) distribution to enforce this:\n\nusing Bijectors: ordered\n\n@model function gaussian_mixture_model_ordered(x)\n # Draw the parameters for each of the K=2 clusters from a standard normal distribution.\n K = 2\n μ ~ ordered(MvNormal(Zeros(K), I))\n # Draw the weights for the K clusters from a Dirichlet distribution with parameters αₖ = 1.\n w ~ Dirichlet(K, 1.0)\n # Alternatively, one could use a fixed set of weights.\n # w = fill(1/K, K)\n # Construct categorical distribution of assignments.\n distribution_assignments = Categorical(w)\n # Construct multivariate normal distributions of each cluster.\n D, N = size(x)\n distribution_clusters = [MvNormal(Fill(μₖ, D), I) for μₖ in μ]\n # Draw assignments for each datum and generate it from the multivariate normal distribution.\n k = Vector{Int}(undef, N)\n for i in 1:N\n k[i] ~ distribution_assignments\n x[:, i] ~ distribution_clusters[k[i]]\n end\n return k\nend\n\nmodel = gaussian_mixture_model_ordered(x);\n\nNow, re-running our model, we can see that the assigned means are consistent between chains:\n\nchains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = burn);\n\n\nplot(chains[[\"μ[1]\", \"μ[2]\"]]; legend=true)\n\n\n\n\nWe also inspect the samples of the mixture weights \\(w\\).\n\nplot(chains[[\"w[1]\", \"w[2]\"]]; legend=true)\n\n\n\n\nAs the distributions of the samples for the parameters \\(\\mu_1\\), \\(\\mu_2\\), \\(w_1\\), and \\(w_2\\) are unimodal, we can safely visualise the density region of our model using the average values.\n\n# Model with mean of samples as parameters.\nμ_mean = [mean(chains, \"μ[$i]\") for i in 1:2]\nw_mean = [mean(chains, \"w[$i]\") for i in 1:2]\nmixturemodel_mean = MixtureModel([MvNormal(Fill(μₖ, 2), I) for μₖ in μ_mean], w_mean)\ncontour(\n range(-7.5, 3; length=1_000),\n range(-6.5, 3; length=1_000),\n (x, y) -> logpdf(mixturemodel_mean, [x, y]);\n widen=false,\n)\nscatter!(x[1, :], x[2, :]; legend=false, title=\"Synthetic Dataset\")", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#inferred-assignments", "href": "tutorials/gaussian-mixture-models/index.html#inferred-assignments", "title": "Gaussian Mixture Models", "section": "Inferred Assignments", "text": "Inferred Assignments\nFinally, we can inspect the assignments of the data points inferred using Turing. As we can see, the dataset is partitioned into two distinct groups.\n\nassignments = [mean(chains, \"k[$i]\") for i in 1:N]\nscatter(\n x[1, :],\n x[2, :];\n legend=false,\n title=\"Assignments on Synthetic Dataset\",\n zcolor=assignments,\n)", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#marginalizing-out-the-assignments", "href": "tutorials/gaussian-mixture-models/index.html#marginalizing-out-the-assignments", "title": "Gaussian Mixture Models", "section": "Marginalizing Out The Assignments", "text": "Marginalizing Out The Assignments\nWe can write out the marginal posterior of (continuous) \\(w, \\mu\\) by summing out the influence of our (discrete) assignments \\(z_i\\) from our likelihood:\n\\[p(y \\mid w, \\mu ) = \\sum_{k=1}^K w_k p_k(y \\mid \\mu_k)\\]\nIn our case, this gives us:\n\\[p(y \\mid w, \\mu) = \\sum_{k=1}^K w_k \\cdot \\operatorname{MvNormal}(y \\mid \\mu_k, I)\\]\n\nMarginalizing By Hand\nWe could implement the above version of the Gaussian mixture model in Turing as follows.\nFirst, Turing uses log-probabilities, so the likelihood above must be converted into log-space:\n\\[\\log \\left( p(y \\mid w, \\mu) \\right) = \\text{logsumexp} \\left[\\log (w_k) + \\log(\\operatorname{MvNormal}(y \\mid \\mu_k, I)) \\right]\\]\nWhere we sum the components with logsumexp from the LogExpFunctions.jl package. The manually incremented likelihood can be added to the log-probability with @addlogprob!, giving us the following model:\n\nusing LogExpFunctions\n\n@model function gmm_marginalized(x)\n K = 2\n D, N = size(x)\n μ ~ ordered(MvNormal(Zeros(K), I))\n w ~ Dirichlet(K, 1.0)\n dists = [MvNormal(Fill(μₖ, D), I) for μₖ in μ]\n for i in 1:N\n lvec = Vector(undef, K)\n for k in 1:K\n lvec[k] = (w[k] + logpdf(dists[k], x[:, i]))\n end\n @addlogprob! logsumexp(lvec)\n end\nend\n\n\n\n\n\n\n\nWarningManually Incrementing Probablity\n\n\n\nWhen possible, use of @addlogprob! should be avoided, as it exists outside the usual structure of a Turing model. In most cases, a custom distribution should be used instead.\nThe next section demonstrates the preferred method: using the MixtureModel distribution we have seen already to perform the marginalization automatically.\n\n\n\n\nMarginalizing For Free With Distribution.jl’s MixtureModel Implementation\nWe can use Turing’s ~ syntax with anything that Distributions.jl provides logpdf and rand methods for. It turns out that the MixtureModel distribution it provides has, as its logpdf method, logpdf(MixtureModel([Component_Distributions], weight_vector), Y), where Y can be either a single observation or vector of observations.\nIn fact, Distributions.jl provides many convenient constructors for mixture models, allowing further simplification in common special cases.\nFor example, when mixtures distributions are of the same type, one can write: ~ MixtureModel(Normal, [(μ1, σ1), (μ2, σ2)], w), or when the weight vector is known to allocate probability equally, it can be ommited.\nThe logpdf implementation for a MixtureModel distribution is exactly the marginalization defined above, and so our model can be simplified to:\n\n@model function gmm_marginalized(x)\n K = 2\n D, _ = size(x)\n μ ~ ordered(MvNormal(Zeros(K), I))\n w ~ Dirichlet(K, 1.0)\n x ~ MixtureModel([MvNormal(Fill(μₖ, D), I) for μₖ in μ], w)\nend\nmodel = gmm_marginalized(x);\n\nAs we have summed out the discrete components, we can perform inference using NUTS() alone.\n\nsampler = NUTS()\nchains = sample(model, sampler, MCMCThreads(), nsamples, nchains; discard_initial = burn);\n\nNUTS() significantly outperforms our compositional Gibbs sampler, in large part because our model is now Rao-Blackwellized thanks to the marginalization of our assignment parameter.\n\nplot(chains[[\"μ[1]\", \"μ[2]\"]], legend=true)", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/gaussian-mixture-models/index.html#inferred-assignments-with-the-marginalized-model", "href": "tutorials/gaussian-mixture-models/index.html#inferred-assignments-with-the-marginalized-model", "title": "Gaussian Mixture Models", "section": "Inferred Assignments With The Marginalized Model", "text": "Inferred Assignments With The Marginalized Model\nAs we have summed over possible assignments, the latent parameter representing the assignments is no longer available in our chain. This is not a problem, however, as given any fixed sample \\((\\mu, w)\\), the assignment probability \\(p(z_i \\mid y_i)\\) can be recovered using Bayes’s theorme:\n\\[p(z_i \\mid y_i) = \\frac{p(y_i \\mid z_i) p(z_i)}{\\sum_{k = 1}^K \\left(p(y_i \\mid z_i) p(z_i) \\right)}\\]\nThis quantity can be computed for every \\(p(z = z_i \\mid y_i)\\), resulting in a probability vector, which is then used to sample posterior predictive assignments from a categorial distribution. For details on the mathematics here, see the Stan documentation on latent discrete parameters.\n\nfunction sample_class(xi, dists, w)\n lvec = [(logpdf(d, xi) + log(w[i])) for (i, d) in enumerate(dists)]\n rand(Categorical(softmax(lvec)))\nend\n\n@model function gmm_recover(x)\n K = 2\n D, N = size(x)\n μ ~ ordered(MvNormal(Zeros(K), I))\n w ~ Dirichlet(K, 1.0)\n dists = [MvNormal(Fill(μₖ, D), I) for μₖ in μ]\n x ~ MixtureModel(dists, w)\n # Return assignment draws for each datapoint.\n return [sample_class(x[:, i], dists, w) for i in 1:N]\nend\n\nWe sample from this model as before:\n\nmodel = gmm_recover(x)\nchains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = burn);\n\nGiven a sample from the marginalized posterior, these assignments can be recovered with:\n\nassignments = mean(returned(gmm_recover(x), chains));\n\n\nscatter(\n x[1, :],\n x[2, :];\n legend=false,\n title=\"Assignments on Synthetic Dataset - Recovered\",\n zcolor=assignments,\n)", "crumbs": [ "Get Started", "Tutorials", "Gaussian Mixture Models" ] }, { "objectID": "tutorials/bayesian-time-series-analysis/index.html", "href": "tutorials/bayesian-time-series-analysis/index.html", "title": "Bayesian Time Series Analysis", "section": "", "text": "In time series analysis we are often interested in understanding how various real-life circumstances impact our quantity of interest. These can be, for instance, season, day of week, or time of day. To analyse this it is useful to decompose time series into simpler components (corresponding to relevant circumstances) and infer their relevance. In this tutorial we are going to use Turing for time series analysis and learn about useful ways to decompose time series.\n\nModelling time series\nBefore we start coding, let us talk about what exactly we mean with time series decomposition. In a nutshell, it is a divide-and-conquer approach where we express a time series as a sum or a product of simpler series. For instance, the time series \\(f(t)\\) can be decomposed into a sum of \\(n\\) components\n\\[f(t) = \\sum_{i=1}^n f_i(t),\\]\nor we can decompose \\(g(t)\\) into a product of \\(m\\) components\n\\[g(t) = \\prod_{i=1}^m g_i(t).\\]\nWe refer to this as additive or multiplicative decomposition respectively. This type of decomposition allows us to reason about individual components separately, which simplifies encoding prior information and interpreting model predictions. Two common components are trends, which represent the overall change of the time series (often assumed to be linear), and cyclic effects, which contribute periodic oscillations around the trend. Let us simulate some data with an additive linear trend and oscillating effects.\n\nusing Turing\nusing FillArrays\nusing StatsPlots\n\nusing LinearAlgebra\nusing Random\nusing Statistics\n\nRandom.seed!(12345)\n\ntrue_sin_freq = 2\ntrue_sin_amp = 5\ntrue_cos_freq = 7\ntrue_cos_amp = 2.5\ntmax = 10\nβ_true = 2\nα_true = -1\ntt = 0:0.05:tmax\nf₁(t) = α_true + β_true * t\nf₂(t) = true_sin_amp * sinpi(2 * t * true_sin_freq / tmax)\nf₃(t) = true_cos_amp * cospi(2 * t * true_cos_freq / tmax)\nf(t) = f₁(t) + f₂(t) + f₃(t)\n\nplot(f, tt; label=\"f(t)\", title=\"Observed time series\", legend=:topleft, linewidth=3)\nplot!(\n [f₁, f₂, f₃],\n tt;\n label=[\"f₁(t)\" \"f₂(t)\" \"f₃(t)\"],\n style=[:dot :dash :dashdot],\n linewidth=1,\n)\n\n\n\n\nEven though we use simple components, combining them can give rise to fairly complex time series. In this time series, cyclic effects are just added on top of the trend. If we instead multiply the components the cyclic effects cause the series to oscillate between larger and larger values, since they get scaled by the trend.\n\ng(t) = f₁(t) * f₂(t) * f₃(t)\n\nplot(g, tt; label=\"f(t)\", title=\"Observed time series\", legend=:topleft, linewidth=3)\nplot!([f₁, f₂, f₃], tt; label=[\"f₁(t)\" \"f₂(t)\" \"f₃(t)\"], linewidth=1)\n\n\n\n\nUnlike \\(f\\), \\(g\\) oscillates around \\(0\\) since it is being multiplied with sines and cosines. To let a multiplicative decomposition oscillate around the trend we could define it as \\(\\tilde{g}(t) = f₁(t) * (1 + f₂(t)) * (1 + f₃(t)),\\) but for convenience we will leave it as is. The inference machinery is the same for both cases.\n\n\nModel fitting\nHaving discussed time series decomposition, let us fit a model to the time series above and recover the true parameters. Before building our model, we standardise the time axis to \\([0, 1]\\) and subtract the max of the time series. This helps convergence while maintaining interpretability and the correct scales for the cyclic components.\n\nσ_true = 0.35\nt = collect(tt[begin:3:end])\nt_min, t_max = extrema(t)\nx = (t .- t_min) ./ (t_max - t_min)\nyf = f.(t) .+ σ_true .* randn(size(t))\nyf_max = maximum(yf)\nyf = yf .- yf_max\n\nscatter(x, yf; title=\"Standardised data\", legend=false)\n\n\n\n\nLet us now build our model. We want to assume a linear trend, and cyclic effects. Encoding a linear trend is easy enough, but what about cyclical effects? We will take a scattergun approach, and create multiple cyclical features using both sine and cosine functions and let our inference machinery figure out which to keep. To do this, we define how long a one period should be, and create features in reference to said period. How long a period should be is problem dependent, but as an example let us say it is \\(1\\) year. If we then find evidence for a cyclic effect with a frequency of 2, that would mean a biannual effect. A frequency of 4 would mean quarterly etc. Since we are using synthetic data, we are simply going to let the period be 1, which is the entire length of the time series.\n\nfreqs = 1:10\nnum_freqs = length(freqs)\nperiod = 1\ncyclic_features = [sinpi.(2 .* freqs' .* x ./ period) cospi.(2 .* freqs' .* x ./ period)]\n\nplot_freqs = [1, 3, 5]\nfreq_ptl = plot(\n cyclic_features[:, plot_freqs];\n label=permutedims([\"sin(2π$(f)x)\" for f in plot_freqs]),\n title=\"Cyclical features subset\",\n)\n\n\n\n\nHaving constructed the cyclical features, we can finally build our model. The model we will implement looks like this\n\\[\nf(t) = \\alpha + \\beta_t t + \\sum_{i=1}^F \\beta_{\\sin{},i} \\sin{}(2\\pi f_i t) + \\sum_{i=1}^F \\beta_{\\cos{},i} \\cos{}(2\\pi f_i t),\n\\]\nwith a Gaussian likelihood \\(y \\sim \\mathcal{N}(f(t), \\sigma^2)\\). For convenience we are treating the cyclical feature weights \\(\\beta_{\\sin{},i}\\) and \\(\\beta_{\\cos{},i}\\) the same in code and weight them with \\(\\beta_c\\). And just because it is so easy, we parameterise our model with the operation with which to apply the cyclic effects. This lets us use the exact same code for both additive and multiplicative models. Finally, we plot prior predictive samples to make sure our priors make sense.\n\n@model function decomp_model(t, c, op)\n α ~ Normal(0, 10)\n βt ~ Normal(0, 2)\n βc ~ MvNormal(Zeros(size(c, 2)), I)\n σ ~ truncated(Normal(0, 0.1); lower=0)\n\n cyclic = c * βc\n trend = α .+ βt .* t\n μ = op(trend, cyclic)\n y ~ MvNormal(μ, σ^2 * I)\n return (; trend, cyclic)\nend\n\ny_prior_samples = mapreduce(hcat, 1:100) do _\n rand(decomp_model(t, cyclic_features, +)).y\nend\nplot(t, y_prior_samples; linewidth=1, alpha=0.5, color=1, label=\"\", title=\"Prior samples\")\nscatter!(t, yf; color=2, label=\"Data\")\n\n\n\n\nWith the model specified and with a reasonable prior we can now let Turing decompose the time series for us!\n\nusing MCMCChains: get_sections\n\nfunction mean_ribbon(samples)\n qs = quantile(samples)\n low = qs[:, Symbol(\"2.5%\")]\n up = qs[:, Symbol(\"97.5%\")]\n m = mean(samples)[:, :mean]\n return m, (m - low, up - m)\nend\n\nfunction plot_fit(x, y, decomp, ymax)\n trend = mapreduce(x -> x.trend, hcat, decomp)\n cyclic = mapreduce(x -> x.cyclic, hcat, decomp)\n\n trend_plt = plot(\n x,\n trend .+ ymax;\n color=1,\n label=nothing,\n alpha=0.2,\n title=\"Trend\",\n xlabel=\"Time\",\n ylabel=\"f₁(t)\",\n )\n ls = [ones(length(t)) t] \\ y\n α̂, β̂ = ls[1], ls[2:end]\n plot!(\n trend_plt,\n t,\n α̂ .+ t .* β̂ .+ ymax;\n label=\"Least squares trend\",\n color=5,\n linewidth=4,\n )\n\n scatter!(trend_plt, x, y .+ ymax; label=nothing, color=2, legend=:topleft)\n cyclic_plt = plot(\n x,\n cyclic;\n color=1,\n label=nothing,\n alpha=0.2,\n title=\"Cyclic effect\",\n xlabel=\"Time\",\n ylabel=\"f₂(t)\",\n )\n return trend_plt, cyclic_plt\nend\n\nmodel = decomp_model(x, cyclic_features, +) | (; y = yf)\nchain = sample(model, NUTS(), 2000, progress=false)\nyf_samples = predict(decondition(model), chain)\nm, conf = mean_ribbon(yf_samples)\npredictive_plt = plot(\n t,\n m .+ yf_max;\n ribbon=conf,\n label=\"Posterior density\",\n title=\"Posterior decomposition\",\n xlabel=\"Time\",\n ylabel=\"f(t)\",\n)\nscatter!(predictive_plt, t, yf .+ yf_max; color=2, label=\"Data\", legend=:topleft)\n\ndecomp = returned(model, chain)\ndecomposed_plt = plot_fit(t, yf, decomp, yf_max)\nplot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.0125\n\n\n\n\n\n\n\nInference is successful and the posterior beautifully captures the data. We see that the least squares linear fit deviates somewhat from the posterior trend. Since our model takes cyclic effects into account separately, we get a better estimate of the true overall trend than if we would have just fitted a line. But what frequency content did the model identify?\n\nfunction plot_cyclic_features(βsin, βcos)\n labels = reshape([\"freq = $i\" for i in freqs], 1, :)\n colors = collect(freqs)'\n style = reshape([i <= 10 ? :solid : :dash for i in 1:length(labels)], 1, :)\n sin_features_plt = density(\n βsin[:, :, 1];\n title=\"Sine features posterior\",\n label=labels,\n ylabel=\"Density\",\n xlabel=\"Weight\",\n color=colors,\n linestyle=style,\n legend=nothing,\n )\n cos_features_plt = density(\n βcos[:, :, 1];\n title=\"Cosine features posterior\",\n ylabel=\"Density\",\n xlabel=\"Weight\",\n label=nothing,\n color=colors,\n linestyle=style,\n )\n\n return seasonal_features_plt = plot(\n sin_features_plt,\n cos_features_plt;\n layout=(2, 1),\n size=(800, 600),\n legend=:outerright,\n )\nend\n\nβc = Array(group(chain, :βc))\nplot_cyclic_features(βc[:, begin:num_freqs, :], βc[:, (num_freqs + 1):end, :])\n\n\n\n\nPlotting the posterior over the cyclic features reveals that the model managed to extract the true frequency content.\nSince we wrote our model to accept a combining operator, we can easily run the same analysis for a multiplicative model.\n\nyg = g.(t) .+ σ_true .* randn(size(t))\n\ny_prior_samples = mapreduce(hcat, 1:100) do _\n rand(decomp_model(t, cyclic_features, .*)).y\nend\nplot(t, y_prior_samples; linewidth=1, alpha=0.5, color=1, label=\"\", title=\"Prior samples\")\nscatter!(t, yf; color=2, label=\"Data\")\n\n\n\n\n\nmodel = decomp_model(x, cyclic_features, .*) | (; y = yg)\nchain = sample(model, NUTS(), 2000, progress=false)\nyg_samples = predict(decondition(model), chain)\nm, conf = mean_ribbon(yg_samples)\npredictive_plt = plot(\n t,\n m;\n ribbon=conf,\n label=\"Posterior density\",\n title=\"Posterior decomposition\",\n xlabel=\"Time\",\n ylabel=\"g(t)\",\n)\nscatter!(predictive_plt, t, yg; color=2, label=\"Data\", legend=:topleft)\n\ndecomp = returned(model, chain)\ndecomposed_plt = plot_fit(t, yg, decomp, 0)\nplot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.00625\n\n\n\n\n\n\n\nThe model fits! What about the inferred cyclic components?\n\nβc = Array(group(chain, :βc))\nplot_cyclic_features(βc[:, begin:num_freqs, :], βc[:, (num_freqs + 1):end, :])\n\n\n\n\nWhile a multiplicative model does manage to fit the data, it does not recover the true parameters for this dataset.\n\n\nWrapping up\nIn this tutorial we have seen how to implement and fit time series models using additive and multiplicative decomposition. We also saw how to visualise the model fit, and how to interpret learned cyclical components.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Tutorials", "Bayesian Time Series Analysis" ] }, { "objectID": "tutorials/variational-inference/index.html", "href": "tutorials/variational-inference/index.html", "title": "Variational Inference", "section": "", "text": "This post will look at variational inference (VI), an optimisation approach to approximate Bayesian inference, and how to use it in Turing.jl as an alternative to other approaches such as MCMC. This post will focus on the usage of VI in Turing rather than the principles and theory underlying VI. If you are interested in understanding the mathematics you can checkout our write-up or any other resource online (there are a lot of great ones).\nLet’s start with a minimal example. Consider a Turing.Model, which we denote as model. Approximating the posterior associated with model via VI is as simple as\nThus, it’s no more work than standard MCMC sampling in Turing. The default algorithm uses stochastic gradient descent to minimise the (exclusive) KL divergence. This approach is commonly referred to as automatic differentiation variational inference (ADVI)1, stochastic gradient VI2, and black-box variational inference3 with the reparameterization gradient456.\nTo get a bit more into what we can do with VI, let’s look at a more concrete example. We will reproduce the tutorial on Bayesian linear regression using VI instead of MCMC. After that, we will discuss how to customise the behaviour of vi for more advanced usage.\nLet’s first import the relevant packages:\nusing Random\nusing Turing\nusing Turing: Variational\nusing AdvancedVI\nusing Plots\n\nRandom.seed!(42);", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#bayesian-linear-regression-example", "href": "tutorials/variational-inference/index.html#bayesian-linear-regression-example", "title": "Variational Inference", "section": "Bayesian Linear Regression Example", "text": "Bayesian Linear Regression Example\nLet’s start by setting up our example. We will re-use the Bayesian linear regression example. As we’ll see, there is really no additional work required to apply variational inference to a more complex Model.\n\nusing FillArrays\nusing RDatasets\n\nusing LinearAlgebra\n\n# Import the \"Default\" dataset.\ndata = RDatasets.dataset(\"datasets\", \"mtcars\");\n\n# Show the first six rows of the dataset.\nfirst(data, 6)\n\n6×12 DataFrame\n\n\n\nRow\nModel\nMPG\nCyl\nDisp\nHP\nDRat\nWT\nQSec\nVS\nAM\nGear\nCarb\n\n\n\nString31\nFloat64\nInt64\nFloat64\nInt64\nFloat64\nFloat64\nFloat64\nInt64\nInt64\nInt64\nInt64\n\n\n\n\n1\nMazda RX4\n21.0\n6\n160.0\n110\n3.9\n2.62\n16.46\n0\n1\n4\n4\n\n\n2\nMazda RX4 Wag\n21.0\n6\n160.0\n110\n3.9\n2.875\n17.02\n0\n1\n4\n4\n\n\n3\nDatsun 710\n22.8\n4\n108.0\n93\n3.85\n2.32\n18.61\n1\n1\n4\n1\n\n\n4\nHornet 4 Drive\n21.4\n6\n258.0\n110\n3.08\n3.215\n19.44\n1\n0\n3\n1\n\n\n5\nHornet Sportabout\n18.7\n8\n360.0\n175\n3.15\n3.44\n17.02\n0\n0\n3\n2\n\n\n6\nValiant\n18.1\n6\n225.0\n105\n2.76\n3.46\n20.22\n1\n0\n3\n1\n\n\n\n\n\n\n\n# Function to split samples.\nfunction split_data(df, at=0.70)\n r = size(df, 1)\n index = Int(round(r * at))\n train = df[1:index, :]\n test = df[(index + 1):end, :]\n return train, test\nend\n\n# A handy helper function to rescale our dataset.\nfunction standardise(x)\n return (x .- mean(x; dims=1)) ./ std(x; dims=1)\nend\n\nfunction standardise(x, orig)\n return (x .- mean(orig; dims=1)) ./ std(orig; dims=1)\nend\n\n# Another helper function to unstandardize our datasets.\nfunction unstandardize(x, orig)\n return x .* std(orig; dims=1) .+ mean(orig; dims=1)\nend\n\nfunction unstandardize(x, mean_train, std_train)\n return x .* std_train .+ mean_train\nend\n\nunstandardize (generic function with 2 methods)\n\n\n\n# Remove the model column.\nselect!(data, Not(:Model))\n\n# Split our dataset 70%/30% into training/test sets.\ntrain, test = split_data(data, 0.7)\ntrain_unstandardized = copy(train)\n\n# Standardise both datasets.\nstd_train = standardise(Matrix(train))\nstd_test = standardise(Matrix(test), Matrix(train))\n\n# Save dataframe versions of our dataset.\ntrain_cut = DataFrame(std_train, names(data))\ntest_cut = DataFrame(std_test, names(data))\n\n# Create our labels. These are the values we are trying to predict.\ntrain_label = train_cut[:, :MPG]\ntest_label = test_cut[:, :MPG]\n\n# Get the list of columns to keep.\nremove_names = filter(x -> !in(x, [\"MPG\"]), names(data))\n\n# Filter the test and train sets.\ntrain = Matrix(train_cut[:, remove_names]);\ntest = Matrix(test_cut[:, remove_names]);\n\n\n# Bayesian linear regression.\n@model function linear_regression(x, y, n_obs, n_vars, ::Type{T}=Vector{Float64}) where {T}\n # Set variance prior.\n σ² ~ truncated(Normal(0, 100); lower=0)\n\n # Set intercept prior.\n intercept ~ Normal(0, 3)\n\n # Set the priors on our coefficients.\n coefficients ~ MvNormal(Zeros(n_vars), 10.0 * I)\n\n # Calculate all the mu terms.\n mu = intercept .+ x * coefficients\n return y ~ MvNormal(mu, σ² * I)\nend;\n\n\nn_obs, n_vars = size(train)\nm = linear_regression(train, train_label, n_obs, n_vars);", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#basic-usage", "href": "tutorials/variational-inference/index.html#basic-usage", "title": "Variational Inference", "section": "Basic Usage", "text": "Basic Usage\nTo run VI, we must first set a variational family. For instance, the most commonly used family is the mean-field Gaussian family. For this, Turing provides functions that automatically construct the initialisation corresponding to the model m:\n\nq_init = q_meanfield_gaussian(m);\n\nvi will automatically recognise the variational family through the type of q_init. Here is a detailed documentation for the constructor:\n\n@doc(Variational.q_meanfield_gaussian)\n\nq_meanfield_gaussian(\n [rng::Random.AbstractRNG,]\n model::DynamicPPL.Model;\n location::Union{Nothing,<:AbstractVector} = nothing,\n scale::Union{Nothing,<:Diagonal} = nothing,\n kwargs...\n)\nFind a numerically non-degenerate mean-field Gaussian q for approximating the target model.\nIf the scale set as nothing, the default value will be a zero-mean Gaussian with a Diagonal scale matrix (the \"mean-field\" approximation) no larger than 0.6*I (covariance of 0.6^2*I). This guarantees that the samples from the initial variational approximation will fall in the range of (-2, 2) with 99.9% probability, which mimics the behavior of the Turing.InitFromUniform() strategy. Whether the default choice is used or not, the scale may be adjusted via q_initialize_scale so that the log-densities of model are finite over the samples from q.\nArguments\n\nmodel: The target DynamicPPL.Model.\n\n\nKeyword Arguments\n\nlocation: The location parameter of the initialization. If nothing, a vector of zeros is used.\n\nscale: The scale parameter of the initialization. If nothing, an identity matrix is used.\n\n\nThe remaining keyword arguments are passed to q_locationscale.\nReturns\n\nq::Bijectors.TransformedDistribution: A AdvancedVI.LocationScale distribution matching the support of model.\n\n\n\n\n\n\n\nAs we can see, the precise initialisation can be customized through the keyword arguments.\nLet’s run VI with the default setting:\n\nn_iters = 1000\nq_avg, info, state = vi(m, q_init, n_iters; show_progress=false);\n\n\n[ Info: The capability of the supplied target `LogDensityProblem` LogDensityProblems.LogDensityOrder{1}() is >= `LogDensityProblems.LogDensityOrder{1}()`. To make use of this, the `adtype` argument for AdvancedVI must be one of `AutoReverseDiff`, `AutoZygote`, `AutoMooncake`, or `AutoEnzyme` in reverse mode.\n\n\n\n\nThe default setting uses the AdvancedVI.RepGradELBO objective, which corresponds to a variant of what is known as automatic differentiation VI7 or stochastic gradient VI8 or black-box VI9 with the reparameterization gradient101112. The default optimiser we use is AdvancedVI.DoWG13 combined with a proximal operator. (The use of proximal operators with VI on a location-scale family is discussed in detail by J. Domke1415 and others16.) We will take a deeper look into the returned values and the keyword arguments in the following subsections. First, here is the full documentation for vi:\n\n@doc(Variational.vi)\n\nvi(\n [rng::Random.AbstractRNG,]\n model::DynamicPPL.Model,\n q,\n max_iter::Int;\n adtype::ADTypes.AbstractADType=DEFAULT_ADTYPE,\n algorithm::AdvancedVI.AbstractVariationalAlgorithm = KLMinRepGradProxDescent(\n adtype; n_samples=10\n ),\n show_progress::Bool = Turing.PROGRESS[],\n kwargs...\n)\nApproximate the target model via the variational inference algorithm algorithm by starting from the initial variational approximation q. This is a thin wrapper around AdvancedVI.optimize.\nIf the chosen variational inference algorithm operates in an unconstrained space, then the provided initial variational approximation q must be a Bijectors.TransformedDistribution of an unconstrained distribution. For example, the initialization supplied by q_meanfield_gaussian,q_fullrank_gaussian, q_locationscale.\nThe default algorithm, KLMinRepGradProxDescent (relevant docs), assumes q uses AdvancedVI.MvLocationScale, which can be constructed by invoking q_fullrank_gaussian or q_meanfield_gaussian. For other variational families, refer to the documentation of AdvancedVI to determine the best algorithm and other options.\nArguments\n\nmodel: The target DynamicPPL.Model.\n\nq: The initial variational approximation.\n\nmax_iter: Maximum number of steps.\n\nAny additional arguments are passed on to AdvancedVI.optimize.\n\n\nKeyword Arguments\n\nadtype: Automatic differentiation backend to be applied to the log-density. The default value for algorithm also uses this backend for differentiating the variational objective.\n\nalgorithm: Variational inference algorithm. The default is KLMinRepGradProxDescent, please refer to AdvancedVI docs for all the options.\n\nshow_progress: Whether to show the progress bar.\n\nunconstrained: Whether to transform the posterior to be unconstrained for running the variational inference algorithm. If true, then the output q will be wrapped into a Bijectors.TransformedDistribution with the transformation matching the support of the posterior. The default value depends on the chosen algorithm.\n\nAny additional keyword arguments are passed on to AdvancedVI.optimize.\n\n\nSee the docs of AdvancedVI.optimize for additional keyword arguments.\nReturns\n\nq: Output variational distribution of algorithm.\n\nstate: Collection of states used by algorithm. This can be used to resume from a past call to vi.\n\ninfo: Information generated while executing algorithm.", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#values-returned-by-vi", "href": "tutorials/variational-inference/index.html#values-returned-by-vi", "title": "Variational Inference", "section": "Values Returned by vi", "text": "Values Returned by vi\nThe main output of the algorithm is q_avg, the average of the parameters generated by the optimisation algorithm. For computing q_avg, the default setting uses what is known as polynomial averaging17. Usually, q_avg will perform better than the last-iterate q_last, which cana be obtained by disabling averaging:\n\nq_last, _, _ = vi(\n m,\n q_init,\n n_iters;\n show_progress=false,\n algorithm=KLMinRepGradDescent(\n AutoForwardDiff();\n operator=AdvancedVI.ClipScale(),\n averager=AdvancedVI.NoAveraging()\n ),\n);\n\n\n[ Info: The capability of the supplied target `LogDensityProblem` LogDensityProblems.LogDensityOrder{1}() is >= `LogDensityProblems.LogDensityOrder{1}()`. To make use of this, the `adtype` argument for AdvancedVI must be one of `AutoReverseDiff`, `AutoZygote`, `AutoMooncake`, or `AutoEnzyme` in reverse mode.\n\n\n\n\nFor instance, we can compare the ELBO of the two:\n\n@info(\"Objective of q_avg and q_last\",\n ELBO_q_avg = estimate_objective(AdvancedVI.RepGradELBO(32), q_avg, LogDensityFunction(m)),\n ELBO_q_last = estimate_objective(AdvancedVI.RepGradELBO(32), q_last, LogDensityFunction(m))\n)\n\n\n┌ Info: Objective of q_avg and q_last\n│ ELBO_q_avg = 52.798415041106054\n└ ELBO_q_last = 61.02384427285732\n\n\n\n\nWe can see that ELBO_q_avg is slightly more optimal.\nNow, info contains information generated during optimisation that could be useful for diagnostics. For the default setting, which is RepGradELBO, it contains the ELBO estimated at each step, which can be plotted as follows:\n\nPlots.plot([i.elbo for i in info], xlabel=\"Iterations\", ylabel=\"ELBO\", label=\"info\")\n\n\n\n\nSince the ELBO is estimated by a small number of samples, it appears noisy. Furthermore, at each step, the ELBO is evaluated on q_last, not q_avg, which is the actual output that we care about. To obtain more accurate ELBO estimates evaluated on q_avg, we have to define a custom callback function.", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#custom-callback-functions", "href": "tutorials/variational-inference/index.html#custom-callback-functions", "title": "Variational Inference", "section": "Custom Callback Functions", "text": "Custom Callback Functions\nTo inspect the progress of optimisation in more detail, one can define a custom callback function. For example, the following callback function estimates the ELBO on q_avg every 10 steps with a larger number of samples:\n\nusing DynamicPPL: DynamicPPL\nlinked_vi = DynamicPPL.link!!(DynamicPPL.VarInfo(m), m);\n\nfunction callback(; iteration, averaged_params, restructure, kwargs...)\n if mod(iteration, 10) == 1\n q_avg = restructure(averaged_params)\n obj = AdvancedVI.RepGradELBO(128) # 128 samples for ELBO estimation\n elbo_avg = -estimate_objective(obj, q_avg, LogDensityFunction(m, DynamicPPL.getlogjoint_internal, linked_vi))\n (elbo_avg = elbo_avg,)\n else\n nothing\n end\nend;\n\nThe NamedTuple returned by callback will be appended to the corresponding entry of info, and it will also be displayed on the progress meter if show_progress is set as true.\nThe custom callback can be supplied to vi as a keyword argument:\n\nq_mf, info_mf, _ = vi(m, q_init, n_iters; show_progress=false, callback=callback);\n\n\n[ Info: The capability of the supplied target `LogDensityProblem` LogDensityProblems.LogDensityOrder{1}() is >= `LogDensityProblems.LogDensityOrder{1}()`. To make use of this, the `adtype` argument for AdvancedVI must be one of `AutoReverseDiff`, `AutoZygote`, `AutoMooncake`, or `AutoEnzyme` in reverse mode.\n\n\n\n\nLet’s plot the result:\n\niters = 1:10:length(info_mf)\nelbo_mf = [i.elbo_avg for i in info_mf[iters]]\nPlots.plot([i.elbo for i in info], xlabel=\"Iterations\", ylabel=\"ELBO\", label=\"info\", linewidth=0.4)\nPlots.plot!(iters, elbo_mf, xlabel=\"Iterations\", ylabel=\"ELBO\", label=\"callback\", ylims=(-200,Inf), linewidth=2)\n\n\n\n\nWe can see that the ELBO values are less noisy and progress more smoothly due to averaging.", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#using-different-optimisers", "href": "tutorials/variational-inference/index.html#using-different-optimisers", "title": "Variational Inference", "section": "Using Different Optimisers", "text": "Using Different Optimisers\nThe default optimiser we use is a proximal variant of DoWG18. For Gaussian variational families, this works well as a default option. Sometimes, the step size of AdvancedVI.DoWG could be too large, resulting in unstable behaviour. (In this case, we recommend trying AdvancedVI.DoG19) Or, for whatever reason, it might be desirable to use a different optimiser. Our implementation supports any optimiser that implements the Optimisers.jl interface.\nFor instance, let’s try using Optimisers.Adam20, which is a popular choice. Since AdvancedVI does not implement a proximal operator for Optimisers.Adam, we must use the AdvancedVI.ClipScale() projection operator, which ensures that the scale matrix of the variational approximation is positive definite. (See the paper by J. Domke 202021 for more detail about the use of a projection operator.)\n\nusing Optimisers\n\n_, info_adam, _ = vi(\n m, q_init, n_iters;\n show_progress=false,\n callback=callback,\n algorithm=KLMinRepGradDescent(AutoForwardDiff(); optimizer=Optimisers.Adam(3e-3), operator=ClipScale())\n);\n\n\n[ Info: The capability of the supplied target `LogDensityProblem` LogDensityProblems.LogDensityOrder{1}() is >= `LogDensityProblems.LogDensityOrder{1}()`. To make use of this, the `adtype` argument for AdvancedVI must be one of `AutoReverseDiff`, `AutoZygote`, `AutoMooncake`, or `AutoEnzyme` in reverse mode.\n\n\n\n\n\niters = 1:10:length(info_mf)\nelbo_adam = [i.elbo_avg for i in info_adam[iters]]\nPlots.plot(iters, elbo_mf, xlabel=\"Iterations\", ylabel=\"ELBO\", label=\"DoWG\")\nPlots.plot!(iters, elbo_adam, xlabel=\"Iterations\", ylabel=\"ELBO\", label=\"Adam\")\n\n\n\n\nCompared to the default option AdvancedVI.DoWG(), we can see that Optimisers.Adam(3e-3) is converging more slowly. With more step size tuning, it is possible that Optimisers.Adam could perform better or equal. That is, most common optimisers require some degree of tuning to perform better or comparably to AdvancedVI.DoWG() or AdvancedVI.DoG(), which do not require much tuning at all. Due to this fact, they are referred to as parameter-free optimizers.", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#using-full-rank-variational-families", "href": "tutorials/variational-inference/index.html#using-full-rank-variational-families", "title": "Variational Inference", "section": "Using Full-Rank Variational Families", "text": "Using Full-Rank Variational Families\nSo far, we have only used the mean-field Gaussian family. This, however, approximates the posterior covariance with a diagonal matrix. To model the full covariance matrix, we can use the full-rank Gaussian family2223:\n\nq_init_fr = q_fullrank_gaussian(m);\n\n\n@doc(Variational.q_fullrank_gaussian)\n\nq_fullrank_gaussian(\n [rng::Random.AbstractRNG,]\n model::DynamicPPL.Model;\n location::Union{Nothing,<:AbstractVector} = nothing,\n scale::Union{Nothing,<:LowerTriangular} = nothing,\n kwargs...\n)\nFind a numerically non-degenerate Gaussian q with a scale with full-rank factors (traditionally referred to as a \"full-rank family\") for approximating the target model.\nIf the scale set as nothing, the default value will be a zero-mean Gaussian with a LowerTriangular scale matrix (resulting in a covariance with \"full-rank\" factors) no larger than 0.6*I (covariance of 0.6^2*I). This guarantees that the samples from the initial variational approximation will fall in the range of (-2, 2) with 99.9% probability, which mimics the behavior of the Turing.InitFromUniform() strategy. Whether the default choice is used or not, the scale may be adjusted via q_initialize_scale so that the log-densities of model are finite over the samples from q.\nArguments\n\nmodel: The target DynamicPPL.Model.\n\n\nKeyword Arguments\n\nlocation: The location parameter of the initialization. If nothing, a vector of zeros is used.\n\nscale: The scale parameter of the initialization. If nothing, an identity matrix is used.\n\n\nThe remaining keyword arguments are passed to q_locationscale.\nReturns\n\nq::Bijectors.TransformedDistribution: A AdvancedVI.LocationScale distribution matching the support of model.\n\n\n\n\n\n\n\nThe term full-rank might seem a bit peculiar since covariance matrices are always full-rank. This term, however, traditionally comes from the fact that full-rank families use full-rank factors in addition to the diagonal of the covariance.\nIn contrast to the mean-field family, the full-rank family will often result in more computation per optimisation step and slower convergence, especially in high dimensions:\n\nq_fr, info_fr, _ = vi(m, q_init_fr, n_iters; show_progress=false, callback)\n\nPlots.plot(elbo_mf, xlabel=\"Iterations\", ylabel=\"ELBO\", label=\"Mean-Field\", ylims=(-200, Inf))\n\nelbo_fr = [i.elbo_avg for i in info_fr[iters]]\nPlots.plot!(elbo_fr, xlabel=\"Iterations\", ylabel=\"ELBO\", label=\"Full-Rank\", ylims=(-200, Inf))\n\n\n[ Info: The capability of the supplied target `LogDensityProblem` LogDensityProblems.LogDensityOrder{1}() is >= `LogDensityProblems.LogDensityOrder{1}()`. To make use of this, the `adtype` argument for AdvancedVI must be one of `AutoReverseDiff`, `AutoZygote`, `AutoMooncake`, or `AutoEnzyme` in reverse mode.\n\n\n\n\n\n\n\nHowever, we can see that the full-rank families achieve a higher ELBO in the end. Due to the relationship between the ELBO and the Kullback–Leibler divergence, this indicates that the full-rank covariance is much more accurate. This trade-off between statistical accuracy and optimisation speed is often referred to as the statistical-computational trade-off. The fact that we can control this trade-off through the choice of variational family is a strength, rather than a limitation, of variational inference.\nWe can also visualise the covariance matrix.\n\nheatmap(cov(rand(q_fr, 100_000), dims=2))", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#obtaining-summary-statistics", "href": "tutorials/variational-inference/index.html#obtaining-summary-statistics", "title": "Variational Inference", "section": "Obtaining Summary Statistics", "text": "Obtaining Summary Statistics\nLet’s inspect the resulting variational approximation in more detail and compare it against MCMC. To obtain summary statistics from VI, we can draw samples from the resulting variational approximation:\n\nz = rand(q_fr, 100_000);\n\nNow, we can, for example, look at expectations:\n\navg = vec(mean(z; dims=2))\n\n12-element Vector{Float64}:\n 0.29665249970140406\n -0.002014959072651458\n 0.37257594756248874\n -0.08706379452567733\n -0.09650423460261785\n 0.6029528091635894\n -0.01742810733672524\n 0.08540506979801704\n -0.06345151623877471\n 0.1334939457294853\n 0.1734491462355629\n -0.5947240224190139\n\n\nThe vector has the same ordering as the parameters in the model, e.g. in this case σ² has index 1, intercept has index 2 and coefficients has indices 3:12. If you forget or you might want to do something programmatically with the result, you can obtain the sym → indices mapping as follows:\n\nusing Bijectors: bijector\n\n_, sym2range = bijector(m, Val(true));\nsym2range\n\n(intercept = UnitRange{Int64}[2:2], σ² = UnitRange{Int64}[1:1], coefficients = UnitRange{Int64}[3:12])\n\n\nFor example, we can check the sample distribution and mean value of σ²:\n\nhistogram(z[1, :])\navg[union(sym2range[:σ²]...)]\n\n1-element Vector{Float64}:\n 0.29665249970140406\n\n\n\navg[union(sym2range[:intercept]...)]\n\n1-element Vector{Float64}:\n -0.002014959072651458\n\n\n\navg[union(sym2range[:coefficients]...)]\n\n10-element Vector{Float64}:\n 0.37257594756248874\n -0.08706379452567733\n -0.09650423460261785\n 0.6029528091635894\n -0.01742810733672524\n 0.08540506979801704\n -0.06345151623877471\n 0.1334939457294853\n 0.1734491462355629\n -0.5947240224190139\n\n\nFor further convenience, we can wrap the samples into a Chains object to summarise the results.\n\nvarinf = Turing.DynamicPPL.VarInfo(m)\nvns_and_values = Turing.DynamicPPL.varname_and_value_leaves(Turing.DynamicPPL.values_as(varinf, OrderedDict))\nvarnames = map(first, vns_and_values)\nvi_chain = Chains(reshape(z', (size(z,2), size(z,1), 1)), varnames)\n\nChains MCMC chain (100000×12×1 reshape(adjoint(::Matrix{Float64}), 100000, 12, 1) with eltype Float64):\n\nIterations = 1:1:100000\nNumber of chains = 1\nSamples per chain = 100000\nparameters = σ², intercept, coefficients[1], coefficients[2], coefficients[3], coefficients[4], coefficients[5], coefficients[6], coefficients[7], coefficients[8], coefficients[9], coefficients[10]\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n(Since we’re drawing independent samples, we can simply ignore the ESS and Rhat metrics.) Unfortunately, extracting varnames is a bit verbose at the moment, but hopefully will become simpler in the near future.\nLet’s compare this against samples from NUTS:\n\nmcmc_chain = sample(m, NUTS(), 10_000; progress=false);\n\nvi_mean = mean(vi_chain)[:, 2]\nmcmc_mean = mean(mcmc_chain, names(mcmc_chain, :parameters))[:, 2]\n\nplot(mcmc_mean; xticks=1:1:length(mcmc_mean), label=\"mean of NUTS\")\nplot!(vi_mean; label=\"mean of VI\")\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\n\n\n\n\n\nThat looks pretty good! But let’s see how the predictive distributions looks for the two.", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#making-predictions", "href": "tutorials/variational-inference/index.html#making-predictions", "title": "Variational Inference", "section": "Making Predictions", "text": "Making Predictions\nSimilarily to the linear regression tutorial, we’re going to compare to multivariate ordinary linear regression using the GLM package:\n\n# Import the GLM package.\nusing GLM\n\n# Perform multivariate OLS.\nols = lm(\n @formula(MPG ~ Cyl + Disp + HP + DRat + WT + QSec + VS + AM + Gear + Carb), train_cut\n)\n\n# Store our predictions in the original dataframe.\ntrain_cut.OLSPrediction = unstandardize(GLM.predict(ols), train_unstandardized.MPG)\ntest_cut.OLSPrediction = unstandardize(GLM.predict(ols, test_cut), train_unstandardized.MPG);\n\n\n# Make a prediction given an input vector, using mean parameter values from a chain.\nfunction prediction(chain, x)\n p = get_params(chain)\n α = mean(p.intercept)\n β = collect(mean.(p.coefficients))\n return α .+ x * β\nend\n\nprediction (generic function with 1 method)\n\n\n\n# Unstandardize the dependent variable.\ntrain_cut.MPG = unstandardize(train_cut.MPG, train_unstandardized.MPG)\ntest_cut.MPG = unstandardize(test_cut.MPG, train_unstandardized.MPG);\n\n\n# Show the first side rows of the modified dataframe.\nfirst(test_cut, 6)\n\n6×12 DataFrame\n\n\n\nRow\nMPG\nCyl\nDisp\nHP\nDRat\nWT\nQSec\nVS\nAM\nGear\nCarb\nOLSPrediction\n\n\n\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\nFloat64\n\n\n\n\n1\n15.2\n1.04746\n0.565102\n0.258882\n-0.652405\n0.0714991\n-0.716725\n-0.977008\n-0.598293\n-0.891883\n-0.469126\n19.8583\n\n\n2\n13.3\n1.04746\n0.929057\n1.90345\n0.380435\n0.465717\n-1.90403\n-0.977008\n-0.598293\n-0.891883\n1.11869\n16.0462\n\n\n3\n19.2\n1.04746\n1.32466\n0.691663\n-0.777058\n0.470584\n-0.873777\n-0.977008\n-0.598293\n-0.891883\n-0.469126\n18.5746\n\n\n4\n27.3\n-1.25696\n-1.21511\n-1.19526\n1.0037\n-1.38857\n0.288403\n0.977008\n1.59545\n1.07026\n-1.26303\n29.3233\n\n\n5\n26.0\n-1.25696\n-0.888346\n-0.762482\n1.62697\n-1.18903\n-1.09365\n-0.977008\n1.59545\n3.0324\n-0.469126\n30.7731\n\n\n6\n30.4\n-1.25696\n-1.08773\n-0.381634\n0.451665\n-1.79933\n-0.968007\n0.977008\n1.59545\n3.0324\n-0.469126\n25.2892\n\n\n\n\n\n\n\n# Construct the Chains from the Variational Approximations\nz_mf = rand(q_mf, 10_000);\nz_fr = rand(q_fr, 10_000);\n\nvi_mf_chain = Chains(reshape(z_mf', (size(z_mf,2), size(z_mf,1), 1)), varnames);\nvi_fr_chain = Chains(reshape(z_fr', (size(z_fr,2), size(z_fr,1), 1)), varnames);\n\n\n# Calculate the predictions for the training and testing sets using the samples `z` from variational posterior\ntrain_cut.VIMFPredictions = unstandardize(\n prediction(vi_mf_chain, train), train_unstandardized.MPG\n)\ntest_cut.VIMFPredictions = unstandardize(\n prediction(vi_mf_chain, test), train_unstandardized.MPG\n)\n\ntrain_cut.VIFRPredictions = unstandardize(\n prediction(vi_fr_chain, train), train_unstandardized.MPG\n)\ntest_cut.VIFRPredictions = unstandardize(\n prediction(vi_fr_chain, test), train_unstandardized.MPG\n)\n\ntrain_cut.BayesPredictions = unstandardize(\n prediction(mcmc_chain, train), train_unstandardized.MPG\n)\ntest_cut.BayesPredictions = unstandardize(\n prediction(mcmc_chain, test), train_unstandardized.MPG\n);\n\n\nvi_mf_loss1 = mean((train_cut.VIMFPredictions - train_cut.MPG) .^ 2)\nvi_fr_loss1 = mean((train_cut.VIFRPredictions - train_cut.MPG) .^ 2)\nbayes_loss1 = mean((train_cut.BayesPredictions - train_cut.MPG) .^ 2)\nols_loss1 = mean((train_cut.OLSPrediction - train_cut.MPG) .^ 2)\n\nvi_mf_loss2 = mean((test_cut.VIMFPredictions - test_cut.MPG) .^ 2)\nvi_fr_loss2 = mean((test_cut.VIFRPredictions - test_cut.MPG) .^ 2)\nbayes_loss2 = mean((test_cut.BayesPredictions - test_cut.MPG) .^ 2)\nols_loss2 = mean((test_cut.OLSPrediction - test_cut.MPG) .^ 2)\n\nprintln(\"Training set:\n VI Mean-Field loss: $vi_mf_loss1\n VI Full-Rank loss: $vi_fr_loss1\n Bayes loss: $bayes_loss1\n OLS loss: $ols_loss1\nTest set:\n VI Mean-Field loss: $vi_mf_loss2\n VI Full-Rank loss: $vi_fr_loss2\n Bayes loss: $bayes_loss2\n OLS loss: $ols_loss2\")\n\nTraining set:\n VI Mean-Field loss: 3.077618615638055\n VI Full-Rank loss: 3.0782554512668967\n Bayes loss: 3.071989269007453\n OLS loss: 3.0709261248930093\nTest set:\n VI Mean-Field loss: 25.957924714205177\n VI Full-Rank loss: 25.15654990831634\n Bayes loss: 26.451387399228828\n OLS loss: 27.09481307076057\n\n\nInterestingly the squared difference between true- and mean-prediction on the test-set is actually better for the full-rank variational posterior than for the “true” posterior obtained by MCMC sampling using NUTS. But, as Bayesians, we know that the mean doesn’t tell the entire story. One quick check is to look at the mean predictions ± standard deviation of the two different approaches:\n\npreds_vi_mf = mapreduce(hcat, 1:5:size(vi_mf_chain, 1)) do i\n return unstandardize(prediction(vi_mf_chain[i], test), train_unstandardized.MPG)\nend\n\np1 = scatter(\n 1:size(test, 1),\n mean(preds_vi_mf; dims=2);\n yerr=std(preds_vi_mf; dims=2),\n label=\"prediction (mean ± std)\",\n size=(900, 500),\n markersize=8,\n)\nscatter!(1:size(test, 1), unstandardize(test_label, train_unstandardized.MPG); label=\"true\")\nxaxis!(1:size(test, 1))\nylims!(10, 40)\ntitle!(\"VI Mean-Field\")\n\npreds_vi_fr = mapreduce(hcat, 1:5:size(vi_mf_chain, 1)) do i\n return unstandardize(prediction(vi_fr_chain[i], test), train_unstandardized.MPG)\nend\n\np2 = scatter(\n 1:size(test, 1),\n mean(preds_vi_fr; dims=2);\n yerr=std(preds_vi_fr; dims=2),\n label=\"prediction (mean ± std)\",\n size=(900, 500),\n markersize=8,\n)\nscatter!(1:size(test, 1), unstandardize(test_label, train_unstandardized.MPG); label=\"true\")\nxaxis!(1:size(test, 1))\nylims!(10, 40)\ntitle!(\"VI Full-Rank\")\n\npreds_mcmc = mapreduce(hcat, 1:5:size(mcmc_chain, 1)) do i\n return unstandardize(prediction(mcmc_chain[i], test), train_unstandardized.MPG)\nend\n\np3 = scatter(\n 1:size(test, 1),\n mean(preds_mcmc; dims=2);\n yerr=std(preds_mcmc; dims=2),\n label=\"prediction (mean ± std)\",\n size=(900, 500),\n markersize=8,\n)\nscatter!(1:size(test, 1), unstandardize(test_label, train_unstandardized.MPG); label=\"true\")\nxaxis!(1:size(test, 1))\nylims!(10, 40)\ntitle!(\"MCMC (NUTS)\")\n\nplot(p1, p2, p3; layout=(1, 3), size=(900, 250), label=\"\")\n\n\n\n\nWe can see that the full-rank VI approximation is very close to the predictions from MCMC samples. Also, the coverage of full-rank VI and MCMC is much better the crude mean-field approximation.", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/variational-inference/index.html#footnotes", "href": "tutorials/variational-inference/index.html#footnotes", "title": "Variational Inference", "section": "Footnotes", "text": "Footnotes\n\n\nKucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic differentiation variational inference. Journal of Machine Learning Research, 18(14).↩︎\nTitsias, M., & Lázaro-Gredilla, M. (2014). Doubly stochastic variational Bayes for non-conjugate inference. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nRanganath, R., Gerrish, S., & Blei, D. (2014). Black box variational inference. In Proceedings of the International Conference on Artificial intelligence and statistics. PMLR.↩︎\nKingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations.↩︎\nRezende, D. J., Mohamed, S., & Wierstra, D (2014). Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nTitsias, M., & Lázaro-Gredilla, M. (2014). Doubly stochastic variational Bayes for non-conjugate inference. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nKucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic differentiation variational inference. Journal of Machine Learning Research, 18(14).↩︎\nTitsias, M., & Lázaro-Gredilla, M. (2014). Doubly stochastic variational Bayes for non-conjugate inference. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nRanganath, R., Gerrish, S., & Blei, D. (2014). Black box variational inference. In Proceedings of the International Conference on Artificial intelligence and statistics. PMLR.↩︎\nKingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations.↩︎\nRezende, D. J., Mohamed, S., & Wierstra, D (2014). Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nTitsias, M., & Lázaro-Gredilla, M. (2014). Doubly stochastic variational Bayes for non-conjugate inference. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nKhaled, A., Mishchenko, K., & Jin, C. (2023). DoWG unleashed: An efficient universal parameter-free gradient descent method. In Advances in Neural Information Processing Systems, 36.↩︎\nDomke, J. (2020). Provable smoothness guarantees for black-box variational inference. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nDomke, J., Gower, R., & Garrigos, G. (2023). Provable convergence guarantees for black-box variational inference. In Advances in Neural Information Processing Systems, 36.↩︎\nKim, K., Oh, J., Wu, K., Ma, Y., & Gardner, J. (2023). On the convergence of black-box variational inference. In Advances in Neural Information Processing Systems, 36.↩︎\nShamir, O., & Zhang, T. (2013). Stochastic gradient descent for non-smooth optimisation: Convergence results and optimal averaging schemes. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nKhaled, A., Mishchenko, K., & Jin, C. (2023). DoWG unleashed: An efficient universal parameter-free gradient descent method. In Advances in Neural Information Processing Systems, 36.↩︎\nIvgi, M., Hinder, O., & Carmon, Y. (2023). DoG is SGD’s best friend: A parameter-free dynamic step size schedule. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nKingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimisation. In Proceedings of the International Conference on Learning Representations.↩︎\nDomke, J. (2020). Provable smoothness guarantees for black-box variational inference. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nTitsias, M., & Lázaro-Gredilla, M. (2014). Doubly stochastic variational Bayes for non-conjugate inference. In Proceedings of the International Conference on Machine Learning. PMLR.↩︎\nKucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic differentiation variational inference. Journal of Machine Learning Research, 18(14).↩︎", "crumbs": [ "Get Started", "Tutorials", "Variational Inference" ] }, { "objectID": "tutorials/infinite-mixture-models/index.html", "href": "tutorials/infinite-mixture-models/index.html", "title": "Infinite Mixture Models", "section": "", "text": "In many applications it is desirable to allow the model to adjust its complexity to the amount of data. Consider for example the task of assigning objects into clusters or groups. This task often involves the specification of the number of groups. However, often times it is not known beforehand how many groups exist. Moreover, in some applications, e.g. modelling topics in text documents or grouping species, the number of examples per group is heavy tailed. This makes it impossible to predefine the number of groups and requiring the model to form new groups when data points from previously unseen groups are observed.\nA natural approach for such applications is the use of non-parametric models, which can grow in complexity as more data are observed. This tutorial demonstrates how to use the Dirichlet process in a mixture of infinitely many Gaussians using Turing. For further information on Bayesian nonparametrics and the Dirichlet process, see the introduction by Zoubin Ghahramani and the book “Fundamentals of Nonparametric Bayesian Inference” by Subhashis Ghosal and Aad van der Vaart.\nusing Turing", "crumbs": [ "Get Started", "Tutorials", "Infinite Mixture Models" ] }, { "objectID": "tutorials/infinite-mixture-models/index.html#mixture-model", "href": "tutorials/infinite-mixture-models/index.html#mixture-model", "title": "Infinite Mixture Models", "section": "Mixture Model", "text": "Mixture Model\nBefore introducing infinite mixture models in Turing, we will briefly review the construction of finite mixture models. Subsequently, we will define how to use the Chinese restaurant process construction of a Dirichlet process for non-parametric clustering.\n\nTwo-Component Model\nFirst, consider the simple case of a mixture model with two Gaussian components with fixed covariance. The generative process of such a model can be written as:\n\\[\\begin{equation*}\n\\begin{aligned}\n\\pi_1 &\\sim \\mathrm{Beta}(a, b) \\\\\n\\pi_2 &= 1-\\pi_1 \\\\\n\\mu_1 &\\sim \\mathrm{Normal}(\\mu_0, \\Sigma_0) \\\\\n\\mu_2 &\\sim \\mathrm{Normal}(\\mu_0, \\Sigma_0) \\\\\nz_i &\\sim \\mathrm{Categorical}(\\pi_1, \\pi_2) \\\\\nx_i &\\sim \\mathrm{Normal}(\\mu_{z_i}, \\Sigma)\n\\end{aligned}\n\\end{equation*}\\]\nwhere \\(\\pi_1, \\pi_2\\) are the mixing weights of the mixture model, i.e. \\(\\pi_1 + \\pi_2 = 1\\), and \\(z_i\\) is a latent assignment of the observation \\(x_i\\) to a component (Gaussian).\nWe can implement this model in Turing for 1D data as follows:\n\n@model function two_model(x)\n # Hyper-parameters\n μ0 = 0.0\n σ0 = 1.0\n\n # Draw weights.\n π1 ~ Beta(1, 1)\n π2 = 1 - π1\n\n # Draw locations of the components.\n μ1 ~ Normal(μ0, σ0)\n μ2 ~ Normal(μ0, σ0)\n\n # Draw latent assignment.\n z ~ Categorical([π1, π2])\n\n # Draw observation from selected component.\n if z == 1\n x ~ Normal(μ1, 1.0)\n else\n x ~ Normal(μ2, 1.0)\n end\nend\n\ntwo_model (generic function with 2 methods)\n\n\n\n\nFinite Mixture Model\nIf we have more than two components, this model can elegantly be extended using a Dirichlet distribution as prior for the mixing weights \\(\\pi_1, \\dots, \\pi_K\\). Note that the Dirichlet distribution is the multivariate generalization of the beta distribution. The resulting model can be written as:\n\\[\n\\begin{align}\n(\\pi_1, \\dots, \\pi_K) &\\sim Dirichlet(K, \\alpha) \\\\\n\\mu_k &\\sim \\mathrm{Normal}(\\mu_0, \\Sigma_0), \\;\\; \\forall k \\\\\nz &\\sim Categorical(\\pi_1, \\dots, \\pi_K) \\\\\nx &\\sim \\mathrm{Normal}(\\mu_z, \\Sigma)\n\\end{align}\n\\]\nwhich resembles the model in the Gaussian mixture model tutorial with a slightly different notation.", "crumbs": [ "Get Started", "Tutorials", "Infinite Mixture Models" ] }, { "objectID": "tutorials/infinite-mixture-models/index.html#infinite-mixture-model", "href": "tutorials/infinite-mixture-models/index.html#infinite-mixture-model", "title": "Infinite Mixture Models", "section": "Infinite Mixture Model", "text": "Infinite Mixture Model\nThe question now arises, is there a generalization of a Dirichlet distribution for which the dimensionality \\(K\\) is infinite, i.e. \\(K = \\infty\\)?\nBut first, to implement an infinite Gaussian mixture model in Turing, we first need to load the Turing.RandomMeasures module. RandomMeasures contains a variety of tools useful in nonparametrics.\n\nusing Turing.RandomMeasures\n\nWe now will utilize the fact that one can integrate out the mixing weights in a Gaussian mixture model allowing us to arrive at the Chinese restaurant process construction. See Carl E. Rasmussen: The Infinite Gaussian Mixture Model, NIPS (2000) for details.\nIn fact, if the mixing weights are integrated out, the conditional prior for the latent variable \\(z\\) is given by:\n\\[\np(z_i = k \\mid z_{\\not i}, \\alpha) = \\frac{n_k + \\alpha K}{N - 1 + \\alpha}\n\\]\nwhere \\(z_{\\not i}\\) are the latent assignments of all observations except observation \\(i\\). Note that we use \\(n_k\\) to denote the number of observations at component \\(k\\) excluding observation \\(i\\). The parameter \\(\\alpha\\) is the concentration parameter of the Dirichlet distribution used as prior over the mixing weights.\n\nChinese Restaurant Process\nTo obtain the Chinese restaurant process construction, we can now derive the conditional prior if \\(K \\rightarrow \\infty\\).\nFor \\(n_k > 0\\) we obtain:\n\\[\np(z_i = k \\mid z_{\\not i}, \\alpha) = \\frac{n_k}{N - 1 + \\alpha}\n\\]\nand for all infinitely many clusters that are empty (combined) we get:\n\\[\np(z_i = k \\mid z_{\\not i}, \\alpha) = \\frac{\\alpha}{N - 1 + \\alpha}\n\\]\nThose equations show that the conditional prior for component assignments is proportional to the number of such observations, meaning that the Chinese restaurant process has a rich get richer property.\nTo get a better understanding of this property, we can plot the cluster chosen by for each new observation drawn from the conditional prior.\n\n# Concentration parameter.\nα = 10.0\n\n# Random measure, e.g. Dirichlet process.\nrpm = DirichletProcess(α)\n\n# Cluster assignments for each observation.\nz = Vector{Int}()\n\n# Maximum number of observations we observe.\nNmax = 500\n\nfor i in 1:Nmax\n # Number of observations per cluster.\n K = isempty(z) ? 0 : maximum(z)\n nk = Vector{Int}(map(k -> sum(z .== k), 1:K))\n\n # Draw new assignment.\n push!(z, rand(ChineseRestaurantProcess(rpm, nk)))\nend\n\n\nusing Plots\n\n# Plot the cluster assignments over time\n@gif for i in 1:Nmax\n scatter(\n collect(1:i),\n z[1:i];\n markersize=2,\n xlabel=\"observation (i)\",\n ylabel=\"cluster (k)\",\n legend=false,\n )\nend\n\n\nGKS: cannot open display - headless operation mode active\n[ Info: Saved animation to /tmp/jl_1OFAz0Fqti.gif\n\n\n\n\n\n\n\nFurther, we can see that the number of clusters is logarithmic in the number of observations and data points. This is a side-effect of the “rich-get-richer” phenomenon, i.e. we expect large clusters and thus the number of clusters has to be smaller than the number of observations.\n\\[\n\\mathbb{E}[K \\mid N] \\approx \\alpha \\cdot log \\big(1 + \\frac{N}{\\alpha}\\big)\n\\]\nWe can see from the equation that the concentration parameter \\(\\alpha\\) allows us to control the number of clusters formed a priori.\nIn Turing we can implement an infinite Gaussian mixture model using the Chinese restaurant process construction of a Dirichlet process as follows:\n\n@model function infiniteGMM(x)\n # Hyper-parameters, i.e. concentration parameter and parameters of H.\n α = 1.0\n μ0 = 0.0\n σ0 = 1.0\n\n # Define random measure, e.g. Dirichlet process.\n rpm = DirichletProcess(α)\n\n # Define the base distribution, i.e. expected value of the Dirichlet process.\n H = Normal(μ0, σ0)\n\n # Latent assignment.\n z = zeros(Int, length(x))\n\n # Locations of the infinitely many clusters.\n μ = zeros(Float64, 0)\n\n for i in 1:length(x)\n\n # Number of clusters.\n K = maximum(z)\n nk = Vector{Int}(map(k -> sum(z .== k), 1:K))\n\n # Draw the latent assignment.\n z[i] ~ ChineseRestaurantProcess(rpm, nk)\n\n # Create a new cluster?\n if z[i] > K\n push!(μ, 0.0)\n\n # Draw location of new cluster.\n μ[z[i]] ~ H\n end\n\n # Draw observation.\n x[i] ~ Normal(μ[z[i]], 1.0)\n end\nend\n\ninfiniteGMM (generic function with 2 methods)\n\n\nWe can now use Turing to infer the assignments of some data points. First, we will create some random data that comes from three clusters, with means of 0, -5, and 10.\n\nusing Plots, Random\n\n# Generate some test data.\nRandom.seed!(1)\ndata = vcat(randn(10), randn(10) .- 5, randn(10) .+ 10)\ndata .-= mean(data)\ndata /= std(data);\n\nNext, we’ll sample from our posterior using SMC.\n\nsetprogress!(false)\n\n\n# MCMC sampling\nRandom.seed!(2)\niterations = 1000\nmodel_fun = infiniteGMM(data);\nchain = sample(model_fun, SMC(), iterations);\n\nFinally, we can plot the number of clusters in each sample.\n\n# Extract the number of clusters for each sample of the Markov chain.\nk = map(\n t -> length(unique(vec(chain[t, MCMCChains.namesingroup(chain, :z), :].value))),\n 1:iterations,\n);\n\n# Visualize the number of clusters.\nplot(k; xlabel=\"Iteration\", ylabel=\"Number of clusters\", label=\"Chain 1\")\n\n\n\n\nIf we visualise the histogram of the number of clusters sampled from our posterior, we observe that the model seems to prefer 3 clusters, which is the true number of clusters. Note that the number of clusters in a Dirichlet process mixture model is not limited a priori and will grow to infinity with probability one. However, if conditioned on data the posterior will concentrate on a finite number of clusters enforcing the resulting model to have a finite amount of clusters. It is, however, not given that the posterior of a Dirichlet process Gaussian mixture model converges to the true number of clusters, given that data comes from a finite mixture model. See Jeffrey Miller and Matthew Harrison: A simple example of Dirichlet process mixture inconsistency for the number of components for details.\n\nhistogram(k; xlabel=\"Number of clusters\", legend=false)\n\n\n\n\nOne issue with the Chinese restaurant process construction is that the number of latent parameters we need to sample scales with the number of observations. It may be desirable to use alternative constructions in certain cases. Alternative methods of constructing a Dirichlet process can be employed via the following representations:\nSize-Biased Sampling Process\n\\[\nj_k \\sim \\mathrm{Beta}(1, \\alpha) \\cdot \\mathrm{surplus}\n\\]\nStick-Breaking Process \\[\nv_k \\sim \\mathrm{Beta}(1, \\alpha)\n\\]\nChinese Restaurant Process \\[\np(z_n = k | z_{1:n-1}) \\propto \\begin{cases}\n\\frac{m_k}{n-1+\\alpha}, \\text{ if } m_k > 0\\\\\\\n\\frac{\\alpha}{n-1+\\alpha}\n\\end{cases}\n\\]\nFor more details see this article.", "crumbs": [ "Get Started", "Tutorials", "Infinite Mixture Models" ] }, { "objectID": "tutorials/bayesian-logistic-regression/index.html", "href": "tutorials/bayesian-logistic-regression/index.html", "title": "Bayesian Logistic Regression", "section": "", "text": "Bayesian logistic regression is the Bayesian counterpart to a common tool in machine learning, logistic regression. The goal of logistic regression is to predict a one or a zero for a given training item. An example might be predicting whether someone is sick or ill given their symptoms and personal information.\nIn our example, we’ll be working to predict whether someone is likely to default with a synthetic dataset found in the RDatasets package. This dataset, Defaults, comes from R’s ISLR package and contains information on borrowers.\nTo start, let’s import all the libraries we’ll need.\n# Import Turing and Distributions.\nusing Turing, Distributions\n\n# Import RDatasets.\nusing RDatasets\n\n# Import MCMCChains, Plots, and StatsPlots for visualisations and diagnostics.\nusing MCMCChains, Plots, StatsPlots\n\n# We need a logistic function, which is provided by StatsFuns.\nusing StatsFuns: logistic\n\n# Functionality for splitting and normalising the data\nusing MLDataUtils: shuffleobs, stratifiedobs, rescale!\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(0);", "crumbs": [ "Get Started", "Tutorials", "Bayesian Logistic Regression" ] }, { "objectID": "tutorials/bayesian-logistic-regression/index.html#data-cleaning-set-up", "href": "tutorials/bayesian-logistic-regression/index.html#data-cleaning-set-up", "title": "Bayesian Logistic Regression", "section": "Data Cleaning & Set Up", "text": "Data Cleaning & Set Up\nNow we’re going to import our dataset. The first six rows of the dataset are shown below so you can get a good feel for what kind of data we have.\n\n# Import the \"Default\" dataset.\ndata = RDatasets.dataset(\"ISLR\", \"Default\");\n\n# Show the first six rows of the dataset.\nfirst(data, 6)\n\n6×4 DataFrame\n\n\n\nRow\nDefault\nStudent\nBalance\nIncome\n\n\n\nCat…\nCat…\nFloat64\nFloat64\n\n\n\n\n1\nNo\nNo\n729.526\n44361.6\n\n\n2\nNo\nYes\n817.18\n12106.1\n\n\n3\nNo\nNo\n1073.55\n31767.1\n\n\n4\nNo\nNo\n529.251\n35704.5\n\n\n5\nNo\nNo\n785.656\n38463.5\n\n\n6\nNo\nYes\n919.589\n7491.56\n\n\n\n\n\n\nMost machine learning processes require some effort to tidy up the data, and this is no different. We need to convert the Default and Student columns, which say “Yes” or “No” into 1s and 0s. Afterwards, we’ll get rid of the old words-based columns.\n\n# Convert \"Default\" and \"Student\" to numeric values.\ndata[!, :DefaultNum] = [r.Default == \"Yes\" ? 1.0 : 0.0 for r in eachrow(data)]\ndata[!, :StudentNum] = [r.Student == \"Yes\" ? 1.0 : 0.0 for r in eachrow(data)]\n\n# Delete the old columns which say \"Yes\" and \"No\".\nselect!(data, Not([:Default, :Student]))\n\n# Show the first six rows of our edited dataset.\nfirst(data, 6)\n\n6×4 DataFrame\n\n\n\nRow\nBalance\nIncome\nDefaultNum\nStudentNum\n\n\n\nFloat64\nFloat64\nFloat64\nFloat64\n\n\n\n\n1\n729.526\n44361.6\n0.0\n0.0\n\n\n2\n817.18\n12106.1\n0.0\n1.0\n\n\n3\n1073.55\n31767.1\n0.0\n0.0\n\n\n4\n529.251\n35704.5\n0.0\n0.0\n\n\n5\n785.656\n38463.5\n0.0\n0.0\n\n\n6\n919.589\n7491.56\n0.0\n1.0\n\n\n\n\n\n\nAfter we’ve done that tidying, it’s time to split our dataset into training and testing sets, and separate the labels from the data. We separate our data into two halves, train and test. You can use a higher percentage of splitting (or a lower one) by modifying the at = 0.05 argument. We have highlighted the use of only a 5% sample to show the power of Bayesian inference with small sample sizes.\nWe must rescale our variables so that they are centred around zero by subtracting each column by the mean and dividing it by the standard deviation. This rescaling ensures features are on comparable scales, which improves sampler initialisation and convergence. To do this we will leverage MLDataUtils, which also lets us effortlessly shuffle our observations and perform a stratified split to get a representative test set.\n\nfunction split_data(df, target; at=0.70)\n shuffled = shuffleobs(df)\n return trainset, testset = stratifiedobs(row -> row[target], shuffled; p=at)\nend\n\nfeatures = [:StudentNum, :Balance, :Income]\nnumerics = [:Balance, :Income]\ntarget = :DefaultNum\n\ntrainset, testset = split_data(data, target; at=0.05)\nfor feature in numerics\n μ, σ = rescale!(trainset[!, feature]; obsdim=1)\n rescale!(testset[!, feature], μ, σ; obsdim=1)\nend\n\n# Turing requires data in matrix form, not dataframe\ntrain = Matrix(trainset[:, features])\ntest = Matrix(testset[:, features])\ntrain_label = trainset[:, target]\ntest_label = testset[:, target];", "crumbs": [ "Get Started", "Tutorials", "Bayesian Logistic Regression" ] }, { "objectID": "tutorials/bayesian-logistic-regression/index.html#model-declaration", "href": "tutorials/bayesian-logistic-regression/index.html#model-declaration", "title": "Bayesian Logistic Regression", "section": "Model Declaration", "text": "Model Declaration\nFinally, we can define our model.\nlogistic_regression takes four arguments:\n\nx is our set of independent variables;\ny is the element we want to predict;\nn is the number of observations we have; and\nσ is the standard deviation we want to assume for our priors.\n\nWithin the model, we create four coefficients (intercept, student, balance, and income) and assign a prior of normally distributed with means of zero and standard deviations of σ. We want to find values of these four coefficients to predict any given y.\nThe for block creates a variable v which is the logistic function. We then observe the likelihood of calculating v given the actual label, y[i].\n\n# Bayesian logistic regression (LR)\n@model function logistic_regression(x, y, n, σ)\n intercept ~ Normal(0, σ)\n\n student ~ Normal(0, σ)\n balance ~ Normal(0, σ)\n income ~ Normal(0, σ)\n\n for i in 1:n\n v = logistic(intercept + student * x[i, 1] + balance * x[i, 2] + income * x[i, 3])\n y[i] ~ Bernoulli(v)\n end\nend;", "crumbs": [ "Get Started", "Tutorials", "Bayesian Logistic Regression" ] }, { "objectID": "tutorials/bayesian-logistic-regression/index.html#sampling", "href": "tutorials/bayesian-logistic-regression/index.html#sampling", "title": "Bayesian Logistic Regression", "section": "Sampling", "text": "Sampling\nNow we can run our sampler. This time we’ll use NUTS to sample from our posterior.\n\nsetprogress!(false)\n\n\n# Retrieve the number of observations.\nn, _ = size(train)\n\n# Sample using NUTS.\nm = logistic_regression(train, train_label, n, 1)\nchain = sample(m, NUTS(), MCMCThreads(), 1_500, 3)\n\n\n\nChains MCMC chain (1500×18×3 Array{Float64, 3}):\n\nIterations = 751:1:2250\nNumber of chains = 3\nSamples per chain = 1500\nWall duration = 12.96 seconds\nCompute duration = 8.42 seconds\nparameters = intercept, student, balance, income\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\n\n\n\n\n\nWarningSampling With Multiple Threads\n\n\n\n\n\nThe sample() call above assumes that you have at least nchains threads available in your Julia instance. If you do not, the multiple chains will run sequentially, and you may notice a warning. For more information, see the Turing documentation on sampling multiple chains.\n\n\n\nSince we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.\n\nplot(chain)\n\n\n\n\nLooks good!\nWe can also use the corner function from MCMCChains to show the distributions of the various parameters of our logistic regression.\n\n# The labels to use.\nl = [:student, :balance, :income]\n\n# Use the corner function. Requires StatsPlots and MCMCChains.\ncorner(chain, l)\n\n\n\n\nFortunately the corner plot appears to demonstrate unimodal distributions for each of our parameters, so it should be straightforward to take the means of each parameter’s sampled values to estimate our model to make predictions.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Logistic Regression" ] }, { "objectID": "tutorials/bayesian-logistic-regression/index.html#making-predictions", "href": "tutorials/bayesian-logistic-regression/index.html#making-predictions", "title": "Bayesian Logistic Regression", "section": "Making Predictions", "text": "Making Predictions\nHow do we test how well the model actually predicts whether someone is likely to default? We need to build a prediction function that takes the test object we made earlier and runs it through the average parameter calculated during sampling.\nThe prediction function below takes a Matrix and a Chain object. It takes the mean of each parameter’s sampled values and re-runs the logistic function using those mean values for every element in the test set.\n\nfunction prediction(x::Matrix, chain, threshold)\n # Pull the means from each parameter's sampled values in the chain.\n intercept = mean(chain[:intercept])\n student = mean(chain[:student])\n balance = mean(chain[:balance])\n income = mean(chain[:income])\n\n # Retrieve the number of rows.\n n, _ = size(x)\n\n # Generate a vector to store our predictions.\n v = Vector{Float64}(undef, n)\n\n # Calculate the logistic function for each element in the test set.\n for i in 1:n\n num = logistic(\n intercept .+ student * x[i, 1] + balance * x[i, 2] + income * x[i, 3]\n )\n if num >= threshold\n v[i] = 1\n else\n v[i] = 0\n end\n end\n return v\nend;\n\nLet’s see how we did! We run the test matrix through the prediction function, and compute the mean squared error (MSE) for our prediction. The threshold variable sets the decision boundary for classification. For example, a threshold of 0.07 will predict a default (value of 1) for any predicted probability greater than 0.07 and no default otherwise. Lower thresholds increase sensitivity but may increase false positives.\n\n# Set the prediction threshold.\nthreshold = 0.07\n\n# Make the predictions.\npredictions = prediction(test, chain, threshold)\n\n# Calculate MSE for our test set.\nloss = sum((predictions - test_label) .^ 2) / length(test_label)\n\n0.12894736842105264\n\n\nPerhaps more important is to see what percentage of defaults we correctly predicted. The code below simply counts defaults and predictions and presents the results.\n\ndefaults = sum(test_label)\nnot_defaults = length(test_label) - defaults\n\npredicted_defaults = sum(test_label .== predictions .== 1)\npredicted_not_defaults = sum(test_label .== predictions .== 0)\n\nprintln(\"Defaults: $defaults\n Predictions: $predicted_defaults\n Percentage defaults correct $(predicted_defaults/defaults)\")\n\nprintln(\"Not defaults: $not_defaults\n Predictions: $predicted_not_defaults\n Percentage non-defaults correct $(predicted_not_defaults/not_defaults)\")\n\nDefaults: 316.0\n Predictions: 273\n Percentage defaults correct 0.8639240506329114\nNot defaults: 9184.0\n Predictions: 8002\n Percentage non-defaults correct 0.8712979094076655\n\n\nThe above shows that with a threshold of 0.07, we correctly predict a respectable portion of the defaults, and correctly identify most non-defaults. This is fairly sensitive to a choice of threshold, and you may wish to experiment with it.\nThis tutorial has demonstrated how to use Turing to perform Bayesian logistic regression.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Logistic Regression" ] }, { "objectID": "tutorials/bayesian-poisson-regression/index.html", "href": "tutorials/bayesian-poisson-regression/index.html", "title": "Bayesian Poisson Regression", "section": "", "text": "This notebook is ported from the example notebook of PyMC3 on Poisson Regression.\nPoisson Regression is a technique commonly used to model count data. Some of the applications include predicting the number of people defaulting on their loans or the number of cars running on a highway on a given day. This example describes a method to implement the Bayesian version of this technique using Turing.\nWe will generate the dataset that we will be working on which describes the relationship between number of times a person sneezes during the day with his alcohol consumption and medicinal intake.\nWe start by importing the required libraries.\n\n#Import Turing, Distributions and DataFrames\nusing Turing, Distributions, DataFrames, Distributed\n\n# Import MCMCChain, Plots, and StatsPlots for visualisations and diagnostics.\nusing MCMCChains, Plots, StatsPlots\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(12);\n\n\nGenerating data\nWe start off by creating a toy dataset. We take the case of a person who takes medicine to prevent excessive sneezing. Alcohol consumption increases the rate of sneezing for that person. Thus, the two factors affecting the number of sneezes in a given day are alcohol consumption and whether the person has taken his medicine. Both these variables are taken as boolean valued while the number of sneezes will be a count valued variable. We also take into consideration that the interaction between the two boolean variables will affect the number of sneezes\n5 random rows are printed from the generated data to get a gist of the data generated.\n\ntheta_noalcohol_meds = 1 # no alcohol, took medicine\ntheta_alcohol_meds = 3 # alcohol, took medicine\ntheta_noalcohol_nomeds = 6 # no alcohol, no medicine\ntheta_alcohol_nomeds = 36 # alcohol, no medicine\n\n# no of samples for each of the above cases\nq = 100\n\n#Generate data from different Poisson distributions\nnoalcohol_meds = Poisson(theta_noalcohol_meds)\nalcohol_meds = Poisson(theta_alcohol_meds)\nnoalcohol_nomeds = Poisson(theta_noalcohol_nomeds)\nalcohol_nomeds = Poisson(theta_alcohol_nomeds)\n\nnsneeze_data = vcat(\n rand(noalcohol_meds, q),\n rand(alcohol_meds, q),\n rand(noalcohol_nomeds, q),\n rand(alcohol_nomeds, q),\n)\nalcohol_data = vcat(zeros(q), ones(q), zeros(q), ones(q))\nmeds_data = vcat(zeros(q), zeros(q), ones(q), ones(q))\n\ndf = DataFrame(;\n nsneeze=nsneeze_data,\n alcohol_taken=alcohol_data,\n nomeds_taken=meds_data,\n product_alcohol_meds=meds_data .* alcohol_data,\n)\ndf[sample(1:nrow(df), 5; replace=false), :]\n\n5×4 DataFrame\n\n\n\nRow\nnsneeze\nalcohol_taken\nnomeds_taken\nproduct_alcohol_meds\n\n\n\nInt64\nFloat64\nFloat64\nFloat64\n\n\n\n\n1\n2\n1.0\n0.0\n0.0\n\n\n2\n1\n0.0\n0.0\n0.0\n\n\n3\n2\n0.0\n0.0\n0.0\n\n\n4\n30\n1.0\n1.0\n1.0\n\n\n5\n0\n1.0\n0.0\n0.0\n\n\n\n\n\n\n\n\nVisualisation of the dataset\nWe plot the distribution of the number of sneezes for the 4 different cases taken above. As expected, the person sneezes the most when he has taken alcohol and not taken his medicine. He sneezes the least when he doesn’t consume alcohol and takes his medicine.\n\n# Data Plotting\n\np1 = Plots.histogram(\n df[(df[:, :alcohol_taken] .== 0) .& (df[:, :nomeds_taken] .== 0), 1];\n title=\"no_alcohol+meds\",\n)\np2 = Plots.histogram(\n (df[(df[:, :alcohol_taken] .== 1) .& (df[:, :nomeds_taken] .== 0), 1]);\n title=\"alcohol+meds\",\n)\np3 = Plots.histogram(\n (df[(df[:, :alcohol_taken] .== 0) .& (df[:, :nomeds_taken] .== 1), 1]);\n title=\"no_alcohol+no_meds\",\n)\np4 = Plots.histogram(\n (df[(df[:, :alcohol_taken] .== 1) .& (df[:, :nomeds_taken] .== 1), 1]);\n title=\"alcohol+no_meds\",\n)\nplot(p1, p2, p3, p4; layout=(2, 2), legend=false)\n\n\n\n\nWe must convert our DataFrame data into the Matrix form as the manipulations that we are about are designed to work with Matrix data. We also separate the features from the labels which will be later used by the Turing sampler to generate samples from the posterior.\n\n# Convert the DataFrame object to matrices.\ndata = Matrix(df[:, [:alcohol_taken, :nomeds_taken, :product_alcohol_meds]])\ndata_labels = df[:, :nsneeze]\ndata\n\n400×3 Matrix{Float64}:\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n 0.0 0.0 0.0\n ⋮ \n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n 1.0 1.0 1.0\n\n\nWe must standardise our data (centring about 0 with unit variance) to help the Turing sampler initialise parameter estimates effectively. We do this by subtracting the mean and dividing by the standard deviation for each column:\n\n# Rescale our matrices.\ndata = (data .- mean(data; dims=1)) ./ std(data; dims=1)\n\n400×3 Matrix{Float64}:\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n -0.998749 -0.998749 -0.576628\n ⋮ \n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n 0.998749 0.998749 1.72988\n\n\n\n\nDeclaring the Model: Poisson Regression\nOur model, poisson_regression takes four arguments:\n\nx is our set of independent variables;\ny is the element we want to predict;\nn is the number of observations we have; and\nσ² is the standard deviation we want to assume for our priors.\n\nWithin the model, we create four coefficients (b0, b1, b2, and b3) and assign a prior of normally distributed with means of zero and standard deviations of σ². We want to find values of these four coefficients to predict any given y.\nIntuitively, we can think of the coefficients as:\n\nb1 is the coefficient which represents the effect of taking alcohol on the number of sneezes;\nb2 is the coefficient which represents the effect of taking in no medicines on the number of sneezes;\nb3 is the coefficient which represents the effect of interaction between taking alcohol and no medicine on the number of sneezes;\n\nThe for block creates a variable theta which is the weighted combination of the input features. We have defined the priors on these weights above. We then observe the likelihood of calculating theta given the actual label, y[i].\n\n# Bayesian poisson regression (LR)\n@model function poisson_regression(x, y, n, σ²)\n b0 ~ Normal(0, σ²)\n b1 ~ Normal(0, σ²)\n b2 ~ Normal(0, σ²)\n b3 ~ Normal(0, σ²)\n for i in 1:n\n theta = b0 + b1 * x[i, 1] + b2 * x[i, 2] + b3 * x[i, 3]\n y[i] ~ Poisson(exp(theta))\n end\nend;\n\n\n\nSampling from the posterior\nWe use the NUTS sampler to sample values from the posterior. We run multiple chains using the MCMCThreads() function to nullify the effect of a problematic chain. We then use the Gelman, Rubin, and Brooks Diagnostic to check the convergence of these multiple chains.\n\n# Retrieve the number of observations.\nn, _ = size(data)\n\n# Sample using NUTS.\n\nnum_chains = 4\nm = poisson_regression(data, data_labels, n, 10)\nchain = sample(m, NUTS(), MCMCThreads(), 2_500, num_chains; discard_adapt=false, progress=false)\n\n\n\nChains MCMC chain (2500×18×4 Array{Union{Missing, Float64}, 3}):\n\nIterations = 1:1:2500\nNumber of chains = 4\nSamples per chain = 2500\nWall duration = 17.31 seconds\nCompute duration = 13.24 seconds\nparameters = b0, b1, b2, b3\ninternals = logprior, loglikelihood, logjoint, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\n\n\n\n\n\nWarningSampling With Multiple Threads\n\n\n\n\n\nThe sample() call above assumes that you have at least nchains threads available in your Julia instance. If you do not, the multiple chains will run sequentially, and you may notice a warning. For more information, see the Turing documentation on sampling multiple chains.\n\n\n\n\n\nViewing the Diagnostics\nWe use the Gelman, Rubin, and Brooks Diagnostic to check whether our chains have converged. Note that we require multiple chains to use this diagnostic which analyses the difference between these multiple chains.\nWe expect the chains to have converged. This is because we have taken sufficient number of iterations (1500) for the NUTS sampler. However, in case the test fails, then we will have to take a larger number of iterations, resulting in longer computation time.\n\n# Because some of the sampler statistics are `missing`, we need to extract only\n# the parameters and then concretize the array so that `gelmandiag` can be computed.\nparameter_chain = MCMCChains.concretize(MCMCChains.get_sections(chain, :parameters))\ngelmandiag(parameter_chain)\n\n\nGelman, Rubin, and Brooks diagnostic\n\n parameters psrf psrfci\n Symbol Float64 Float64\n\n b0 1.1295 1.1529\n b1 1.0092 1.0099\n b2 1.0486 1.0687\n b3 1.0127 1.0175\n\n\n\n\n\nFrom the above diagnostic, we can conclude that the chains have converged because the PSRF values of the coefficients are close to 1.\nSo, we have obtained the posterior distributions of the parameters. We transform the coefficients and recover theta values by taking the exponent of the meaned values of the coefficients b0, b1, b2 and b3. We take the exponent of the means to get a better comparison of the relative values of the coefficients. We then compare this with the intuitive meaning that was described earlier.\n\n# Taking the first chain\nc1 = chain[:, :, 1]\n\n# Calculating the exponentiated means\nb0_exp = exp(mean(c1[:b0]))\nb1_exp = exp(mean(c1[:b1]))\nb2_exp = exp(mean(c1[:b2]))\nb3_exp = exp(mean(c1[:b3]))\n\nprint(\"The exponent of the meaned values of the weights (or coefficients are): \\n\")\nprintln(\"b0: \", b0_exp)\nprintln(\"b1: \", b1_exp)\nprintln(\"b2: \", b2_exp)\nprintln(\"b3: \", b3_exp)\nprint(\"The posterior distributions obtained after sampling can be visualised as :\\n\")\n\nThe exponent of the meaned values of the weights (or coefficients are): \nb0: 4.831670126174538\nb1: 1.927893605972769\nb2: 2.7004412326927834\nb3: 1.2220771840472897\nThe posterior distributions obtained after sampling can be visualised as :\n\n\nVisualising the posterior by plotting it:\n\nplot(chain)\n\n\n\n\n\n\nInterpreting the Obtained Mean Values\nThe exponentiated mean of the coefficient b1 is roughly half of that of b2. This makes sense because in the data that we generated, the number of sneezes was more sensitive to the medicinal intake as compared to the alcohol consumption. We also get a weaker dependence on the interaction between the alcohol consumption and the medicinal intake as can be seen from the value of b3.\n\n\nRemoving the Warmup Samples\nAs can be seen from the plots above, the parameters converge to their final distributions after a few iterations. The initial samples during the warmup phase show higher variability and should be discarded before computing posterior statistics. Thus, we remove these warmup samples and view the diagnostics again. We discard the first 200 samples, corresponding to the adaptation phase used by the NUTS sampler (which we explicitly chose not to discard earlier with the discard_adapt=false argument).\n\nchains_new = chain[201:end, :, :]\n\nChains MCMC chain (2300×18×4 Array{Union{Missing, Float64}, 3}):\n\nIterations = 201:1:2500\nNumber of chains = 4\nSamples per chain = 2300\nWall duration = 17.31 seconds\nCompute duration = 13.24 seconds\nparameters = b0, b1, b2, b3\ninternals = logprior, loglikelihood, logjoint, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\nplot(chains_new)\n\n\n\n\nAs can be seen from the numeric values and the plots above, the standard deviation values have decreased and all the plotted values are from the estimated posteriors. The exponentiated mean values, with the warmup samples removed, have not changed by much and they are still in accordance with their intuitive meanings as described earlier.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Tutorials", "Bayesian Poisson Regression" ] }, { "objectID": "uri/troubleshooting.html", "href": "uri/troubleshooting.html", "title": "Troubleshooting", "section": "", "text": "If you are not redirected, please click here.\n\nNo matching items\n Back to top" }, { "objectID": "uri/growablearray.html", "href": "uri/growablearray.html", "title": "Troubleshooting - GrowableArray", "section": "", "text": "If you are not redirected, please click here.\n\nNo matching items\n Back to top" }, { "objectID": "developers/compiler/design-overview/index.html", "href": "developers/compiler/design-overview/index.html", "title": "Turing Compiler Design (Outdated)", "section": "", "text": "In this section, the current design of Turing’s model “compiler” is described, which enables Turing to perform various types of Bayesian inference without changing the model definition. The “compiler” is essentially just a macro that rewrites the user’s model definition to a function that generates a Model struct that Julia’s dispatch can operate on and that Julia’s compiler can successfully do type inference on for efficient machine code generation." }, { "objectID": "developers/compiler/design-overview/index.html#the-model", "href": "developers/compiler/design-overview/index.html#the-model", "title": "Turing Compiler Design (Outdated)", "section": "The model", "text": "The model\n<!– Very outdated A model::Model is a callable struct that one can sample from by calling\n#| eval: false\n(model::Model)([rng, varinfo, sampler, context])\nwhere rng is a random number generator (default: Random.default_rng()), varinfo is a data structure that stores information about the random variables (default: DynamicPPL.VarInfo()), sampler is a sampling algorithm (default: DynamicPPL.SampleFromPrior()), and context is a sampling context that can, for example, modify how the log probability is accumulated (default: DynamicPPL.DefaultContext()).\nSampling resets the log joint probability of varinfo and increases the evaluation counter of sampler. If context is a LikelihoodContext, only the log likelihood of D will be accumulated, whereas with PriorContext only the log prior probability of P is. With the DefaultContext the log joint probability of both P and D is accumulated.\nThe Model struct contains the four internal fields f, args, defaults, and context. When model::Model is called, then the internal function model.f is called as model.f(rng, varinfo, sampler, context, model.args...) (for multithreaded sampling, instead of varinfo a threadsafe wrapper is passed to model.f). The positional and keyword arguments that were passed to the user-defined model function when the model was created are saved as a NamedTuple in model.args. The default values of the positional and keyword arguments of the user-defined model functions, if any, are saved as a NamedTuple in model.defaults. They are used for constructing model instances with different arguments by the logprob and prob string macros. The context variable sets an evaluation context that can be used to control, for instance, whether log probabilities should be evaluated for the prior, likelihood, or joint probability. By default it is set to evaluate the log joint." }, { "objectID": "developers/compiler/design-overview/index.html#step-1-break-up-the-model-definition", "href": "developers/compiler/design-overview/index.html#step-1-break-up-the-model-definition", "title": "Turing Compiler Design (Outdated)", "section": "Step 1: Break up the model definition", "text": "Step 1: Break up the model definition\nFirst, the @model macro breaks up the user-provided function definition using DynamicPPL.build_model_info. This function returns a dictionary consisting of:\n\nallargs_exprs: The expressions of the positional and keyword arguments, without default values.\nallargs_syms: The names of the positional and keyword arguments, e.g., [:x, :y, :TV] above.\nallargs_namedtuple: An expression that constructs a NamedTuple of the positional and keyword arguments, e.g., :((x = x, y = y, TV = TV)) above.\ndefaults_namedtuple: An expression that constructs a NamedTuple of the default positional and keyword arguments, if any, e.g., :((x = missing, y = 1, TV = Vector{Float64})) above.\nmodeldef: A dictionary with the name, arguments, and function body of the model definition, as returned by MacroTools.splitdef." }, { "objectID": "developers/compiler/design-overview/index.html#step-2-generate-the-body-of-the-internal-model-function", "href": "developers/compiler/design-overview/index.html#step-2-generate-the-body-of-the-internal-model-function", "title": "Turing Compiler Design (Outdated)", "section": "Step 2: Generate the body of the internal model function", "text": "Step 2: Generate the body of the internal model function\nIn a second step, DynamicPPL.generate_mainbody generates the main part of the transformed function body using the user-provided function body and the provided function arguments, without default values, for figuring out if a variable denotes an observation or a random variable. Hereby the function DynamicPPL.generate_tilde replaces the L ~ R lines in the model and the function DynamicPPL.generate_dot_tilde replaces the @. L ~ R and L .~ R lines in the model.\nIn the above example, p[1] ~ InverseGamma(2, 3) is replaced with something similar to\n#| eval: false\n#= REPL[25]:6 =#\nbegin\n var\"##tmpright#323\" = InverseGamma(2, 3)\n var\"##tmpright#323\" isa Union{Distribution,AbstractVector{<:Distribution}} || throw(\n ArgumentError(\n \"Right-hand side of a ~ must be subtype of Distribution or a vector of Distributions.\",\n ),\n )\n var\"##vn#325\" = (DynamicPPL.VarName)(:p, ((1,),))\n var\"##inds#326\" = ((1,),)\n p[1] = (DynamicPPL.tilde_assume)(\n _rng,\n _context,\n _sampler,\n var\"##tmpright#323\",\n var\"##vn#325\",\n var\"##inds#326\",\n _varinfo,\n )\nend\nHere the first line is a so-called line number node that enables more helpful error messages by providing users with the exact location of the error in their model definition. Then the right hand side (RHS) of the ~ is assigned to a variable (with an automatically generated name). We check that the RHS is a distribution or an array of distributions, otherwise an error is thrown. Next we extract a compact representation of the variable with its name and index (or indices). Finally, the ~ expression is replaced with a call to DynamicPPL.tilde_assume since the compiler figured out that p[1] is a random variable using the following heuristic:\n\nIf the symbol on the LHS of ~, :p in this case, is not among the arguments to the model, (:x, :y, :T) in this case, it is a random variable.\nIf the symbol on the LHS of ~, :p in this case, is among the arguments to the model but has a value of missing, it is a random variable.\nIf the value of the LHS of ~, p[1] in this case, is missing, then it is a random variable.\nOtherwise, it is treated as an observation.\n\nThe DynamicPPL.tilde_assume function takes care of sampling the random variable, if needed, and updating its value and the accumulated log joint probability in the _varinfo object. If L ~ R is an observation, DynamicPPL.tilde_observe is called with the same arguments except the random number generator _rng (since observations are never sampled).\nA similar transformation is performed for expressions of the form @. L ~ R and L .~ R. For instance, @. x[1:2] ~ Normal(p[2], sqrt(p[1])) is replaced with\n#| eval: false\n#= REPL[25]:8 =#\nbegin\n var\"##tmpright#331\" = Normal.(p[2], sqrt.(p[1]))\n var\"##tmpright#331\" isa Union{Distribution,AbstractVector{<:Distribution}} || throw(\n ArgumentError(\n \"Right-hand side of a ~ must be subtype of Distribution or a vector of Distributions.\",\n ),\n )\n var\"##vn#333\" = (DynamicPPL.VarName)(:x, ((1:2,),))\n var\"##inds#334\" = ((1:2,),)\n var\"##isassumption#335\" = begin\n let var\"##vn#336\" = (DynamicPPL.VarName)(:x, ((1:2,),))\n if !((DynamicPPL.inargnames)(var\"##vn#336\", _model)) ||\n (DynamicPPL.inmissings)(var\"##vn#336\", _model)\n true\n else\n x[1:2] === missing\n end\n end\n end\n if var\"##isassumption#335\"\n x[1:2] .= (DynamicPPL.dot_tilde_assume)(\n _rng,\n _context,\n _sampler,\n var\"##tmpright#331\",\n x[1:2],\n var\"##vn#333\",\n var\"##inds#334\",\n _varinfo,\n )\n else\n (DynamicPPL.dot_tilde_observe)(\n _context,\n _sampler,\n var\"##tmpright#331\",\n x[1:2],\n var\"##vn#333\",\n var\"##inds#334\",\n _varinfo,\n )\n end\nend\nThe main difference in the expanded code between L ~ R and @. L ~ R is that the former doesn’t assume L to be defined, it can be a new Julia variable in the scope, while the latter assumes L already exists. Moreover, DynamicPPL.dot_tilde_assume and DynamicPPL.dot_tilde_observe are called instead of DynamicPPL.tilde_assume and DynamicPPL.tilde_observe." }, { "objectID": "developers/compiler/design-overview/index.html#step-3-replace-the-user-provided-function-body", "href": "developers/compiler/design-overview/index.html#step-3-replace-the-user-provided-function-body", "title": "Turing Compiler Design (Outdated)", "section": "Step 3: Replace the user-provided function body", "text": "Step 3: Replace the user-provided function body\nFinally, we replace the user-provided function body using DynamicPPL.build_output. This function uses MacroTools.combinedef to reassemble the user-provided function with a new function body. In the modified function body an anonymous function is created whose function body was generated in step 2 above and whose arguments are\n\na random number generator _rng,\na model _model,\na datastructure _varinfo,\na sampler _sampler,\na sampling context _context,\nand all positional and keyword arguments of the user-provided model function as positional arguments without any default values. Finally, in the new function body a model::Model with this anonymous function as internal function is returned." }, { "objectID": "developers/compiler/design-overview/index.html#overview-1", "href": "developers/compiler/design-overview/index.html#overview-1", "title": "Turing Compiler Design (Outdated)", "section": "Overview", "text": "Overview\nVarInfo is the data structure in Turing that facilitates tracking random variables and certain metadata about them that are required for sampling. For instance, the distribution of every random variable is stored in VarInfo because we need to know the support of every random variable when sampling using HMC for example. Random variables whose distributions have a constrained support are transformed using a bijector from Bijectors.jl so that the sampling happens in the unconstrained space. Different samplers require different metadata about the random variables.\nThe definition of VarInfo in Turing is:\n#| eval: false\nstruct VarInfo{Tmeta, Tlogp} <: AbstractVarInfo\n metadata::Tmeta\n logp::Base.RefValue{Tlogp}\n num_produce::Base.RefValue{Int}\nend\nBased on the type of metadata, the VarInfo is either aliased UntypedVarInfo or TypedVarInfo. metadata can be either a subtype of the union type Metadata or a NamedTuple of multiple such subtypes. Let vi be an instance of VarInfo. If vi isa VarInfo{<:Metadata}, then it is called an UntypedVarInfo. If vi isa VarInfo{<:NamedTuple}, then vi.metadata would be a NamedTuple mapping each symbol in P to an instance of Metadata. vi would then be called a TypedVarInfo. The other fields of VarInfo include logp which is used to accumulate the log probability or log probability density of the variables in P and D. num_produce keeps track of how many observations have been made in the model so far. This is incremented when running a ~ statement when the symbol on the LHS is in D." }, { "objectID": "developers/compiler/design-overview/index.html#metadata", "href": "developers/compiler/design-overview/index.html#metadata", "title": "Turing Compiler Design (Outdated)", "section": "Metadata", "text": "Metadata\nThe Metadata struct stores some metadata about the random variables sampled. This helps query certain information about a variable such as: its distribution, which samplers sample this variable, its value and whether this value is transformed to real space or not. Let md be an instance of Metadata:\n\nmd.vns is the vector of all VarName instances. Let vn be an arbitrary element of md.vns\nmd.idcs is the dictionary that maps each VarName instance to its index in md.vns, md.ranges, md.dists, md.orders and md.flags.\nmd.vns[md.idcs[vn]] == vn.\nmd.dists[md.idcs[vn]] is the distribution of vn.\nmd.gids[md.idcs[vn]] is the set of algorithms used to sample vn. This was used by the Gibbs sampler. Since Turing v0.36 it is unused and will eventually be deleted.\nmd.orders[md.idcs[vn]] is the number of observe statements before vn is sampled.\nmd.ranges[md.idcs[vn]] is the index range of vn in md.vals.\nmd.vals[md.ranges[md.idcs[vn]]] is the linearized vector of values of corresponding to vn.\nmd.flags is a dictionary of true/false flags. md.flags[flag][md.idcs[vn]] is the value of flag corresponding to vn.\n\nNote that in order to make md::Metadata type stable, all the md.vns must have the same symbol and distribution type. However, one can have a single Julia variable, e.g. x, that is a matrix or a hierarchical array sampled in partitions, e.g. x[1][:] ~ MvNormal(zeros(2), I); x[2][:] ~ MvNormal(ones(2), I). The symbol x can still be managed by a single md::Metadata without hurting the type stability since all the distributions on the RHS of ~ are of the same type.\nHowever, in Turing models one cannot have this restriction, so we must use a type unstable Metadata if we want to use one Metadata instance for the whole model. This is what UntypedVarInfo does. A type unstable Metadata will still work but will have inferior performance.\nTo strike a balance between flexibility and performance when constructing the spl::Sampler instance, the model is first run by sampling the parameters in P from their priors using an UntypedVarInfo, i.e. a type unstable Metadata is used for all the variables. Then once all the symbols and distribution types have been identified, a vi::TypedVarInfo is constructed where vi.metadata is a NamedTuple mapping each symbol in P to a specialized instance of Metadata. So as long as each symbol in P is sampled from only one type of distributions, vi::TypedVarInfo will have fully concretely typed fields which brings out the peak performance of Julia." }, { "objectID": "developers/compiler/minituring-compiler/index.html", "href": "developers/compiler/minituring-compiler/index.html", "title": "A Mini Turing Implementation I: Compiler", "section": "", "text": "In this tutorial we develop a very simple probabilistic programming language. The implementation is similar to DynamicPPL. This is intentional as we want to demonstrate some key ideas from Turing’s internal implementation.\nTo make things easy to understand and to implement we restrict our language to a very simple subset of the language that Turing actually supports. Defining an accurate syntax description is not our goal here, instead, we give a simple example and all similar programs should work.\n\nConsider a probabilistic model defined by\n\\[\n\\begin{aligned}\na &\\sim \\operatorname{Normal}(0.5, 1^2) \\\\\nb &\\sim \\operatorname{Normal}(a, 2^2) \\\\\nx &\\sim \\operatorname{Normal}(b, 0.5^2)\n\\end{aligned}\n\\]\nWe assume that x is data, i.e., an observed variable. In our small language this model will be defined as\n\n@mini_model function m(x)\n a ~ Normal(0.5, 1)\n b ~ Normal(a, 2)\n x ~ Normal(b, 0.5)\n return nothing\nend\n\nSpecifically, we demand that\n\nall observed variables are arguments of the program,\nthe model definition does not contain any control flow,\nall variables are scalars, and\nthe function returns nothing.\n\nFirst, we import some required packages:\n\nusing MacroTools, Distributions, Random, AbstractMCMC, MCMCChains\n\nBefore getting to the actual “compiler”, we first build the data structure for the program trace. A program trace for a probabilistic programming language needs to at least record the values of stochastic variables and their log-probabilities.\n\nstruct VarInfo{V,L}\n values::V\n logps::L\nend\n\nVarInfo() = VarInfo(Dict{Symbol,Float64}(), Dict{Symbol,Float64}())\n\nfunction Base.setindex!(varinfo::VarInfo, (value, logp), var_id)\n varinfo.values[var_id] = value\n varinfo.logps[var_id] = logp\n return varinfo\nend\n\nInternally, our probabilistic programming language works with two main functions:\n\nassume for sampling unobserved variables and computing their log-probabilities, and\nobserve for computing log-probabilities of observed variables (but not sampling them).\n\nFor different inference algorithms we may have to use different sampling procedures and different log-probability computations. For instance, in some cases we might want to sample all variables from their prior distributions and in other cases we might only want to compute the log-likelihood of the observations based on a given set of values for the unobserved variables. Thus depending on the inference algorithm we want to use different assume and observe implementations. We can achieve this by providing this context information as a function argument to assume and observe.\nNote: Although the context system in this tutorial is inspired by DynamicPPL, it is very simplistic. We expand this mini Turing example in the contexts tutorial with some more complexity, to illustrate how and why contexts are central to Turing’s design. For the full details one still needs to go to the actual source of DynamicPPL though.\nHere we can see the implementation of a sampler that draws values of unobserved variables from the prior and computes the log-probability for every variable.\n\nstruct SamplingContext{S<:AbstractMCMC.AbstractSampler,R<:Random.AbstractRNG}\n rng::R\n sampler::S\nend\n\nstruct PriorSampler <: AbstractMCMC.AbstractSampler end\n\nfunction observe(context::SamplingContext, varinfo, dist, var_id, var_value)\n logp = logpdf(dist, var_value)\n varinfo[var_id] = (var_value, logp)\n return nothing\nend\n\nfunction assume(context::SamplingContext{PriorSampler}, varinfo, dist, var_id)\n sample = Random.rand(context.rng, dist)\n logp = logpdf(dist, sample)\n varinfo[var_id] = (sample, logp)\n return sample\nend;\n\nNext we define the “compiler” for our simple programming language. The term compiler is actually a bit misleading here since its only purpose is to transform the function definition in the @mini_model macro by\n\nadding the context information (context) and the tracing data structure (varinfo) as additional arguments, and\nreplacing tildes with calls to assume and observe.\n\nAfterwards, as usual the Julia compiler will just-in-time compile the model function when it is called.\nThe manipulation of Julia expressions is an advanced part of the Julia language. The Julia documentation provides an introduction to and more details about this so-called metaprogramming.\n\nmacro mini_model(expr)\n return esc(mini_model(expr))\nend\n\nfunction mini_model(expr)\n # Split the function definition into a dictionary with its name, arguments, body etc.\n def = MacroTools.splitdef(expr)\n\n # Replace tildes in the function body with calls to `assume` or `observe`\n def[:body] = MacroTools.postwalk(def[:body]) do sub_expr\n if MacroTools.@capture(sub_expr, var_ ~ dist_)\n if var in def[:args]\n # If the variable is an argument of the model function, it is observed\n return :($(observe)(context, varinfo, $dist, $(Meta.quot(var)), $var))\n else\n # Otherwise it is unobserved\n return :($var = $(assume)(context, varinfo, $dist, $(Meta.quot(var))))\n end\n else\n return sub_expr\n end\n end\n\n # Add `context` and `varinfo` arguments to the model function\n def[:args] = vcat(:varinfo, :context, def[:args])\n\n # Reassemble the function definition from its name, arguments, body etc.\n return MacroTools.combinedef(def)\nend;\n\nFor inference, we make use of the AbstractMCMC interface. It provides a default implementation of a sample function for sampling a Markov chain. The default implementation already supports e.g. sampling of multiple chains in parallel, thinning of samples, or discarding initial samples.\nThe AbstractMCMC interface requires us to at least\n\ndefine a model that is a subtype of AbstractMCMC.AbstractModel,\ndefine a sampler that is a subtype of AbstractMCMC.AbstractSampler,\nimplement AbstractMCMC.step for our model and sampler.\n\nThus here we define a MiniModel model. In this model we store the model function and the observed data.\n\nstruct MiniModel{F,D} <: AbstractMCMC.AbstractModel\n f::F\n data::D # a NamedTuple of all the data\nend\n\nIn the Turing compiler, the model-specific DynamicPPL.Model is constructed automatically when calling the model function. But for the sake of simplicity here we construct the model manually.\nTo illustrate probabilistic inference with our mini language we implement an extremely simplistic Random-Walk Metropolis-Hastings sampler. We hard-code the proposal step as part of the sampler and only allow normal distributions with zero mean and fixed standard deviation. The Metropolis-Hastings sampler in Turing is more flexible.\n\nstruct MHSampler{T<:Real} <: AbstractMCMC.AbstractSampler\n sigma::T\nend\n\nMHSampler() = MHSampler(1)\n\nfunction assume(context::SamplingContext{<:MHSampler}, varinfo, dist, var_id)\n sampler = context.sampler\n old_value = varinfo.values[var_id]\n\n # propose a random-walk step, i.e, add the current value to a random\n # value sampled from a Normal distribution centred at 0\n value = rand(context.rng, Normal(old_value, sampler.sigma))\n logp = Distributions.logpdf(dist, value)\n varinfo[var_id] = (value, logp)\n\n return value\nend;\n\nWe need to define two step functions, one for the first step and the other for the following steps. In the first step we sample values from the prior distributions and in the following steps we sample with the random-walk proposal. The two functions are identified by the different arguments they take.\n\n# The first step: Sampling from the prior distributions\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG, model::MiniModel, sampler::MHSampler; kwargs...\n)\n vi = VarInfo()\n ctx = SamplingContext(rng, PriorSampler())\n model.f(vi, ctx, values(model.data)...)\n return vi, vi\nend\n\n# The following steps: Sampling with random-walk proposal\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n model::MiniModel,\n sampler::MHSampler,\n prev_state::VarInfo; # is just the old trace\n kwargs...\n)\n vi = prev_state\n new_vi = deepcopy(vi)\n ctx = SamplingContext(rng, sampler)\n model.f(new_vi, ctx, values(model.data)...)\n\n # Compute log acceptance probability\n # Since the proposal is symmetric the computation can be simplified\n logα = sum(values(new_vi.logps)) - sum(values(vi.logps))\n\n # Accept proposal with computed acceptance probability\n if -randexp(rng) < logα\n return new_vi, new_vi\n else\n return prev_state, prev_state\n end\nend;\n\nTo make it easier to analyse the samples and compare them with results from Turing, additionally we define a version of AbstractMCMC.bundle_samples for our model and sampler that returns a MCMCChains.Chains object of samples.\n\nfunction AbstractMCMC.bundle_samples(\n samples, model::MiniModel, ::MHSampler, ::Any, ::Type{Chains}; kwargs...\n)\n # We get a vector of traces\n values = [sample.values for sample in samples]\n params = [key for key in keys(values[1]) if key ∉ keys(model.data)]\n vals = reduce(hcat, [value[p] for value in values] for p in params)\n # Composing the `Chains` data-structure, of which analysing infrastructure is provided\n chains = Chains(vals, params)\n return chains\nend;\n\nLet us check how our mini probabilistic programming language works. We define the probabilistic model:\n\n@mini_model function m(x)\n a ~ Normal(0.5, 1)\n b ~ Normal(a, 2)\n x ~ Normal(b, 0.5)\n return nothing\nend;\n\nThe @mini_model macro expands this into another function, m, which effectively calls either assume or observe on each variable as needed:\n\n@macroexpand @mini_model function m(x)\n a ~ Normal(0.5, 1)\n b ~ Normal(a, 2)\n x ~ Normal(b, 0.5)\n return nothing\nend\n\n\n:(function m(varinfo, context, x; )\n #= /home/runner/work/docs/docs/developers/compiler/minituring-compiler/index.qmd:276 =#\n #= /home/runner/work/docs/docs/developers/compiler/minituring-compiler/index.qmd:277 =#\n a = (assume)(context, varinfo, Normal(0.5, 1), :a)\n #= /home/runner/work/docs/docs/developers/compiler/minituring-compiler/index.qmd:278 =#\n b = (assume)(context, varinfo, Normal(a, 2), :b)\n #= /home/runner/work/docs/docs/developers/compiler/minituring-compiler/index.qmd:279 =#\n (observe)(context, varinfo, Normal(b, 0.5), :x, x)\n #= /home/runner/work/docs/docs/developers/compiler/minituring-compiler/index.qmd:280 =#\n return nothing\n end)\n\n\n\nWe can use this function to construct the MiniModel, and then perform inference with data x = 3.0:\n\nsample(MiniModel(m, (x=3.0,)), MHSampler(), 1_000_000; chain_type=Chains, progress=false)\n\nChains MCMC chain (1000000×2×1 Array{Float64, 3}):\n\nIterations = 1:1:1000000\nNumber of chains = 1\nSamples per chain = 1000000\nparameters = a, b\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nWe compare these results with Turing.\n\nusing Turing\nusing PDMats\n\n@model function turing_m(x)\n a ~ Normal(0.5, 1)\n b ~ Normal(a, 2)\n x ~ Normal(b, 0.5)\n return nothing\nend\n\nsample(turing_m(3.0), MH(ScalMat(2, 1.0)), 1_000_000, progress=false)\n\nChains MCMC chain (1000000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:1000000\nNumber of chains = 1\nSamples per chain = 1000000\nWall duration = 8.93 seconds\nCompute duration = 8.93 seconds\nparameters = a, b\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nAs you can see, with our simple probabilistic programming language and custom samplers we get similar results as Turing.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "A Mini Turing Implementation I: Compiler" ] }, { "objectID": "developers/transforms/bijectors/index.html", "href": "developers/transforms/bijectors/index.html", "title": "Bijectors in MCMC", "section": "", "text": "All the above has been purely a mathematical discussion of how distributions can be transformed. Now, we turn to their implementation in Julia, specifically using the Bijectors.jl package.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Bijectors in MCMC" ] }, { "objectID": "developers/transforms/bijectors/index.html#bijectors.jl", "href": "developers/transforms/bijectors/index.html#bijectors.jl", "title": "Bijectors in MCMC", "section": "Bijectors.jl", "text": "Bijectors.jl\n\nimport Random\nRandom.seed!(468);\n\nusing Distributions: Normal, LogNormal, logpdf\nusing Statistics: mean, var\nusing Plots: histogram\n\nA bijection between two sets (Wikipedia) is, essentially, a one-to-one mapping between the elements of these sets. That is to say, if we have two sets \\(X\\) and \\(Y\\), then a bijection maps each element of \\(X\\) to a unique element of \\(Y\\). To return to our univariate example, where we transformed \\(x\\) to \\(y\\) using \\(y = \\exp(x)\\), the exponentiation function is a bijection because every value of \\(x\\) maps to one unique value of \\(y\\). The input set (the domain) is \\((-\\infty, \\infty)\\), and the output set (the codomain) is \\((0, \\infty)\\). (Here, \\((a, b)\\) denotes the open interval from \\(a\\) to \\(b\\) but excluding \\(a\\) and \\(b\\) themselves.)\nSince bijections are a one-to-one mapping between elements, we can also reverse the direction of this mapping to create an inverse function. In the case of \\(y = \\exp(x)\\), the inverse function is \\(x = \\log(y)\\).\n\n\n\n\n\n\nNote\n\n\n\nTechnically, the bijections in Bijectors.jl are functions \\(f: X \\to Y\\) for which:\n\n\\(f\\) is continuously differentiable, i.e. the derivative \\(\\mathrm{d}f(x)/\\mathrm{d}x\\) exists and is continuous (over the domain of interest \\(X\\));\nIf \\(f^{-1}: Y \\to X\\) is the inverse of \\(f\\), then that is also continuously differentiable (over its own domain, i.e. \\(Y\\)).\n\nThe technical mathematical term for this is a diffeomorphism (Wikipedia), but we call them ‘bijectors’.\nWhen thinking about continuous differentiability, it’s important to be conscious of the domains or codomains that we care about. For example, taking the inverse function \\(\\log(y)\\) from above, its derivative is \\(1/y\\), which is not continuous at \\(y = 0\\). However, we specified that the bijection \\(y = \\exp(x)\\) maps values of \\(x \\in (-\\infty, \\infty)\\) to \\(y \\in (0, \\infty)\\), so the point \\(y = 0\\) is not within the domain of the inverse function.\n\n\nSpecifically, one of the primary purposes of Bijectors.jl is to construct bijections which map constrained distributions to unconstrained ones. For example, the log-normal distribution which we saw in the previous page is constrained: its support, i.e. the range over which \\(p(x) > 0\\), is \\((0, \\infty)\\). However, we can transform that to an unconstrained distribution (the normal distribution) using the transformation \\(y = \\log(x)\\).\n\n\n\n\n\n\nNote\n\n\n\nBijectors.jl, as well as DynamicPPL (which we’ll come to later), can work with a much broader class of bijective transformations of variables, not just ones that go to the entire real line. But for the purposes of MCMC, unconstraining is the most common transformation, so we’ll stick with that terminology.\n\n\nThe bijector function, when applied to a distribution, returns a bijection \\(f\\) that can be used to map the constrained distribution to an unconstrained one. Unsurprisingly, for the log-normal distribution, the bijection is (a broadcasted version of) the \\(\\log\\) function.\n\nimport Bijectors as B\n\nf = B.bijector(LogNormal())\n\n(::Base.Fix1{typeof(broadcast), typeof(log)}) (generic function with 1 method)\n\n\nWe can apply this transformation to samples from the original distribution, for example:\n\nsamples_lognormal = rand(LogNormal(), 5)\n\nsamples_normal = f(samples_lognormal)\n\n5-element Vector{Float64}:\n 0.07200886749732066\n -0.07404375655951738\n 0.6327762377562545\n -0.9799776018729268\n 1.6115229499167665\n\n\nWe can also obtain the inverse of a bijection, \\(f^{-1}\\):\n\nf_inv = B.inverse(f)\n\nf_inv(samples_normal) == samples_lognormal\n\ntrue\n\n\nWe know that the transformation \\(y = \\log(x)\\) changes the log-normal distribution to the normal distribution. Bijectors.jl also gives us a way to access that transformed distribution:\n\ntransformed_dist = B.transformed(LogNormal(), f)\n\nBijectors.UnivariateTransformed{Distributions.LogNormal{Float64}, Base.Fix1{typeof(broadcast), typeof(log)}}(\ndist: Distributions.LogNormal{Float64}(μ=0.0, σ=1.0)\ntransform: Base.Fix1{typeof(broadcast), typeof(log)}(broadcast, log)\n)\n\n\nThis type doesn’t immediately look like a Normal(), but it behaves in exactly the same way. For example, we can sample from it and plot a histogram:\n\nsamples_plot = rand(transformed_dist, 5000)\nhistogram(samples_plot, bins=50)\n\n\n\n\nWe can also obtain the logpdf of the transformed distribution and check that it is the same as that of a normal distribution:\n\nprintln(\"Sample: $(samples_plot[1])\")\nprintln(\"Expected: $(logpdf(Normal(), samples_plot[1]))\")\nprintln(\"Actual: $(logpdf(transformed_dist, samples_plot[1]))\")\n\nSample: -0.2031149013821452\nExpected: -0.9395663647864121\nActual: -0.9395663647864121\n\n\nGiven the discussion in the previous sections, you might not be surprised to find that the logpdf of the transformed distribution is implemented using the Jacobian of the transformation. In particular, it directly uses the formula\n\\[\\log(q(\\mathbf{y})) = \\log(p(\\mathbf{x})) - \\log(|\\det(\\mathbf{J})|).\\]\nYou can access \\(\\log(|\\det(\\mathbf{J})|)\\) (evaluated at the point \\(\\mathbf{x}\\)) using the logabsdetjac function:\n\n# Reiterating the setup, just to be clear\noriginal_dist = LogNormal()\nx = rand(original_dist)\nf = B.bijector(original_dist)\ny = f(x)\ntransformed_dist = B.transformed(LogNormal(), f)\n\nprintln(\"log(q(y)) : $(logpdf(transformed_dist, y))\")\nprintln(\"log(p(x)) : $(logpdf(original_dist, x))\")\nprintln(\"log(|det(J)|) : $(B.logabsdetjac(f, x))\")\n\nlog(q(y)) : -0.9258400203646245\nlog(p(x)) : -0.8083539602557612\nlog(|det(J)|) : 0.11748606010886327\n\n\nfrom which you can see that the equation above holds. There are more functions available in the Bijectors.jl API; for full details do check out the documentation. For example, logpdf_with_trans can directly give us \\(\\log(q(\\mathbf{y}))\\) without going through the effort of constructing the bijector:\n\nB.logpdf_with_trans(original_dist, x, true)\n\n-0.9258400203646245", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Bijectors in MCMC" ] }, { "objectID": "developers/transforms/bijectors/index.html#the-case-for-bijectors-in-mcmc", "href": "developers/transforms/bijectors/index.html#the-case-for-bijectors-in-mcmc", "title": "Bijectors in MCMC", "section": "The case for bijectors in MCMC", "text": "The case for bijectors in MCMC\nConstraints pose a challenge for many numerical methods such as optimisation, and sampling is no exception to this. The problem is that for any value \\(x\\) outside of the support of a constrained distribution, \\(p(x)\\) will be zero, and the logpdf will be \\(-\\infty\\). Thus, any term that involves some ratio of probabilities (or equivalently, the logpdf) will be infinite.\n\nMetropolis with rejection\nTo see the practical impact of this on sampling, let’s attempt to sample from a log-normal distribution using a random walk Metropolis algorithm.\nOne way of handling constraints is to simply reject any steps that would take us out of bounds. This is a barebones implementation which does precisely that:\n\n# Take a step where the proposal is a normal distribution centred around\n# the current value. Return the new value, plus a flag to indicate whether\n# the new value was in bounds.\nfunction mh_step(logp, x, in_bounds)\n x_proposed = rand(Normal(x, 1))\n in_bounds(x_proposed) || return (x, false) # bounds check\n acceptance_logp = logp(x_proposed) - logp(x)\n return if log(rand()) < acceptance_logp\n (x_proposed, true) # successful step\n else\n (x, true) # failed step\n end\nend\n\n# Run a random walk Metropolis sampler.\n# `logp` : a function that takes `x` and returns the log pdf of the\n# distribution we're trying to sample from (up to a constant\n# additive factor)\n# `n_samples` : the number of samples to draw\n# `in_bounds` : a function that takes `x` and returns whether `x` is within\n# the support of the distribution\n# `x0` : the initial value\n# Returns a vector of samples, plus the number of times we went out of bounds.\nfunction mh(logp, n_samples, in_bounds; x0=1.0)\n samples = [x0]\n x = x0\n n_out_of_bounds = 0\n for _ in 2:n_samples\n x, inb = mh_step(logp, x, in_bounds)\n if !inb\n n_out_of_bounds += 1\n end\n push!(samples, x)\n end\n return (samples, n_out_of_bounds)\nend\n\nmh (generic function with 1 method)\n\n\n\n\n\n\n\n\nNote\n\n\n\nIn the MH algorithm, we technically do not need to explicitly check the proposal, because for any \\(x \\leq 0\\), we have that \\(p(x) = 0\\); thus, the acceptance probability will be zero. However, doing so here allows us to track how often this happens, and also illustrates the general principle of handling constraints by rejection.\n\n\nNow to actually perform the sampling:\n\nlogp(x) = logpdf(LogNormal(), x)\nsamples, n_out_of_bounds = mh(logp, 10000, x -> x > 0)\nhistogram(samples, bins=0:0.1:5; xlims=(0, 5))\n\n\n\n\nHow do we know that this has sampled correctly? For one, we can check that the mean of the samples are what we expect them to be. From Wikipedia, the mean of a log-normal distribution is given by \\(\\exp[\\mu + (\\sigma^2/2)]\\). For our log-normal distribution, we set \\(\\mu = 0\\) and \\(\\sigma = 1\\), so:\n\nprintln(\"expected mean: $(exp(0 + (1^2/2)))\")\nprintln(\" actual mean: $(mean(samples))\")\n\nexpected mean: 1.6487212707001282\n actual mean: 1.3347941996487\n\n\n\n\nMetropolis with transformation\nThe issue with this is that many of the sampling steps are unproductive, in that they bring us to the region of \\(x \\leq 0\\) and get rejected:\n\nprintln(\"went out of bounds $n_out_of_bounds/10000 times\")\n\nwent out of bounds 1870/10000 times\n\n\nAnd this could have been even worse if we had chosen a wider proposal distribution in the Metropolis step, or if the support of the distribution was narrower! In general, we probably don’t want to have to re-parameterise our proposal distribution each time we sample from a distribution with different constraints.\nThis is where the transformation functions from Bijectors.jl come in: we can use them to map the distribution to an unconstrained one and sample from that instead. Since the sampler only ever sees an unconstrained distribution, it doesn’t have to worry about checking for bounds.\nTo make this happen, instead of passing \\(\\log(p(x))\\) to the sampler, we pass \\(\\log(q(y))\\). This can be obtained using the Bijectors.logpdf_with_trans function that was introduced above.\n\nd = LogNormal()\nf = B.bijector(d) # Transformation function\nf_inv = B.inverse(f) # Inverse transformation function\nfunction logq(y)\n x = f_inv(y)\n return B.logpdf_with_trans(d, x, true)\nend\nsamples_transformed, n_oob_transformed = mh(logq, 10000, x -> true);\n\nNow, this process gives us samples that have been transformed, so we need to un-transform them to get the samples from the original distribution:\n\nsamples_untransformed = f_inv(samples_transformed)\nhistogram(samples_untransformed, bins=0:0.1:5; xlims=(0, 5))\n\n\n\n\nWe can check the mean of the samples too, to see that it is what we expect:\n\nprintln(\"expected mean: $(exp(0 + (1^2/2)))\")\nprintln(\" actual mean: $(mean(samples_untransformed))\")\n\nexpected mean: 1.6487212707001282\n actual mean: 1.7184757306010636\n\n\nOn top of that, we can also verify that we don’t ever go out of bounds:\n\nprintln(\"went out of bounds $n_oob_transformed/10000 times\")\n\nwent out of bounds 0/10000 times\n\n\n\n\nWhich one is better?\nIn the subsections above, we’ve seen two different methods of sampling from a constrained distribution:\n\nSample directly from the distribution and reject any samples outside of its support.\nTransform the distribution to an unconstrained one and sample from that instead.\n\n(Note that both of these methods are applicable to other samplers as well, such as Hamiltonian Monte Carlo.)\nOf course, a natural question to then ask is which one of these is better!\nOne option might be look at the sample means above to see which one is ‘closer’ to the expected mean. However, that’s not a very robust method because the sample mean is itself random, and if we were to use a different random seed we might well reach a different conclusion.\nAnother possibility we could look at the number of times the sample was rejected. Does a lower rejection rate (as in the transformed case) imply that the method is better? As it happens, this might seem like an intuitive conclusion, but it’s not necessarily the case: for example, the sampling in unconstrained space could be much less efficient, such that even though we’re not rejecting samples, the ones that we do get are overly correlated and thus not representative of the distribution.\nA robust comparison would involve performing both methods many times and seeing how reliable the sample mean is.\n\nfunction get_sample_mean(; transform)\n if transform\n # Sample from transformed distribution\n samples = f_inv(first(mh(logq, 10000, x -> true)))\n else\n # Sample from original distribution and reject if out of bounds\n samples = first(mh(logp, 10000, x -> x > 0))\n end\n return mean(samples)\nend\n\nget_sample_mean (generic function with 1 method)\n\n\n\nmeans_with_rejection = [get_sample_mean(; transform=false) for _ in 1:1000]\nmean(means_with_rejection), var(means_with_rejection)\n\n(1.652032684314151, 0.30454613712270745)\n\n\n\nmeans_with_transformation = [get_sample_mean(; transform=true) for _ in 1:1000]\nmean(means_with_transformation), var(means_with_transformation)\n\n(1.6489347143276902, 0.003945513418875533)\n\n\nWe can see from this small study that although both methods give us the correct mean (on average), the method with the transformation is more reliable, in that the variance is much lower!\n\n\n\n\n\n\nNote\n\n\n\nAlternatively, we could also try to directly measure how correlated the samples are. One way to do this is to calculate the effective sample size (ESS), which is described in the Stan documentation, and implemented in MCMCChains.jl. A larger ESS implies that the samples are less correlated, and thus more representative of the underlying distribution:\n\nusing MCMCChains: Chains, ess\n\nrejection = first(mh(logp, 10000, x -> x > 0))\ntransformation = f_inv(first(mh(logq, 10000, x -> true)))\nchn = Chains(hcat(rejection, transformation), [:rejection, :transformation])\ness(chn)\n\n\nESS\n\n parameters ess ess_per_sec\n Symbol Float64 Missing\n\n rejection 503.4349 missing\n transformation 1106.6909 missing\n\n\n\n\n\n\n\n\n\nWhat happens without the Jacobian?\nIn the transformation method above, we used Bijectors.logpdf_with_trans to calculate the log probability density of the transformed distribution. This function makes sure to include the Jacobian term when performing the transformation, and this is what makes sure that when we un-transform the samples, we get the correct distribution.\nThe next code block shows what happens if we don’t include the Jacobian term. In this logq_wrong, we’ve un-transformed y to x and calculated the logpdf with respect to its original distribution. This is exactly the same mistake that we made at the start of this article with naive_logpdf.\n\nfunction logq_wrong(y)\n x = f_inv(y)\n return logpdf(d, x) # no Jacobian term!\nend\nsamples_questionable, _ = mh(logq_wrong, 100000, x -> true)\nsamples_questionable_untransformed = f_inv(samples_questionable)\n\nprintln(\"mean: $(mean(samples_questionable_untransformed))\")\n\nmean: 0.5919166187308191\n\n\nYou can see that even though we used ten times more samples, the mean is quite wrong, which implies that our samples are not being drawn from the correct distribution.\nIn the next page, we’ll see how to use these transformations in the context of a probabilistic programming language, paying particular attention to their handling in DynamicPPL.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Bijectors in MCMC" ] }, { "objectID": "developers/transforms/distributions/index.html", "href": "developers/transforms/distributions/index.html", "title": "Distributions and the Jacobian", "section": "", "text": "This series of articles will seek to motivate the Bijectors.jl package, which provides the tools for transforming distributions in the Turing.jl probabilistic programming language.\nIt assumes:\nimport Random\nRandom.seed!(468);\n\nusing Distributions: Normal, LogNormal, logpdf, Distributions\nusing Plots: histogram", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Distributions and the Jacobian" ] }, { "objectID": "developers/transforms/distributions/index.html#sampling-from-a-distribution", "href": "developers/transforms/distributions/index.html#sampling-from-a-distribution", "title": "Distributions and the Jacobian", "section": "Sampling from a distribution", "text": "Sampling from a distribution\nTo sample from a distribution (as defined in Distributions.jl), we can use the rand function. Let’s sample from a normal distribution and then plot a histogram of the samples.\n\nsamples = rand(Normal(), 5000)\nhistogram(samples, bins=50)\n\n\n\n\n(Calling Normal() without any arguments, as we do here, gives us a normal distribution with mean 0 and standard deviation 1.) If you want to know the log probability density of observing any of the samples, you can use logpdf:\n\nprintln(\"sample: $(samples[1])\")\nprintln(\"logpdf: $(logpdf(Normal(), samples[1]))\")\n\nsample: 0.04374853981619864\nlogpdf: -0.9198955005726975\n\n\nThe probability density function for the normal distribution with mean 0 and standard deviation 1 is\n\\[p(x) = \\frac{1}{\\sqrt{2\\pi}} \\exp{\\left(-\\frac{x^2}{2}\\right)},\\]\nso we could also have calculated this manually using:\n\nlog(1 / sqrt(2π) * exp(-samples[1]^2 / 2))\n\n-0.9198955005726974\n\n\n(or more efficiently, -(samples[1]^2 + log2π) / 2, where log2π is from the IrrationalConstants.jl package).", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Distributions and the Jacobian" ] }, { "objectID": "developers/transforms/distributions/index.html#sampling-from-a-transformed-distribution", "href": "developers/transforms/distributions/index.html#sampling-from-a-transformed-distribution", "title": "Distributions and the Jacobian", "section": "Sampling from a transformed distribution", "text": "Sampling from a transformed distribution\nSay that \\(x\\) is distributed according to Normal(), and we want to draw samples of \\(y = \\exp(x)\\). Now, \\(y\\) is itself a random variable, and like any other random variable, will have a probability distribution, which we’ll call \\(q(y)\\).\nIn this specific case, the distribution of \\(y\\) is known as a log-normal distribution. For the purposes of this tutorial, let’s implement our own MyLogNormal distribution that we can sample from. (Distributions.jl already defines its own LogNormal, so we have to use a different name.) To do this, we need to overload Base.rand for our new distribution.\n\nstruct MyLogNormal <: Distributions.ContinuousUnivariateDistribution\n μ::Float64\n σ::Float64\nend\nMyLogNormal() = MyLogNormal(0.0, 1.0)\n\nfunction Base.rand(rng::Random.AbstractRNG, d::MyLogNormal)\n exp(rand(rng, Normal(d.μ, d.σ)))\nend\n\nNow we can do the same as above:\n\nsamples_lognormal = rand(MyLogNormal(), 5000)\n# Cut off the tail for clearer visualisation\nhistogram(samples_lognormal, bins=0:0.1:5; xlims=(0, 5))\n\n\n\n\nHow do we implement logpdf for our new distribution, though? Or in other words, if we observe a sample \\(y\\), how do we know what the probability of drawing that sample was?\nNaively, we might think to just un-transform the variable y by reversing the exponential, i.e. taking the logarithm. We could then use the logpdf of the original distribution of x.\n\nnaive_logpdf(d::MyLogNormal, y) = logpdf(Normal(d.μ, d.σ), log(y))\n\nnaive_logpdf (generic function with 1 method)\n\n\nWe can compare this function against the logpdf implemented in Distributions.jl:\n\nprintln(\"Sample : $(samples_lognormal[1])\")\nprintln(\"Expected : $(logpdf(LogNormal(), samples_lognormal[1]))\")\nprintln(\"Actual : $(naive_logpdf(MyLogNormal(), samples_lognormal[1]))\")\n\nSample : 2.2331001636281114\nExpected : -2.0450477723405234\nActual : -1.2416569444078478\n\n\nClearly this approach is not quite correct!", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Distributions and the Jacobian" ] }, { "objectID": "developers/transforms/distributions/index.html#the-derivative", "href": "developers/transforms/distributions/index.html#the-derivative", "title": "Distributions and the Jacobian", "section": "The derivative", "text": "The derivative\nThe reason why this doesn’t work is because transforming a (continuous) distribution causes probability density to be stretched and otherwise moved around. For example, in the normal distribution, half of the probability density is between \\(-\\infty\\) and \\(0\\), and half is between \\(0\\) and \\(\\infty\\). When exponentiated (i.e. in the log-normal distribution), the first half of the density is mapped to the interval \\((0, 1)\\), and the second half to \\((1, \\infty)\\).\nThis ‘explanation’ on its own does not really mean much, though. A perhaps more useful approach is to not talk about probability densities, but instead to make it more concrete by relating them to actual probabilities. If we think about the normal distribution as a continuous curve, what the probability density function \\(p(x)\\) really tells us is that: for any two points \\(a\\) and \\(b\\) (where \\(a \\leq b\\)), the probability of drawing a sample between \\(a\\) and \\(b\\) is the corresponding area under the curve, i.e.\n\\[\\int_a^b p(x) \\, \\mathrm{d}x.\\]\nFor example, if \\((a, b) = (-\\infty, \\infty)\\), then the probability of drawing a sample between \\(a\\) and \\(b\\) is 1.\nLet’s say that the probability density function of the log-normal distribution is \\(q(y)\\). Then, the area under the curve between the two points \\(\\exp(a)\\) and \\(\\exp(b)\\) is:\n\\[\\int_{\\exp(a)}^{\\exp(b)} q(y) \\, \\mathrm{d}y.\\]\nThis integral should be equal to the one above, because the probability of drawing from \\([a, b]\\) in the original distribution should be the same as the probability of drawing from \\([\\exp(a), \\exp(b)]\\) in the transformed distribution. The question we have to solve here is: how do we find a function \\(q(y)\\) such that this equality holds?\nWe can approach this by making the substitution \\(y = \\exp(x)\\) in the first integral (see Wikipedia for a refresher on substitutions in integrals, if needed). We have that:\n\\[\\frac{\\mathrm{d}y}{\\mathrm{d}x} = \\exp(x) = y \\implies \\mathrm{d}x = \\frac{1}{y}\\,\\mathrm{d}y\\]\nand so\n\\[\\int_{x=a}^{x=b} p(x) \\, \\mathrm{d}x\n = \\int_{y=\\exp(a)}^{y=\\exp(b)} p(\\log(y)) \\frac{1}{y} \\,\\mathrm{d}y\n = \\int_{\\exp(a)}^{\\exp(b)} q(y) \\, \\mathrm{d}y,\n\\]\nfrom which we can read off \\(q(y) = p(\\log(y)) / y\\).\nIn contrast, when we implemented naive_logpdf\n\nnaive_logpdf(d::MyLogNormal, y) = logpdf(Normal(d.μ, d.σ), log(y))\n\nnaive_logpdf (generic function with 1 method)\n\n\nthat was the equivalent of saying that \\(q(y) = p(\\log(y))\\). We left out a factor of \\(1/y\\)!\nIndeed, now we can define the correct logpdf function. Since everything is a logarithm here, instead of multiplying by \\(1/y\\) we subtract \\(\\log(y)\\):\n\nDistributions.logpdf(d::MyLogNormal, y) = logpdf(Normal(d.μ, d.σ), log(y)) - log(y)\n\nand check that it works:\n\nprintln(\"Sample : $(samples_lognormal[1])\")\nprintln(\"Expected : $(logpdf(LogNormal(), samples_lognormal[1]))\")\nprintln(\"Actual : $(logpdf(MyLogNormal(), samples_lognormal[1]))\")\n\nSample : 2.2331001636281114\nExpected : -2.0450477723405234\nActual : -2.0450477723405234\n\n\nThe same process can be applied to any kind of (invertible) transformation. If we have some transformation from \\(x\\) to \\(y\\), and the probability density functions of \\(x\\) and \\(y\\) are \\(p(x)\\) and \\(q(y)\\) respectively, then we have a general formula that:\n\\[q(y) = p(x) \\left| \\frac{\\mathrm{d}x}{\\mathrm{d}y} \\right|.\\]\nIn this case, we had \\(y = \\exp(x)\\), so \\(\\mathrm{d}x/\\mathrm{d}y = 1/y\\). (This equation is (11.5) in Bishop’s textbook.)\n\n\n\n\n\n\nNote\n\n\n\nThe absolute value here takes care of the case where \\(f\\) is a decreasing function, i.e., \\(f(x) > f(y)\\) when \\(x < y\\). You can try this out with the transformation \\(y = -\\exp(x)\\). If \\(a < b\\), then \\(-\\exp(a) > -\\exp(b)\\), and so you will have to swap the integration limits to ensure that the integral comes out positive.\n\n\nNote that \\(\\mathrm{d}y/\\mathrm{d}x\\) is equal to \\((\\mathrm{d}x/\\mathrm{d}y)^{-1}\\), so the formula above can also be written as:\n\\[q(y) \\left| \\frac{\\mathrm{d}y}{\\mathrm{d}x} \\right| = p(x).\\]", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Distributions and the Jacobian" ] }, { "objectID": "developers/transforms/distributions/index.html#the-jacobian", "href": "developers/transforms/distributions/index.html#the-jacobian", "title": "Distributions and the Jacobian", "section": "The Jacobian", "text": "The Jacobian\nIn general, we may have transforms that act on multivariate distributions: for example, something mapping \\(p(x_1, x_2)\\) to \\(q(y_1, y_2)\\). In this case, we need to extend the rule above by introducing what is known as the Jacobian matrix:\nIn this case, the rule above has to be extended by replacing the derivative \\(\\mathrm{d}x/\\mathrm{d}y\\) with the determinant of the inverse Jacobian matrix:\n\\[\\mathbf{J} = \\begin{pmatrix}\n\\partial y_1/\\partial x_1 & \\partial y_1/\\partial x_2 \\\\\n\\partial y_2/\\partial x_1 & \\partial y_2/\\partial x_2\n\\end{pmatrix}.\\]\nThis allows us to write the direct generalisation as:\n\\[q(y_1, y_2) \\left| \\det(\\mathbf{J}) \\right| = p(x_1, x_2),\\]\nor equivalently,\n\\[q(y_1, y_2) = p(x_1, x_2) \\left| \\det(\\mathbf{J}^{-1}) \\right|.\\]\nwhere \\(\\mathbf{J}^{-1}\\) is the inverse of the Jacobian matrix. This is the same as equation (11.9) in Bishop.\n\n\n\n\n\n\nNote\n\n\n\nInstead of inverting the original Jacobian matrix to get \\(\\mathbf{J}^{-1}\\), we could also use the Jacobian of the inverse function:\n\\[\\mathbf{J}_\\text{inv} = \\begin{pmatrix}\n\\partial x_1/\\partial y_1 & \\partial x_1/\\partial y_2 \\\\\n\\partial x_2/\\partial y_1 & \\partial x_2/\\partial y_2\n\\end{pmatrix}.\\]\nAs it turns out, these are entirely equivalent: the Jacobian of the inverse function is the inverse of the original Jacobian matrix.\n\n\nThe rest of this section will be devoted to an example to show that this works, and contains some slightly less pretty mathematics. If you are already suitably convinced by this stage, then you can skip the rest of this section. (Or if you prefer something more formal, the Wikipedia article on integration by substitution discusses the multivariate case as well.)\n\nAn example: the Box–Muller transform\nA motivating example where one might like to use a Jacobian is the Box–Muller transform, which is a technique for sampling from a normal distribution.\nThe Box–Muller transform works by first sampling two random variables from the uniform distribution between 0 and 1:\n\\[\\begin{align}\nx_1 &\\sim U(0, 1) \\\\\nx_2 &\\sim U(0, 1).\n\\end{align}\\]\nBoth of these have a probability density function of \\(p(x) = 1\\) for \\(0 < x \\leq 1\\), and 0 otherwise. Because they are independent, we can write that\n\\[p(x_1, x_2) = p(x_1) p(x_2) = \\begin{cases}\n1 & \\text{if } 0 < x_1 \\leq 1 \\text{ and } 0 < x_2 \\leq 1, \\\\\n0 & \\text{otherwise}.\n\\end{cases}\\]\nThe next step is to perform the transforms\n\\[\\begin{align}\ny_1 &= \\sqrt{-2 \\log(x_1)} \\cos(2\\pi x_2); \\\\\ny_2 &= \\sqrt{-2 \\log(x_1)} \\sin(2\\pi x_2),\n\\end{align}\\]\nand it turns out that with these transforms, both \\(y_1\\) and \\(y_2\\) are independent and normally distributed with mean 0 and standard deviation 1, i.e.\n\\[q(y_1, y_2) = \\frac{1}{2\\pi} \\exp{\\left(-\\frac{y_1^2}{2}\\right)} \\exp{\\left(-\\frac{y_2^2}{2}\\right)}.\\]\nHow can we show that this is the case?\nThere are many ways to work out the required calculus. Some are more elegant and some rather less so! One of the less headache-inducing ways is to define the intermediate variables:\n\\[r = \\sqrt{-2 \\log(x_1)}; \\quad \\theta = 2\\pi x_2,\\]\nfrom which we can see that \\(y_1 = r\\cos\\theta\\) and \\(y_2 = r\\sin\\theta\\), and hence\n\\[\\begin{align}\nx_1 &= \\exp{\\left(-\\frac{r^2}{2}\\right)} = \\exp{\\left(-\\frac{y_1^2}{2}\\right)}\\exp{\\left(-\\frac{y_2^2}{2}\\right)}; \\\\\nx_2 &= \\frac{\\theta}{2\\pi} = \\frac{1}{2\\pi} \\, \\arctan\\left(\\frac{y_2}{y_1}\\right).\n\\end{align}\\]\nThis lets us obtain the requisite partial derivatives in a way that doesn’t involve too much algebra. As an example, we have\n\\[\\frac{\\partial x_1}{\\partial y_1} = -y_1 \\exp{\\left(-\\frac{y_1^2}{2}\\right)}\\exp{\\left(-\\frac{y_2^2}{2}\\right)} = -y_1 x_1,\\]\n(where we used the product rule), and\n\\[\\frac{\\partial x_2}{\\partial y_1} = \\frac{1}{2\\pi} \\left(\\frac{1}{1 + (y_2/y_1)^2}\\right) \\left(-\\frac{y_2}{y_1^2}\\right),\\]\n(where we used the chain rule, and the derivative \\(\\mathrm{d}(\\arctan(a))/\\mathrm{d}a = 1/(1 + a^2)\\)).\nPutting together the Jacobian matrix, we have:\n\\[\\mathbf{J} = \\begin{pmatrix}\n-y_1 x_1 & -y_2 x_1 \\\\\n-cy_2/y_1^2 & c/y_1 \\\\\n\\end{pmatrix},\\]\nwhere \\(c = [2\\pi(1 + (y_2/y_1)^2)]^{-1}\\). The determinant of this matrix is\n\\[\\begin{align}\n\\det(\\mathbf{J}) &= -cx_1 - cx_1(y_2/y_1)^2 \\\\\n&= -cx_1\\left[1 + \\left(\\frac{y_2}{y_1}\\right)^2\\right] \\\\\n&= -\\frac{1}{2\\pi} x_1 \\\\\n&= -\\frac{1}{2\\pi}\\exp{\\left(-\\frac{y_1^2}{2}\\right)}\\exp{\\left(-\\frac{y_2^2}{2}\\right)},\n\\end{align}\\]\nComing right back to our probability density, we have that\n\\[\\begin{align}\nq(y_1, y_2) &= p(x_1, x_2) \\cdot |\\det(\\mathbf{J})| \\\\\n&= \\frac{1}{2\\pi}\\exp{\\left(-\\frac{y_1^2}{2}\\right)}\\exp{\\left(-\\frac{y_2^2}{2}\\right)},\n\\end{align}\\]\nas desired.\n\n\n\n\n\n\nNote\n\n\n\nWe haven’t yet explicitly accounted for the fact that \\(p(x_1, x_2)\\) is 0 if either \\(x_1\\) or \\(x_2\\) are outside the range \\((0, 1]\\). For example, if this constraint on \\(x_1\\) and \\(x_2\\) were to result in inaccessible values of \\(y_1\\) or \\(y_2\\), then \\(q(y_1, y_2)\\) should be 0 for those values. Formally, for the transformation \\(f: X \\to Y\\) where \\(X\\) is the unit square (i.e. \\(0 < x_1, x_2 \\leq 1\\)), \\(q(y_1, y_2)\\) should only take the above value for the image of \\(f\\), and anywhere outside of the image it should be 0.\nIn our case, the \\(\\log(x_1)\\) term in the transform varies between 0 and \\(\\infty\\), and the \\(\\cos(2\\pi x_2)\\) term ranges from \\(-1\\) to \\(1\\). Hence \\(y_1\\), which is the product of these two terms, ranges from \\(-\\infty\\) to \\(\\infty\\), and likewise for \\(y_2\\). So the image of \\(f\\) is the entire real plane, and we don’t have to worry about this.\n\n\nHaving seen the theory that underpins how distributions can be transformed, let’s now turn to how this is implemented in the Turing ecosystem.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Distributions and the Jacobian" ] }, { "objectID": "developers/inference/abstractmcmc-interface/index.html", "href": "developers/inference/abstractmcmc-interface/index.html", "title": "Interface Guide", "section": "", "text": "Turing implements a sampling interface (hosted at AbstractMCMC) that is intended to provide a common framework for Markov chain Monte Carlo samplers. The interface presents several structures and functions that one needs to overload in order to implement an interface-compatible sampler.\nThis guide will demonstrate how to implement the interface without Turing.\n\n\nAny implementation of an inference method that uses the AbstractMCMC interface should implement a subset of the following types and functions:\n\nA subtype of AbstractSampler, defined as a mutable struct containing state information or sampler parameters.\nA function sample_init! which performs any necessary set-up (default: do not perform any set-up).\nA function step! which returns a transition that represents a single draw from the sampler.\nA function transitions_init which returns a container for the transitions obtained from the sampler (default: return a Vector{T} of length N where T is the type of the transition obtained in the first step and N is the number of requested samples).\nA function transitions_save! which saves transitions to the container (default: save the transition of iteration i at position i in the vector of transitions).\nA function sample_end! which handles any sampler wrap-up (default: do not perform any wrap-up).\nA function bundle_samples which accepts the container of transitions and returns a collection of samples (default: return the vector of transitions).\n\nThe interface methods with exclamation points are those that are intended to allow for state mutation. Any mutating function is meant to allow mutation where needed – you might use:\n\nsample_init! to run some kind of sampler preparation, before sampling begins. This could mutate a sampler’s state.\nstep! might mutate a sampler flag after each sample.\nsample_end! contains any wrap-up you might need to do. If you were sampling in a transformed space, this might be where you convert everything back to a constrained space.\n\n\n\n\nThe motivation for the interface is to allow Julia’s fantastic probabilistic programming language community to have a set of standards and common implementations so we can all thrive together. Markov chain Monte Carlo methods tend to have a very similar framework to one another, and so a common interface should help more great inference methods built in single-purpose packages to experience more use among the community.\n\n\n\nMetropolis-Hastings is often the first sampling method that people are exposed to. It is a very straightforward algorithm and is accordingly the easiest to implement, so it makes for a good example. In this section, you will learn how to use the types and functions listed above to implement the Metropolis-Hastings sampler using the MCMC interface.\nThe full code for this implementation is housed in AdvancedMH.jl.\n\n\nLet’s begin by importing the relevant libraries. We’ll import AbstractMCMC, which contains the interface framework we’ll fill out. We also need Distributions and Random.\n\n# Import the relevant libraries.\nusing AbstractMCMC: AbstractMCMC\nusing Distributions\nusing Random\n\nAn interface extension (like the one we’re writing right now) typically requires that you overload or implement several functions. Specifically, you should import the functions you intend to overload. This next code block accomplishes that.\nFrom Distributions, we need Sampleable, VariateForm, and ValueSupport, three abstract types that define a distribution. Models in the interface are assumed to be subtypes of Sampleable{VariateForm, ValueSupport}. In this section our model is going to be extremely simple, so we will not end up using these except to make sure that the inference functions are dispatching correctly.\n\n\n\nLet’s begin our sampler definition by defining a sampler called MetropolisHastings which is a subtype of AbstractSampler. Correct typing is very important for proper interface implementation – if you are missing a subtype, your method may not be dispatched to when you call sample.\n\n# Define a sampler type.\nstruct MetropolisHastings{T,D} <: AbstractMCMC.AbstractSampler\n init_θ::T\n proposal::D\nend\n\n# Default constructors.\nMetropolisHastings(init_θ::Real) = MetropolisHastings(init_θ, Normal(0, 1))\nfunction MetropolisHastings(init_θ::Vector{<:Real})\n return MetropolisHastings(init_θ, MvNormal(zero(init_θ), I))\nend\n\nMetropolisHastings\n\n\nAbove, we have defined a sampler that stores the initial parameterisation of the prior, and a distribution object from which proposals are drawn. You can have a struct that has no fields, and simply use it for dispatching onto the relevant functions, or you can store a large amount of state information in your sampler.\nThe general intuition for what to store in your sampler struct is that anything you may need to perform inference between samples but you don’t want to store in a transition should go into the sampler struct. It’s the only way you can carry non-sample related state information between step! calls.\n\n\n\nNext, we need to have a model of some kind. A model is a struct that’s a subtype of AbstractModel that contains whatever information is necessary to perform inference on your problem. In our case we want to know the mean and variance parameters for a standard Normal distribution, so we can keep our model to the log density of a Normal.\nNote that we only have to do this because we are not yet integrating the sampler with Turing – Turing has a very sophisticated modelling engine that removes the need to define custom model structs.\n\n# Define a model type. Stores the log density function.\nstruct DensityModel{F<:Function} <: AbstractMCMC.AbstractModel\n ℓπ::F\nend\n\n\n\n\nThe next step is to define some transition which we will return from each step! call. We’ll keep it simple by just defining a wrapper struct that contains the parameter draws and the log density of that draw:\n\n# Create a very basic Transition type, only stores the \n# parameter draws and the log probability of the draw.\nstruct Transition{T,L}\n θ::T\n lp::L\nend\n\n# Store the new draw and its log density.\nTransition(model::DensityModel, θ) = Transition(θ, ℓπ(model, θ))\n\nTransition\n\n\nTransition can now store any type of parameter, whether it’s a vector of draws from multiple parameters or a single univariate draw.\n\n\n\nNow it’s time to get into the actual inference. We’ve defined all of the core pieces we need, but we need to implement the step! function which actually performs inference.\nAs a refresher, Metropolis-Hastings implements a very basic algorithm:\n\nPick some initial state, \\theta_0.\nFor t in [1,N], do\n\nGenerate a proposal parameterisation \\theta^\\prime_t \\sim q(\\theta^\\prime_t \\mid \\theta_{t-1}).\nCalculate the acceptance probability, \\alpha = \\text{min}\\left[1,\\frac{\\pi(\\theta'_t)}{\\pi(\\theta_{t-1})} \\frac{q(\\theta_{t-1} \\mid \\theta'_t)}{q(\\theta'_t \\mid \\theta_{t-1})}) \\right].\nIf U \\le \\alpha where U \\sim [0,1], then \\theta_t = \\theta'_t. Otherwise, \\theta_t = \\theta_{t-1}.\n\n\nOf course, it’s much easier to do this in the log space, so the acceptance probability is more commonly written as\n\\log \\alpha = \\min\\left[0, \\log \\pi(\\theta'_t) - \\log \\pi(\\theta_{t-1}) + \\log q(\\theta_{t-1} \\mid \\theta^\\prime_t) - \\log q(\\theta\\prime_t \\mid \\theta_{t-1}) \\right].\nIn interface terms, we should do the following:\n\nMake a new transition containing a proposed sample.\nCalculate the acceptance probability.\nIf we accept, return the new transition, otherwise, return the old one.\n\n\n\n\nThe step! function is the function that performs the bulk of your inference. In our case, we will implement two step! functions – one for the very first iteration, and one for every subsequent iteration.\n\n# Define the first step! function, which is called at the \n# beginning of sampling. Return the initial parameter used\n# to define the sampler.\nfunction AbstractMCMC.step!(\n rng::AbstractRNG,\n model::DensityModel,\n spl::MetropolisHastings,\n N::Integer,\n ::Nothing;\n kwargs...,\n)\n return Transition(model, spl.init_θ)\nend\n\nThe first step! function just packages up the initial parameterisation inside the sampler, and returns it. We implicitly accept the very first parameterisation.\nThe other step! function performs the usual steps from Metropolis-Hastings. Included are several helper functions, proposal and q, which are designed to replicate the functions in the pseudocode above.\n\nproposal generates a new proposal in the form of a Transition, which can be univariate if the value passed in is univariate, or it can be multivariate if the Transition given is multivariate. Proposals use a basic Normal or MvNormal proposal distribution.\nq returns the log density of one parameterisation conditional on another, according to the proposal distribution.\nstep! generates a new proposal, checks the acceptance probability, and then returns either the previous transition or the proposed transition.\n\n\n# Define a function that makes a basic proposal depending on a univariate\n# parameterisation or a multivariate parameterisation.\nfunction propose(spl::MetropolisHastings, model::DensityModel, θ::Real)\n return Transition(model, θ + rand(spl.proposal))\nend\nfunction propose(spl::MetropolisHastings, model::DensityModel, θ::Vector{<:Real})\n return Transition(model, θ + rand(spl.proposal))\nend\nfunction propose(spl::MetropolisHastings, model::DensityModel, t::Transition)\n return propose(spl, model, t.θ)\nend\n\n# Calculates the probability `q(θ|θcond)`, using the proposal distribution `spl.proposal`.\nq(spl::MetropolisHastings, θ::Real, θcond::Real) = logpdf(spl.proposal, θ - θcond)\nfunction q(spl::MetropolisHastings, θ::Vector{<:Real}, θcond::Vector{<:Real})\n return logpdf(spl.proposal, θ - θcond)\nend\nq(spl::MetropolisHastings, t1::Transition, t2::Transition) = q(spl, t1.θ, t2.θ)\n\n# Calculate the density of the model given some parameterisation.\nℓπ(model::DensityModel, θ) = model.ℓπ(θ)\nℓπ(model::DensityModel, t::Transition) = t.lp\n\n# Define the other step function. Returns a Transition containing\n# either a new proposal (if accepted) or the previous proposal \n# (if not accepted).\nfunction AbstractMCMC.step!(\n rng::AbstractRNG,\n model::DensityModel,\n spl::MetropolisHastings,\n ::Integer,\n θ_prev::Transition;\n kwargs...,\n)\n # Generate a new proposal.\n θ = propose(spl, model, θ_prev)\n\n # Calculate the log acceptance probability.\n α = ℓπ(model, θ) - ℓπ(model, θ_prev) + q(spl, θ_prev, θ) - q(spl, θ, θ_prev)\n\n # Decide whether to return the previous θ or the new one.\n if log(rand(rng)) < min(α, 0.0)\n return θ\n else\n return θ_prev\n end\nend\n\n\n\n\nIn the default implementation, sample just returns a vector of all transitions. If instead you would like to obtain a Chains object (e.g., to simplify downstream analysis), you have to implement the bundle_samples function as well. It accepts the vector of transitions and returns a collection of samples. Fortunately, our Transition is incredibly simple, and we only need to build a little bit of functionality to accept custom parameter names passed in by the user.\n\n# A basic chains constructor that works with the Transition struct we defined.\nfunction AbstractMCMC.bundle_samples(\n rng::AbstractRNG,\n ℓ::DensityModel,\n s::MetropolisHastings,\n N::Integer,\n ts::Vector{<:Transition},\n chain_type::Type{Any};\n param_names=missing,\n kwargs...,\n)\n # Turn all the transitions into a vector-of-vectors.\n vals = copy(reduce(hcat, [vcat(t.θ, t.lp) for t in ts])')\n\n # Check if we received any parameter names.\n if ismissing(param_names)\n param_names = [\"Parameter $i\" for i in 1:(length(first(vals)) - 1)]\n end\n\n # Add the log density field to the parameter names.\n push!(param_names, \"lp\")\n\n # Bundle everything up and return a Chains struct.\n return Chains(vals, param_names, (internals=[\"lp\"],))\nend\n\nAll done!\nYou can even implement different output formats by implementing bundle_samples for different chain_types, which can be provided as keyword argument to sample. As default sample uses chain_type = Any.\n\n\n\nNow that we have all the pieces, we should test the implementation by defining a model to calculate the mean and variance parameters of a Normal distribution. We can do this by constructing a target density function, providing a sample of data, and then running the sampler with sample.\n\n# Generate a set of data from the posterior we want to estimate.\ndata = rand(Normal(5, 3), 30)\n\n# Define the components of a basic model.\ninsupport(θ) = θ[2] >= 0\ndist(θ) = Normal(θ[1], θ[2])\ndensity(θ) = insupport(θ) ? sum(logpdf.(dist(θ), data)) : -Inf\n\n# Construct a DensityModel.\nmodel = DensityModel(density)\n\n# Set up our sampler with initial parameters.\nspl = MetropolisHastings([0.0, 0.0])\n\n# Sample from the posterior.\nchain = sample(model, spl, 100000; param_names=[\"μ\", \"σ\"])\n\nIf all the interface functions have been extended properly, you should get an output from display(chain) that looks something like this:\nObject of type Chains, with data of type 100000×3×1 Array{Float64,3}\n\nIterations = 1:100000\nThinning interval = 1\nChains = 1\nSamples per chain = 100000\ninternals = lp\nparameters = μ, σ\n\n2-element Array{ChainDataFrame,1}\n\nSummary Statistics\n\n│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │\n│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │\n├─────┼────────────┼─────────┼──────────┼────────────┼────────────┼─────────┼─────────┤\n│ 1 │ μ │ 5.33157 │ 0.854193 │ 0.0027012 │ 0.00893069 │ 8344.75 │ 1.00009 │\n│ 2 │ σ │ 4.54992 │ 0.632916 │ 0.00200146 │ 0.00534942 │ 14260.8 │ 1.00005 │\n\nQuantiles\n\n│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │\n│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │\n├─────┼────────────┼─────────┼─────────┼─────────┼─────────┼─────────┤\n│ 1 │ μ │ 3.6595 │ 4.77754 │ 5.33182 │ 5.89509 │ 6.99651 │\n│ 2 │ σ │ 3.5097 │ 4.09732 │ 4.47805 │ 4.93094 │ 5.96821 │\nIt looks like we’re extremely close to our true parameters of Normal(5,3), though with a fairly high variance due to the low sample size.\n\n\n\n\nWe’ve seen how to implement the sampling interface for general projects. Turing’s interface methods are ever-evolving, so please open an issue at AbstractMCMC with feature requests or problems." }, { "objectID": "developers/inference/abstractmcmc-interface/index.html#interface-overview", "href": "developers/inference/abstractmcmc-interface/index.html#interface-overview", "title": "Interface Guide", "section": "", "text": "Any implementation of an inference method that uses the AbstractMCMC interface should implement a subset of the following types and functions:\n\nA subtype of AbstractSampler, defined as a mutable struct containing state information or sampler parameters.\nA function sample_init! which performs any necessary set-up (default: do not perform any set-up).\nA function step! which returns a transition that represents a single draw from the sampler.\nA function transitions_init which returns a container for the transitions obtained from the sampler (default: return a Vector{T} of length N where T is the type of the transition obtained in the first step and N is the number of requested samples).\nA function transitions_save! which saves transitions to the container (default: save the transition of iteration i at position i in the vector of transitions).\nA function sample_end! which handles any sampler wrap-up (default: do not perform any wrap-up).\nA function bundle_samples which accepts the container of transitions and returns a collection of samples (default: return the vector of transitions).\n\nThe interface methods with exclamation points are those that are intended to allow for state mutation. Any mutating function is meant to allow mutation where needed – you might use:\n\nsample_init! to run some kind of sampler preparation, before sampling begins. This could mutate a sampler’s state.\nstep! might mutate a sampler flag after each sample.\nsample_end! contains any wrap-up you might need to do. If you were sampling in a transformed space, this might be where you convert everything back to a constrained space." }, { "objectID": "developers/inference/abstractmcmc-interface/index.html#why-do-you-have-an-interface", "href": "developers/inference/abstractmcmc-interface/index.html#why-do-you-have-an-interface", "title": "Interface Guide", "section": "", "text": "The motivation for the interface is to allow Julia’s fantastic probabilistic programming language community to have a set of standards and common implementations so we can all thrive together. Markov chain Monte Carlo methods tend to have a very similar framework to one another, and so a common interface should help more great inference methods built in single-purpose packages to experience more use among the community." }, { "objectID": "developers/inference/abstractmcmc-interface/index.html#implementing-metropolis-hastings-without-turing", "href": "developers/inference/abstractmcmc-interface/index.html#implementing-metropolis-hastings-without-turing", "title": "Interface Guide", "section": "", "text": "Metropolis-Hastings is often the first sampling method that people are exposed to. It is a very straightforward algorithm and is accordingly the easiest to implement, so it makes for a good example. In this section, you will learn how to use the types and functions listed above to implement the Metropolis-Hastings sampler using the MCMC interface.\nThe full code for this implementation is housed in AdvancedMH.jl.\n\n\nLet’s begin by importing the relevant libraries. We’ll import AbstractMCMC, which contains the interface framework we’ll fill out. We also need Distributions and Random.\n\n# Import the relevant libraries.\nusing AbstractMCMC: AbstractMCMC\nusing Distributions\nusing Random\n\nAn interface extension (like the one we’re writing right now) typically requires that you overload or implement several functions. Specifically, you should import the functions you intend to overload. This next code block accomplishes that.\nFrom Distributions, we need Sampleable, VariateForm, and ValueSupport, three abstract types that define a distribution. Models in the interface are assumed to be subtypes of Sampleable{VariateForm, ValueSupport}. In this section our model is going to be extremely simple, so we will not end up using these except to make sure that the inference functions are dispatching correctly.\n\n\n\nLet’s begin our sampler definition by defining a sampler called MetropolisHastings which is a subtype of AbstractSampler. Correct typing is very important for proper interface implementation – if you are missing a subtype, your method may not be dispatched to when you call sample.\n\n# Define a sampler type.\nstruct MetropolisHastings{T,D} <: AbstractMCMC.AbstractSampler\n init_θ::T\n proposal::D\nend\n\n# Default constructors.\nMetropolisHastings(init_θ::Real) = MetropolisHastings(init_θ, Normal(0, 1))\nfunction MetropolisHastings(init_θ::Vector{<:Real})\n return MetropolisHastings(init_θ, MvNormal(zero(init_θ), I))\nend\n\nMetropolisHastings\n\n\nAbove, we have defined a sampler that stores the initial parameterisation of the prior, and a distribution object from which proposals are drawn. You can have a struct that has no fields, and simply use it for dispatching onto the relevant functions, or you can store a large amount of state information in your sampler.\nThe general intuition for what to store in your sampler struct is that anything you may need to perform inference between samples but you don’t want to store in a transition should go into the sampler struct. It’s the only way you can carry non-sample related state information between step! calls.\n\n\n\nNext, we need to have a model of some kind. A model is a struct that’s a subtype of AbstractModel that contains whatever information is necessary to perform inference on your problem. In our case we want to know the mean and variance parameters for a standard Normal distribution, so we can keep our model to the log density of a Normal.\nNote that we only have to do this because we are not yet integrating the sampler with Turing – Turing has a very sophisticated modelling engine that removes the need to define custom model structs.\n\n# Define a model type. Stores the log density function.\nstruct DensityModel{F<:Function} <: AbstractMCMC.AbstractModel\n ℓπ::F\nend\n\n\n\n\nThe next step is to define some transition which we will return from each step! call. We’ll keep it simple by just defining a wrapper struct that contains the parameter draws and the log density of that draw:\n\n# Create a very basic Transition type, only stores the \n# parameter draws and the log probability of the draw.\nstruct Transition{T,L}\n θ::T\n lp::L\nend\n\n# Store the new draw and its log density.\nTransition(model::DensityModel, θ) = Transition(θ, ℓπ(model, θ))\n\nTransition\n\n\nTransition can now store any type of parameter, whether it’s a vector of draws from multiple parameters or a single univariate draw.\n\n\n\nNow it’s time to get into the actual inference. We’ve defined all of the core pieces we need, but we need to implement the step! function which actually performs inference.\nAs a refresher, Metropolis-Hastings implements a very basic algorithm:\n\nPick some initial state, \\theta_0.\nFor t in [1,N], do\n\nGenerate a proposal parameterisation \\theta^\\prime_t \\sim q(\\theta^\\prime_t \\mid \\theta_{t-1}).\nCalculate the acceptance probability, \\alpha = \\text{min}\\left[1,\\frac{\\pi(\\theta'_t)}{\\pi(\\theta_{t-1})} \\frac{q(\\theta_{t-1} \\mid \\theta'_t)}{q(\\theta'_t \\mid \\theta_{t-1})}) \\right].\nIf U \\le \\alpha where U \\sim [0,1], then \\theta_t = \\theta'_t. Otherwise, \\theta_t = \\theta_{t-1}.\n\n\nOf course, it’s much easier to do this in the log space, so the acceptance probability is more commonly written as\n\\log \\alpha = \\min\\left[0, \\log \\pi(\\theta'_t) - \\log \\pi(\\theta_{t-1}) + \\log q(\\theta_{t-1} \\mid \\theta^\\prime_t) - \\log q(\\theta\\prime_t \\mid \\theta_{t-1}) \\right].\nIn interface terms, we should do the following:\n\nMake a new transition containing a proposed sample.\nCalculate the acceptance probability.\nIf we accept, return the new transition, otherwise, return the old one.\n\n\n\n\nThe step! function is the function that performs the bulk of your inference. In our case, we will implement two step! functions – one for the very first iteration, and one for every subsequent iteration.\n\n# Define the first step! function, which is called at the \n# beginning of sampling. Return the initial parameter used\n# to define the sampler.\nfunction AbstractMCMC.step!(\n rng::AbstractRNG,\n model::DensityModel,\n spl::MetropolisHastings,\n N::Integer,\n ::Nothing;\n kwargs...,\n)\n return Transition(model, spl.init_θ)\nend\n\nThe first step! function just packages up the initial parameterisation inside the sampler, and returns it. We implicitly accept the very first parameterisation.\nThe other step! function performs the usual steps from Metropolis-Hastings. Included are several helper functions, proposal and q, which are designed to replicate the functions in the pseudocode above.\n\nproposal generates a new proposal in the form of a Transition, which can be univariate if the value passed in is univariate, or it can be multivariate if the Transition given is multivariate. Proposals use a basic Normal or MvNormal proposal distribution.\nq returns the log density of one parameterisation conditional on another, according to the proposal distribution.\nstep! generates a new proposal, checks the acceptance probability, and then returns either the previous transition or the proposed transition.\n\n\n# Define a function that makes a basic proposal depending on a univariate\n# parameterisation or a multivariate parameterisation.\nfunction propose(spl::MetropolisHastings, model::DensityModel, θ::Real)\n return Transition(model, θ + rand(spl.proposal))\nend\nfunction propose(spl::MetropolisHastings, model::DensityModel, θ::Vector{<:Real})\n return Transition(model, θ + rand(spl.proposal))\nend\nfunction propose(spl::MetropolisHastings, model::DensityModel, t::Transition)\n return propose(spl, model, t.θ)\nend\n\n# Calculates the probability `q(θ|θcond)`, using the proposal distribution `spl.proposal`.\nq(spl::MetropolisHastings, θ::Real, θcond::Real) = logpdf(spl.proposal, θ - θcond)\nfunction q(spl::MetropolisHastings, θ::Vector{<:Real}, θcond::Vector{<:Real})\n return logpdf(spl.proposal, θ - θcond)\nend\nq(spl::MetropolisHastings, t1::Transition, t2::Transition) = q(spl, t1.θ, t2.θ)\n\n# Calculate the density of the model given some parameterisation.\nℓπ(model::DensityModel, θ) = model.ℓπ(θ)\nℓπ(model::DensityModel, t::Transition) = t.lp\n\n# Define the other step function. Returns a Transition containing\n# either a new proposal (if accepted) or the previous proposal \n# (if not accepted).\nfunction AbstractMCMC.step!(\n rng::AbstractRNG,\n model::DensityModel,\n spl::MetropolisHastings,\n ::Integer,\n θ_prev::Transition;\n kwargs...,\n)\n # Generate a new proposal.\n θ = propose(spl, model, θ_prev)\n\n # Calculate the log acceptance probability.\n α = ℓπ(model, θ) - ℓπ(model, θ_prev) + q(spl, θ_prev, θ) - q(spl, θ, θ_prev)\n\n # Decide whether to return the previous θ or the new one.\n if log(rand(rng)) < min(α, 0.0)\n return θ\n else\n return θ_prev\n end\nend\n\n\n\n\nIn the default implementation, sample just returns a vector of all transitions. If instead you would like to obtain a Chains object (e.g., to simplify downstream analysis), you have to implement the bundle_samples function as well. It accepts the vector of transitions and returns a collection of samples. Fortunately, our Transition is incredibly simple, and we only need to build a little bit of functionality to accept custom parameter names passed in by the user.\n\n# A basic chains constructor that works with the Transition struct we defined.\nfunction AbstractMCMC.bundle_samples(\n rng::AbstractRNG,\n ℓ::DensityModel,\n s::MetropolisHastings,\n N::Integer,\n ts::Vector{<:Transition},\n chain_type::Type{Any};\n param_names=missing,\n kwargs...,\n)\n # Turn all the transitions into a vector-of-vectors.\n vals = copy(reduce(hcat, [vcat(t.θ, t.lp) for t in ts])')\n\n # Check if we received any parameter names.\n if ismissing(param_names)\n param_names = [\"Parameter $i\" for i in 1:(length(first(vals)) - 1)]\n end\n\n # Add the log density field to the parameter names.\n push!(param_names, \"lp\")\n\n # Bundle everything up and return a Chains struct.\n return Chains(vals, param_names, (internals=[\"lp\"],))\nend\n\nAll done!\nYou can even implement different output formats by implementing bundle_samples for different chain_types, which can be provided as keyword argument to sample. As default sample uses chain_type = Any.\n\n\n\nNow that we have all the pieces, we should test the implementation by defining a model to calculate the mean and variance parameters of a Normal distribution. We can do this by constructing a target density function, providing a sample of data, and then running the sampler with sample.\n\n# Generate a set of data from the posterior we want to estimate.\ndata = rand(Normal(5, 3), 30)\n\n# Define the components of a basic model.\ninsupport(θ) = θ[2] >= 0\ndist(θ) = Normal(θ[1], θ[2])\ndensity(θ) = insupport(θ) ? sum(logpdf.(dist(θ), data)) : -Inf\n\n# Construct a DensityModel.\nmodel = DensityModel(density)\n\n# Set up our sampler with initial parameters.\nspl = MetropolisHastings([0.0, 0.0])\n\n# Sample from the posterior.\nchain = sample(model, spl, 100000; param_names=[\"μ\", \"σ\"])\n\nIf all the interface functions have been extended properly, you should get an output from display(chain) that looks something like this:\nObject of type Chains, with data of type 100000×3×1 Array{Float64,3}\n\nIterations = 1:100000\nThinning interval = 1\nChains = 1\nSamples per chain = 100000\ninternals = lp\nparameters = μ, σ\n\n2-element Array{ChainDataFrame,1}\n\nSummary Statistics\n\n│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │\n│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │\n├─────┼────────────┼─────────┼──────────┼────────────┼────────────┼─────────┼─────────┤\n│ 1 │ μ │ 5.33157 │ 0.854193 │ 0.0027012 │ 0.00893069 │ 8344.75 │ 1.00009 │\n│ 2 │ σ │ 4.54992 │ 0.632916 │ 0.00200146 │ 0.00534942 │ 14260.8 │ 1.00005 │\n\nQuantiles\n\n│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │\n│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │\n├─────┼────────────┼─────────┼─────────┼─────────┼─────────┼─────────┤\n│ 1 │ μ │ 3.6595 │ 4.77754 │ 5.33182 │ 5.89509 │ 6.99651 │\n│ 2 │ σ │ 3.5097 │ 4.09732 │ 4.47805 │ 4.93094 │ 5.96821 │\nIt looks like we’re extremely close to our true parameters of Normal(5,3), though with a fairly high variance due to the low sample size." }, { "objectID": "developers/inference/abstractmcmc-interface/index.html#conclusion", "href": "developers/inference/abstractmcmc-interface/index.html#conclusion", "title": "Interface Guide", "section": "", "text": "We’ve seen how to implement the sampling interface for general projects. Turing’s interface methods are ever-evolving, so please open an issue at AbstractMCMC with feature requests or problems." }, { "objectID": "developers/inference/variational-inference/index.html", "href": "developers/inference/variational-inference/index.html", "title": "Variational Inference", "section": "", "text": "In this post, we’ll examine variational inference (VI), a family of approximate Bayesian inference methods. We will focus on one of the more standard VI methods, Automatic Differentiation Variational Inference (ADVI).\nHere, we’ll examine the theory behind VI, but if you’re interested in using ADVI in Turing, check out this tutorial.", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Variational Inference" ] }, { "objectID": "developers/inference/variational-inference/index.html#computing-kl-divergence-without-knowing-the-posterior", "href": "developers/inference/variational-inference/index.html#computing-kl-divergence-without-knowing-the-posterior", "title": "Variational Inference", "section": "Computing KL-divergence without knowing the posterior", "text": "Computing KL-divergence without knowing the posterior\nFirst off, recall that\n\n\\[\np(z \\mid x\\_i) = \\frac{p(x\\_i, z)}{p(x\\_i)}\n\\]\n\nso we can write\n\n\\[\n\\begin{align*}\n\\mathrm{D\\_{KL}} \\left( q(z), p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) \\right) &= \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log q(z) \\right] - \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) - \\log p(x\\_i) \\right] \\\\\n &= \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log q(z) \\right] - \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) \\right] + \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x_i) \\right] \\\\\n &= \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log q(z) \\right] - \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) \\right] + \\sum\\_{i = 1}^n \\log p(x\\_i),\n\\end{align*}\n\\]\n\nwhere in the last equality we used the fact that \\(p(x_i)\\) is independent of \\(z\\).\nNow you’re probably thinking “Oh great! Now you’ve introduced \\(p(x_i)\\) which we also can’t compute (in general)!”. Woah. Calm down human. Let’s do some more algebra. The above expression can be rearranged to\n\n\\[\n\\mathrm{D\\_{KL}} \\left( q(z), p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) \\right) + \\underbrace{\\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) \\right] - \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log q(z) \\right]}\\_{=: \\mathrm{ELBO}(q)} = \\underbrace{\\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i) \\right]}\\_{\\text{constant}}.\n\\]\n\nSee? The left-hand side is constant and, as we mentioned before, \\(\\mathrm{D_{KL}} \\ge 0\\). What happens if we try to maximize the term we just gave the completely arbitrary name \\(\\mathrm{ELBO}\\)? Well, if \\(\\mathrm{ELBO}\\) goes up while \\(p(x_i)\\) stays constant then \\(\\mathrm{D_{KL}}\\) has to go down! That is, the \\(q(z)\\) which minimizes the KL-divergence is the same \\(q(z)\\) which maximizes \\(\\mathrm{ELBO}(q)\\):\n\n\\[\n\\underset{q}{\\mathrm{argmin}} \\ \\mathrm{D\\_{KL}} \\left( q(z), p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) \\right) = \\underset{q}{\\mathrm{argmax}} \\ \\mathrm{ELBO}(q)\n\\]\n\nwhere\n\n\\[\n\\begin{align*}\n\\mathrm{ELBO}(q) &:= \\left( \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) \\right] \\right) - \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log q(z) \\right] \\\\\n &= \\left( \\sum\\_{i = 1}^n \\mathbb{E}\\_{z \\sim q(z)} \\left[ \\log p(x\\_i, z) \\right] \\right) + \\mathbb{H}\\left( q(z) \\right)\n\\end{align*}\n\\]\n\nand \\(\\mathbb{H} \\left(q(z) \\right)\\) denotes the (differential) entropy of \\(q(z)\\).\nAssuming joint \\(p(x_i, z)\\) and the entropy \\(\\mathbb{H}\\left(q(z)\\right)\\) are both tractable, we can use a Monte-Carlo for the remaining expectation. This leaves us with the following tractable expression\n\n\\[\n\\underset{q}{\\mathrm{argmin}} \\ \\mathrm{D\\_{KL}} \\left( q(z), p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) \\right) \\approx \\underset{q}{\\mathrm{argmax}} \\ \\widehat{\\mathrm{ELBO}}(q)\n\\]\n\nwhere\n\n\\[\n\\widehat{\\mathrm{ELBO}}(q) = \\frac{1}{m} \\left( \\sum\\_{k = 1}^m \\sum\\_{i = 1}^n \\log p(x\\_i, z\\_k) \\right) + \\mathbb{H} \\left(q(z)\\right) \\quad \\text{where} \\quad z\\_k \\sim q(z) \\quad \\forall k = 1, \\dots, m.\n\\]\n\nHence, as long as we can sample from \\(q(z)\\) somewhat efficiently, we can indeed minimize the KL-divergence! Neat, eh?\nSidenote: in the case where \\(q(z)\\) is tractable but \\(\\mathbb{H} \\left(q(z) \\right)\\) is not , we can use a Monte-Carlo estimate for this term too but this generally results in a higher-variance estimate.\nAlso, I fooled you real good: the ELBO isn’t an arbitrary name, hah! In fact it’s an abbreviation for the expected lower bound (ELBO) because it, uhmm, well, it’s the expected lower bound (remember \\(\\mathrm{D_{KL}} \\ge 0\\)). Yup.", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Variational Inference" ] }, { "objectID": "developers/inference/variational-inference/index.html#maximizing-the-elbo", "href": "developers/inference/variational-inference/index.html#maximizing-the-elbo", "title": "Variational Inference", "section": "Maximizing the ELBO", "text": "Maximizing the ELBO\nFinding the optimal \\(q\\) over all possible densities of course isn’t feasible. Instead we consider a family of parameterized densities \\(\\mathscr{D}\\_{\\Theta}\\) where \\(\\Theta\\) denotes the space of possible parameters. Each density in this family \\(q\\_{\\theta} \\in \\mathscr{D}\\_{\\Theta}\\) is parameterized by a unique \\(\\theta \\in \\Theta\\). Moreover, we’ll assume\n\n\\(q\\_{\\theta}(z)\\), i.e. evaluating the probability density \\(q\\) at any point \\(z\\), is differentiable\n\\(z \\sim q\\_{\\theta}(z)\\), i.e. the process of sampling from \\(q\\_{\\theta}(z)\\), is differentiable\n\n\nis fairly straight-forward, but (2) is a bit tricky. What does it even mean for a sampling process to be differentiable? This is quite an interesting problem in its own right and would require something like a 50-page paper to properly review the different approaches (highly recommended read).\n\nWe’re going to make use of a particular such approach which goes under a bunch of different names: reparametrization trick, path derivative, etc. This refers to making the assumption that all elements \\(q\\_{\\theta} \\in \\mathscr{Q}\\_{\\Theta}\\) can be considered as reparameterizations of some base density, say \\(\\bar{q}(z)\\). That is, if \\(q\\_{\\theta} \\in \\mathscr{Q}\\_{\\Theta}\\) then\n\n\\[\nz \\sim q\\_{\\theta}(z) \\quad \\iff \\quad z := g\\_{\\theta}(\\tilde{z}) \\quad \\text{where} \\quad \\bar{z} \\sim \\bar{q}(z)\n\\]\n\nfor some function \\(g\\_{\\theta}\\) differentiable wrt. \\(\\theta\\). So all \\(q_{\\theta} \\in \\mathscr{Q}\\_{\\Theta}\\) are using the same reparameterization-function \\(g\\) but each \\(q\\_{\\theta}\\) correspond to different choices of \\(\\theta\\) for \\(f\\_{\\theta}\\).\nUnder this assumption we can differentiate the sampling process by taking the derivative of \\(g\\_{\\theta}\\) wrt. \\(\\theta\\), and thus we can differentiate the entire \\(\\widehat{\\mathrm{ELBO}}(q\\_{\\theta})\\) wrt. \\(\\theta\\)! With the gradient available we can either try to solve for optimality either by setting the gradient equal to zero or maximise \\(\\widehat{\\mathrm{ELBO}}(q\\_{\\theta})\\) stepwise by traversing \\(\\mathscr{Q}\\_{\\Theta}\\) in the direction of steepest ascent. For the sake of generality, we’re going to go with the stepwise approach.\nWith all this nailed down, we eventually reach the section on Automatic Differentiation Variational Inference (ADVI).", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Variational Inference" ] }, { "objectID": "developers/inference/variational-inference/index.html#automatic-differentiation-variational-inference-advi", "href": "developers/inference/variational-inference/index.html#automatic-differentiation-variational-inference-advi", "title": "Variational Inference", "section": "Automatic Differentiation Variational Inference (ADVI)", "text": "Automatic Differentiation Variational Inference (ADVI)\nSo let’s revisit the assumptions we’ve made at this point:\n\nThe variational posterior \\(q\\_{\\theta}\\) is in a parameterized family of densities denoted \\(\\mathscr{Q}\\_{\\Theta}\\), with \\(\\theta \\in \\Theta\\).\n\\(\\mathscr{Q}\\_{\\Theta}\\) is a space of reparameterizable densities with \\(\\bar{q}(z)\\) as the base-density.\nThe parameterisation function \\(g\\_{\\theta}\\) is differentiable wrt. \\(\\theta\\).\nEvaluation of the probability density \\(q\\_{\\theta}(z)\\) is differentiable wrt. \\(\\theta\\).\n\\(\\mathbb{H}\\left(q\\_{\\theta}(z)\\right)\\) is tractable.\nEvaluation of the joint density \\(p(x, z)\\) is tractable and differentiable wrt. \\(z\\)\nThe support of \\(q(z)\\) is a subspace of the support of \\(p(z \\mid x)\\) : \\(\\mathrm{supp}\\left(q(z)\\right) \\subseteq \\mathrm{supp}\\left(p(z \\mid x)\\right)\\).\n\nAll of these are not necessary to do VI, but they are very convenient and results in a fairly flexible approach. One distribution which has a density satisfying all of the above assumptions except (7) (we’ll get back to this in second) for any tractable and differentiable \\(p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n)\\) is the good ole’ Gaussian/normal distribution:\n\n\\[\nz \\sim \\mathcal{N}(\\mu, \\Sigma) \\quad \\iff \\quad z = g\\_{\\mu, L}(\\bar{z}) := \\mu + L^T \\tilde{z} \\quad \\text{where} \\quad \\bar{z} \\sim \\bar{q}(z) := \\mathcal{N}(1\\_d, I\\_{d \\times d})\n\\]\n\nwhere \\(\\Sigma = L L^T,\\) with \\(L\\) obtained from the Cholesky-decomposition. Abusing notation a bit, we’re going to write\n\n\\[\n\\theta = (\\mu, \\Sigma) := (\\mu\\_1, \\dots, \\mu\\_d, L\\_{11}, \\dots, L\\_{1, d}, L\\_{2, 1}, \\dots, L\\_{2, d}, \\dots, L\\_{d, 1}, \\dots, L\\_{d, d}).\n\\]\n\nWith this assumption we finally have a tractable expression for \\(\\widehat{\\mathrm{ELBO}}(q_{\\mu, \\Sigma})\\)! Well, assuming (7) is holds. Since a Gaussian has non-zero probability on the entirety of \\(\\mathbb{R}^d\\), we also require \\(p(z \\mid \\\\{ x_i \\\\}_{i = 1}^n)\\) to have non-zero probability on all of \\(\\mathbb{R}^d\\).\nThough not necessary, we’ll often make a mean-field assumption for the variational posterior \\(q(z)\\), i.e. assume independence between the latent variables. In this case, we’ll write\n\n\\[\n\\theta = (\\mu, \\sigma^2) := (\\mu\\_1, \\dots, \\mu\\_d, \\sigma\\_1^2, \\dots, \\sigma\\_d^2).\n\\]\n\n\nExamples\nAs a (trivial) example we could apply the approach described above to is the following generative model for \\(p(z \\mid \\\\{ x_i \\\\}\\_{i = 1}^n)\\):\n\n\\[\n\\begin{align*}\n m &\\sim \\mathcal{N}(0, 1) \\\\\n x\\_i &\\overset{\\text{i.i.d.}}{=} \\mathcal{N}(m, 1), \\quad i = 1, \\dots, n.\n\\end{align*}\n\\]\n\nIn this case \\(z = m\\) and we have the posterior defined \\(p(m \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) = p(m) \\prod\\_{i = 1}^n p(x\\_i \\mid m)\\). Then the variational posterior would be\n\n\\[\nq\\_{\\mu, \\sigma} = \\mathcal{N}(\\mu, \\sigma^2), \\quad \\text{where} \\quad \\mu \\in \\mathbb{R}, \\ \\sigma^2 \\in \\mathbb{R}^{ + }.\n\\]\n\nAnd since prior of \\(m\\), \\(\\mathcal{N}(0, 1)\\), has non-zero probability on the entirety of \\(\\mathbb{R}\\), same as \\(q(m)\\), i.e. assumption (7) above holds, everything is fine and life is good.\nBut what about this generative model for \\(p(z \\mid \\\\{ x_i \\\\}_{i = 1}^n)\\):\n\n\\[\n\\begin{align*}\n s &\\sim \\mathrm{InverseGamma}(2, 3), \\\\\n m &\\sim \\mathcal{N}(0, s), \\\\\n x\\_i &\\overset{\\text{i.i.d.}}{=} \\mathcal{N}(m, s), \\quad i = 1, \\dots, n,\n\\end{align*}\n\\]\n\nwith posterior \\(p(s, m \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) = p(s) p(m \\mid s) \\prod\\_{i = 1}^n p(x\\_i \\mid s, m)\\) and the mean-field variational posterior \\(q(s, m)\\) will be\n\n\\[\nq\\_{\\mu\\_1, \\mu\\_2, \\sigma\\_1^2, \\sigma\\_2^2}(s, m) = p\\_{\\mathcal{N}(\\mu\\_1, \\sigma\\_1^2)}(s)\\ p\\_{\\mathcal{N}(\\mu\\_2, \\sigma\\_2^2)}(m),\n\\]\n\nwhere we’ve denoted the evaluation of the probability density of a Gaussian as \\(p_{\\mathcal{N}(\\mu, \\sigma^2)}(x)\\).\nObserve that \\(\\mathrm{InverseGamma}(2, 3)\\) has non-zero probability only on \\(\\mathbb{R}^{ + } := (0, \\infty)\\) which is clearly not all of \\(\\mathbb{R}\\) like \\(q(s, m)\\) has, i.e.\n\n\\[\n\\mathrm{supp} \\left( q(s, m) \\right) \\not\\subseteq \\mathrm{supp} \\left( p(z \\mid \\\\{ x\\_i \\\\}\\_{i = 1}^n) \\right).\n\\]\n\nRecall from the definition of the KL-divergence that when this is the case, the KL-divergence isn’t well defined. This gets us to the automatic part of ADVI.\n\n\n“Automatic”? How?\nFor a lot of the standard (continuous) densities \\(p\\) we can actually construct a probability density \\(\\tilde{p}\\) with non-zero probability on all of \\(\\mathbb{R}\\) by transforming the “constrained” probability density \\(p\\) to \\(\\tilde{p}\\). In fact, in these cases this is a one-to-one relationship. As we’ll see, this helps solve the support-issue we’ve been going on and on about.\n\nTransforming densities using change of variables\nIf we want to compute the probability of \\(x\\) taking a value in some set \\(A \\subseteq \\mathrm{supp} \\left( p(x) \\right)\\), we have to integrate \\(p(x)\\) over \\(A\\), i.e.\n\n\\[\n\\mathbb{P}_p(x \\in A) = \\int_A p(x) \\mathrm{d}x.\n\\]\n\nThis means that if we have a differentiable bijection \\(f: \\mathrm{supp} \\left( q(x) \\right) \\to \\mathbb{R}^d\\) with differentiable inverse \\(f^{-1}: \\mathbb{R}^d \\to \\mathrm{supp} \\left( p(x) \\right)\\), we can perform a change of variables\n\n\\[\n\\mathbb{P}\\_p(x \\in A) = \\int\\_{f^{-1}(A)} p \\left(f^{-1}(y) \\right) \\ \\left| \\det \\mathcal{J}\\_{f^{-1}}(y) \\right| \\mathrm{d}y,\n\\]\n\nwhere \\(\\mathcal{J}_{f^{-1}}(x)\\) denotes the jacobian of \\(f^{-1}\\) evaluated at \\(x\\). Observe that this defines a probability distribution\n\n\\[\n\\mathbb{P}\\_{\\tilde{p}}\\left(y \\in f^{-1}(A) \\right) = \\int\\_{f^{-1}(A)} \\tilde{p}(y) \\mathrm{d}y,\n\\]\n\nsince \\(f^{-1}\\left(\\mathrm{supp} (p(x)) \\right) = \\mathbb{R}^d\\) which has probability 1. This probability distribution has density \\(\\tilde{p}(y)\\) with \\(\\mathrm{supp} \\left( \\tilde{p}(y) \\right) = \\mathbb{R}^d\\), defined\n\n\\[\n\\tilde{p}(y) = p \\left( f^{-1}(y) \\right) \\ \\left| \\det \\mathcal{J}\\_{f^{-1}}(y) \\right|\n\\]\n\nor equivalently\n\n\\[\n\\tilde{p} \\left( f(x) \\right) = \\frac{p(x)}{\\big| \\det \\mathcal{J}\\_{f}(x) \\big|}\n\\]\n\ndue to the fact that\n\n\\[\n\\big| \\det \\mathcal{J}\\_{f^{-1}}(y) \\big| = \\big| \\det \\mathcal{J}\\_{f}(x) \\big|^{-1}\n\\]\n\nNote: it’s also necessary that the log-abs-det-jacobian term is non-vanishing. This can for example be accomplished by assuming \\(f\\) to also be elementwise monotonic.\n\n\nBack to VI\nSo why is this is useful? Well, we’re looking to generalise our approach using a normal distribution to cases where the supports don’t match up. How about defining \\(q(z)\\) by\n\n\\[\n\\begin{align*}\n \\eta &\\sim \\mathcal{N}(\\mu, \\Sigma), \\\\\\\\\n z &= f^{-1}(\\eta),\n\\end{align*}\n\\]\n\nwhere \\(f^{-1}: \\mathbb{R}^d \\to \\mathrm{supp} \\left( p(z \\mid x) \\right)\\) is a differentiable bijection with differentiable inverse. Then \\(z \\sim q_{\\mu, \\Sigma}(z) \\implies z \\in \\mathrm{supp} \\left( p(z \\mid x) \\right)\\) as we wanted. The resulting variational density is\n\n\\[\nq\\_{\\mu, \\Sigma}(z) = p\\_{\\mathcal{N}(\\mu, \\Sigma)}\\left( f(z) \\right) \\ \\big| \\det \\mathcal{J}\\_{f}(z) \\big|.\n\\]\n\nNote that the way we’ve constructed \\(q(z)\\) here is basically a reverse of the approach we described above. Here we sample from a distribution with support on \\(\\mathbb{R}\\) and transform to \\(\\mathrm{supp} \\left( p(z \\mid x) \\right)\\).\nIf we want to write the ELBO explicitly in terms of \\(\\eta\\) rather than \\(z\\), the first term in the ELBO becomes\n\n\\[\n\\begin{align*}\n \\mathbb{E}\\_{z \\sim q_{\\mu, \\Sigma}(z)} \\left[ \\log p(x\\_i, z) \\right] &= \\mathbb{E}\\_{\\eta \\sim \\mathcal{N}(\\mu, \\Sigma)} \\Bigg[ \\log \\frac{p\\left(x\\_i, f^{-1}(\\eta) \\right)}{\\big| \\det \\mathcal{J}_{f^{-1}}(\\eta) \\big|} \\Bigg] \\\\\n &= \\mathbb{E}\\_{\\eta \\sim \\mathcal{N}(\\mu, \\Sigma)} \\left[ \\log p\\left(x\\_i, f^{-1}(\\eta) \\right) \\right] - \\mathbb{E}\\_{\\eta \\sim \\mathcal{N}(\\mu, \\Sigma)} \\left[ \\left| \\det \\mathcal{J}\\_{f^{-1}}(\\eta) \\right| \\right].\n\\end{align*}\n\\]\n\nThe entropy is invariant under change of variables, thus \\(\\mathbb{H} \\left(q\\_{\\mu, \\Sigma}(z)\\right)\\) is simply the entropy of the normal distribution which is known analytically.\nHence, the resulting empirical estimate of the ELBO is\n\n\\[\n\\begin{align*}\n\\widehat{\\mathrm{ELBO}}(q\\_{\\mu, \\Sigma}) &= \\frac{1}{m} \\left( \\sum\\_{k = 1}^m \\sum\\_{i = 1}^n \\left(\\log p\\left(x\\_i, f^{-1}(\\eta_k)\\right) - \\log \\big| \\det \\mathcal{J}\\_{f^{-1}}(\\eta\\_k) \\big| \\right) \\right) + \\mathbb{H} \\left(p\\_{\\mathcal{N}(\\mu, \\Sigma)}(z)\\right) \\\\\n& \\text{where} \\quad z\\_k \\sim \\mathcal{N}(\\mu, \\Sigma) \\quad \\forall k = 1, \\dots, m\n\\end{align*}.\n\\]\n\nAnd maximizing this wrt. \\(\\mu\\) and \\(\\Sigma\\) is what’s referred to as Automatic Differentiation Variational Inference (ADVI)!\nNow if you want to try it out, check out the tutorial on how to use ADVI in Turing.jl!", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Variational Inference" ] }, { "objectID": "developers/models/varinfo-overview/index.html", "href": "developers/models/varinfo-overview/index.html", "title": "Evaluation of DynamicPPL Models with VarInfo", "section": "", "text": "Once you have defined a model using the @model macro, Turing.jl provides high-level interfaces for applying MCMC sampling, variational inference, optimisation, and other inference algorithms. Suppose, however, that you want to work more directly with the model. A common use case for this is if you are developing your own inference algorithm.\nThis page describes how you can evaluate DynamicPPL models and obtain information about variable values, log densities, and other quantities of interest. In particular, this provides a high-level overview of what we call VarInfo: this is a data structure that holds information about the execution state while traversing a model.\nTo begin, let’s define a simple model.\nusing DynamicPPL, Distributions\n\n@model function simple()\n @info \" --- Executing model --- \"\n x ~ Normal() # Prior\n 2.0 ~ Normal(x) # Likelihood\n return (; xplus1 = x + 1) # Return value\nend\n\nmodel = simple()\n\nModel{typeof(simple), (), (), (), Tuple{}, Tuple{}, DefaultContext, false}(simple, NamedTuple(), NamedTuple(), DefaultContext())", "crumbs": [ "Get Started", "Developers", "DynamicPPL Models", "Evaluation of DynamicPPL Models with VarInfo" ] }, { "objectID": "developers/models/varinfo-overview/index.html#the-outputs-of-a-model", "href": "developers/models/varinfo-overview/index.html#the-outputs-of-a-model", "title": "Evaluation of DynamicPPL Models with VarInfo", "section": "The outputs of a model", "text": "The outputs of a model\nA DynamicPPL model has similar characteristics to Julia functions (which should not come as a surprise, since the @model macro is applied to a Julia function). However, an ordinary function only has a return value, whereas DynamicPPL models can have both return values as well as latent variables (i.e., the random variables in the model).\nIn general, both of these are of interest. We can obtain the return value by calling the model as if it were a function:\n\nretval = model()\n\n\n[ Info: --- Executing model --- \n\n\n\n\n(xplus1 = 0.7697206155989985,)\n\n\nand the latent variables using rand():\n\nlatents = rand(Dict, model)\n\n\n[ Info: --- Executing model --- \n\n\n\n\nDict{VarName, Any} with 1 entry:\n x => 0.0645677\n\n\n\n\n\n\n\n\nNoteWhy Dict?\n\n\n\nSimply calling rand(model), by default, returns a NamedTuple. This is fine for simple models where all variables on the left-hand side of tilde statements are standalone variables like x. However, if you have indices or fields such as x[1] or x.a on the left-hand side, then the NamedTuple will not be able to represent these variables properly. Feeding such a NamedTuple back into the model will lead to errors.\nIn general, Dict{VarName} will always avoid such correctness issues.\n\n\nBefore proceeding, it is worth mentioning that both of these calls generate values for random variables by sampling from their prior distributions. We will see how to use different sampling strategies later.", "crumbs": [ "Get Started", "Developers", "DynamicPPL Models", "Evaluation of DynamicPPL Models with VarInfo" ] }, { "objectID": "developers/models/varinfo-overview/index.html#passing-latent-values-into-a-model", "href": "developers/models/varinfo-overview/index.html#passing-latent-values-into-a-model", "title": "Evaluation of DynamicPPL Models with VarInfo", "section": "Passing latent values into a model", "text": "Passing latent values into a model\nHaving considered what one can obtain from a model, we now turn to how we can use it.\nSuppose you now want to obtain the log probability (prior, likelihood, or joint) of a model, given certain parameters. For this purpose, DynamicPPL provides the logprior, loglikelihood, and logjoint functions:\n\nlogprior(model, latents)\n\n\n[ Info: --- Executing model --- \n\n\n\n\n-0.9210230246003822\n\n\nOne can check this against the expected log prior:\n\nlogpdf(Normal(), latents[@varname(x)])\n\n-0.9210230246003822\n\n\nLikewise, you can evaluate the return value of the model given the latent variables:\n\nreturned(model, latents)\n\n\n[ Info: --- Executing model --- \n\n\n\n\n(xplus1 = 1.064567660569505,)", "crumbs": [ "Get Started", "Developers", "DynamicPPL Models", "Evaluation of DynamicPPL Models with VarInfo" ] }, { "objectID": "developers/models/varinfo-overview/index.html#varinfo", "href": "developers/models/varinfo-overview/index.html#varinfo", "title": "Evaluation of DynamicPPL Models with VarInfo", "section": "VarInfo", "text": "VarInfo\nThe above functions are convenient, but for many ‘serious’ applications they might not be flexible enough. For example, if you wanted to obtain the return value and the log joint, you would have to execute the model twice: once with returned and once with logjoint.\nIf you want to avoid this duplicate work, you need to use a lower-level interface, which is DynamicPPL.evaluate!!. At its core, evaluate!! takes a model and a VarInfo object, and returns a tuple of the return value and the new VarInfo. So, before we even get to evaluate!!, we need to understand what a VarInfo is.\nA VarInfo is a container that tracks the state of model execution, as well as any outputs related to its latent variables, such as log probabilities. DynamicPPL’s source code contains many different kinds of VarInfos, each with different trade-offs. The details of these are somewhat arcane and unfortunately cannot be fully abstracted away, mainly due to performance considerations.\nFor the vast majority of users, it suffices to know that you can generate one of them for a model with the constructor VarInfo([rng, ]model). Note that this construction executes the model once (sampling new parameter values from the prior in the process).\n\nv = VarInfo(model)\n\n\n[ Info: --- Executing model --- \n\n\n\n\nVarInfo{@NamedTuple{x::DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}}((x = DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}(Dict(x => 1), [x], UnitRange{Int64}[1:1], [0.29738036067561885], Normal{Float64}[Distributions.Normal{Float64}(μ=0.0, σ=1.0)], Bool[0]),), DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}((LogPrior = DynamicPPL.LogPriorAccumulator(-0.9631560726624534), LogJacobian = DynamicPPL.LogJacobianAccumulator(0.0), LogLikelihood = DynamicPPL.LogLikelihoodAccumulator(-2.3683953513112157))))\n\n\n(Don’t worry about the printout of the VarInfo object: we won’t need to understand its internal structure.) We can index into a VarInfo:\n\nv[@varname(x)]\n\n0.29738036067561885\n\n\nTo access the values of log-probabilities, DynamicPPL provides the getlogprior, getloglikelihood, and getlogjoint functions:\n\nDynamicPPL.getlogprior(v)\n\n-0.9631560726624534\n\n\nWhat about the return value? Well, the VarInfo does not store this directly: recall that evaluate!! gives us back the return value separately from the updated VarInfo. So, let’s try calling it to see what happens. The default behaviour of evaluate!! is to use the parameter values stored in the VarInfo during model execution. That is, when it sees x ~ Normal(), it will use the value of x stored in v. We will see later how to change this behaviour.\n\nretval, vout = DynamicPPL.evaluate!!(model, v)\n\n\n[ Info: --- Executing model --- \n\n\n\n\n((xplus1 = 1.297380360675619,), VarInfo{@NamedTuple{x::DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}}((x = DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}(Dict(x => 1), [x], UnitRange{Int64}[1:1], [0.29738036067561885], Normal{Float64}[Distributions.Normal{Float64}(μ=0.0, σ=1.0)], Bool[0]),), DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}((LogPrior = DynamicPPL.LogPriorAccumulator(-0.9631560726624534), LogJacobian = DynamicPPL.LogJacobianAccumulator(0.0), LogLikelihood = DynamicPPL.LogLikelihoodAccumulator(-2.3683953513112157)))))\n\n\nSo here in a single call we have obtained both the return value and an updated VarInfo vout, from which we can again extract log probabilities and variable values. We can see from this that the value of vout[@varname(x)] is the same as v[@varname(x)]:\n\nvout[@varname(x)] == v[@varname(x)]\n\ntrue\n\n\nwhich is in line with the statement above that by default evaluate!! uses the values stored in the VarInfo.\nAt this point, the keen reader will notice that we have not really solved the problem here. Although the call to DynamicPPL.evaluate!! does indeed only execute the model once, we also had to do this once more at the beginning when constructing the VarInfo.\nBesides, we don’t know how to control the parameter values used during model execution: they were simply whatever we got in the original VarInfo.", "crumbs": [ "Get Started", "Developers", "DynamicPPL Models", "Evaluation of DynamicPPL Models with VarInfo" ] }, { "objectID": "developers/models/varinfo-overview/index.html#specifying-parameter-values", "href": "developers/models/varinfo-overview/index.html#specifying-parameter-values", "title": "Evaluation of DynamicPPL Models with VarInfo", "section": "Specifying parameter values", "text": "Specifying parameter values\nWe will first tackle the problem of specifying our own parameter values. To do this, we need to use DynamicPPL.init!! instead of DynamicPPL.evaluate!!.\nThe difference is that instead of using the values stored in the VarInfo (which evaluate!! does by default), init!! uses a strategy for generating new values, and overwrites the values in the VarInfo accordingly. For example, InitFromPrior() says that any time a tilde-statement x ~ dist is encountered, a new value for x should be sampled from dist:\n\nretval, v_new = DynamicPPL.init!!(model, v, InitFromPrior())\n\n\n[ Info: --- Executing model --- \n\n\n\n\n((xplus1 = 1.300253417909387,), VarInfo{@NamedTuple{x::DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}}((x = DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}(Dict(x => 1), [x], UnitRange{Int64}[1:1], [0.30025341790938703], Normal{Float64}[Distributions.Normal{Float64}(μ=0.0, σ=1.0)], Bool[0]),), DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}((LogPrior = DynamicPPL.LogPriorAccumulator(-0.9640145906878073), LogJacobian = DynamicPPL.LogJacobianAccumulator(0.0), LogLikelihood = DynamicPPL.LogLikelihoodAccumulator(-2.3635077548690333)))))\n\n\nThis updates v_new with the new values that were sampled, and also means that log probabilities are computed using these new values.\n\n\n\n\n\n\nNoteRandom number generator\n\n\n\nYou can also provide an AbstractRNG as the first argument to init!! to control the reproducibility of the sampling: here we have omitted it.\n\n\nAlternatively, to provide specific sets of values, we can use InitFromParams(...) to specify them. InitFromParams can wrap either a NamedTuple or an AbstractDict{<:VarName}, but Dict is generally much preferred as this guarantees correct behaviour even for complex variable names.\n\nretval, v_new = DynamicPPL.init!!(\n model, v, InitFromParams(Dict(@varname(x) => 3.0))\n)\n\n\n[ Info: --- Executing model --- \n\n\n\n\n((xplus1 = 4.0,), VarInfo{@NamedTuple{x::DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}}((x = DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}(Dict(x => 1), [x], UnitRange{Int64}[1:1], [3.0], Normal{Float64}[Distributions.Normal{Float64}(μ=0.0, σ=1.0)], Bool[0]),), DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}((LogPrior = DynamicPPL.LogPriorAccumulator(-5.418938533204673), LogJacobian = DynamicPPL.LogJacobianAccumulator(0.0), LogLikelihood = DynamicPPL.LogLikelihoodAccumulator(-1.4189385332046727)))))\n\n\nWe now find that if we look into v_new, the value of x is indeed 3.0:\n\nv_new[@varname(x)]\n\n3.0\n\n\nand we can extract the return value and log probabilities exactly as before.\nNote that init!! always ignores any values that are already present in the VarInfo, and overwrites them with new values according to the specified strategy.\nIf you have a loop in which you want to repeatedly evaluate a model with different parameter values, then the workflow shown here is recommended:\n\nFirst generate a VarInfo using VarInfo(model);\nThen call DynamicPPL.init!!(model, v, InitFromParams(...)) to evaluate the model using those parameters.\n\nThis requires you to pay a one-time cost at the very beginning to generate the VarInfo, but subsequent evaluations will be efficient. DynamicPPL uses this approach when implementing functions such as predict(model, chain).\n\n\n\n\n\n\nTip\n\n\n\nIf you want to avoid even the first model evaluation, you will need to read on to the ‘Advanced’ section below. However, for most applications this should not necessary.", "crumbs": [ "Get Started", "Developers", "DynamicPPL Models", "Evaluation of DynamicPPL Models with VarInfo" ] }, { "objectID": "developers/models/varinfo-overview/index.html#parameters-in-the-form-of-vectors", "href": "developers/models/varinfo-overview/index.html#parameters-in-the-form-of-vectors", "title": "Evaluation of DynamicPPL Models with VarInfo", "section": "Parameters in the form of Vectors", "text": "Parameters in the form of Vectors\nIn general, one problem with init!! is that it is often slower than evaluate!!. This is primarily because it does more work: it has to not only read from the provided parameters, but also overwrite existing values in the VarInfo.\n\nusing Chairmarks, Logging\n# We need to silence the 'executing model' message, or else it will\n# fill up the entire screen!\nwith_logger(ConsoleLogger(stderr, Logging.Warn)) do\n median(@be DynamicPPL.evaluate!!(model, v_new))\nend\n\n368.911 ns (7 allocs: 288 bytes)\n\n\n\nwith_logger(ConsoleLogger(stderr, Logging.Warn)) do\n median(@be DynamicPPL.init!!(model, v_new, InitFromParams(Dict(@varname(x) => 3.0))))\nend\n\n467.292 ns (12 allocs: 624 bytes)\n\n\nWhen evaluating models in tight loops, as is often the case in inference algorithms, this overhead can be quite unwanted. DynamicPPL provides a rather dangerous, but powerful, way to get around this, which is the DynamicPPL.unflatten function. unflatten allows you to directly modify the internal storage of a VarInfo, without having to go through init!! and model evaluation. Its input is a vector of parameters.\n\nxs = [7.0]\nv_unflattened = DynamicPPL.unflatten(v_new, xs)\nv_unflattened[@varname(x)]\n\n7.0\n\n\nWe can then directly use v_new in evaluate!!, which will use the value 7.0 for x:\n\nretval, vout = DynamicPPL.evaluate!!(model, v_unflattened)\n\n\n[ Info: --- Executing model --- \n\n\n\n\n((xplus1 = 8.0,), VarInfo{@NamedTuple{x::DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}}((x = DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}(Dict(x => 1), [x], UnitRange{Int64}[1:1], [7.0], Normal{Float64}[Distributions.Normal{Float64}(μ=0.0, σ=1.0)], Bool[0]),), DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}((LogPrior = DynamicPPL.LogPriorAccumulator(-25.418938533204674), LogJacobian = DynamicPPL.LogJacobianAccumulator(0.0), LogLikelihood = DynamicPPL.LogLikelihoodAccumulator(-13.418938533204672)))))\n\n\nEven the combination of unflatten and evaluate!! tends to be faster than a single call to init!!, especially for larger models.\nHowever, there are several reasons why this function is dangerous. If you use it, you must pay close attention to correctness:\n\nFor models with multiple variables, the order in which these variables occur in the vector is not obvious. The short answer is that it depends on the order in which the variables are added to the VarInfo during its initialisation. If you have models where the order of variables can vary from one execution to another, then unflatten can easily lead to incorrect results.\nThe meaning of the values passed in will generally depend on whether the VarInfo is linked or not (see the Variable Transformations page for more information about linked VarInfos). You must make sure that the values passed in are consistent with the link status of the VarInfo. In contrast, InitFromParams always uses unlinked values.\nWhile unflatten modifies the parameter values stored in the VarInfo, it does not modify any other information, such as log probabilities. Thus, after calling unflatten, your VarInfo will be in an inconsistent state, and you should not attempt to read any other information from it until you have called evaluate!! again (which recomputes e.g. log probabilities).\n\nThe inverse operation of unflatten is DynamicPPL.getindex_internal(v, :):\n\nDynamicPPL.getindex_internal(v_unflattened, :)\n\n1-element Vector{Float64}:\n 7.0", "crumbs": [ "Get Started", "Developers", "DynamicPPL Models", "Evaluation of DynamicPPL Models with VarInfo" ] }, { "objectID": "developers/models/varinfo-overview/index.html#logdensityfunction", "href": "developers/models/varinfo-overview/index.html#logdensityfunction", "title": "Evaluation of DynamicPPL Models with VarInfo", "section": "LogDensityFunction", "text": "LogDensityFunction\nThere is one place where unflatten is (unfortunately) quite indispensable, namely, the implementation of the LogDensityProblems.jl interface for Turing models.\nThe LogDensityProblems interface defines interface functions such as\nLogDensityProblems.logdensity(f, x::AbstractVector)\nwhich evaluates the log density of a model f given a vector of parameters x.\nGiven what we have seen above, this can be done by wrapping a model and a VarInfo together inside a struct. Here is a rough sketch of how this can be implemented:\n\nusing LogDensityProblems\n\nstruct MyModelLogDensity{M<:DynamicPPL.Model,V<:DynamicPPL.VarInfo}\n model::M\n varinfo::V\nend\n\nfunction LogDensityProblems.logdensity(f::MyModelLogDensity, x::AbstractVector)\n v_new = DynamicPPL.unflatten(f.varinfo, x)\n _, vout = DynamicPPL.evaluate!!(f.model, v_new)\n return DynamicPPL.getlogjoint(vout)\nend\n\n# Usage\nmy_ldf = MyModelLogDensity(model, VarInfo(model))\nLogDensityProblems.logdensity(my_ldf, [2.5])\n\n\n[ Info: --- Executing model --- \n[ Info: --- Executing model --- \n\n\n\n\n-5.087877066409346\n\n\nDynamicPPL contains a LogDensityFunction type that, at its core, is essentially the same as the above.\n\n# the varinfo object defaults to VarInfo(model)\nldf = DynamicPPL.LogDensityFunction(model)\nLogDensityProblems.logdensity(ldf, [2.5])\n\n\n[ Info: --- Executing model --- \n[ Info: --- Executing model --- \n\n\n\n\n-5.087877066409346\n\n\nThe real implementation is a bit more complicated as it provides more options, as well as support for gradients with automatic differentiation.\nIn this way, any Turing model can be converted into an object that you can use with LogDensityProblems-compatible optimisers, samplers, and other algorithms. This is very powerful as it allows the algorithms to completely ignore the internal structure of the model, and simply treat it as an opaque log-density function. For example, Turing’s external sampler interface makes heavy use of this.\nHowever, it should be noted that because this uses unflatten under the hood, it suffers from exactly the same limitations as described above. For example, models that do not have a fixed number or order of latent variables can lead to incorrect results or errors.", "crumbs": [ "Get Started", "Developers", "DynamicPPL Models", "Evaluation of DynamicPPL Models with VarInfo" ] }, { "objectID": "developers/models/varinfo-overview/index.html#advanced-typed-and-untyped-varinfo", "href": "developers/models/varinfo-overview/index.html#advanced-typed-and-untyped-varinfo", "title": "Evaluation of DynamicPPL Models with VarInfo", "section": "Advanced: Typed and untyped VarInfo", "text": "Advanced: Typed and untyped VarInfo\nThe discussion above suffices for many applications of DynamicPPL, but one question remains: how to avoid the initial overhead of constructing a VarInfo object before we can do anything useful with it. This is important when implementing a function such as logjoint(model, params): in principle, only a single evaluation should be needed.\nTo tackle this, we need to understand a little bit more about two kinds of VarInfo. Conceptually, DynamicPPL has both typed and untyped VarInfos. This distinction is also described in section 4.2.4 of our recent Turing.jl paper.\nEvaluating a model with an existing typed VarInfo is generally much faster, and once you have a typed VarInfo it is a good idea to stick with it. However, when instantiating a new VarInfo, it is often better to start with an untyped VarInfo, fill in the values, and then convert it to a typed VarInfo.\n\n\n\n\n\n\nNoteWhy is untyped initialisation better?\n\n\n\nInitialising a fresh VarInfo requires adding variables to it as they are encountered during model execution. There are two main reasons for preferring untyped VarInfo: firstly, compilation time with typed VarInfo scales poorly with the number of variables; and secondly, typed VarInfos can error with certain kinds of models. See this issue for more information.\n\n\nTo see this in action, let’s begin by constructing an empty untyped VarInfo. This does not execute the model, and so the resulting object has no stored variable values. If we try to index into it, we will get an error:\n\nv_empty_untyped = VarInfo()\nv_empty_untyped[@varname(x)]\n\n\nKeyError: key x not found\nStacktrace:\n [1] getindex\n @ ./dict.jl:477 [inlined]\n [2] getidx\n @ ~/.julia/packages/DynamicPPL/oIycL/src/varinfo.jl:635 [inlined]\n [3] is_transformed\n @ ~/.julia/packages/DynamicPPL/oIycL/src/varinfo.jl:811 [inlined]\n [4] is_transformed\n @ ~/.julia/packages/DynamicPPL/oIycL/src/varinfo.jl:810 [inlined]\n [5] from_maybe_linked_internal_transform\n @ ~/.julia/packages/DynamicPPL/oIycL/src/abstract_varinfo.jl:1099 [inlined]\n [6] getindex(vi::VarInfo{DynamicPPL.Metadata{Dict{VarName, Int64}, Vector{Distribution}, Vector{VarName}, Vector{Real}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}}, vn::VarName{:x, typeof(identity)})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/varinfo.jl:1417\n [7] top-level scope\n @ ~/work/docs/docs/developers/models/varinfo-overview/index.qmd:323\n\n\n\n\n\n\n\n\n\nNoteVarInfo(model) returns a typed VarInfo\n\n\n\nAlthough VarInfo() with no arguments returns an untyped VarInfo, note that calling VarInfo(model) returns a typed VarInfo. This is a slightly awkward aspect of DynamicPPL’s current API.\n\n\nTo generate new values for it, we will use DynamicPPL.init!! as before.\n\n_, v_filled_untyped = DynamicPPL.init!!(model, v_empty_untyped, InitFromParams(Dict(@varname(x) => 5.0)))\n\n\n[ Info: --- Executing model --- \n\n\n\n\n((xplus1 = 6.0,), VarInfo (1 variable (x), dimension 1; accumulators: (LogPrior = DynamicPPL.LogPriorAccumulator(-13.418938533204672), LogJacobian = DynamicPPL.LogJacobianAccumulator(0.0), LogLikelihood = DynamicPPL.LogLikelihoodAccumulator(-5.418938533204673))))\n\n\nNow that we have filled in the untyped VarInfo, we can access parameter values, log probabilities, and so on:\n\nDynamicPPL.getlogprior(v_filled_untyped)\n\n-13.418938533204672\n\n\nSo, putting this all together, this is how an implementation of logprior(model, params) could look:\n\nfunction mylogprior(model, params)\n # Create empty untyped VarInfo\n v_empty_untyped = VarInfo()\n # Fill in values from given params\n _, v_filled_untyped = DynamicPPL.init!!(model, v_empty_untyped, InitFromParams(params))\n # Extract log prior\n return DynamicPPL.getlogprior(v_filled_untyped)\nend\n\nmylogprior(model, Dict(@varname(x) => 5.0))\n\n\n[ Info: --- Executing model --- \n\n\n\n\n-13.418938533204672\n\n\nNotice that the above only required a single model evaluation.\nIf we later want to convert the untyped VarInfo into a typed VarInfo (for example, for later reuse), we can do so using DynamicPPL.typed_varinfo:\n\nv_filled_typed = DynamicPPL.typed_varinfo(v_filled_untyped)\n\nVarInfo{@NamedTuple{x::DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}}((x = DynamicPPL.Metadata{Dict{VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, typeof(identity)}}, Vector{Float64}}(Dict(x => 1), [x], UnitRange{Int64}[1:1], [5.0], Normal{Float64}[Distributions.Normal{Float64}(μ=0.0, σ=1.0)], Bool[0]),), DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}((LogPrior = DynamicPPL.LogPriorAccumulator(-13.418938533204672), LogJacobian = DynamicPPL.LogJacobianAccumulator(0.0), LogLikelihood = DynamicPPL.LogLikelihoodAccumulator(-5.418938533204673))))\n\n\nThis allows us to demonstrate how VarInfo(model) is implemented:\n\nfunction myvarinfo(model)\n # Create empty untyped VarInfo\n v_empty_untyped = VarInfo()\n # Sample values from prior\n _, v_filled_untyped = DynamicPPL.init!!(model, v_empty_untyped, InitFromPrior())\n # Convert to typed VarInfo\n return DynamicPPL.typed_varinfo(v_filled_untyped)\nend\n\nmyvarinfo (generic function with 1 method)\n\n\nNotice here that evaluate!! runs much faster with a typed VarInfo than with untyped: this is why generally for repeated evaluation you should use a typed VarInfo. The same is true of init!!.\n\nwith_logger(ConsoleLogger(stderr, Logging.Warn)) do\n median(@be DynamicPPL.evaluate!!(model, v_filled_untyped))\nend\n\n2.039 μs (32 allocs: 1.328 KiB)\n\n\n\nwith_logger(ConsoleLogger(stderr, Logging.Warn)) do\n median(@be DynamicPPL.evaluate!!(model, v_filled_typed))\nend\n\n358.585 ns (7 allocs: 288 bytes)", "crumbs": [ "Get Started", "Developers", "DynamicPPL Models", "Evaluation of DynamicPPL Models with VarInfo" ] }, { "objectID": "usage/performance-tips/index.html", "href": "usage/performance-tips/index.html", "title": "Performance Tips", "section": "", "text": "This section briefly summarises a few common techniques to ensure good performance when using Turing. We refer to the Julia documentation for general techniques to ensure good performance of Julia programs.", "crumbs": [ "Get Started", "User Guide", "Performance Tips" ] }, { "objectID": "usage/performance-tips/index.html#use-multivariate-distributions", "href": "usage/performance-tips/index.html#use-multivariate-distributions", "title": "Performance Tips", "section": "Use multivariate distributions", "text": "Use multivariate distributions\nIt is generally preferable to use multivariate distributions if possible.\nThe following example:\n\nusing Turing\n@model function gmodel(x)\n m ~ Normal()\n for i in eachindex(x)\n x[i] ~ Normal(m, 0.2)\n end\nend\n\ngmodel (generic function with 2 methods)\n\n\ncan be directly expressed more efficiently with a simple transformation:\n\nusing FillArrays\n\n@model function gmodel(x)\n m ~ Normal()\n return x ~ MvNormal(Fill(m, length(x)), 0.04 * I)\nend\n\ngmodel (generic function with 2 methods)", "crumbs": [ "Get Started", "User Guide", "Performance Tips" ] }, { "objectID": "usage/performance-tips/index.html#choose-your-ad-backend", "href": "usage/performance-tips/index.html#choose-your-ad-backend", "title": "Performance Tips", "section": "Choose your AD backend", "text": "Choose your AD backend\nAutomatic differentiation (AD) makes it possible to use modern, efficient gradient-based samplers like NUTS and HMC. This, however, also means that using a performant AD system is incredibly important. Turing currently supports several AD backends, including ForwardDiff (the default), Mooncake, and ReverseDiff.\nFor many common types of models, the default ForwardDiff backend performs well, and there is no need to worry about changing it. However, if you need more speed, you can try different backends via the standard ADTypes interface by passing an AbstractADType to the sampler with the optional adtype argument, e.g. NUTS(; adtype = AutoMooncake()).\nGenerally, adtype = AutoForwardDiff() is likely to be the fastest and most reliable for models with few parameters (say, less than 20 or so), while reverse-mode backends such as AutoMooncake() or AutoReverseDiff() will perform better for models with many parameters or linear algebra operations. If in doubt, you can benchmark your model with different backends to see which one performs best. See the Automatic Differentiation page for details.\n\nSpecial care for ReverseDiff with a compiled tape\nFor large models, the fastest option is often ReverseDiff with a compiled tape, specified as adtype=AutoReverseDiff(; compile=true). However, it is important to note that if your model contains any branching code, such as if-else statements, the gradients from a compiled tape may be inaccurate, leading to erroneous results. If you use this option for the (considerable) speedup it can provide, make sure to check your code for branching and ensure that it does not affect the gradients. It is also a good idea to verify your gradients with another backend.", "crumbs": [ "Get Started", "User Guide", "Performance Tips" ] }, { "objectID": "usage/performance-tips/index.html#ensure-that-types-in-your-model-can-be-inferred", "href": "usage/performance-tips/index.html#ensure-that-types-in-your-model-can-be-inferred", "title": "Performance Tips", "section": "Ensure that types in your model can be inferred", "text": "Ensure that types in your model can be inferred\nFor efficient gradient-based inference, e.g. using HMC, NUTS or ADVI, it is important to ensure the types in your model can be inferred.\nThe following example with abstract types\n\n@model function tmodel(x, y)\n p, n = size(x)\n params = Vector{Real}(undef, n)\n for i in 1:n\n params[i] ~ truncated(Normal(); lower=0)\n end\n\n a = x * params\n return y ~ MvNormal(a, I)\nend\n\ntmodel (generic function with 2 methods)\n\n\ncan be transformed into the following representation with concrete types:\n\n@model function tmodel(x, y, ::Type{T}=Float64) where {T}\n p, n = size(x)\n params = Vector{T}(undef, n)\n for i in 1:n\n params[i] ~ truncated(Normal(); lower=0)\n end\n\n a = x * params\n return y ~ MvNormal(a, I)\nend\n\ntmodel (generic function with 4 methods)\n\n\nAlternatively, you could use filldist in this example:\n\n@model function tmodel(x, y)\n params ~ filldist(truncated(Normal(); lower=0), size(x, 2))\n a = x * params\n return y ~ MvNormal(a, I)\nend\n\ntmodel (generic function with 4 methods)\n\n\nYou can use DynamicPPL’s debugging utilities to find types in your model definition that the compiler cannot infer. These will be marked in red in the Julia REPL (much like when using the @code_warntype macro).\nFor example, consider the following model:\n\n@model function tmodel(x)\n p = Vector{Real}(undef, 1)\n p[1] ~ Normal()\n p = p .+ 1\n return x ~ Normal(p[1])\nend\n\ntmodel (generic function with 6 methods)\n\n\nBecause the element type of p is an abstract type (Real), the compiler cannot infer a concrete type for p[1]. To detect this, we can use\n\nmodel = tmodel(1.0)\n\nusing DynamicPPL\nDynamicPPL.DebugUtils.model_warntype(model)\n\nIn this particular model, the following call to getindex should be highlighted in red (the exact numbers may vary):\n[...]\n│ %120 = p::AbstractVector\n│ %121 = Base.getindex(%120, 1)::Any\n[...]", "crumbs": [ "Get Started", "User Guide", "Performance Tips" ] }, { "objectID": "usage/custom-distribution/index.html", "href": "usage/custom-distribution/index.html", "title": "Custom Distributions", "section": "", "text": "Turing.jl supports the use of distributions from the Distributions.jl package. By extension, it also supports the use of customised distributions by defining them as subtypes of the Distribution type in the Distributions.jl package, as well as corresponding functions.\nThis page shows a workflow of how to define a customised distribution, using our own implementation of a simple Uniform distribution as a simple example.\nusing Distributions, Turing, Random, Bijectors", "crumbs": [ "Get Started", "User Guide", "Custom Distributions" ] }, { "objectID": "usage/custom-distribution/index.html#define-the-distribution-type", "href": "usage/custom-distribution/index.html#define-the-distribution-type", "title": "Custom Distributions", "section": "Define the Distribution Type", "text": "Define the Distribution Type\nFirst, define a type of the distribution, as a subtype of a corresponding distribution type in the Distributions.jl package.\n\nstruct CustomUniform <: ContinuousUnivariateDistribution end", "crumbs": [ "Get Started", "User Guide", "Custom Distributions" ] }, { "objectID": "usage/custom-distribution/index.html#implement-sampling-and-evaluation-of-the-log-pdf", "href": "usage/custom-distribution/index.html#implement-sampling-and-evaluation-of-the-log-pdf", "title": "Custom Distributions", "section": "Implement Sampling and Evaluation of the log-pdf", "text": "Implement Sampling and Evaluation of the log-pdf\nSecond, implement the rand and logpdf functions for your new distribution, which will be used to run the model.\n\n# sample in [0, 1]\nDistributions.rand(rng::AbstractRNG, d::CustomUniform) = rand(rng)\n\n# p(x) = 1 → log[p(x)] = 0\nDistributions.logpdf(d::CustomUniform, x::Real) = zero(x)", "crumbs": [ "Get Started", "User Guide", "Custom Distributions" ] }, { "objectID": "usage/custom-distribution/index.html#define-helper-functions", "href": "usage/custom-distribution/index.html#define-helper-functions", "title": "Custom Distributions", "section": "Define Helper Functions", "text": "Define Helper Functions\nIn most cases, it may be required to define some helper functions.\n\nDomain Transformation\nCertain samplers, such as HMC, require the domain of the priors to be unbounded. Therefore, to use our CustomUniform as a prior in a model we also need to define how to transform samples from [0, 1] to ℝ. To do this, we need to define the corresponding Bijector from Bijectors.jl, which is what Turing.jl uses internally to deal with constrained distributions.\nTo transform from [0, 1] to ℝ we can use the Logit bijector:\n\nBijectors.bijector(d::CustomUniform) = Logit(0.0, 1.0)\n\nIn the present example, CustomUniform is a subtype of ContinuousUnivariateDistribution. The procedure for subtypes of ContinuousMultivariateDistribution and ContinuousMatrixDistribution is exactly the same. For example, Wishart defines a distribution over positive-definite matrices and so the bijector returns a PDBijector when called with a Wishart distribution as an argument. For discrete distributions, there is no need to define a bijector; the Identity bijector is used by default.\nAs an alternative to the above, for UnivariateDistribution we could define the minimum and maximum of the distribution:\n\nDistributions.minimum(d::CustomUniform) = 0.0\nDistributions.maximum(d::CustomUniform) = 1.0\n\nand Bijectors.jl will return a default Bijector called TruncatedBijector which makes use of minimum and maximum to derive the correct transformation.\nInternally, Turing basically does the following when it needs to convert a constrained distribution to an unconstrained distribution, e.g. when sampling using HMC:\n\ndist = Gamma(2,3)\nb = bijector(dist)\ntransformed_dist = transformed(dist, b) # results in distribution with transformed support + correction for logpdf\n\nBijectors.UnivariateTransformed{Distributions.Gamma{Float64}, Base.Fix1{typeof(broadcast), typeof(log)}}(\ndist: Distributions.Gamma{Float64}(α=2.0, θ=3.0)\ntransform: Base.Fix1{typeof(broadcast), typeof(log)}(broadcast, log)\n)\n\n\nand then we can call rand and logpdf as usual, where - rand(transformed_dist) returns a sample in the unconstrained space, and - logpdf(transformed_dist, y) returns the log density of the original distribution, but with y living in the unconstrained space.\nTo read more about Bijectors.jl, check out its documentation.", "crumbs": [ "Get Started", "User Guide", "Custom Distributions" ] }, { "objectID": "usage/mode-estimation/index.html", "href": "usage/mode-estimation/index.html", "title": "Mode Estimation", "section": "", "text": "After defining a statistical model, in addition to sampling from its distributions, one may be interested in finding the parameter values that maximise for instance the posterior distribution density function or the likelihood. This is called mode estimation. Turing provides support for two mode estimation techniques, maximum likelihood estimation (MLE) and maximum a posteriori (MAP) estimation.\nTo demonstrate mode estimation, let us load Turing and declare a model:\nusing Turing\n\n@model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\nend\n\ngdemo (generic function with 2 methods)\nOnce the model is defined, we can construct a model instance as we normally would:\n# Instantiate the gdemo model with our data.\ndata = [1.5, 2.0]\nmodel = gdemo(data)\n\nDynamicPPL.Model{typeof(gdemo), (:x,), (), (), Tuple{Vector{Float64}}, Tuple{}, DynamicPPL.DefaultContext, false}(gdemo, (x = [1.5, 2.0],), NamedTuple(), DynamicPPL.DefaultContext())\nFinding the maximum a posteriori or maximum likelihood parameters is as simple as\n# Generate a MLE estimate.\nmle_estimate = maximum_likelihood(model)\n\n# Generate a MAP estimate.\nmap_estimate = maximum_a_posteriori(model)\n\nModeResult with maximized lp of -4.62\n[0.9074074059420846, 1.1666666685519136]\nThe estimates are returned as instances of the ModeResult type. It has the fields values for the parameter values found and lp for the log probability at the optimum, as well as f for the objective function and optim_result for more detailed results of the optimisation procedure.\n@show mle_estimate.values\n@show mle_estimate.lp;\n\nmle_estimate.values = [0.06249999999985329, 1.750000000000848]\nmle_estimate.lp = -0.0652883441695642", "crumbs": [ "Get Started", "User Guide", "Mode Estimation" ] }, { "objectID": "usage/mode-estimation/index.html#controlling-the-optimisation-process", "href": "usage/mode-estimation/index.html#controlling-the-optimisation-process", "title": "Mode Estimation", "section": "Controlling the optimisation process", "text": "Controlling the optimisation process\nUnder the hood maximum_likelihood and maximum_a_posteriori use the Optimisation.jl package, which provides a unified interface to many other optimisation packages. By default Turing typically uses the LBFGS method from Optim.jl to find the mode estimate, but we can easily change that:\n\nusing OptimizationOptimJL: NelderMead\n@show maximum_likelihood(model, NelderMead())\n\nusing OptimizationNLopt: NLopt.LD_TNEWTON_PRECOND_RESTART\n@show maximum_likelihood(model, LD_TNEWTON_PRECOND_RESTART());\n\n\nmaximum_likelihood(model, NelderMead()) = [0.06249845612998427, 1.749999831824458]\n┌ Warning: The selected optimization algorithm requires second order derivatives, but `SecondOrder` ADtype was not provided.\n│ So a `SecondOrder` with ADTypes.AutoForwardDiff() for both inner and outer will be created, this can be suboptimal and not work in some cases so\n│ an explicit `SecondOrder` ADtype is recommended.\n└ @ OptimizationBase ~/.julia/packages/OptimizationBase/mYxHK/src/cache.jl:58\nmaximum_likelihood(model, LD_TNEWTON_PRECOND_RESTART()) = [0.062499999968155424, 1.7499999999080402]\n\n\n\n\nThe above are just two examples, Optimisation.jl supports many more.\nWe can also help the optimisation by giving it a starting point we know is close to the final solution, or by specifying an automatic differentiation method\n\nimport Mooncake\n\nmaximum_likelihood(\n model, NelderMead(); initial_params=[0.1, 2], adtype=AutoMooncake()\n)\n\nModeResult with maximized lp of -0.07\n[0.062494553692639856, 1.7500042095865365]\n\n\nWhen providing values to arguments like initial_params the parameters are typically specified in the order in which they appear in the code of the model, so in this case first s² then m. More precisely it’s the order returned by Turing.Inference.getparams(model, DynamicPPL.VarInfo(model)).\n\n\n\n\n\n\nNoteInitialisation strategies and consistency with MCMC sampling\n\n\n\nSince Turing v0.41, for MCMC sampling, the initial_params argument must be a DynamicPPL.AbstractInitStrategy as described in the sampling options page). The optimisation interface has not yet been updated to use this; thus, initial parameters are still specified as Vectors. We expect that this will be changed in the near future.\n\n\nWe can also do constrained optimisation, by providing either intervals within which the parameters must stay, or constraint functions that they need to respect. For instance, here’s how one can find the MLE with the constraint that the variance must be less than 0.01 and the mean must be between -1 and 1.:\n\nmaximum_likelihood(model; lb=[0.0, -1.0], ub=[0.01, 1.0])\n\nModeResult with maximized lp of -59.73\n[0.009999999999058257, 0.9999999999613886]\n\n\nThe arguments for lower (lb) and upper (ub) bounds follow the arguments of Optimisation.OptimizationProblem, as do other parameters for providing constraints, such as cons. Any extraneous keyword arguments given to maximum_likelihood or maximum_a_posteriori are passed to Optimisation.solve. Some often useful ones are maxiters for controlling the maximum number of iterations and abstol and reltol for the absolute and relative convergence tolerances:\n\nbadly_converged_mle = maximum_likelihood(\n model, NelderMead(); maxiters=10, reltol=1e-9\n)\n\nModeResult with maximized lp of -0.97\n[0.23472843802029694, 2.024145688452318]\n\n\nWe can check whether the optimisation converged using the optim_result field of the result:\n\n@show badly_converged_mle.optim_result;\n\nbadly_converged_mle.optim_result = retcode: Failure\nu: [-1.449326015931066, 2.024145688452318]\nFinal objective value: 0.9749983495698895\n\n\n\nFor more details, such as a full list of possible arguments, we encourage the reader to read the docstring of the function Turing.Optimisation.estimate_mode, which is what maximum_likelihood and maximum_a_posteriori call, and the documentation of Optimisation.jl.", "crumbs": [ "Get Started", "User Guide", "Mode Estimation" ] }, { "objectID": "usage/mode-estimation/index.html#analyzing-your-mode-estimate", "href": "usage/mode-estimation/index.html#analyzing-your-mode-estimate", "title": "Mode Estimation", "section": "Analyzing your mode estimate", "text": "Analyzing your mode estimate\nTuring extends several methods from StatsBase that can be used to analyse your mode estimation results. Methods implemented include vcov, informationmatrix, coeftable, params, and coef, among others.\nFor example, let’s examine our ML estimate from above using coeftable:\n\nusing StatsBase: coeftable\ncoeftable(mle_estimate)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCoef.\nStd. Error\nz\nPr(>\nz\n)\n\n\n\n\ns²\n0.0625\n0.0625\n1.0\n0.317311\n-0.0599977\n0.184998\n\n\nm\n1.75\n0.176777\n9.89949\n4.18383e-23\n1.40352\n2.09648\n\n\n\n\n\nStandard errors are calculated from the Fisher information matrix (inverse Hessian of the log likelihood or log joint). Note that standard errors calculated in this way may not always be appropriate for MAP estimates, so please be cautious in interpreting them.", "crumbs": [ "Get Started", "User Guide", "Mode Estimation" ] }, { "objectID": "usage/mode-estimation/index.html#sampling-with-the-mapmle-as-initial-states", "href": "usage/mode-estimation/index.html#sampling-with-the-mapmle-as-initial-states", "title": "Mode Estimation", "section": "Sampling with the MAP/MLE as initial states", "text": "Sampling with the MAP/MLE as initial states\nYou can begin sampling your chain from an MLE/MAP estimate by wrapping it in InitFromParams and providing it to the sample function with the keyword initial_params. For example, here is how to sample from the full posterior using the MAP estimate as the starting point:\n\nmap_estimate = maximum_a_posteriori(model)\nchain = sample(model, NUTS(), 1_000; initial_params=InitFromParams(map_estimate))", "crumbs": [ "Get Started", "User Guide", "Mode Estimation" ] }, { "objectID": "usage/submodels/index.html", "href": "usage/submodels/index.html", "title": "Submodels", "section": "", "text": "using Turing\nusing Random: Xoshiro, seed!\nseed!(468)\n\nRandom.TaskLocalRNG()\nIn Turing.jl, you can define models and use them as components of larger models (i.e., submodels), using the to_submodel function. In this way, you can (for example) define a model once and use it in multiple places:\n@model function inner()\n a ~ Normal()\n return a + 100\nend\n\n@model function outer()\n # This line adds the variable `x.a` to the chain.\n # The inner variable `a` is prefixed with the\n # left-hand side of the `~` operator, i.e. `x`.\n x ~ to_submodel(inner())\n # Here, the value of x will be `a + 100` because\n # that is the return value of the submodel.\n b ~ Normal(x)\nend\n\nouter (generic function with 2 methods)\nIf we sample from this model, we would expect that x.a should be close to zero, and b close to 100:\nrand(outer())\n\n(var\"x.a\" = 0.07200886749732076, b = 99.9979651109378)", "crumbs": [ "Get Started", "User Guide", "Submodels" ] }, { "objectID": "usage/submodels/index.html#manipulating-submodels", "href": "usage/submodels/index.html#manipulating-submodels", "title": "Submodels", "section": "Manipulating submodels", "text": "Manipulating submodels\n\nConditioning\nIn general, everything that can be done to a model ‘carries over’ to when it is used as a submodel. For example, you can condition a variable in a submodel in two ways:\n\n# From the outside; the prefix `x` must be applied because\n# from the perspective of `outer`, the variable is called\n# `x.a`.\nouter_conditioned1 = outer() | (@varname(x.a) => 1);\nrand(Xoshiro(468), outer_conditioned1)\n\n(b = 101.07200886749732,)\n\n\nOr equivalently, from the inside:\n\n@model function outer_conditioned2()\n # The prefix doesn't need to be applied here because\n # `inner` itself has no knowledge of the prefix.\n x ~ to_submodel(inner() | (@varname(a) => 1))\n b ~ Normal(x)\nend\nrand(Xoshiro(468), outer_conditioned2())\n\n(b = 101.07200886749732,)\n\n\nIn both cases the variable x.a does not appear.\nNote, however, that you cannot condition on the return value of a submodel. Thus, for example, if we had:\n\n@model function inner_sensible()\n a ~ Normal()\n return a\nend\n\n@model function outer()\n x ~ to_submodel(inner())\n b ~ Normal(x)\nend\n\nouter (generic function with 2 methods)\n\n\nand we tried to condition on x, it would be silently ignored, even though x is equal to a.\nThe reason for this is because it is entirely coincidental that the return value of the submodel is equal to a. In general, a return value can be anything, and conditioning on it is in general not a meaningful operation.\n\n\nPrefixing\nPrefixing is the only place where submodel behaviour is ‘special’ compared to that of ordinary models.\nBy default, all variables in a submodel are prefixed with the left-hand side of the tilde-statement. This is done to avoid clashes if the same submodel is used multiple times in a model.\nYou can disable this by passing false as the second argument to to_submodel:\n\n@model function outer_no_prefix()\n x ~ to_submodel(inner(), false)\n b ~ Normal(x)\nend\nrand(outer_no_prefix())\n\n(a = 0.6327762377562545, b = 99.65279863588333)", "crumbs": [ "Get Started", "User Guide", "Submodels" ] }, { "objectID": "usage/submodels/index.html#accessing-submodel-variables", "href": "usage/submodels/index.html#accessing-submodel-variables", "title": "Submodels", "section": "Accessing submodel variables", "text": "Accessing submodel variables\nIn all of the examples above, x is equal to a + 100 because that is the return value of the submodel. To access the actual latent variables in the submodel itself, the simplest option is to include the variable in the return value of the submodel:\n\n@model function inner_with_retval()\n a ~ Normal()\n # You can return anything you like from the model,\n # but if you want to access the latent variables, they\n # should be included in the return value.\n return (; a=a, a_plus_100=a + 100)\nend\n@model function outer_with_retval()\n # The variable `x` will now contain the return value of the submodel,\n # which is a named tuple with `a` and `a_plus_100`.\n x ~ to_submodel(inner_with_retval())\n # You can access the value of x.a directly, because\n # x is a NamedTuple which contains `a`. Since `b` is\n # centred on `x.a`, it should be close to 0, not 100.\n b ~ Normal(x.a)\nend\nrand(Xoshiro(468), outer_with_retval())\n\n(var\"x.a\" = 0.07200886749732076, b = -0.0020348890621966487)\n\n\nYou can also manually access the value by looking inside the special __varinfo__ object.\n\n\n\n\n\n\nWarning\n\n\n\nThis relies on DynamicPPL internals and we do not recommend doing this unless you have no other option, e.g., if the submodel is defined in a different package which you do not control.\n\n\n\n@model function outer_with_varinfo()\n x ~ to_submodel(inner())\n # Access the value of x.a\n a_value = __varinfo__[@varname(x.a)]\n b ~ Normal(a_value)\nend\nrand(Xoshiro(468), outer_with_varinfo())\n\n(var\"x.a\" = 0.07200886749732076, b = -0.0020348890621966487)", "crumbs": [ "Get Started", "User Guide", "Submodels" ] }, { "objectID": "usage/submodels/index.html#example-linear-models", "href": "usage/submodels/index.html#example-linear-models", "title": "Submodels", "section": "Example: linear models", "text": "Example: linear models\nHere is a motivating example for the use of submodels. Suppose we want to fit a (very simplified) regression model to some data \\(x\\) and \\(y\\), where\n\\[\\begin{align}\nc_0 &\\sim \\text{Normal}(0, 5) \\\\\nc_1 &\\sim \\text{Normal}(0, 5) \\\\\n\\mu &= c_0 + c_1x \\\\\ny &\\sim d\n\\end{align}\\]\nwhere \\(d\\) is some distribution parameterised by the value of \\(\\mu\\), which we don’t know the exact form of.\nIn practice, what we would do is to write several different models, one for each function \\(f\\):\n\n@model function normal(x, y)\n c0 ~ Normal(0, 5)\n c1 ~ Normal(0, 5)\n mu = c0 .+ c1 .* x\n # Assume that y = mu, and that the noise in `y` is\n # normally distributed with standard deviation sigma\n sigma ~ truncated(Cauchy(0, 3); lower=0)\n for i in eachindex(y)\n y[i] ~ Normal(mu[i], sigma)\n end\nend\n\n@model function logpoisson(x, y)\n c0 ~ Normal(0, 5)\n c1 ~ Normal(0, 5)\n mu = c0 .+ c1 .* x\n # exponentiate mu because the rate parameter of\n # a Poisson distribution must be positive\n for i in eachindex(y)\n y[i] ~ Poisson(exp(mu[i]))\n end\nend\n\n# and so on...\n\nlogpoisson (generic function with 2 methods)\n\n\n\n\n\n\n\n\nNote\n\n\n\nYou could use arraydist to avoid the loops: for example, in logpoisson, one could write y ~ arraydist(Poisson.(exp.(mu))), but for simplicity in this tutorial we spell it out fully.\n\n\nWe would then fit all of our models and use some criterion to test which model is most suitable (see e.g. Wikipedia, or section 3.4 of Bishop’s Pattern Recognition and Machine Learning).\nHowever, the code above is quite repetitive. For example, if we wanted to adjust the priors on c0 and c1, we would have to do it in each model separately. If this was any other kind of code, we would naturally think of extracting the common parts into a separate function. In this case we can do exactly that with a submodel:\n\n@model function priors(x)\n c0 ~ Normal(0, 5)\n c1 ~ Normal(0, 5)\n mu = c0 .+ c1 .* x\n return (; c0=c0, c1=c1, mu=mu)\nend\n\n@model function normal(x, y)\n ps = to_submodel(priors(x))\n sigma ~ truncated(Cauchy(0, 3); lower=0)\n for i in eachindex(y)\n y[i] ~ Normal(ps.mu[i], sigma)\n end\nend\n\n@model function logpoisson(x, y)\n ps = to_submodel(priors(x))\n for i in eachindex(y)\n y[i] ~ Poisson(exp(ps.mu[i]))\n end\nend\n\nlogpoisson (generic function with 2 methods)\n\n\nOne could go even further and extract the y section into its own submodel as well, which would bring us to a generalised linear modelling interface that does not actually require the user to define their own Turing models at all:\n\n@model function normal_family(mu, y)\n sigma ~ truncated(Cauchy(0, 3); lower=0)\n for i in eachindex(y)\n y[i] ~ Normal(mu[i], sigma)\n end\n return nothing\nend\n\n@model function logpoisson_family(mu, y)\n for i in eachindex(y)\n y[i] ~ Poisson(exp(mu[i]))\n end\n return nothing\nend\n\n# An end-user could just use this function. Of course,\n# a more thorough interface would also allow the user to\n# specify priors, etc.\nfunction make_model(x, y, family::Symbol)\n if family == :normal\n family_model = normal_family\n elseif family == :logpoisson\n family_model = logpoisson_family\n else\n error(\"unknown family: `$family`\")\n end\n\n @model function general(x, y)\n ps ~ to_submodel(priors(x), false)\n _n ~ to_submodel(family_model(ps.mu, y), false)\n end\n return general(x, y)\nend\n\nsample(make_model([1, 2, 3], [1, 2, 3], :normal), NUTS(), 1000; progress=false)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.8\n\n\n\n\nChains MCMC chain (1000×17×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 8.42 seconds\nCompute duration = 8.42 seconds\nparameters = c0, c1, sigma\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nWhile this final example really showcases the composability of submodels, it also illustrates a minor syntactic drawback. When we create a submodel from family_model(ps.mu, y), in principle, we do not really care about its return value because it is not used anywhere else in the model. Ideally, we should therefore not need to place anything on the left-hand side of to_submodel. However, because the special behaviour of to_submodel relies on the tilde operator, and the tilde operator requires a left-hand side, we have to use a dummy variable (here _n).\nFurthermore, because the left-hand side of a tilde-statement must be a valid variable name, we cannot use destructuring syntax on the left-hand side of to_submodel, even if the return value is a NamedTuple. Thus, for example, the following is not allowed:\n(; c0, c1, mu) ~ to_submodel(priors(x))\nTo use destructuring syntax, you would have to add a separate line:\nps = to_submodel(priors(x))\n(; c0, c1, mu) = ps", "crumbs": [ "Get Started", "User Guide", "Submodels" ] }, { "objectID": "usage/submodels/index.html#submodels-versus-distributions", "href": "usage/submodels/index.html#submodels-versus-distributions", "title": "Submodels", "section": "Submodels versus distributions", "text": "Submodels versus distributions\nFinally, we end with a discussion of why some of the behaviour for submodels above has come about. This is slightly more behind-the-scenes and therefore will likely be of most interest to Turing developers.\nFundamentally, submodels are to be compared against distributions: both of them can appear on the right-hand side of a tilde statement. However, distributions only have one ‘output’, i.e., the value that is sampled from them:\n\ndist = Normal()\nrand(dist)\n\n1.6626589581960542\n\n\nAnother point to bear in mind is that, given a sample from dist, asking for its log-probability is a meaningful calculation.\n\nlogpdf(dist, rand(dist))\n\n-1.1091782613614343\n\n\nIn contrast, models (and hence submodels) have two different outputs: the latent variables, and the return value. These are accessed respectively using rand(model) and model():\n\n@model function f()\n a ~ Normal()\n return \"hello, world.\"\nend\n\nmodel = f()\n\nDynamicPPL.Model{typeof(f), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}(f, NamedTuple(), NamedTuple(), DynamicPPL.DefaultContext())\n\n\n\n# Latent variables\nrand(model)\n\n(a = 0.8966845352544693,)\n\n\n\n# Return value\nmodel()\n\n\"hello, world.\"\n\n\nJust like for distributions, one can indeed ask for the log-probability of the latent variables (although we have to specify whether we want the joint, likelihood, or prior):\n\nlogjoint(model, rand(model))\n\n-1.201325437800081\n\n\nBut it does not make sense to ask for the log-probability of the return value (which in this case is a string, and in general, could be literally any object).\nThe fact that we have what looks like a unified notation for these is a bit of a lie, since it hides this distinction. In particular, for x ~ distr, x is assigned the value of rand(distr); but for y ~ submodel, y is assigned the value of submodel(). This is why, for example, it is impossible to condition on y in y ~ ...; we can only condition on x in x ~ dist.\nEventually we would like to make this more logically consistent. In particular, it is clear that y ~ submodel should return not one but two objects: the latent variables and the return value. Furthermore, it should be possible to condition on the latent variables, but not on the return value. See this issue for an ongoing discussion of the best way to accomplish this.\nIt should be mentioned that extracting the latent variables from a submodel is not entirely trivial since the submodel is run using the same VarInfo as the parent model (i.e., we would have to do a before-and-after comparison to see which new variables were added by the submodel).\nAlso, we are still working out the exact data structure that should be used to represent the latent variables. In the examples above rand(model) returns a NamedTuple, but this actually causes loss of information because the keys of a NamedTuple are Symbols, whereas we really want to use VarNames. See this issue for a current proposal.", "crumbs": [ "Get Started", "User Guide", "Submodels" ] }, { "objectID": "usage/threadsafe-evaluation/index.html", "href": "usage/threadsafe-evaluation/index.html", "title": "Threadsafe Evaluation", "section": "", "text": "A common technique to speed up Julia code is to use multiple threads to run computations in parallel. The Julia manual has a section on multithreading, which is a good introduction to the topic.\nWe assume that the reader is familiar with some threading constructs in Julia, and the general concept of data races. This page specificaly discusses Turing’s support for threadsafe model evaluation.\nprintln(\"This notebook is being run with $(Threads.nthreads()) threads.\")\n\nThis notebook is being run with 4 threads.", "crumbs": [ "Get Started", "User Guide", "Threadsafe Evaluation" ] }, { "objectID": "usage/threadsafe-evaluation/index.html#threading-in-turing-models", "href": "usage/threadsafe-evaluation/index.html#threading-in-turing-models", "title": "Threadsafe Evaluation", "section": "Threading in Turing models", "text": "Threading in Turing models\nGiven that Turing models mostly contain ‘plain’ Julia code, one might expect that all threading constructs such as Threads.@threads or Threads.@spawn can be used inside Turing models.\nThis is, to some extent, true: for example, you can use threading constructs to speed up deterministic computations. For example, here we use parallelism to speed up a transformation of x:\n\nusing Turing\n\n@model function parallel(y)\n x ~ dist\n x_transformed = similar(x)\n Threads.@threads for i in eachindex(x)\n x_transformed[i] = some_expensive_function(x[i])\n end\n y ~ some_likelihood(x_transformed)\nend\n\n\n┌ Warning: It looks like you are using `Threads.@threads` in your model definition.\n│ \n│ Note that since version 0.39 of DynamicPPL, threadsafe evaluation of models is disabled by default. If you need it, you will need to explicitly enable it by creating the model, and then running `model = setthreadsafe(model, true)`.\n│ \n│ Threadsafe model evaluation is only needed when parallelising tilde-statements (not arbitrary Julia code), and avoiding it can often lead to significant performance improvements.\n│ \n│ Please see https://turinglang.org/docs/usage/threadsafe-evaluation/ for more details of when threadsafe evaluation is actually required.\n└ @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/compiler.jl:383\n\n\n\n\nparallel (generic function with 2 methods)\n\n\nIn general, for code that does not involve tilde-statements (x ~ dist), threading works exactly as it does in regular Julia code.\nHowever, extra care must be taken when using tilde-statements (x ~ dist), or @addlogprob!, inside threaded blocks.\n\n\n\n\n\n\nNoteWhy are tilde-statements special?\n\n\n\nTilde-statements are expanded by the @model macro into something that modifies the internal VarInfo object used for model evaluation. Essentially, x ~ dist expands to something like\nx, __varinfo__ = DynamicPPL.tilde_assume!!(..., __varinfo__)\nand writing into __varinfo__ is, in general, not threadsafe. Thus, parallelising tilde-statements can lead to data races as described in the Julia manual.", "crumbs": [ "Get Started", "User Guide", "Threadsafe Evaluation" ] }, { "objectID": "usage/threadsafe-evaluation/index.html#threaded-observations", "href": "usage/threadsafe-evaluation/index.html#threaded-observations", "title": "Threadsafe Evaluation", "section": "Threaded observations", "text": "Threaded observations\nAs of version 0.42, Turing only supports the use of tilde-statements inside threaded blocks when these are observations (i.e., likelihood terms).\nHowever, such models must be marked by the user as requiring threadsafe evaluation, using setthreadsafe.\nThis means that the following code is safe to use:\n\n@model function threaded_obs(N)\n x ~ Normal()\n y = Vector{Float64}(undef, N)\n Threads.@threads for i in 1:N\n y[i] ~ Normal(x)\n end\nend\n\nN = 100\ny = randn(N)\nthreadunsafe_model = threaded_obs(N) | (; y = y)\nthreadsafe_model = setthreadsafe(threadunsafe_model, true)\n\n\n┌ Warning: It looks like you are using `Threads.@threads` in your model definition.\n│ \n│ Note that since version 0.39 of DynamicPPL, threadsafe evaluation of models is disabled by default. If you need it, you will need to explicitly enable it by creating the model, and then running `model = setthreadsafe(model, true)`.\n│ \n│ Threadsafe model evaluation is only needed when parallelising tilde-statements (not arbitrary Julia code), and avoiding it can often lead to significant performance improvements.\n│ \n│ Please see https://turinglang.org/docs/usage/threadsafe-evaluation/ for more details of when threadsafe evaluation is actually required.\n└ @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/compiler.jl:383\n\n\n\n\nDynamicPPL.Model{typeof(threaded_obs), (:N,), (), (), Tuple{Int64}, Tuple{}, DynamicPPL.ConditionContext{@NamedTuple{y::Vector{Float64}}, DynamicPPL.DefaultContext}, true}(threaded_obs, (N = 100,), NamedTuple(), ConditionContext((y = [0.6157227548409125, 1.0462155141931704, -0.24091937971417118, 0.347608886387371, -0.20536740908243986, -0.7167829796673898, -0.0821860192632437, 0.5938552191307076, 0.4173776302912096, -0.05295080354607168, 1.2575320913284038, -1.2863380367361552, 1.054271727637123, 0.07480009410789022, 0.052477055762680115, -0.4284167206511119, -1.4172146064585498, -0.04082496516276575, 0.38871790855097, -0.1417760213358877, -1.0254448355965495, -1.467914624890712, -0.6287933685115594, 0.08982061454138729, 1.4043604031934744, 0.4204693395879604, 1.5478390178365018, 0.6128542432422197, 0.42710129047416706, -0.6109801392920082, -2.207884098819997, -1.2147512321901626, 0.605272908051513, 0.2726083463922303, 0.2346665034554911, 1.0145017597139154, 0.7797695743840005, -0.3626568826974562, -0.2385123363042218, -0.24790184741139124, -0.24056473736811984, -0.7949838831632657, 0.9724103370309812, -0.45001670565323015, 0.14199158317091654, -1.1185949474448997, -1.497157638000686, -1.155526704836487, -1.3522157782751871, -0.26359288212400483, -0.702501281558662, 0.6854996404760917, 0.28882824336135204, -0.7793579760413759, -1.6238403708521503, -0.8526503514396271, 0.5804563395216211, 0.20156591834999157, 1.1326573488233298, 0.41415148727101164, -0.1297963739728851, -2.175886480679348, 0.19050068853642302, -0.10956050271828323, 0.9424417218711935, 0.9463635522700503, 0.17838206558937053, 0.5475759395073111, 0.8655261832643574, -0.08930319126826322, 1.3147112627680377, 0.8421269067766447, -2.022072536829743, 0.9340525979905604, -0.08751703474324263, -0.12366652037493893, -0.28499628629918666, 0.686550895629059, -0.568697198950328, -0.8541209725716037, -1.4601716494718802, 1.540154380131499, 0.4174640117532761, -1.1116491997590532, -1.8779224416638187, -0.6070972658902858, -1.559928780284108, -1.6276445591210194, -0.5983569175620345, 0.5400404223573743, -0.12212573212851616, -0.33961029913246016, -0.27237176462278395, -1.163424752258261, 1.7141473337477955, 1.1007264052210153, 0.28677175722432985, 1.1808456969100567, 0.6578279176322354, 0.26813615536427515],), DynamicPPL.DefaultContext()))\n\n\nEvaluating this model is threadsafe, in that Turing guarantees to provide the correct result in functions such as:\n\nlogjoint(threadsafe_model, (; x = 0.0))\n\n-134.1374832528087\n\n\n(we can compare with the true value)\n\nlogpdf(Normal(), 0.0) + sum(logpdf.(Normal(0.0), y))\n\n-134.13748325280866\n\n\nNote that if you do not use setthreadsafe, the above code may give wrong results, or even error:\n\nlogjoint(threadunsafe_model, (; x = 0.0))\n\n-34.85681637618924\n\n\nYou can sample from this model and safely use functions such as predict or returned, as long as the model is always marked as threadsafe:\n\nmodel = setthreadsafe(threaded_obs(N) | (; y = y), true)\nchn = sample(model, NUTS(), 100; check_model=false, progress=false)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\n\n\nChains MCMC chain (100×15×1 Array{Float64, 3}):\n\nIterations = 51:1:150\nNumber of chains = 1\nSamples per chain = 100\nWall duration = 6.7 seconds\nCompute duration = 6.7 seconds\nparameters = x\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\npmodel = setthreadsafe(threaded_obs(N), true) # don't condition on data\npredict(pmodel, chn)\n\nChains MCMC chain (100×100×1 Array{Float64, 3}):\n\nIterations = 1:1:100\nNumber of chains = 1\nSamples per chain = 100\nparameters = y[1], y[2], y[3], y[4], y[5], y[6], y[7], y[8], y[9], y[10], y[11], y[12], y[13], y[14], y[15], y[16], y[17], y[18], y[19], y[20], y[21], y[22], y[23], y[24], y[25], y[51], y[52], y[53], y[54], y[55], y[56], y[57], y[58], y[59], y[60], y[61], y[62], y[63], y[64], y[65], y[66], y[67], y[68], y[69], y[70], y[71], y[72], y[73], y[74], y[75], y[26], y[27], y[28], y[29], y[30], y[31], y[32], y[33], y[34], y[35], y[36], y[37], y[38], y[39], y[40], y[41], y[42], y[43], y[44], y[45], y[46], y[47], y[48], y[49], y[50], y[76], y[77], y[78], y[79], y[80], y[81], y[82], y[83], y[84], y[85], y[86], y[87], y[88], y[89], y[90], y[91], y[92], y[93], y[94], y[95], y[96], y[97], y[98], y[99], y[100]\ninternals = \n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\n\n\n\n\n\nWarningPrevious versions\n\n\n\nUp until Turing v0.41, you did not need to use setthreadsafe to enable threadsafe evaluation, and it was automatically enabled whenever Julia was launched with more than one thread.\nThere were several reasons for changing this: one major one is because threadsafe evaluation comes with a performance cost, which can sometimes be substantial (see below).\nFurthermore, the number of threads is not an appropriate way to determine whether threadsafe evaluation is needed!", "crumbs": [ "Get Started", "User Guide", "Threadsafe Evaluation" ] }, { "objectID": "usage/threadsafe-evaluation/index.html#threaded-assumptions-sampling-latent-values", "href": "usage/threadsafe-evaluation/index.html#threaded-assumptions-sampling-latent-values", "title": "Threadsafe Evaluation", "section": "Threaded assumptions / sampling latent values", "text": "Threaded assumptions / sampling latent values\nOn the other hand, parallelising the sampling of latent values is not supported. Attempting to do this will either error or give wrong results.\n\n@model function threaded_assume_bad(N)\n x = Vector{Float64}(undef, N)\n Threads.@threads for i in 1:N\n x[i] ~ Normal()\n end\n return x\nend\n\nmodel = threaded_assume_bad(100)\n\n# This will throw an error (and probably a different error\n# each time it's run...)\nmodel()\n\n\n┌ Warning: It looks like you are using `Threads.@threads` in your model definition.\n│ \n│ Note that since version 0.39 of DynamicPPL, threadsafe evaluation of models is disabled by default. If you need it, you will need to explicitly enable it by creating the model, and then running `model = setthreadsafe(model, true)`.\n│ \n│ Threadsafe model evaluation is only needed when parallelising tilde-statements (not arbitrary Julia code), and avoiding it can often lead to significant performance improvements.\n│ \n│ Please see https://turinglang.org/docs/usage/threadsafe-evaluation/ for more details of when threadsafe evaluation is actually required.\n└ @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/compiler.jl:383\n\n\n\n\n\nTaskFailedException\n\n nested task error: BoundsError: attempt to access 11-element BitVector at index [13]\n Stacktrace:\n [1] throw_boundserror(A::BitVector, I::Tuple{Int64})\n @ Base ./essentials.jl:14\n [2] checkbounds\n @ ./abstractarray.jl:699 [inlined]\n [3] getindex\n @ ./bitarray.jl:681 [inlined]\n [4] is_transformed\n @ ~/.julia/packages/DynamicPPL/oIycL/src/varinfo.jl:811 [inlined]\n [5] is_transformed\n @ ~/.julia/packages/DynamicPPL/oIycL/src/varinfo.jl:810 [inlined]\n [6] (::DynamicPPL.var\"#123#124\"{DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{AbstractPPL.VarName, Int64}, Vector{Distribution}, Vector{AbstractPPL.VarName}, Vector{Real}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}})(vn::AbstractPPL.VarName{:x, Accessors.IndexLens{Tuple{Int64}}})\n @ DynamicPPL ./none:0\n [7] iterate\n @ ./generator.jl:48 [inlined]\n [8] _any(f::typeof(identity), itr::Base.Generator{Vector{AbstractPPL.VarName}, DynamicPPL.var\"#123#124\"{DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{AbstractPPL.VarName, Int64}, Vector{Distribution}, Vector{AbstractPPL.VarName}, Vector{Real}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ::Colon)\n @ Base ./reduce.jl:1243\n [9] any\n @ ./reduce.jl:1228 [inlined]\n [10] any\n @ ./reduce.jl:1154 [inlined]\n [11] is_transformed\n @ ~/.julia/packages/DynamicPPL/oIycL/src/varinfo.jl:1411 [inlined]\n [12] tilde_assume!!(ctx::DynamicPPL.InitContext{Random.TaskLocalRNG, InitFromPrior}, dist::Normal{Float64}, vn::AbstractPPL.VarName{:x, Accessors.IndexLens{Tuple{Int64}}}, vi::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{AbstractPPL.VarName, Int64}, Vector{Distribution}, Vector{AbstractPPL.VarName}, Vector{Real}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/contexts/init.jl:329\n [13] (::var\"#85#threadsfor_fun#8\"{var\"#85#threadsfor_fun#7#9\"{DynamicPPL.Model{typeof(threaded_assume_bad), (:N,), (), (), Tuple{Int64}, Tuple{}, DynamicPPL.InitContext{Random.TaskLocalRNG, InitFromPrior}, false}, UnitRange{Int64}}})(tid::Int64; onethread::Bool)\n @ Main.Notebook ./threadingconstructs.jl:253\n [14] #85#threadsfor_fun\n @ ./threadingconstructs.jl:220 [inlined]\n [15] (::Base.Threads.var\"#1#2\"{var\"#85#threadsfor_fun#8\"{var\"#85#threadsfor_fun#7#9\"{DynamicPPL.Model{typeof(threaded_assume_bad), (:N,), (), (), Tuple{Int64}, Tuple{}, DynamicPPL.InitContext{Random.TaskLocalRNG, InitFromPrior}, false}, UnitRange{Int64}}}, Int64})()\n @ Base.Threads ./threadingconstructs.jl:154\n\n...and 3 more exceptions.\n\nStacktrace:\n [1] threading_run(fun::var\"#85#threadsfor_fun#8\"{var\"#85#threadsfor_fun#7#9\"{DynamicPPL.Model{typeof(threaded_assume_bad), (:N,), (), (), Tuple{Int64}, Tuple{}, DynamicPPL.InitContext{Random.TaskLocalRNG, InitFromPrior}, false}, UnitRange{Int64}}}, static::Bool)\n @ Base.Threads ./threadingconstructs.jl:173\n [2] macro expansion\n @ ./threadingconstructs.jl:190 [inlined]\n [3] threaded_assume_bad\n @ ~/work/docs/docs/usage/threadsafe-evaluation/index.qmd:140 [inlined]\n [4] _evaluate!!\n @ ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:997 [inlined]\n [5] evaluate!!\n @ ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:983 [inlined]\n [6] init!!\n @ ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:938 [inlined]\n [7] init!!\n @ ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:936 [inlined]\n [8] Model\n @ ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:911 [inlined]\n [9] (::DynamicPPL.Model{typeof(threaded_assume_bad), (:N,), (), (), Tuple{Int64}, Tuple{}, DynamicPPL.DefaultContext, false})()\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:904\n [10] top-level scope\n @ ~/work/docs/docs/usage/threadsafe-evaluation/index.qmd:150", "crumbs": [ "Get Started", "User Guide", "Threadsafe Evaluation" ] }, { "objectID": "usage/threadsafe-evaluation/index.html#when-is-threadsafe-evaluation-really-needed", "href": "usage/threadsafe-evaluation/index.html#when-is-threadsafe-evaluation-really-needed", "title": "Threadsafe Evaluation", "section": "When is threadsafe evaluation really needed?", "text": "When is threadsafe evaluation really needed?\nYou only need to enable threadsafe evaluation if you are using tilde-statements or @addlogprob! inside threaded blocks.\nSpecifically, you do not need to enable threadsafe evaluation if:\n\nYou have parallelism inside the model, but it does not involve tilde-statements or @addlogprob!.\n@model function parallel_no_tilde(y)\n x ~ Normal()\n fy = similar(y)\n Threads.@threads for i in eachindex(y)\n fy[i] = some_expensive_function(x, y[i])\n end\nend\n# This does not need setthreadsafe\nmodel = parallel_no_tilde(y)\nYou are sampling from a model using MCMCThreads(), but the model itself does not contain any parallel tilde-statements or @addlogprob!.\n@model function no_parallel(y)\n x ~ Normal()\n y ~ Normal(x)\nend\n\n# This does not need setthreadsafe\nmodel = no_parallel(1.0)\nchn = sample(model, NUTS(), MCMCThreads(), 100)", "crumbs": [ "Get Started", "User Guide", "Threadsafe Evaluation" ] }, { "objectID": "usage/threadsafe-evaluation/index.html#performance-considerations", "href": "usage/threadsafe-evaluation/index.html#performance-considerations", "title": "Threadsafe Evaluation", "section": "Performance considerations", "text": "Performance considerations\nAs described above, one of the major considerations behind the introduction of setthreadsafe is that threadsafe evaluation comes with a performance cost.\nConsider a simple model that does not use threading:\n\n@model function gdemo()\n s ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s))\n 1.5 ~ Normal(m, sqrt(s))\n 2.0 ~ Normal(m, sqrt(s))\nend\nmodel_no_threadsafe = gdemo()\nmodel_threadsafe = setthreadsafe(gdemo(), true)\n\nDynamicPPL.Model{typeof(gdemo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, true}(gdemo, NamedTuple(), NamedTuple(), DynamicPPL.DefaultContext())\n\n\nOne can see that evaluation of the threadsafe model is substantially slower:\n\nusing Chairmarks, DynamicPPL\n\nfunction benchmark_eval(m)\n vi = VarInfo(m)\n display(median(@be DynamicPPL.evaluate!!($m, $vi)))\nend\n\nbenchmark_eval(model_no_threadsafe)\nbenchmark_eval(model_threadsafe)\n\n284.782 ns (8 allocs: 464 bytes)\n\n\n3.271 μs (49 allocs: 2.766 KiB)\n\n\nIn previous versions of Turing, this cost would always be incurred whenever Julia was launched with multiple threads, even if the model did not use any threading at all!", "crumbs": [ "Get Started", "User Guide", "Threadsafe Evaluation" ] }, { "objectID": "usage/threadsafe-evaluation/index.html#alternatives-to-threaded-observation", "href": "usage/threadsafe-evaluation/index.html#alternatives-to-threaded-observation", "title": "Threadsafe Evaluation", "section": "Alternatives to threaded observation", "text": "Alternatives to threaded observation\nAn alternative to using threaded observations is to manually calculate the log-likelihood term (which can be parallelised using any of Julia’s standard mechanisms), and then outside of the threaded block, add it to the model using @addlogprob!.\nFor example:\n\n# Note that `y` has to be passed as an argument; you can't\n# condition on it because otherwise `y[i]` won't be defined.\n@model function threaded_obs_addlogprob(N, y)\n x ~ Normal()\n\n # Instead of this:\n # Threads.@threads for i in 1:N\n # y[i] ~ Normal(x)\n # end\n\n # Do this instead:\n lls = map(1:N) do i\n Threads.@spawn begin\n logpdf(Normal(x), y[i])\n end\n end\n @addlogprob! sum(fetch.(lls))\nend\n\nthreaded_obs_addlogprob (generic function with 2 methods)\n\n\nIn a similar way, you can also use your favourite parallelism package, such as FLoops.jl or OhMyThreads.jl. See this Discourse post for some examples.\nWe make no promises about the use of tilde-statements with these packages (indeed it will most likely error), but as long as you use them to only parallelise regular Julia code (i.e., not tilde-statements), they will work as intended.\nThe main downside of this approach is:\n\nYou can’t use conditioning syntax to provide data; it has to be passed as an argument or otherwise included inside the model.\nYou can’t use predict to sample new data.\n\nOn the other hand, one benefit of rewriting the model this way is that sampling from this model with MCMCThreads() will always be reproducible.\n\nusing Random\nN = 100\ny = randn(N)\n# Note that since `@addlogprob!` is outside of the threaded block, we don't\n# need to use `setthreadsafe`.\nmodel = threaded_obs_addlogprob(N, y)\nnuts_kwargs = (progress=false, verbose=false)\n\nchain1 = sample(Xoshiro(468), model, NUTS(), MCMCThreads(), 1000, 4; nuts_kwargs...)\nchain2 = sample(Xoshiro(468), model, NUTS(), MCMCThreads(), 1000, 4; nuts_kwargs...)\nmean(chain1[:x]), mean(chain2[:x]) # should be identical\n\n(-0.05366125300886113, -0.05366125300886113)\n\n\nIn contrast, the original threaded_obs (which used tilde inside Threads.@threads) is not reproducible when using MCMCThreads(). (In principle, we would like to fix this bug, but we haven’t yet investigated where it stems from.)\n\nmodel = setthreadsafe(threaded_obs(N) | (; y = y), true)\nnuts_kwargs = (progress=false, verbose=false)\nchain1 = sample(Xoshiro(468), model, NUTS(), MCMCThreads(), 1000, 4; nuts_kwargs...)\nchain2 = sample(Xoshiro(468), model, NUTS(), MCMCThreads(), 1000, 4; nuts_kwargs...)\nmean(chain1[:x]), mean(chain2[:x]) # oops!\n\n(-0.047996272969382116, -0.05017649031371641)", "crumbs": [ "Get Started", "User Guide", "Threadsafe Evaluation" ] }, { "objectID": "usage/threadsafe-evaluation/index.html#ad-support", "href": "usage/threadsafe-evaluation/index.html#ad-support", "title": "Threadsafe Evaluation", "section": "AD support", "text": "AD support\nFinally, if you are using Turing with automatic differentiation, you also need to keep track of which AD backends support threadsafe evaluation.\nForwardDiff is the only AD backend that we find to work reliably with threaded model evaluation.\nIn particular:\n\nReverseDiff sometimes gives right results, but quite often gives incorrect gradients.\nMooncake currently does not support multithreading at all.\nEnzyme mostly gives the right result, but sometimes gives incorrect gradients.", "crumbs": [ "Get Started", "User Guide", "Threadsafe Evaluation" ] }, { "objectID": "usage/threadsafe-evaluation/index.html#under-the-hood", "href": "usage/threadsafe-evaluation/index.html#under-the-hood", "title": "Threadsafe Evaluation", "section": "Under the hood", "text": "Under the hood\n\n\n\n\n\n\nNote\n\n\n\nThis part will likely only be of interest to DynamicPPL developers and the very curious user.\n\n\n\nWhy is VarInfo not threadsafe?\nAs alluded to above, the issue with threaded tilde-statements stems from the fact that these tilde-statements modify the VarInfo object used for model evaluation, leading to potential data races.\nTraditionally, VarInfo objects contain both metadata as well as accumulators. Metadata is where information about the random variables’ values are stored. It is a Dict-like structure, and pushing to it from multiple threads is therefore not threadsafe (Julia’s Dict has similar limitations).\nOn the other hand, accumulators are used to store outputs of the model, such as log-probabilities The way DynamicPPL’s threadsafe evaluation works is to create one set of accumulators per thread, and then combine the results at the end of model evaluation.\nIn this way, any function call that solely involving accumulators can be made threadsafe. For example, this is why observations are supported: there is no need to modify metadata, and only the log-likelihood accumulator needs to be updated.\nHowever, assume tilde-statements always modify the metadata, and thus cannot currently be made threadsafe.\n\n\nOnlyAccsVarInfo\nAs it happens, much of what is needed in DynamicPPL can be constructed such that they only rely on accumulators.\nFor example, as long as there is no need to sample new values of random variables, it is actually fine to completely omit the metadata object. This is the case for LogDensityFunction: since values are provided as the input vector, there is no need to store it in metadata. We need only calculate the associated log-prior probability, which is stored in an accumulator. Thus, since DynamicPPL v0.39, LogDensityFunction itself is completely threadsafe.\nTechnically speaking, this is achieved using OnlyAccsVarInfo, which is a subtype of VarInfo that only contains accumulators, and no metadata at all. It implements enough of the VarInfo interface to be used in model evaluation, but will error if any functions attempt to modify or read its metadata.\nThere is currently an ongoing push to use OnlyAccsVarInfo in as many settings as we possibly can. For example, this is why predict is threadsafe in DynamicPPL v0.39: instead of modifying metadata to store the predicted values, we store them inside a ValuesAsInModelAccumulator instead, and combine them at the end of evaluation.\nHowever, propagating these changes up to Turing will require a substantial amount of additional work, since there are many places in Turing which currently rely on a full VarInfo (with metadata). See, e.g., this PR for more information.", "crumbs": [ "Get Started", "User Guide", "Threadsafe Evaluation" ] }, { "objectID": "usage/varnamedtuple/index.html", "href": "usage/varnamedtuple/index.html", "title": "VarNamedTuple", "section": "", "text": "WarningThis page refers to a future version of DynamicPPL.jl\n\n\n\nThe changes on this page are being implemented in DynamicPPL v0.40. They are not currently available in released versions of DynamicPPL.jl and Turing.jl. The documentation is being written in advance to minimise the delay between the release of the new version and the availability of documentation.\nPlease see this PR and this milestone for ongoing progress.\nusing DynamicPPL\nif pkgversion(DynamicPPL) >= v\"0.40\"\n error(\"This page needs to be updated\")\nend\nIn many places Turing.jl uses a custom data structure, VarNamedTuple, to represent mappings of VarNames to arbitrary values.\nThis completely replaces the usage of NamedTuples or OrderedDict{VarName} in previous versions.\nCurrently, VarNamedTuple is defined in DynamicPPL.jl; it may be moved to AbstractPPL.jl in the future once its functionality has stabilised.", "crumbs": [ "Get Started", "User Guide", "VarNamedTuple" ] }, { "objectID": "usage/varnamedtuple/index.html#using-varnamedtuples", "href": "usage/varnamedtuple/index.html#using-varnamedtuples", "title": "VarNamedTuple", "section": "Using VarNamedTuples", "text": "Using VarNamedTuples\nVery often, VarNamedTuples are constructed automatically inside Turing.jl models, and you do not need to create them yourself. Here is a simple example of a VarNamedTuple created automatically by Turing.jl when running mode estimation:\nusing Turing\n\n@model function demo_model()\n x = Vector{Float64}(undef, 2)\n x[1] ~ Normal()\n x[2] ~ Beta(2, 2)\n y ~ Normal(x[1] + x[2], 1)\nend\nmodel = demo_model() | (; y = 1.0)\n\nres = maximum_a_posteriori(model)\n\n# This is a VarNamedTuple.\nres.params\nAs far as using VarNamedTuples goes, they behave very similarly to Dict{VarName}s. You can access the stored values using getindex:\nres.params[@varname(x[1])]\nThe nice thing about VarNamedTuples is that they contain knowledge about the structure of the variables inside them (which is stored during the model evaluation). For example, this particular VarNamedTuple knows that x is a length-2 vector, so you can access\nres.params[@varname(x)]\neven though x itself was never on the left-hand side of a tilde-statement (only x[1] and x[2] were). This is not possible with a Dict{VarName}. You can even do things like:\nres.params[@varname(x[end])]\nand it will work ‘as expected’.\nPut simply, indexing into a variable in a VarNamedTuple mimics indexing into the original variable itself as far as possible.", "crumbs": [ "Get Started", "User Guide", "VarNamedTuple" ] }, { "objectID": "usage/varnamedtuple/index.html#creating-varnamedtuples", "href": "usage/varnamedtuple/index.html#creating-varnamedtuples", "title": "VarNamedTuple", "section": "Creating VarNamedTuples", "text": "Creating VarNamedTuples\nIf you only ever need to read from a VarNamedTuple, then the above section would suffice. However, there are also some cases where we ask users to construct a VarNamedTuple.\nSome cases where Turing users may need to construct a VarNamedTuples include the following:\n\nProviding initial parameters for MCMC sampling or optimisation;\nProviding parameters to condition models on, or to fix.\n\n\n\n\n\n\n\nNoteA deeper dive into VarNamedTuples\n\n\n\nIf you are developing against Turing or DynamicPPL (e.g. if you are writing custom inference algorithms), you will also probably need to create VarNamedTuples. In this case you will likely have to understand their lower-level APIs. We strongly recommend reading the DynamicPPL docs, where we explain the design and implementation of VarNamedTuples in much more detail.\n\n\nTo create a VarNamedTuple, you can use the VarNamedTuple constructor directly:\nVarNamedTuple(x = 1, y = \"a\", z = [1, 2, 3])\nHowever, this direct constructor only works for variables that are top-level symbols. If you have VarNames that contain indexing or field access, we recommend using the @vnt macro, which is exported from DynamicPPL and Turing.\nusing Turing\n\nvnt = @vnt begin\n x := 1\n y.a.b := \"a\"\n z[1] := 10\nend\nHere, each line with := indicates that we are setting that VarName to the corresponding value. You can have any valid VarName on the left-hand side. (Note that you must use colon-equals; we reserve the syntax x = y for future use.)\n\nGrowableArrays\nIn the above example, vnt is a VarNamedTuple with three entries. However, you may have noticed the warning issued about a GrowableArray. What does that mean?\nThe problem with the above call is that when setting z[1], the VarNamedTuple does not yet know what z is supposed to be. It is probably a vector, but in principle it could be a matrix (where z[1] is using linear indexing). Furthermore, we don’t know what type of array it is. It could be Base.Array, or it could be some custom array type, like OffsetArray.\nGrowableArray is DynamicPPL’s way of representing an array whose size and type are not yet known. When you set z[1] := 10, DynamicPPL creates a one-dimensional GrowableArray for z, which can then be ‘grown’ as more entries are set. However, this is a heuristic, and may not always be correct; hence the warning.\n\n\nTemplating\nTo avoid this, we strongly recommend that whenever you have variables that are arrays or structs, you provide a ‘template’ for them. A template is an array that has the same type and shape as the variable that will eventually be used in the model.\nFor example, if your model looks like this:\n@model function demo_template()\n # ...\n z = zeros(2, 2, 2)\n z[1] ~ Normal()\n # ...\nend\nthen the template for z should be any Base.Array{T,3} of size (2, 2, 2). (The element type does not matter, as it will be inferred from the values you set.)\nTo specify a template, you can use the @template macro inside the @vnt block. The following example, for example, says that z inside the model will be a 3-dimensional Base.Array of size (2, 2, 2). The fact that it contains zeros is irrelevant, so you can provide any template that is structurally the same.\nvnt = @vnt begin\n @template z = zeros(2, 2, 2)\n z[1] := 1.0\nend\nNotice now that the created VarNamedTuple knows that z is a 3-dimensional array, so no warnings are issued. Furthermore, you can now index into it as if it were a 3D array:\nvnt[@varname(z[1, 1, 1])]\n(With a GrowableArray, this would have errored.)\nWhen setting a template, you can use any valid Julia expression on the right-hand side (such as variables from the surrounding scope). Any expressions in templates are only evaluated once.\nYou can also omit the right-hand side, in which case the template will be assumed to be the variable with that name:\n# Declare this variable outside.\nz = zeros(2, 2, 2)\n\n# The following is equivalent to `@template z = z`.\nvnt = @vnt begin\n @template z\n z[1] := 1.0\nend\nMultiple templates can also be set on the same line, using space-separated assignments: @template x = expr1 y = expr2 ....\n\n\nNested values\nIf you have nested structs or arrays, you need to provide templates for the top-level symbol.\nvnt = @vnt begin\n @template y = (a = zeros(2), b = zeros(3))\n y.a[1] := 1.0\n y.b[2] := 2.0\nend\nThis restriction will probably be lifted in future versions; for example if you are trying to set a value y.a[1], you could provide a template for y.a without providing one for y.", "crumbs": [ "Get Started", "User Guide", "VarNamedTuple" ] }, { "objectID": "usage/external-samplers/index.html", "href": "usage/external-samplers/index.html", "title": "Using External Samplers", "section": "", "text": "Turing provides several wrapped samplers from external sampling libraries, e.g., HMC samplers from AdvancedHMC. These wrappers allow new users to seamlessly sample statistical models without leaving Turing However, these wrappers might not always be complete, missing some functionality from the wrapped sampling library. Moreover, users might want to use samplers currently not wrapped within Turing.\nFor these reasons, Turing also makes running external samplers on Turing models easy without any necessary modifications or wrapping! Throughout, we will use a 10-dimensional Neal’s funnel as a running example::\n\n# Import libraries.\nusing Turing, Random, LinearAlgebra\n\nd = 10\n@model function funnel()\n θ ~ Truncated(Normal(0, 3), -3, 3)\n z ~ MvNormal(zeros(d - 1), exp(θ) * I)\n return x ~ MvNormal(z, I)\nend\n\nfunnel (generic function with 2 methods)\n\n\nNow we sample the model to generate some observations, which we can then condition on.\n\n(; x) = rand(funnel() | (θ=0,))\nmodel = funnel() | (; x);\n\nUsers can use any sampler algorithm to sample this model if it follows the AbstractMCMC API. Before discussing how this is done in practice, giving a high-level description of the process is interesting. Imagine that we created an instance of an external sampler that we will call spl such that typeof(spl)<:AbstractMCMC.AbstractSampler. In order to avoid type ambiguity within Turing, at the moment, it is necessary to declare spl as an external sampler to Turing espl = externalsampler(spl), where externalsampler(s::AbstractMCMC.AbstractSampler) is a Turing function that types our external sampler adequately.\nAn excellent point to start to show how this is done in practice is by looking at the sampling library AdvancedMH (AdvancedMH’s GitHub) for Metropolis-Hastings (MH) methods. Let’s say we want to use a random walk Metropolis-Hastings sampler without specifying the proposal distributions. The code below constructs an MH sampler using a multivariate Gaussian distribution with zero mean and unit variance in d dimensions as a random walk proposal.\n\n# Importing the sampling library\nusing AdvancedMH\nrwmh = AdvancedMH.RWMH(d)\n\nMetropolisHastings{RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}}(RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}(ZeroMeanIsoNormal(\ndim: 10\nμ: Zeros(10)\nΣ: [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]\n)\n))\n\n\n\nsetprogress!(false)\n\nSampling is then as easy as:\n\nchain = sample(model, externalsampler(rwmh), 10_000)\n\nChains MCMC chain (10000×14×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 4.44 seconds\nCompute duration = 4.44 seconds\nparameters = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9]\ninternals = accepted, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.", "crumbs": [ "Get Started", "User Guide", "Using External Samplers" ] }, { "objectID": "usage/external-samplers/index.html#using-external-samplers-on-turing-models", "href": "usage/external-samplers/index.html#using-external-samplers-on-turing-models", "title": "Using External Samplers", "section": "", "text": "Turing provides several wrapped samplers from external sampling libraries, e.g., HMC samplers from AdvancedHMC. These wrappers allow new users to seamlessly sample statistical models without leaving Turing However, these wrappers might not always be complete, missing some functionality from the wrapped sampling library. Moreover, users might want to use samplers currently not wrapped within Turing.\nFor these reasons, Turing also makes running external samplers on Turing models easy without any necessary modifications or wrapping! Throughout, we will use a 10-dimensional Neal’s funnel as a running example::\n\n# Import libraries.\nusing Turing, Random, LinearAlgebra\n\nd = 10\n@model function funnel()\n θ ~ Truncated(Normal(0, 3), -3, 3)\n z ~ MvNormal(zeros(d - 1), exp(θ) * I)\n return x ~ MvNormal(z, I)\nend\n\nfunnel (generic function with 2 methods)\n\n\nNow we sample the model to generate some observations, which we can then condition on.\n\n(; x) = rand(funnel() | (θ=0,))\nmodel = funnel() | (; x);\n\nUsers can use any sampler algorithm to sample this model if it follows the AbstractMCMC API. Before discussing how this is done in practice, giving a high-level description of the process is interesting. Imagine that we created an instance of an external sampler that we will call spl such that typeof(spl)<:AbstractMCMC.AbstractSampler. In order to avoid type ambiguity within Turing, at the moment, it is necessary to declare spl as an external sampler to Turing espl = externalsampler(spl), where externalsampler(s::AbstractMCMC.AbstractSampler) is a Turing function that types our external sampler adequately.\nAn excellent point to start to show how this is done in practice is by looking at the sampling library AdvancedMH (AdvancedMH’s GitHub) for Metropolis-Hastings (MH) methods. Let’s say we want to use a random walk Metropolis-Hastings sampler without specifying the proposal distributions. The code below constructs an MH sampler using a multivariate Gaussian distribution with zero mean and unit variance in d dimensions as a random walk proposal.\n\n# Importing the sampling library\nusing AdvancedMH\nrwmh = AdvancedMH.RWMH(d)\n\nMetropolisHastings{RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}}(RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}(ZeroMeanIsoNormal(\ndim: 10\nμ: Zeros(10)\nΣ: [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]\n)\n))\n\n\n\nsetprogress!(false)\n\nSampling is then as easy as:\n\nchain = sample(model, externalsampler(rwmh), 10_000)\n\nChains MCMC chain (10000×14×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 4.44 seconds\nCompute duration = 4.44 seconds\nparameters = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9]\ninternals = accepted, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.", "crumbs": [ "Get Started", "User Guide", "Using External Samplers" ] }, { "objectID": "usage/external-samplers/index.html#going-beyond-the-turing-api", "href": "usage/external-samplers/index.html#going-beyond-the-turing-api", "title": "Using External Samplers", "section": "Going beyond the Turing API", "text": "Going beyond the Turing API\nAs previously mentioned, the Turing wrappers can often limit the capabilities of the sampling libraries they wrap. AdvancedHMC1 (AdvancedHMC’s GitHub) is a clear example of this. A common practice when performing HMC is to provide an initial guess for the mass matrix. However, the native HMC sampler within Turing only allows the user to specify the type of the mass matrix despite the two options being possible within AdvancedHMC. Thankfully, we can use Turing’s support for external samplers to define an HMC sampler with a custom mass matrix in AdvancedHMC and then use it to sample our Turing model.\nWe can use the library Pathfinder2 (Pathfinder’s GitHub) to construct our estimate of mass matrix. Pathfinder is a variational inference algorithm that first finds the maximum a posteriori (MAP) estimate of a target posterior distribution and then uses the trace of the optimisation to construct a sequence of multivariate normal approximations to the target distribution. In this process, Pathfinder computes an estimate of the mass matrix the user can access. You can see an example of how to use Pathfinder with Turing in Pathfinder’s docs.", "crumbs": [ "Get Started", "User Guide", "Using External Samplers" ] }, { "objectID": "usage/external-samplers/index.html#using-new-inference-methods", "href": "usage/external-samplers/index.html#using-new-inference-methods", "title": "Using External Samplers", "section": "Using new inference methods", "text": "Using new inference methods\nSo far we have used Turing’s support for external samplers to go beyond the capabilities of the wrappers. This is made possible by an interface for external samplers, which is described in the Turing.jl documentation here: if you are implementing your own sampler and would like it to work with Turing.jl models, that link describes the methods that you need to overload.\nFor an example of an ‘external sampler’ that works in this way with Turing, we recommend the SliceSampling.jl library. Note that although this library is hosted under the TuringLang GitHub organisation, it is not a Turing.jl dependency, and thus from Turing’s perspective it is truly an ‘external’ sampler.\nIn this section, we will briefly go through the interface requirements for external samplers. First and foremost, the sampler MySampler should be a subtype of AbstractMCMC.AbstractSampler. Second, the stepping function of the MCMC algorithm must be defined as new methods of AbstractMCMC.step following the structure below:\n# First step\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n model::AbstractMCMC.LogDensityModel,\n spl::MySampler;\n kwargs...,\n)\n [...]\n return transition, state\nend\n\n# N+1 step\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n model::AbstractMCMC.LogDensityModel,\n sampler::MySampler,\n state;\n kwargs...,\n) \n [...]\n return transition, state\nend\nNote that the model argument here must be an AbstractMCMC.LogDensityModel. This is a thin wrapper around an object which satisfies the LogDensityProblems.jl interface. Thus, in your external sampler, you can access the inner object with model.logdensity and call LogDensityProblems.logdensity(model.logdensity, params) to calculate the (unnormalised) log density of the model at params.\nAs shown above, there must be two step methods:\n\nA method that performs the first step, performing any initialisation it needs to; and\nA method that performs the following steps and takes an extra input, state, which carries the initialisation information.\n\nThe output of both of these methods must be a tuple containing: - a ‘transition’, which is essentially the ‘visible output’ of the sampler: this object is later used to construct an MCMCChains.Chains; - a ‘state’, representing the current state of the sampler, which is passed to the next step of the MCMC algorithm.\nApart from this, your sampler state should also implement Turing.Inference.getparams(model, transition) to return the parameters of the model as a vector. Here, transition represents the first output of the step function.\nfunction Turing.Inference.getparams(model::DynamicPPL.Model, state::MyTransition)\n # Return a vector containing the parameters of the model.\nend\nThese functions are the bare minimum that your external sampler must implement to work with Turing models. There are other methods which can be overloaded to improve the performance or other features of the sampler; please refer to the documentation linked above for more details.\nIn general, we recommend that the AbstractMCMC interface is implemented directly in your library. However, any DynamicPPL- or Turing-specific functionality is best implemented in a MySamplerTuringExt extension.", "crumbs": [ "Get Started", "User Guide", "Using External Samplers" ] }, { "objectID": "usage/external-samplers/index.html#footnotes", "href": "usage/external-samplers/index.html#footnotes", "title": "Using External Samplers", "section": "Footnotes", "text": "Footnotes\n\n\nXu et al., AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms, 2019↩︎\nZhang et al., Pathfinder: Parallel quasi-Newton variational inference, 2021↩︎", "crumbs": [ "Get Started", "User Guide", "Using External Samplers" ] }, { "objectID": "usage/sampler-visualisation/index.html", "href": "usage/sampler-visualisation/index.html", "title": "Sampler Visualization", "section": "", "text": "For each sampler, we will use the same code to plot sampler paths. The block below loads the relevant libraries and defines a function for plotting the sampler’s trajectory across the posterior.\nThe Turing model definition used here is not especially practical, but it is designed in such a way as to produce visually interesting posterior surfaces to show how different samplers move along the distribution.\n\nENV[\"GKS_ENCODING\"] = \"utf-8\" # Allows the use of unicode characters in Plots.jl\nusing Plots\nusing StatsPlots\nusing Turing\nusing Random\nusing Bijectors\n\n# Set a seed.\nRandom.seed!(0)\n\n# Define a strange model.\n@model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n bumps = sin(m) + cos(m)\n m = m + 5 * bumps\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\n return s², m\nend\n\n# Define our data points.\nx = [1.5, 2.0, 13.0, 2.1, 0.0]\n\n# Set up the model call, sample from the prior.\nmodel = gdemo(x)\n\n# Evaluate surface at coordinates.\nevaluate(m1, m2) = logjoint(model, (m=m2, s²=invlink.(Ref(InverseGamma(2, 3)), m1)))\n\nfunction plot_sampler(chain; label=\"\")\n # Extract values from chain.\n val = get(chain, [:s², :m, :logjoint])\n ss = link.(Ref(InverseGamma(2, 3)), val.s²)\n ms = val.m\n lps = val.logjoint\n\n # How many surface points to sample.\n granularity = 100\n\n # Range start/stop points.\n spread = 0.5\n σ_start = minimum(ss) - spread * std(ss)\n σ_stop = maximum(ss) + spread * std(ss)\n μ_start = minimum(ms) - spread * std(ms)\n μ_stop = maximum(ms) + spread * std(ms)\n σ_rng = collect(range(σ_start; stop=σ_stop, length=granularity))\n μ_rng = collect(range(μ_start; stop=μ_stop, length=granularity))\n\n # Make surface plot.\n p = surface(\n σ_rng,\n μ_rng,\n evaluate;\n camera=(30, 65),\n # ticks=nothing,\n colorbar=false,\n color=:inferno,\n title=label,\n )\n\n line_range = 1:length(ms)\n\n scatter3d!(\n ss[line_range],\n ms[line_range],\n lps[line_range];\n mc=:viridis,\n marker_z=collect(line_range),\n msw=0,\n legend=false,\n colorbar=false,\n alpha=0.5,\n xlabel=\"σ\",\n ylabel=\"μ\",\n zlabel=\"Log probability\",\n title=label,\n )\n\n return p\nend;\n\n\nsetprogress!(false)", "crumbs": [ "Get Started", "User Guide", "Sampler Visualization" ] }, { "objectID": "usage/sampler-visualisation/index.html#introduction", "href": "usage/sampler-visualisation/index.html#introduction", "title": "Sampler Visualization", "section": "", "text": "For each sampler, we will use the same code to plot sampler paths. The block below loads the relevant libraries and defines a function for plotting the sampler’s trajectory across the posterior.\nThe Turing model definition used here is not especially practical, but it is designed in such a way as to produce visually interesting posterior surfaces to show how different samplers move along the distribution.\n\nENV[\"GKS_ENCODING\"] = \"utf-8\" # Allows the use of unicode characters in Plots.jl\nusing Plots\nusing StatsPlots\nusing Turing\nusing Random\nusing Bijectors\n\n# Set a seed.\nRandom.seed!(0)\n\n# Define a strange model.\n@model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n bumps = sin(m) + cos(m)\n m = m + 5 * bumps\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\n return s², m\nend\n\n# Define our data points.\nx = [1.5, 2.0, 13.0, 2.1, 0.0]\n\n# Set up the model call, sample from the prior.\nmodel = gdemo(x)\n\n# Evaluate surface at coordinates.\nevaluate(m1, m2) = logjoint(model, (m=m2, s²=invlink.(Ref(InverseGamma(2, 3)), m1)))\n\nfunction plot_sampler(chain; label=\"\")\n # Extract values from chain.\n val = get(chain, [:s², :m, :logjoint])\n ss = link.(Ref(InverseGamma(2, 3)), val.s²)\n ms = val.m\n lps = val.logjoint\n\n # How many surface points to sample.\n granularity = 100\n\n # Range start/stop points.\n spread = 0.5\n σ_start = minimum(ss) - spread * std(ss)\n σ_stop = maximum(ss) + spread * std(ss)\n μ_start = minimum(ms) - spread * std(ms)\n μ_stop = maximum(ms) + spread * std(ms)\n σ_rng = collect(range(σ_start; stop=σ_stop, length=granularity))\n μ_rng = collect(range(μ_start; stop=μ_stop, length=granularity))\n\n # Make surface plot.\n p = surface(\n σ_rng,\n μ_rng,\n evaluate;\n camera=(30, 65),\n # ticks=nothing,\n colorbar=false,\n color=:inferno,\n title=label,\n )\n\n line_range = 1:length(ms)\n\n scatter3d!(\n ss[line_range],\n ms[line_range],\n lps[line_range];\n mc=:viridis,\n marker_z=collect(line_range),\n msw=0,\n legend=false,\n colorbar=false,\n alpha=0.5,\n xlabel=\"σ\",\n ylabel=\"μ\",\n zlabel=\"Log probability\",\n title=label,\n )\n\n return p\nend;\n\n\nsetprogress!(false)", "crumbs": [ "Get Started", "User Guide", "Sampler Visualization" ] }, { "objectID": "usage/sampler-visualisation/index.html#samplers", "href": "usage/sampler-visualisation/index.html#samplers", "title": "Sampler Visualization", "section": "Samplers", "text": "Samplers\n\nGibbs\nGibbs sampling tends to exhibit a “jittery” trajectory. The example below combines HMC and PG sampling to traverse the posterior.\n\nc = sample(model, Gibbs(:s² => HMC(0.01, 5), :m => PG(20)), 1000)\nplot_sampler(c)\n\n\n\n\n\n\nHMC\nHamiltonian Monte Carlo (HMC) sampling is a typical sampler to use, as it tends to be fairly good at converging in an efficient manner. It can often be tricky to set the correct parameters for this sampler however, and the NUTS sampler is often easier to run if you don’t want to spend too much time fiddling with step size and the number of steps to take. Note however that HMC does not explore the positive values μ very well, likely due to the leapfrog and step size parameter settings.\n\nc = sample(model, HMC(0.01, 10), 1000)\nplot_sampler(c)\n\n\n\n\n\n\nHMCDA\nThe HMCDA sampler is an implementation of the Hamiltonian Monte Carlo with Dual Averaging algorithm found in the paper “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” by Hoffman and Gelman (2011). The paper can be found on arXiv for the interested reader.\n\nc = sample(model, HMCDA(200, 0.65, 0.3), 1000)\nplot_sampler(c)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\n\n\n\n\n\n\n\nMH\nMetropolis-Hastings (MH) sampling is one of the earliest Markov Chain Monte Carlo methods. MH sampling does not “move” a lot, unlike many of the other samplers implemented in Turing. Typically a much longer chain is required to converge to an appropriate parameter estimate.\nThe plot below only uses 1,000 iterations of Metropolis-Hastings.\n\nc = sample(model, MH(), 1000)\nplot_sampler(c)\n\n\n\n\nAs you can see, the MH sampler doesn’t move parameter estimates very often.\n\n\nNUTS\nThe No U-Turn Sampler (NUTS) is an implementation of the algorithm found in the paper “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” by Hoffman and Gelman (2011). The paper can be found on arXiv for the interested reader.\nNUTS tends to be very good at traversing complex posteriors quickly.\n\nc = sample(model, NUTS(0.65), 1000)\nplot_sampler(c)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.05\n\n\n\n\n\n\n\nThe only parameter that needs to be set other than the number of iterations to run is the target acceptance rate. In the Hoffman and Gelman paper, they note that a target acceptance rate of 0.65 is typical.\nHere is a plot showing a very high acceptance rate. Note that it appears to “stick” to a mode and is not particularly good at exploring the posterior as compared to the 0.65 target acceptance ratio case.\n\nc = sample(model, NUTS(0.95), 1000)\nplot_sampler(c)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\n\n\n\n\n\nAn exceptionally low acceptance rate will show very few moves on the posterior:\n\nc = sample(model, NUTS(0.2), 1000)\nplot_sampler(c)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.8\n\n\n\n\n\n\n\n\n\nPG\nThe Particle Gibbs (PG) sampler is an implementation of an algorithm from the paper “Particle Markov chain Monte Carlo methods” by Andrieu, Doucet, and Holenstein (2010). The interested reader can learn more here.\nThe two parameters are the number of particles, and the number of iterations. The plot below shows the use of 20 particles.\n\nc = sample(model, PG(20), 1000)\nplot_sampler(c)\n\n\n\n\nNext, we plot using 50 particles.\n\nc = sample(model, PG(50), 1000)\nplot_sampler(c)", "crumbs": [ "Get Started", "User Guide", "Sampler Visualization" ] }, { "objectID": "faq/index.html", "href": "faq/index.html", "title": "Frequently Asked Questions", "section": "", "text": "This is a common source of confusion. In Turing.jl, you can only condition or fix expressions that explicitly appear on the left-hand side (LHS) of a ~ statement.\nFor example, if your model contains:\nx ~ filldist(Normal(), 2)\nYou cannot directly condition on x[2] using condition(model, @varname(x[2]) => 1.0) because x[2] never appears on the LHS of a ~ statement. Only x as a whole appears there.\nHowever, there is an important exception: when you use the broadcasting operator .~ with a univariate distribution, each element is treated as being separately drawn from that distribution, allowing you to condition on individual elements:\n@model function f1()\n x = Vector{Float64}(undef, 3)\n x .~ Normal() # Each element is a separate draw\nend\n\nm1 = f1() | (@varname(x[1]) => 1.0)\nsample(m1, NUTS(), 100) # This works!\nIn contrast, you cannot condition on parts of a multivariate distribution because it represents a single distribution over the entire vector:\n@model function f2()\n x = Vector{Float64}(undef, 3)\n x ~ MvNormal(zeros(3), I) # Single multivariate distribution\nend\n\nm2 = f2() | (@varname(x[1]) => 1.0)\nsample(m2, NUTS(), 100) # This doesn't work!\nThe key insight is that filldist creates a single distribution (not N independent distributions), which is why you cannot condition on individual elements. The distinction is not just about what appears on the LHS of ~, but whether you’re dealing with separate distributions (.~ with univariate) or a single distribution over multiple values (~ with multivariate or filldist).\nTo understand more about how Turing determines whether a variable is treated as random or observed, see:\n\nCore Functionality - basic explanation of the ~ notation and conditioning", "crumbs": [ "Get Started", "Frequently Asked Questions" ] }, { "objectID": "faq/index.html#why-is-this-variable-being-treated-as-random-instead-of-observed", "href": "faq/index.html#why-is-this-variable-being-treated-as-random-instead-of-observed", "title": "Frequently Asked Questions", "section": "", "text": "This is a common source of confusion. In Turing.jl, you can only condition or fix expressions that explicitly appear on the left-hand side (LHS) of a ~ statement.\nFor example, if your model contains:\nx ~ filldist(Normal(), 2)\nYou cannot directly condition on x[2] using condition(model, @varname(x[2]) => 1.0) because x[2] never appears on the LHS of a ~ statement. Only x as a whole appears there.\nHowever, there is an important exception: when you use the broadcasting operator .~ with a univariate distribution, each element is treated as being separately drawn from that distribution, allowing you to condition on individual elements:\n@model function f1()\n x = Vector{Float64}(undef, 3)\n x .~ Normal() # Each element is a separate draw\nend\n\nm1 = f1() | (@varname(x[1]) => 1.0)\nsample(m1, NUTS(), 100) # This works!\nIn contrast, you cannot condition on parts of a multivariate distribution because it represents a single distribution over the entire vector:\n@model function f2()\n x = Vector{Float64}(undef, 3)\n x ~ MvNormal(zeros(3), I) # Single multivariate distribution\nend\n\nm2 = f2() | (@varname(x[1]) => 1.0)\nsample(m2, NUTS(), 100) # This doesn't work!\nThe key insight is that filldist creates a single distribution (not N independent distributions), which is why you cannot condition on individual elements. The distinction is not just about what appears on the LHS of ~, but whether you’re dealing with separate distributions (.~ with univariate) or a single distribution over multiple values (~ with multivariate or filldist).\nTo understand more about how Turing determines whether a variable is treated as random or observed, see:\n\nCore Functionality - basic explanation of the ~ notation and conditioning", "crumbs": [ "Get Started", "Frequently Asked Questions" ] }, { "objectID": "faq/index.html#can-i-use-parallelism-threads-in-my-model", "href": "faq/index.html#can-i-use-parallelism-threads-in-my-model", "title": "Frequently Asked Questions", "section": "Can I use parallelism / threads in my model?", "text": "Can I use parallelism / threads in my model?\nYes, but with some important caveats:\n\n1. Parallel Sampling (Multiple Chains)\nTuring.jl fully supports sampling multiple chains in parallel:\n\nMultithreaded sampling: Use MCMCThreads() to run one chain per thread\nDistributed sampling: Use MCMCDistributed() for distributed computing\n\nSee the Core Functionality guide for examples.\n\n\n2. Threading Within Models\nUsing threads inside your model (e.g., Threads.@threads) requires more care. In particular, only threaded observe statements are safe to use; threaded assume statements can lead to crashes or incorrect results. Please see the Threadsafe Evaluation page for complete details.\n@model function f(y)\n x = Vector{Float64}(undef, length(y))\n Threads.@threads for i in eachindex(y)\n # This would be unsafe!\n # x[i] ~ Normal()\n # This is safe:\n y[i] ~ Normal()\n end\nend\n# If you have parallel tilde-statements or `@addlogprob!` in a model, \n# you must mark the model as threadsafe:\nmodel = setthreadsafe(f(y), true)\nImportant limitations:\n\nObserve statements: Generally safe to use in threaded loops\nAssume statements (sampling statements): Often crash unpredictably or produce incorrect results\nAD backend compatibility: Many AD backends don’t support threading. Check the multithreaded column in ADTests for compatibility", "crumbs": [ "Get Started", "Frequently Asked Questions" ] }, { "objectID": "faq/index.html#how-do-i-check-the-type-stability-of-my-turing-model", "href": "faq/index.html#how-do-i-check-the-type-stability-of-my-turing-model", "title": "Frequently Asked Questions", "section": "How do I check the type stability of my Turing model?", "text": "How do I check the type stability of my Turing model?\nType stability is crucial for performance. Check out:\n\nPerformance Tips - includes specific advice on type stability\nUse DynamicPPL.DebugUtils.model_warntype to check type stability of your model", "crumbs": [ "Get Started", "Frequently Asked Questions" ] }, { "objectID": "faq/index.html#how-do-i-debug-my-turing-model", "href": "faq/index.html#how-do-i-debug-my-turing-model", "title": "Frequently Asked Questions", "section": "How do I debug my Turing model?", "text": "How do I debug my Turing model?\nFor debugging both statistical and syntactical issues:\n\nTroubleshooting Guide - common errors and their solutions\nFor more advanced debugging, DynamicPPL provides the DynamicPPL.DebugUtils module for inspecting model internals", "crumbs": [ "Get Started", "Frequently Asked Questions" ] }, { "objectID": "faq/index.html#what-are-the-main-differences-between-turing-bugs-and-stan-syntax", "href": "faq/index.html#what-are-the-main-differences-between-turing-bugs-and-stan-syntax", "title": "Frequently Asked Questions", "section": "What are the main differences between Turing, BUGS, and Stan syntax?", "text": "What are the main differences between Turing, BUGS, and Stan syntax?\nKey syntactic differences include:\n\nParameter blocks: Stan requires explicit data, parameters, and model blocks. In Turing, everything is defined within the @model macro\nVariable declarations: Stan requires upfront type declarations in parameter blocks. Turing infers types from the sampling statements\nTransformed data: Stan has a transformed data block for preprocessing. In Turing, data transformations should be done before defining the model\nGenerated quantities: Stan has a generated quantities block. In Turing, use the approach described in Tracking Extra Quantities\n\nExample comparison:\n// Stan\ndata {\n real y;\n}\nparameters {\n real mu;\n real<lower=0> sigma;\n}\nmodel {\n mu ~ normal(0, 1);\n sigma ~ normal(0, 1);\n y ~ normal(mu, sigma);\n}\n# Turing\n@model function my_model(y)\n mu ~ Normal(0, 1)\n sigma ~ truncated(Normal(0, 1); lower=0)\n y ~ Normal(mu, sigma)\nend", "crumbs": [ "Get Started", "Frequently Asked Questions" ] }, { "objectID": "faq/index.html#which-automatic-differentiation-backend-should-i-use", "href": "faq/index.html#which-automatic-differentiation-backend-should-i-use", "title": "Frequently Asked Questions", "section": "Which automatic differentiation backend should I use?", "text": "Which automatic differentiation backend should I use?\nThe choice of AD backend can significantly impact performance. See:\n\nAutomatic Differentiation Guide - comprehensive comparison of ForwardDiff, Mooncake, ReverseDiff, and other backends\nPerformance Tips - quick guide on choosing backends\nAD Backend Benchmarks - performance comparisons across various models", "crumbs": [ "Get Started", "Frequently Asked Questions" ] }, { "objectID": "faq/index.html#i-changed-one-line-of-my-model-and-now-its-so-much-slower-why", "href": "faq/index.html#i-changed-one-line-of-my-model-and-now-its-so-much-slower-why", "title": "Frequently Asked Questions", "section": "I changed one line of my model and now it’s so much slower; why?", "text": "I changed one line of my model and now it’s so much slower; why?\nSmall changes can have big performance impacts. Common culprits include:\n\nType instability introduced by the change\nSwitching from vectorised to scalar operations (or vice versa)\nInadvertently causing AD backend incompatibilities\nBreaking assumptions that allowed compiler optimizations\n\nSee our Performance Tips and Troubleshooting Guide for debugging performance regressions.", "crumbs": [ "Get Started", "Frequently Asked Questions" ] }, { "objectID": "usage/modifying-logprob/index.html", "href": "usage/modifying-logprob/index.html", "title": "Modifying the Log Probability", "section": "", "text": "Turing accumulates log probabilities internally in an internal data structure that is accessible through the internal variable __varinfo__ within the model definition. To avoid users having to deal with internal data structures, Turing provides the @addlogprob! macro which increases the accumulated log probability. For instance, this allows you to include arbitrary terms in the likelihood\n\nusing Turing\n\nmyloglikelihood(x, μ) = loglikelihood(Normal(μ, 1), x)\n\n@model function demo(x)\n μ ~ Normal()\n @addlogprob! myloglikelihood(x, μ)\nend\n\ndemo (generic function with 2 methods)\n\n\nand to force a sampler to reject a sample:\n\nusing Turing\nusing LinearAlgebra\n\n@model function demo(x)\n m ~ MvNormal(zero(x), I)\n if dot(m, x) < 0\n @addlogprob! -Inf\n # Exit the model evaluation early\n return nothing\n end\n\n x ~ MvNormal(m, I)\n return nothing\nend\n\ndemo (generic function with 2 methods)\n\n\nNote that @addlogprob! (p::Float64) adds p to the log likelihood. If instead you want to add to the log prior, you can use\n\n@addlogprob! (; logprior=value_goes_here)\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "User Guide", "Modifying the Log Probability" ] }, { "objectID": "usage/automatic-differentiation/index.html", "href": "usage/automatic-differentiation/index.html", "title": "Automatic Differentiation", "section": "", "text": "Automatic differentiation (AD) is a technique used in Turing.jl to evaluate the gradient of a function at a given set of arguments. In the context of Turing.jl, the function being differentiated is the log probability density of a model, and the arguments are the parameters of the model (i.e. the values of the random variables). The gradient of the log probability density is used by various algorithms in Turing.jl, such as HMC (including NUTS), mode estimation (which uses gradient-based optimisation), and variational inference.\nThe Julia ecosystem has a number of AD libraries. You can switch between these using the unified ADTypes.jl interface, which for a given AD backend, provides types such as AutoBackend (see the documentation for more details). For example, to use the Mooncake.jl package for AD, you can run the following:\n\n# Turing re-exports AutoEnzyme, AutoForwardDiff, AutoReverseDiff, and AutoMooncake.\n# Other ADTypes must be explicitly imported from ADTypes.jl or\n# DifferentiationInterface.jl.\nusing Turing\nsetprogress!(false)\n\n# Note that if you specify a custom AD backend, you must also import it.\nimport Mooncake\n\n@model function f()\n x ~ Normal()\n # Rest of your model here\nend\n\nsample(f(), HMC(0.1, 5; adtype=AutoMooncake()), 100)\n\n\n[ Info: [Turing]: progress logging is disabled globally\n\n\n\n\nChains MCMC chain (100×13×1 Array{Union{Missing, Float64}, 3}):\n\nIterations = 1:1:100\nNumber of chains = 1\nSamples per chain = 100\nWall duration = 59.16 seconds\nCompute duration = 59.16 seconds\nparameters = x\ninternals = logprior, loglikelihood, logjoint, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nBy default, if you do not specify a backend, Turing will default to ForwardDiff.jl. In this case, you do not need to import ForwardDiff, as it is already a dependency of Turing.", "crumbs": [ "Get Started", "User Guide", "Automatic Differentiation" ] }, { "objectID": "usage/automatic-differentiation/index.html#what-is-automatic-differentiation", "href": "usage/automatic-differentiation/index.html#what-is-automatic-differentiation", "title": "Automatic Differentiation", "section": "", "text": "Automatic differentiation (AD) is a technique used in Turing.jl to evaluate the gradient of a function at a given set of arguments. In the context of Turing.jl, the function being differentiated is the log probability density of a model, and the arguments are the parameters of the model (i.e. the values of the random variables). The gradient of the log probability density is used by various algorithms in Turing.jl, such as HMC (including NUTS), mode estimation (which uses gradient-based optimisation), and variational inference.\nThe Julia ecosystem has a number of AD libraries. You can switch between these using the unified ADTypes.jl interface, which for a given AD backend, provides types such as AutoBackend (see the documentation for more details). For example, to use the Mooncake.jl package for AD, you can run the following:\n\n# Turing re-exports AutoEnzyme, AutoForwardDiff, AutoReverseDiff, and AutoMooncake.\n# Other ADTypes must be explicitly imported from ADTypes.jl or\n# DifferentiationInterface.jl.\nusing Turing\nsetprogress!(false)\n\n# Note that if you specify a custom AD backend, you must also import it.\nimport Mooncake\n\n@model function f()\n x ~ Normal()\n # Rest of your model here\nend\n\nsample(f(), HMC(0.1, 5; adtype=AutoMooncake()), 100)\n\n\n[ Info: [Turing]: progress logging is disabled globally\n\n\n\n\nChains MCMC chain (100×13×1 Array{Union{Missing, Float64}, 3}):\n\nIterations = 1:1:100\nNumber of chains = 1\nSamples per chain = 100\nWall duration = 59.16 seconds\nCompute duration = 59.16 seconds\nparameters = x\ninternals = logprior, loglikelihood, logjoint, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nBy default, if you do not specify a backend, Turing will default to ForwardDiff.jl. In this case, you do not need to import ForwardDiff, as it is already a dependency of Turing.", "crumbs": [ "Get Started", "User Guide", "Automatic Differentiation" ] }, { "objectID": "usage/automatic-differentiation/index.html#choosing-an-ad-backend", "href": "usage/automatic-differentiation/index.html#choosing-an-ad-backend", "title": "Automatic Differentiation", "section": "Choosing an AD Backend", "text": "Choosing an AD Backend\nThere are two aspects to choosing an AD backend: firstly, what backends are available; and secondly, which backend is best for your model.\n\nUsable AD Backends\nTuring.jl uses the functionality in DifferentiationInterface.jl (‘DI’) to interface with AD libraries in a unified way. In principle, any AD library that DI provides an interface for can be used with Turing; you should consult the DI documentation for an up-to-date list of compatible AD libraries.\nNote, however, that not all AD libraries in there are thoroughly tested on Turing models. Thus, it is possible that some of them will either error (because they don’t know how to differentiate through Turing’s code), or maybe even silently give incorrect results (if you are very unlucky). Turing is most extensively tested with ForwardDiff.jl (the default), Enzyme.jl, Mooncake.jl, and ReverseDiff.jl.\n\n\n\n\n\n\nNoteGradient preparation\n\n\n\nUsers of DifferentiationInterface.jl will have seen that it provides functions such as prepare_gradient, which allow you to perform a one-time setup to make subsequent gradient computations faster. Turing will automatically perform gradient preparation for you when calling functions such as sample or optimize, so you do not need to worry about this step.\n\n\n\n\nADTests\nBefore describing how to choose the best AD backend for your model, we should mention that we also publish a table of benchmarks for various models and AD backends in the ADTests website. These models aim to capture a variety of different features of Turing.jl and Julia in general, so that you can see which AD backends may be compatible with your model. Benchmarks are also included, although it should be noted that many of the models in ADTests are small and thus the timings may not be representative of larger, real-life models.\nIf you have suggestions for other models to include, please do let us know by creating an issue on GitHub!\n\n\nThe Best AD Backend for Your Model\nGiven the number of possible backends, how do you choose the best one for your model?\nA simple heuristic is to look at the number of parameters in your model. The log density of the model, i.e. the function being differentiated, is a function that goes from \\(\\mathbb{R}^n \\to \\mathbb{R}\\), where \\(n\\) is the number of parameters in your model. For models with a small number of parameters (say up to ~ 20), forward-mode AD (e.g. ForwardDiff) is generally faster due to a smaller overhead. On the other hand, for models with a large number of parameters, reverse-mode AD (e.g. ReverseDiff or Mooncake) is generally faster as it computes the gradients with respect to all parameters in a single pass.\nThe most exact way to ensure you are using the fastest AD that works for your problem is to benchmark them using the functionality in DynamicPPL (see the API documentation):\n\nusing ADTypes\nusing DynamicPPL.TestUtils.AD: run_ad, ADResult\nusing ForwardDiff, ReverseDiff\n\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n return y ~ Normal(m, sqrt(s²))\nend\nmodel = gdemo(1.5, 2)\n\nfor adtype in [AutoForwardDiff(), AutoReverseDiff()]\n result = run_ad(model, adtype; benchmark=true)\n @show result.grad_time / result.primal_time\nend\n\n\n[ Info: Running AD on gdemo with ADTypes.AutoForwardDiff()\n params : [1.4828467993120018, -0.08736551392635497]\n actual : (-7.211879923233122, [-2.0376748966366827, 0.8539604425994357])\n expected : (-7.211879923233122, [-2.0376748966366827, 0.8539604425994357])\n evaluation : 98.956 ns (2 allocs: 64 bytes)\n gradient : 190.762 ns (5 allocs: 208 bytes)\n grad / eval : 1.928\nresult.grad_time / result.primal_time = 1.927751882984\n[ Info: Running AD on gdemo with ADTypes.AutoReverseDiff()\n params : [0.4680014534183954, 0.161570321393336]\n actual : (-5.703772397883004, [0.006176288640820982, 1.8883326389888757])\n expected : (-5.703772397883004, [0.006176288640821093, 1.8883326389888757])\n evaluation : 98.666 ns (2 allocs: 64 bytes)\n gradient : 14.728 μs (199 allocs: 6.953 KiB)\n grad / eval : 149.3\nresult.grad_time / result.primal_time = 149.2669063516521\n\n\n\n\nIn this specific instance, ForwardDiff is clearly faster (due to the small size of the model).\n\n\n\n\n\n\nNoteA note about ReverseDiff’s compile argument\n\n\n\nThe additional keyword argument compile=true for AutoReverseDiff specifies whether to pre-record the tape only once and reuse it later. By default, this is set to false, which means no pre-recording. Setting compile=true can substantially improve performance, but risks silently incorrect results if not used with care. Pre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.", "crumbs": [ "Get Started", "User Guide", "Automatic Differentiation" ] }, { "objectID": "usage/automatic-differentiation/index.html#compositional-sampling-with-differing-ad-modes", "href": "usage/automatic-differentiation/index.html#compositional-sampling-with-differing-ad-modes", "title": "Automatic Differentiation", "section": "Compositional Sampling with Differing AD Modes", "text": "Compositional Sampling with Differing AD Modes\nWhen using Gibbs sampling, Turing also supports mixed automatic differentiation methods for different variable spaces. The following snippet shows how one can use ForwardDiff to sample the mean (m) parameter, and ReverseDiff for the variance (s) parameter:\n\nusing Turing\nusing ReverseDiff\n\n# Sample using Gibbs and varying autodiff backends.\nc = sample(\n gdemo(1.5, 2),\n Gibbs(\n :m => HMC(0.1, 5; adtype=AutoForwardDiff()),\n :s² => HMC(0.1, 5; adtype=AutoReverseDiff()),\n ),\n 1000,\n progress=false,\n)\n\nChains MCMC chain (1000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 6.06 seconds\nCompute duration = 6.06 seconds\nparameters = s², m\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.", "crumbs": [ "Get Started", "User Guide", "Automatic Differentiation" ] }, { "objectID": "usage/predictive-distributions/index.html", "href": "usage/predictive-distributions/index.html", "title": "Predictive Distributions", "section": "", "text": "Standard MCMC sampling methods return values of the parameters of the model. However, it is often also useful to generate new data points using the model, given a distribution of the parameters. Turing.jl allows you to do this using the predict function, along with conditioning syntax.\nConsider the following simple model, where we observe some normally-distributed data X and want to learn about its mean m.\nusing Turing\n@model function f(N)\n m ~ Normal()\n X ~ filldist(Normal(m), N)\nend\n\nf (generic function with 2 methods)\nNotice first how we have not specified X as an argument to the model. This allows us to use Turing’s conditioning syntax to specify whether we want to provide observed data or not.\n# Generate some synthetic data\nN = 5\ntrue_m = 3.0\nX = rand(Normal(true_m), N)\n\n# Instantiate the model with observed data\nmodel = f(N) | (; X = X)\n\n# Sample from the posterior\nchain = sample(model, NUTS(), 1_000; progress=false)\nmean(chain[:m])\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.80078125\n\n\n\n\n2.588777482803658", "crumbs": [ "Get Started", "User Guide", "Predictive Distributions" ] }, { "objectID": "usage/predictive-distributions/index.html#posterior-predictive-distribution", "href": "usage/predictive-distributions/index.html#posterior-predictive-distribution", "title": "Predictive Distributions", "section": "Posterior predictive distribution", "text": "Posterior predictive distribution\nchain[:m] now contains samples from the posterior distribution of m. If we use these samples of the parameters to generate new data points, we obtain the posterior predictive distribution. Statistically, this is defined as\n\\[\np(\\tilde{x} | \\mathbf{X}) = \\int p(\\tilde{x} | \\theta) p(\\theta | \\mathbf{X}) d\\theta,\n\\]\nwhere \\(\\tilde{x}\\) are the new data which you wish to draw, \\(\\theta\\) are the model parameters, and \\(\\mathbf{X}\\) are the observed data. \\(p(\\tilde{x} | \\theta)\\) is the distribution of the new data given the parameters, which is specified in the Turing.jl model (the X ~ ... line); and \\(p(\\theta | \\mathbf{X})\\) is the posterior distribution, as given by the Markov chain.\nTo obtain samples of \\(\\tilde{x}\\), we need to first remove the observed data from the model (or ‘decondition’ the model). This means that when the model is evaluated, it will sample a new value for X. If you don’t decondition the model, then X will remain fixed to the observed data, and no new samples will be generated.\n\npredictive_model = decondition(model)\n\nDynamicPPL.Model{typeof(f), (:N,), (), (), Tuple{Int64}, Tuple{}, DynamicPPL.DefaultContext, false}(f, (N = 5,), NamedTuple(), DynamicPPL.DefaultContext())\n\n\n\n\n\n\n\n\nTipSelective deconditioning\n\n\n\nIf you only want to decondition a single variable X, you can use decondition(model, @varname(X)).\n\n\nTo demonstrate how this deconditioned model can generate new data, we can fix the value of m to be its mean and evaluate the model:\n\npredictive_model_with_mean_m = predictive_model | (; m = mean(chain[:m]))\nrand(predictive_model_with_mean_m)\n\n(X = [1.740090262677792, 1.7808565708002924, 3.944837785134824, 2.9235136985944625, 1.8442284825075173],)\n\n\nThis has given us a single sample of X given the mean value of m. Of course, to take our Bayesian uncertainty into account, we want to use the full posterior distribution of m, not just its mean. To do so, we use predict, which effectively does the same as above but for every sample in the chain:\n\npredictive_samples = predict(predictive_model, chain)\n\nChains MCMC chain (1000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nparameters = X[1], X[2], X[3], X[4], X[5]\ninternals = \n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\n\n\n\n\n\nTipReproducibility\n\n\n\npredict, like many other Julia functions that involve randomness, takes an optional rng as its first argument. This controls the generation of new X samples, and makes your results reproducible.\n\n\n\n\n\n\n\n\nNote\n\n\n\npredict returns a Chains object itself, which will only contain the newly predicted variables. If you want to also retain the original parameters, you can use predict(rng, predictive_model, chain; include_all=true).\n\n\nWe can visualise the predictive distribution by combining all the samples and making a density plot:\n\nusing StatsPlots: density, density!, vline!\n\npredicted_X = vcat([predictive_samples[Symbol(\"X[$i]\")] for i in 1:N]...)\ndensity(predicted_X, label=\"Posterior predictive\")\n\n\n\n\nDepending on your data, you may naturally want to create different visualisations. For example, perhaps X contains some time-series data, in which case you can plot each prediction individually as a line against time.", "crumbs": [ "Get Started", "User Guide", "Predictive Distributions" ] }, { "objectID": "usage/predictive-distributions/index.html#prior-predictive-distribution", "href": "usage/predictive-distributions/index.html#prior-predictive-distribution", "title": "Predictive Distributions", "section": "Prior predictive distribution", "text": "Prior predictive distribution\nAlternatively, if we use the prior distribution of the parameters \\(p(\\theta)\\), we obtain the prior predictive distribution:\n\\[\np(\\tilde{x}) = \\int p(\\tilde{x} | \\theta) p(\\theta) d\\theta,\n\\]\nIn an exactly analogous fashion to above, you could sample from the prior distribution of the conditioned model, and then pass that to predict:\n\nprior_params = sample(model, Prior(), 1_000; progress=false)\nprior_predictive_samples = predict(predictive_model, prior_params)\n\nChains MCMC chain (1000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nparameters = X[1], X[2], X[3], X[4], X[5]\ninternals = \n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nIn fact there is a simpler way: you can directly sample from the deconditioned model, using Turing’s Prior sampler. This will, in a single call, generate prior samples for both the parameters as well as the new data.\n\nprior_predictive_samples = sample(predictive_model, Prior(), 1_000; progress=false)\n\nChains MCMC chain (1000×9×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 0.09 seconds\nCompute duration = 0.09 seconds\nparameters = m, X[1], X[2], X[3], X[4], X[5]\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nWe can visualise the prior predictive distribution in the same way as before. Let’s compare the two predictive distributions:\n\nprior_predicted_X = vcat([prior_predictive_samples[Symbol(\"X[$i]\")] for i in 1:N]...)\ndensity(prior_predicted_X, label=\"Prior predictive\")\ndensity!(predicted_X, label=\"Posterior predictive\")\nvline!([true_m], label=\"True mean\", linestyle=:dash, color=:black)\n\n\n\n\nWe can see here that the prior predictive distribution is:\n\nWider than the posterior predictive distribution;\nCentred on the prior mean of m (which is 0), rather than the posterior mean (which is close to the true mean of 3).\n\nBoth of these are because the posterior predictive distribution has been informed by the observed data.", "crumbs": [ "Get Started", "User Guide", "Predictive Distributions" ] }, { "objectID": "usage/tracking-extra-quantities/index.html", "href": "usage/tracking-extra-quantities/index.html", "title": "Tracking Extra Quantities", "section": "", "text": "Often, there are quantities in models that we might be interested in viewing the values of, but which are not random variables in the model that are explicitly drawn from a distribution.\nAs a motivating example, the most natural parameterisation for a model might not be the most computationally feasible. Consider the following (efficiently reparametrized) implementation of Neal’s funnel (Neal, 2003):\nusing Turing\nsetprogress!(false)\n\n@model function Neal()\n # Raw draws\n y_raw ~ Normal(0, 1)\n x_raw ~ arraydist([Normal(0, 1) for i in 1:9])\n\n # Transform:\n y = 3 * y_raw\n x = exp.(y ./ 2) .* x_raw\n return nothing\nend\n\n\n[ Info: [Turing]: progress logging is disabled globally\n\n\n\n\nNeal (generic function with 2 methods)\nIn this case, the random variables exposed in the chain (x_raw, y_raw) are not in a helpful form — what we’re after are the deterministically transformed variables x and y.\nThere are two ways to track these extra quantities in Turing.jl.", "crumbs": [ "Get Started", "User Guide", "Tracking Extra Quantities" ] }, { "objectID": "usage/tracking-extra-quantities/index.html#using-during-inference", "href": "usage/tracking-extra-quantities/index.html#using-during-inference", "title": "Tracking Extra Quantities", "section": "Using := (during inference)", "text": "Using := (during inference)\nThe first way is to use the := operator, which behaves exactly like = except that the values of the variables on its left-hand side are automatically added to the chain returned by the sampler. For example:\n\n@model function Neal_coloneq()\n # Raw draws\n y_raw ~ Normal(0, 1)\n x_raw ~ arraydist([Normal(0, 1) for i in 1:9])\n\n # Transform:\n y := 3 * y_raw\n x := exp.(y ./ 2) .* x_raw\nend\n\nsample(Neal_coloneq(), NUTS(), 1000)\n\n\n┌ Info: Found initial step size\n└ ϵ = 1.6\n\n\n\n\nChains MCMC chain (1000×34×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 7.24 seconds\nCompute duration = 7.24 seconds\nparameters = y_raw, x_raw[1], x_raw[2], x_raw[3], x_raw[4], x_raw[5], x_raw[6], x_raw[7], x_raw[8], x_raw[9], y, x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8], x[9]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.", "crumbs": [ "Get Started", "User Guide", "Tracking Extra Quantities" ] }, { "objectID": "usage/tracking-extra-quantities/index.html#using-returned-post-inference", "href": "usage/tracking-extra-quantities/index.html#using-returned-post-inference", "title": "Tracking Extra Quantities", "section": "Using returned (post-inference)", "text": "Using returned (post-inference)\nAlternatively, one can specify the extra quantities as part of the model function’s return statement:\n\n@model function Neal_return()\n # Raw draws\n y_raw ~ Normal(0, 1)\n x_raw ~ arraydist([Normal(0, 1) for i in 1:9])\n\n # Transform and return as a NamedTuple\n y = 3 * y_raw\n x = exp.(y ./ 2) .* x_raw\n return (x=x, y=y)\nend\n\nchain = sample(Neal_return(), NUTS(), 1000)\n\n\n┌ Info: Found initial step size\n└ ϵ = 1.6\n\n\n\n\nChains MCMC chain (1000×24×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 2.0 seconds\nCompute duration = 2.0 seconds\nparameters = y_raw, x_raw[1], x_raw[2], x_raw[3], x_raw[4], x_raw[5], x_raw[6], x_raw[7], x_raw[8], x_raw[9]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nThe sampled chain does not contain x and y, but we can extract the values using the returned function. Calling this function outputs an array:\n\nnts = returned(Neal_return(), chain)\n\n1000×1 Matrix{@NamedTuple{x::Vector{Float64}, y::Float64}}:\n (x = [16.093042430164353, 17.537940461754452, -14.937628740833924, 3.938356702613503, -12.298725116878712, -22.968408524442165, 1.2568606968090792, -0.42515752519040906, 21.539139902005655], y = 5.190553755757926)\n (x = [-0.017931242411617684, -0.27170842246567734, -0.026765756446959763, -0.22023422890720115, 0.014329447824711268, 0.13050101291028926, -0.20799943604336366, -0.16066679708652828, 0.17919845716541374], y = -3.785535625933215)\n (x = [0.16019921438789686, -5.825304693417111, 4.289531766136059, 3.1455763635888294, 5.458641061284198, -0.45423945083430556, 1.3420065475514327, 11.071383271053358, -5.048144064554042], y = 4.315337625380619)\n (x = [0.009140050951862884, 0.04469384684601337, -0.024523254069064682, -0.0006778603195849657, -0.033467017175372704, 0.00018919606255929567, 0.015039540120257645, -0.055561447286951594, 0.0034524591819524975], y = -6.1348006735538965)\n (x = [-2.1504026205280957, -3.5506959935240263, 3.2773198077443486, 2.545944124553593, 3.3614444618107457, -0.9070586519772262, -0.5350953976227709, 4.798970756981464, 5.113472757718003], y = 2.697455727483043)\n (x = [0.11555247180665768, 0.23185805094014644, -0.14526488057329218, -0.1646926623660305, -0.12739168515125368, 0.06276846492657225, 0.05162970646819434, -0.27569830394037254, -0.122124119706429], y = -3.112702486716184)\n (x = [-5.544207611793742, -4.368682817535509, 4.27464553992173, 4.249327805289077, 1.4815400194261001, -1.0363169755383876, -3.3828972784693447, 6.083384758152638, 3.1484611790914268], y = 2.8908326370678674)\n (x = [-6.327601389734798, -5.700007850179025, -2.30152230301352, 2.6624105363513855, -5.5990448182824215, -5.313742033639329, 4.337502947284694, 1.1811766466313658, 6.258709524625601], y = 2.6475948565766236)\n (x = [-1.4060402117178088, 0.777029787245751, 1.909660735207402, 1.0562382349072303, -1.0462096787740782, -1.7245800508058033, 0.6728153564631478, 0.8110578674355746, -1.7590494427234564], y = 1.134795727976113)\n (x = [1.401707916935463, 0.507292281892993, -1.8299327792432754, -1.9110637486405955, -1.3532407713520387, 0.7988072055293202, -0.20236432982286529, 2.218731860531455, 0.8775702346513833], y = 0.5085181322738653)\n ⋮\n (x = [0.32272377619135934, 0.13952017746950535, -0.2530024929518929, 0.4171735746440024, -0.7088857266255505, 0.0764196818508393, -0.05611835583463498, -0.1367835653450475, 0.17469218376144405], y = -1.8473913478854909)\n (x = [-0.09803761336379173, 0.37917581628915087, 0.14932065194587854, -0.07513413016792236, -0.11616701929728064, -0.3656238559055986, -0.544353424815364, 0.19334422631290035, 0.5113754731049156], y = -2.871634508900776)\n (x = [-5.601496857959673, -2.2150004572789275, 2.6232636650427787, 0.6438809891668956, -6.087311573624611, -1.444393295917136, -1.7742535465123495, 3.7029484770887686, -7.115133010281724], y = 2.7333536699234355)\n (x = [1.1615887843029173, 0.30459923514960646, 0.7100125486124826, -1.5448262616849506, 1.0584100329123864, 0.8188429395976751, 1.2291806989360756, 2.5508449307596015, 0.6720022745875682], y = 0.6018175551098451)\n (x = [-0.6906896936261838, 0.44588672061996026, -0.22668662687049115, -0.4705053370288234, -0.128491659310193, 0.4405930563549582, -0.22021571814212146, 0.9570140872766507, 0.2924639501948545], y = -1.597481981283457)\n (x = [-0.6065911924071316, 0.2791018230805245, -0.7429087308889564, -0.4303056030189477, -0.5098929294582188, 0.35840024100637374, -0.6288623930054859, 1.1895062586618474, -0.2658132040801591], y = -1.3002735119614013)\n (x = [-0.43154003540910485, -0.11328103154620067, -0.2106260313652945, 0.4669560370095127, 0.08433950136764555, 0.009045088345351981, -0.0004959309662598563, 0.5726081893085772, -0.17638438661851116], y = -1.912397017886154)\n (x = [-0.6983008438887204, 1.7852508408694736, 1.6594974610325648, -0.7363266078602845, 1.2470462357212786, 0.17378623790926034, 0.6275809389156121, 0.8486538114011325, -0.8541912155584325], y = 0.6250041969905122)\n (x = [-0.6660269087908998, -0.2853816762102294, 1.1155638725576782, 0.4368376350816759, 0.3751570361904651, -0.05333299977628082, 0.5438765307794208, 0.13146253462016977, -1.7091846207505637], y = -0.7893667292854132)\n\n\nwhere each element of which is a NamedTuple, as specified in the return statement of the model.\n\nnts[1]\n\n(x = [16.093042430164353, 17.537940461754452, -14.937628740833924, 3.938356702613503, -12.298725116878712, -22.968408524442165, 1.2568606968090792, -0.42515752519040906, 21.539139902005655], y = 5.190553755757926)", "crumbs": [ "Get Started", "User Guide", "Tracking Extra Quantities" ] }, { "objectID": "usage/tracking-extra-quantities/index.html#which-to-use", "href": "usage/tracking-extra-quantities/index.html#which-to-use", "title": "Tracking Extra Quantities", "section": "Which to use?", "text": "Which to use?\nThere are some pros and cons of using returned, as opposed to :=.\nFirstly, returned is more flexible, as it allows you to track any type of object; := only works with variables that can be inserted into an MCMCChains.Chains object. (Notice that x is a vector, and in the first case where we used :=, reconstructing the vector value of x can also be rather annoying as the chain stores each individual element of x separately.)\nA drawback is that naively using returned can lead to unnecessary computation during inference. This is because during the sampling process, the return values are also calculated (since they are part of the model function), but then thrown away. So, if the extra quantities are expensive to compute, this can be a problem.\nTo avoid this, you will essentially have to create two different models, one for inference and one for post-inference. The simplest way of doing this is to add a parameter to the model argument:\n\n@model function Neal_coloneq_optional(track::Bool)\n # Raw draws\n y_raw ~ Normal(0, 1)\n x_raw ~ arraydist([Normal(0, 1) for i in 1:9])\n\n if track\n y = 3 * y_raw\n x = exp.(y ./ 2) .* x_raw\n return (x=x, y=y)\n else\n return nothing\n end\nend\n\nchain = sample(Neal_coloneq_optional(false), NUTS(), 1000)\n\n\n┌ Info: Found initial step size\n└ ϵ = 1.6062500000000002\n\n\n\n\nChains MCMC chain (1000×24×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 1.51 seconds\nCompute duration = 1.51 seconds\nparameters = y_raw, x_raw[1], x_raw[2], x_raw[3], x_raw[4], x_raw[5], x_raw[6], x_raw[7], x_raw[8], x_raw[9]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nThe above ensures that x and y are not calculated during inference, but allows us to still use returned to extract them:\n\nreturned(Neal_coloneq_optional(true), chain)\n\n1000×1 Matrix{@NamedTuple{x::Vector{Float64}, y::Float64}}:\n (x = [-1.992199453600517, -1.6800725076901717, 1.8690575185515594, -1.3763130261283558, 1.0513453868734934, -0.5972771033451459, -0.5608474638002378, 1.3254698580597148, -0.9161380486387226], y = 0.6909209647736932)\n (x = [3.3450404871932964, 0.6737860485094308, 1.192972377605166, 2.344637887183635, 0.9254057225222564, 1.7922796925061997, -0.260085728019659, -2.0472919116373145, 2.3537870133142595], y = 1.2924382451258005)\n (x = [-0.21850636211054456, -0.017020060633758785, 0.04860053848247633, -0.01372136117982968, -0.07341707562777142, -0.10194688332510131, -0.1769467897652722, 0.03233953715070416, -0.31218359157061243], y = -4.423104224959675)\n (x = [6.794484719989526, 7.378427998272473, -8.074983046140371, 6.928987418952442, -8.867675250371, -1.2495532556036226, 11.290897285810924, -6.784964678901296, -6.4302993641223285], y = 5.30582308989325)\n (x = [9.558281053993804, -30.595542441846945, 8.169240303363564, -14.528984430522495, -36.17113872784534, 13.996953979285053, 11.968769272342742, 28.673919577115004, -2.6053967512324214], y = 6.510134436075642)\n (x = [-3.5220813297960527, -10.629982537148523, 13.439888875470425, -18.318895069160277, -7.813721093467511, 4.54816436892008, -0.9964435953497641, 1.2677397078350698, -9.71746133274626], y = 4.727038499366014)\n (x = [-4.1710593633599995, -1.8816641348731384, 6.1710220271068446, -4.447577206115633, -0.5814571986292438, 3.5398650200754376, -5.007885306008143, 0.446764261802381, -1.9638820095538314], y = 2.5289043888111697)\n (x = [-0.13428729436072756, -0.32999621089857484, -0.4393746680354193, -0.47228308573561306, -0.17362029153498923, -0.3274191643634486, 0.21017331066330236, 0.09050347016568389, 0.13297048301446346], y = -3.1185833658321065)\n (x = [0.3414543236481031, -0.28525194388324493, -0.10557594589111469, -0.2904384599170601, 0.01447420424290864, -0.06354991418420607, -0.03056983019931312, -0.13797840033648298, 0.019421611679804614], y = -3.3309869138801593)\n (x = [0.3580330538352434, -0.5381665518035542, 0.044292063256371944, -0.10599964638101989, 0.27689162857245103, -0.08020714286569151, -0.3961825380662015, 0.06743702953170684, 0.1587942581836392], y = -2.139791184162528)\n ⋮\n (x = [-1.600026052448329, -0.2408545790200107, 1.5405201873310876, 1.8539887296099329, -0.5467462007820474, -0.39981367011569435, 2.228553250271379, -1.1546933940416575, 0.9193626167358475], y = 0.8790202116675154)\n (x = [0.6553358041107036, 0.7782186638284925, -0.711965986510562, -0.8467596367923177, 0.5413134885570362, 0.143205346884316, -1.0940249758901164, 0.5535629709232218, -0.3315798435717492], y = -0.572203924615672)\n (x = [0.6553358041107036, 0.7782186638284925, -0.711965986510562, -0.8467596367923177, 0.5413134885570362, 0.143205346884316, -1.0940249758901164, 0.5535629709232218, -0.3315798435717492], y = -0.572203924615672)\n (x = [-1.0570774566203909, -0.08566511726740073, -0.5251507301468408, 0.8936059980427603, 1.7126451277348376, 0.3507152911562657, -0.28115449481963556, -1.2276151015438708, -0.2994952653301086], y = -0.31020974276648494)\n (x = [0.37866701471026165, -0.14976201003712836, 0.9010171056884527, 0.37635518997106954, 0.29248946820911, -0.7247502038582723, -0.09724976992824359, -0.27419523402426493, -0.1320211247981086], y = -2.036503695421155)\n (x = [-0.12763500067711087, -0.06577748986105811, -1.0314904621332737, 0.3639185132566253, -0.09629580390978185, 0.6793264733677378, 0.12847099739652065, -0.9092757114972808, 0.17129078458455427], y = -1.7209735792268481)\n (x = [-3.686874520521673, -4.005735035528217, -2.891502769951285, 2.9966598845571566, 1.9522150146426198, -0.03336063127097652, 2.0227819725018907, -4.86050857108523, -1.9797533692619753], y = 2.4433673771752566)\n (x = [-3.686874520521673, -4.005735035528217, -2.891502769951285, 2.9966598845571566, 1.9522150146426198, -0.03336063127097652, 2.0227819725018907, -4.86050857108523, -1.9797533692619753], y = 2.4433673771752566)\n (x = [0.5308795263849885, 0.22934868694606708, 0.3485700673614027, -0.3232650048759509, -0.2958934654062842, 0.08082431492559825, -0.22792902345787328, 0.5600242164227642, 0.2852548828192424], y = -1.8117150195106975)\n\n\nAnother equivalent option is to use a submodel:\n\n@model function Neal()\n y_raw ~ Normal(0, 1)\n x_raw ~ arraydist([Normal(0, 1) for i in 1:9])\n return (x_raw=x_raw, y_raw=y_raw)\nend\n\nchain = sample(Neal(), NUTS(), 1000)\n\n@model function Neal_with_extras()\n neal ~ to_submodel(Neal(), false)\n y = 3 * neal.y_raw\n x = exp.(y ./ 2) .* neal.x_raw\n return (x=x, y=y)\nend\n\nreturned(Neal_with_extras(), chain)\n\n\n┌ Info: Found initial step size\n└ ϵ = 1.6\n\n\n\n\n1000×1 Matrix{@NamedTuple{x::Vector{Float64}, y::Float64}}:\n (x = [-0.07566706064062392, 0.10311483572005757, 0.058677270994976584, -0.13865555442308447, -0.038454700951950235, 0.20481203478212628, 0.04516105502675028, -0.16938009355757616, -0.12481090434296249], y = -4.221965845632317)\n (x = [0.19290120702690605, 0.18615047459035977, -0.09498430099548254, 0.2771605485640082, 0.08684617737638237, -0.41045019124241977, 0.14748260472939398, 0.37434815670203486, 0.2959194277188531], y = -2.6386155898842416)\n (x = [-2.1752891711699256, -1.4539550838424695, 0.6867090134805061, -0.777430948650298, -1.3035121199440616, 3.1016392283456615, -1.1033274372882482, -2.6168307660325083, -2.39067343069985], y = 1.4864449779172129)\n (x = [0.10459943674424664, -0.022362914107168768, -0.5096610809090144, 0.05288745933675216, -0.13871967690720774, 0.8477329048119763, 0.03709689696717068, 0.0779711678116002, -0.06403873415455363], y = -2.1585232763088715)\n (x = [-10.83247733774592, -23.840575600875816, -14.978755135917957, -13.03788136726405, 9.062472085318305, -38.40308069559727, -0.48610654599260705, 11.287631481580643, -8.709459023282223], y = 6.2303139532838685)\n (x = [-0.06596587793208697, 0.07803284854484112, -0.05202227581520887, 0.06919077252753845, -0.20863698867186153, -0.18714220009170815, 0.04607874383626437, 0.006079559909273133, 0.08127612959213834], y = -4.692521426906612)\n (x = [-24.380535112597837, -9.280190091777035, 18.853758971865922, -20.65476049441104, 13.236696528794338, 15.974089009403256, 14.370337080721658, -16.617758099772896, -16.39597484758365], y = 6.174682308118474)\n (x = [-0.215075957107738, 0.01775476789141629, -0.22661016661738204, -0.06294429390810571, -0.2150206068952867, 0.0055901614387845735, -0.1777627353307627, 0.03682055712555352, 0.23654495742400974], y = -2.593218734983645)\n (x = [3.995344803948596, -1.500178192903815, 1.564919063595, 2.6819303276451607, 4.518254901700405, -0.6143242522504189, 2.963827331115327, -4.187062580725833, -6.280379154478004], y = 3.20962769475073)\n (x = [-0.28159988711150596, 0.08180857762164455, 0.12083395655177959, -0.19274253467654714, -0.23518423017482434, -0.07426195765883103, -0.18152851426115035, -0.03851955717238401, 0.2295892559322241], y = -2.084041272240995)\n ⋮\n (x = [-0.011296842674278964, -0.04941641170308534, 0.186104120279406, 0.2506451669767975, 0.14741463705464206, 0.07224734430936611, 0.047277163974074124, -0.05901339612473888, 0.13976943072229484], y = -3.99382813084304)\n (x = [1.6159394641708849, 2.2409401572012313, 0.7046286794330993, -2.9291033062989444, -3.1856814720740556, 0.19075300461599093, 0.8644676396987089, 1.5304435199077069, -0.33907179247724556], y = 0.9370613596635249)\n (x = [-0.17669162701299018, 0.010823777221550935, -0.013165734083525091, 0.03830701886504061, 0.3150178811121543, 0.051442726550694644, -0.051506183474092464, 0.07129939896216292, 0.012966226005874517], y = -4.340321834404614)\n (x = [14.081458912669822, -3.540787016523896, -1.1929711535882113, -8.817903318841363, -17.67379281896953, -9.299249205031424, 5.181143932082994, -3.520800661033317, 2.378577290551241], y = 4.670550105125785)\n (x = [-0.21696615398728686, 0.23715923257620372, -0.13804606900638794, -0.00467499083690512, -0.01214059159709238, -0.19417613678977352, -0.05535942318930767, 0.021110711459599252, 0.1869272829100717], y = -3.4829145715546925)\n (x = [0.2234177614553074, -0.05069309340924619, -0.10100119912916802, 0.14504111417749066, 0.34343646575355025, -0.5079536953787023, 0.7971502193994792, 0.19094699917603958, 0.5485653390822689], y = -2.1644297164138706)\n (x = [0.135424962708822, 0.0016117772525612954, -0.07230560622722985, -0.15516790014111895, -0.3305715293788884, 0.0020231436539595226, -0.012776964031678934, -0.1784815969519963, -0.2821912797302038], y = -3.537711233762788)\n (x = [0.26034050456531893, -0.059661221070328556, -0.02005205223858047, -0.12850470456314136, -0.2589029519193505, -0.05661549618178163, -0.25108887978190797, -0.15015466960023785, -0.24023528103446162], y = -3.9453635703147794)\n (x = [29.907405641378922, 17.752067007400527, -4.442333745521303, 23.16956730156063, 53.18515432113202, -4.538030731679161, 19.800288140320916, 50.94970046135725, 40.804250940215645], y = 6.542369532415909)\n\n\nNote that for the returned call to work, the Neal_with_extras() model must have the same variable names as stored in chain. This means the submodel Neal() must not be prefixed, i.e. to_submodel() must be passed a second parameter false.", "crumbs": [ "Get Started", "User Guide", "Tracking Extra Quantities" ] }, { "objectID": "usage/sampling-options/index.html", "href": "usage/sampling-options/index.html", "title": "MCMC Sampling Options", "section": "", "text": "Markov chain Monte Carlo sampling in Turing.jl is performed using the sample() function. As described on the Core Functionality page, single-chain and multiple-chain sampling can be done using, respectively,\nOn top of this, both methods also accept a number of keyword arguments that allow you to control the sampling process. This page will detail these options.\nTo begin, let’s create a simple model:\nusing Turing\n\n@model function demo_model()\n x ~ Normal()\n y ~ Normal(x)\n 4.0 ~ Normal(y)\n return nothing\nend\n\ndemo_model (generic function with 2 methods)", "crumbs": [ "Get Started", "User Guide", "MCMC Sampling Options" ] }, { "objectID": "usage/sampling-options/index.html#controlling-logging", "href": "usage/sampling-options/index.html#controlling-logging", "title": "MCMC Sampling Options", "section": "Controlling logging", "text": "Controlling logging\nProgress bars can be controlled with the progress keyword argument. The exact values that can be used depend on whether you are using single-chain or multi-chain sampling.\nFor single-chain sampling, progress=true and progress=false enable and disable the progress bar, respectively.\nFor multi-chain sampling, progress can take the following values:\n\n:none or false: no progress bar\n(default) :overall or true: creates one overall progress bar for all chains\n:perchain: creates one overall progress bar, plus one extra progress bar per chain (note that this can lead to visual clutter if you have many chains)\n\nIf you want to globally enable or disable the progress bar, you can use:\n\nTuring.setprogress!(false); # or true\n\n\n[ Info: [Turing]: progress logging is disabled globally\n\n\n\n\n(This handily also disables progress logging for the rest of this document.)\nFor NUTS in particular, you can also specify verbose=false to disable the “Found initial step size” info message.", "crumbs": [ "Get Started", "User Guide", "MCMC Sampling Options" ] }, { "objectID": "usage/sampling-options/index.html#ensuring-sampling-reproducibility", "href": "usage/sampling-options/index.html#ensuring-sampling-reproducibility", "title": "MCMC Sampling Options", "section": "Ensuring sampling reproducibility", "text": "Ensuring sampling reproducibility\nLike many other Julia functions, a Random.AbstractRNG object can be passed as the first argument to sample() to ensure reproducibility of results.\n\nusing Random\nchn1 = sample(Xoshiro(468), demo_model(), MH(), 5);\nchn2 = sample(Xoshiro(468), demo_model(), MH(), 5);\n(chn1[:x] == chn2[:x], chn1[:y] == chn2[:y])\n\n(true, true)\n\n\nAlternatively, you can set the global RNG using Random.seed!(), although we recommend this less as it modifies global state.\n\nRandom.seed!(468)\nchn3 = sample(demo_model(), MH(), 5);\nRandom.seed!(468)\nchn4 = sample(demo_model(), MH(), 5);\n(chn3[:x] == chn4[:x], chn3[:y] == chn4[:y])\n\n(true, true)\n\n\n\n\n\n\n\n\nNote\n\n\n\nThe outputs of pseudorandom number generators in the standard Random library are not guaranteed to be the same across different Julia versions or platforms. If you require absolute reproducibility, you should use the StableRNGs.jl package.", "crumbs": [ "Get Started", "User Guide", "MCMC Sampling Options" ] }, { "objectID": "usage/sampling-options/index.html#switching-the-output-type", "href": "usage/sampling-options/index.html#switching-the-output-type", "title": "MCMC Sampling Options", "section": "Switching the output type", "text": "Switching the output type\nBy default, the results of MCMC sampling are bundled up in an MCMCChains.Chains object.\n\nchn = sample(demo_model(), HMC(0.1, 20), 5)\ntypeof(chn)\n\nChains{Union{Missing, Float64}, AxisArrays.AxisArray{Union{Missing, Float64}, 3, Array{Union{Missing, Float64}, 3}, Tuple{AxisArrays.Axis{:iter, StepRange{Int64, Int64}}, AxisArrays.Axis{:var, Vector{Symbol}}, AxisArrays.Axis{:chain, UnitRange{Int64}}}}, Missing, @NamedTuple{parameters::Vector{Symbol}, internals::Vector{Symbol}}, @NamedTuple{varname_to_symbol::OrderedDict{AbstractPPL.VarName, Symbol}, start_time::Float64, stop_time::Float64}}\n\n\nIf you wish to use a different chain format provided in another package, you can specify the chain_type keyword argument. You should refer to the documentation of the respective package for exact details.\nAnother situation where specifying chain_type can be useful is when you want to obtain the raw MCMC outputs as a vector of transitions. This can be used for profiling or debugging purposes (often, chain construction can take a surprising amount of time compared to sampling, especially for very simple models). To do so, you can use chain_type=Any (i.e., do not convert the output to any specific chain format):\n\ntransitions = sample(demo_model(), MH(), 5; chain_type=Any)\ntypeof(transitions)\n\n\nVector{ParamsWithStats{OrderedDict{VarName, Any}, @NamedTuple{logprior::Float64, loglikelihood::Float64, logjoint::Float64}}} (alias for Array{DynamicPPL.ParamsWithStats{OrderedDict{AbstractPPL.VarName, Any}, @NamedTuple{logprior::Float64, loglikelihood::Float64, logjoint::Float64}}, 1})", "crumbs": [ "Get Started", "User Guide", "MCMC Sampling Options" ] }, { "objectID": "usage/sampling-options/index.html#specifying-initial-parameters", "href": "usage/sampling-options/index.html#specifying-initial-parameters", "title": "MCMC Sampling Options", "section": "Specifying initial parameters", "text": "Specifying initial parameters\nIn Turing.jl, initial parameters for MCMC sampling can be specified using the initial_params keyword argument.\n\n\n\n\n\n\nImportantNew initial_params in Turing v0.41\n\n\n\nIn Turing v0.41, the permitted values for initial_params are different. In particular, Vectors are no longer permitted, because they are semantically ambiguous (the way in which indices correspond to parameters relies on DynamicPPL internals). This page describes the new behaviour.\n\n\nFor single-chain sampling with Turing, the initial_params keyword argument should be a DynamicPPL.AbstractInitStrategy. There are several options; all the InitFrom... structs are re-exported by Turing.\n\nInitFromPrior(): generate initial parameters by sampling from the prior\nInitFromUniform(lower, upper): generate initial parameters by sampling uniformly from the given bounds in linked space\nInitFromParams(namedtuple_or_dict): use the provided initial parameters, supplied either as a NamedTuple or a Dict{<:VarName}\nInitFromParams(mode_estimate): use the parameters in the optimisation result obtained via maximum_a_posteriori or maximum_likelihood\n\nIf initial_params is unspecified, each sampler will use its own default initialisation strategy: for most samplers this is InitFromPrior but for Hamiltonian samplers it is InitFromUniform(-2, 2) (which mimics the behaviour of Stan).\n\nchn = sample(demo_model(), MH(), 5; initial_params=InitFromParams((x = 1.0, y = -5.0)))\nchn[:x][1], chn[:y][1]\n\n(1.0, -5.0)\n\n\n\n\n\n\n\n\nNote\n\n\n\nNote that a number of samplers use warm-up steps by default (see the Thinning and Warmup section below), so chn[:param][1] may not correspond to the exact initial parameters you provided. MH() does not do this, which is why we use it here.\n\n\nThis approach scales for parameters with more complex types.\n\n@model function demo_complex()\n x ~ LKJCholesky(3, 0.5)\n y ~ MvNormal(zeros(3), I)\nend\ninit_x, init_y = rand(LKJCholesky(3, 0.5)), rand(MvNormal(zeros(3), I))\nchn = sample(demo_complex(), MH(), 5; initial_params=InitFromParams((x=init_x, y=init_y)));\n\nFor multiple-chain sampling, the initial_params keyword argument should be a vector with length equal to the number of chains being sampled. Each element of this vector should be the initial parameters for the corresponding chain, as described above. Thus, for example, you can supply a vector of AbstractInitStrategy objects. If you want to use the same initial parameters for all chains, you can use fill:\n\ninitial_params = fill(InitFromParams((x=1.0, y=-5.0)), 3)\nchn = sample(demo_model(), MH(), MCMCThreads(), 5, 3; initial_params=initial_params)\nchn[:x][1,:], chn[:y][1,:]\n\n\n┌ Warning: Only a single thread available: MCMC chains are not sampled in parallel\n└ @ AbstractMCMC ~/.julia/packages/AbstractMCMC/XBmfQ/src/sample.jl:432\n\n\n\n\n([1.0, 1.0, 1.0], [-5.0, -5.0, -5.0])\n\n\nIn Turing v0.41, initialisation with a raw NamedTuple is still supported (it will simply be wrapped in InitFromParams()); but we expect to remove this eventually, so it will likely be more future-proof to wrap this in InitFromParams() yourself.", "crumbs": [ "Get Started", "User Guide", "MCMC Sampling Options" ] }, { "objectID": "usage/sampling-options/index.html#saving-and-resuming-sampling", "href": "usage/sampling-options/index.html#saving-and-resuming-sampling", "title": "MCMC Sampling Options", "section": "Saving and resuming sampling", "text": "Saving and resuming sampling\nBy default, MCMC sampling starts from scratch, using the initial parameters provided. You can, however, resume sampling from a previous chain. This is useful to, for example, perform sampling in batches, or to inspect intermediate results.\nFirstly, the previous chain must have been run using the save_state=true argument.\n\nrng = Xoshiro(468)\n\nchn1 = sample(rng, demo_model(), MH(), 5; save_state=true);\n\nFor MCMCChains.Chains, this results in the final sampler state being stored inside the chain metadata. You can access it using Turing.loadstate:\n\nsaved_state = Turing.loadstate(chn1)\ntypeof(saved_state)\n\nTuring.Inference.MHState{DynamicPPL.VarInfo{@NamedTuple{x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}}, y::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:y, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:y, typeof(identity)}}, Vector{Float64}}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}, Float64}\n\n\n\n\n\n\n\n\nNote\n\n\n\nYou can also directly access the saved sampler state with chn1.info.samplerstate, but we recommend not using this as it relies on the internal structure of MCMCChains.Chains.\n\n\nSampling can then be resumed from this state by providing it as the initial_state keyword argument.\n\nchn2 = sample(demo_model(), MH(), 5; initial_state=saved_state)\n\nChains MCMC chain (5×5×1 Array{Float64, 3}):\n\nIterations = 1:1:5\nNumber of chains = 1\nSamples per chain = 5\nWall duration = 0.08 seconds\nCompute duration = 0.08 seconds\nparameters = x, y\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nNote that the exact format saved in chn.info.samplerstate, and that expected by initial_state, depends on the invocation of sample used. For single-chain sampling, the saved state, and the required initial state, is just a single sampler state. For multiple-chain sampling, it is a vector of states, one per chain.\n\n\n\n\n\n\nWarningresume_from\n\n\n\nThe resume_from argument has been removed in Turing v0.41; please use initial_state=loadstate(chn) instead, as described here. In v0.41, loadstate is also exported from Turing rather than DynamicPPL.\n\n\nThis means that, for example, after sampling a single chain, you could sample three chains that branch off from that final state:\n\ninitial_states = fill(saved_state, 3)\nchn3 = sample(demo_model(), MH(), MCMCThreads(), 5, 3; initial_state=initial_states)\n\n\n┌ Warning: Only a single thread available: MCMC chains are not sampled in parallel\n└ @ AbstractMCMC ~/.julia/packages/AbstractMCMC/XBmfQ/src/sample.jl:432\n\n\n\n\nChains MCMC chain (5×5×3 Array{Float64, 3}):\n\nIterations = 1:1:5\nNumber of chains = 3\nSamples per chain = 5\nWall duration = 0.04 seconds\nCompute duration = 0.02 seconds\nparameters = x, y\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\n\n\n\n\n\nNoteInitial states versus initial parameters\n\n\n\nThe initial_state and initial_params keyword arguments are mutually exclusive. If both are provided, initial_params will be silently ignored.\n\nchn2 = sample(rng, demo_model(), MH(), 5;\n initial_state=saved_state, initial_params=InitFromParams((x=0.0, y=0.0))\n)\nchn2[:x][1], chn2[:y][1]\n\n(0.6039098670263787, 3.189622703262277)\n\n\nIn general, the saved state will contain a set of parameters (which will be the last parameters in the previous chain). However, the saved state not only specifies parameters but also other internal variables required by the sampler. For example, the MH state contains a cached log-density of the current parameters, which is later used for calculating the acceptance ratio.\nFinally, note that the first sample in the resumed chain will not be the same as the last sample in the previous chain; it will be the sample immediately after that.\n\n# In general these will not be the same (although it _could_ be if the MH step\n# was rejected -- that is why we seed the sampling in this section).\nchn1[:x][end], chn2[:x][1]\n\n(-0.3719817960998872, 0.6039098670263787)", "crumbs": [ "Get Started", "User Guide", "MCMC Sampling Options" ] }, { "objectID": "usage/sampling-options/index.html#thinning-and-warmup", "href": "usage/sampling-options/index.html#thinning-and-warmup", "title": "MCMC Sampling Options", "section": "Thinning and warmup", "text": "Thinning and warmup\nThe num_warmup and discard_initial keyword arguments can be used to control MCMC warmup. Both of these are integers, and respectively specify the number of warmup steps to perform, and the number of iterations at the start of the chain to discard. Note that the value of discard_initial should also include the num_warmup steps if you want the warmup steps to be discarded.\nHere are some examples of how these two keyword arguments interact:\n\n\n\n\n\n\n\n\nnum_warmup=\ndiscard_initial=\nDescription\n\n\n\n\n10\n10\nPerform 10 warmup steps, discard them; the chain starts from the first non-warmup step\n\n\n10\n15\nPerform 10 warmup steps, discard them and the next 5 steps; the chain starts from the 6th non-warmup step\n\n\n10\n5\nPerform 10 warmup steps, discard the first 5; the chain will contain 5 warmup steps followed by the rest of the chain\n\n\n0\n10\nNo warmup steps, discard the first 10 steps; the chain starts from the 11th step\n\n\n0\n0\nNo warmup steps, do not discard any steps; the chain starts from the 1st step (corresponding to the initial parameters)\n\n\n\nEach sampler has its own default value for num_warmup, but discard_initial always defaults to num_warmup.\nWarmup steps and ‘regular’ non-warmup steps differ in that warmup steps call AbstractMCMC.step_warmup, whereas regular steps call AbstractMCMC.step. For all the samplers defined in Turing, these two functions are identical; however, they may in general differ for other samplers. Please consult the documentation of the respective sampler for details.\nA thinning factor can be specified using the thinning keyword argument. For example, thinning=10 will keep every tenth sample, discarding the other nine.\nNote that thinning is not applied to the first discard_initial samples; it is only applied to the remaining samples. Thus, for example, if you use discard_initial=50 and thinning=10, the chain will contain samples 51, 61, 71, and so on.", "crumbs": [ "Get Started", "User Guide", "MCMC Sampling Options" ] }, { "objectID": "usage/sampling-options/index.html#performing-model-checks", "href": "usage/sampling-options/index.html#performing-model-checks", "title": "MCMC Sampling Options", "section": "Performing model checks", "text": "Performing model checks\nDynamicPPL by default performs a number of checks on the model before any sampling is done. This catches a number of potential errors in a model, such as having repeated variables (see the DynamicPPL documentation for details).\nIf you wish to disable this you can pass check_model=false to sample().", "crumbs": [ "Get Started", "User Guide", "MCMC Sampling Options" ] }, { "objectID": "usage/sampling-options/index.html#callbacks", "href": "usage/sampling-options/index.html#callbacks", "title": "MCMC Sampling Options", "section": "Callbacks", "text": "Callbacks\nThe callback keyword argument can be used to specify a function that is called at the end of each sampler iteration. This function should have the signature callback(rng, model, sampler, sample, iteration::Int; kwargs...).\nIf you are performing multi-chain sampling, kwargs will additionally contain chain_number::Int, which ranges from 1 to the number of chains.\nThe TuringCallbacks.jl package contains a TensorBoardCallback, which can be used to obtain live progress visualisations using TensorBoard.", "crumbs": [ "Get Started", "User Guide", "MCMC Sampling Options" ] }, { "objectID": "usage/sampling-options/index.html#automatic-differentiation", "href": "usage/sampling-options/index.html#automatic-differentiation", "title": "MCMC Sampling Options", "section": "Automatic differentiation", "text": "Automatic differentiation\nFinally, please note that for samplers which use automatic differentiation (e.g., HMC and NUTS), the AD type should be specified in the sampler constructor itself, rather than as a keyword argument to sample().\nIn other words, this is correct:\n\nspl = NUTS(; adtype=AutoForwardDiff())\nchn = sample(demo_model(), spl, 10);\n\n\n┌ Info: Found initial step size\n└ ϵ = 3.2\n\n\n\n\nand not this:\nspl = NUTS()\nchn = sample(demo_model(), spl, 10; adtype=AutoForwardDiff())", "crumbs": [ "Get Started", "User Guide", "MCMC Sampling Options" ] }, { "objectID": "usage/dynamichmc/index.html", "href": "usage/dynamichmc/index.html", "title": "Using DynamicHMC", "section": "", "text": "Turing supports the use of DynamicHMC as a sampler through the DynamicNUTS function.\nTo use the DynamicNUTS function, you must import the DynamicHMC package as well as Turing. Turing does not formally require DynamicHMC but will include additional functionality if both packages are present.\nHere is a brief example:\n\nHow to apply DynamicNUTS:\n\n# Import Turing and DynamicHMC.\nusing DynamicHMC, Turing\n\n# Model definition.\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n return y ~ Normal(m, sqrt(s²))\nend\n\n# Pull 2,000 samples using DynamicNUTS.\ndynamic_nuts = externalsampler(DynamicHMC.NUTS())\nchn = sample(gdemo(1.5, 2.0), dynamic_nuts, 2000, progress=false)\n\nChains MCMC chain (2000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:2000\nNumber of chains = 1\nSamples per chain = 2000\nWall duration = 6.58 seconds\nCompute duration = 6.58 seconds\nparameters = s², m\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "User Guide", "Using DynamicHMC" ] }, { "objectID": "usage/troubleshooting/index.html", "href": "usage/troubleshooting/index.html", "title": "Troubleshooting", "section": "", "text": "This page collects a number of common error messages observed when using Turing, along with suggestions on how to fix them.\nIf the suggestions here do not resolve your problem, please do feel free to open an issue.\nusing Turing\nTuring.setprogress!(false)\n\n\n[ Info: [Turing]: progress logging is disabled globally\n\n\n\n\nfalse", "crumbs": [ "Get Started", "User Guide", "Troubleshooting" ] }, { "objectID": "usage/troubleshooting/index.html#initial-parameters", "href": "usage/troubleshooting/index.html#initial-parameters", "title": "Troubleshooting", "section": "Initial parameters", "text": "Initial parameters\n\nfailed to find valid initial parameters in {N} tries. This may indicate an error with the model or AD backend…\n\nThis error is seen when a Hamiltonian Monte Carlo sampler is unable to determine a valid set of initial parameters for the sampling. Here, ‘valid’ means that the log probability density of the model, as well as its gradient with respect to each parameter, is finite and not NaN.\n\nNaN gradient\nOne of the most common causes of this error is having a NaN gradient. To find out whether this is happening, you can evaluate the gradient manually. Here is an example with a model that is known to be problematic:\n\nusing Turing\nusing DynamicPPL.TestUtils.AD: run_ad\n\n@model function initial_bad()\n a ~ Normal()\n x ~ truncated(Normal(a), 0, Inf)\nend\n\nmodel = initial_bad()\nadtype = AutoForwardDiff()\nresult = run_ad(model, adtype; test=false, benchmark=false)\nresult.grad_actual\n\n\n[ Info: Running AD on initial_bad with ADTypes.AutoForwardDiff()\n params : [-0.38240649964883394, -0.6714508364415822]\n actual : (-1.934761768432398, [NaN, NaN])\n\n\n\n\n2-element Vector{Float64}:\n NaN\n NaN\n\n\n(See the DynamicPPL docs for more details on the run_ad function and its return type.)\nIn this case, the NaN gradient is caused by the Inf argument to truncated. (See, e.g., this issue on Distributions.jl.) Here, the upper bound of Inf is not needed, so it can be removed:\n\n@model function initial_good()\n a ~ Normal()\n x ~ truncated(Normal(a); lower=0)\nend\n\nmodel = initial_good()\nadtype = AutoForwardDiff()\nrun_ad(model, adtype; test=false, benchmark=false).grad_actual\n\n\n[ Info: Running AD on initial_good with ADTypes.AutoForwardDiff()\n params : [2.741352402331662, 0.5904089858010693]\n actual : (-5.440544706343022, [-3.687319068126727, 2.6903536976918887])\n\n\n\n\n2-element Vector{Float64}:\n -3.687319068126727\n 2.6903536976918887\n\n\nMore generally, you could try using a different AD backend; if you don’t know why a model is returning NaN gradients, feel free to open an issue.\n\n\n-Inf log density\nAnother cause of this error is having models with very extreme parameters. This example is taken from this Turing.jl issue:\n\n@model function initial_bad2()\n x ~ Exponential(100)\n y ~ Uniform(0, x)\nend\nmodel = initial_bad2() | (y = 50.0,)\n\nDynamicPPL.Model{typeof(initial_bad2), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{@NamedTuple{y::Float64}, DynamicPPL.DefaultContext}, false}(initial_bad2, NamedTuple(), NamedTuple(), ConditionContext((y = 50.0,), DynamicPPL.DefaultContext()))\n\n\nThe problem here is that HMC’s default initialisation strategy is InitFromUniform(-2, 2): in other words, it attempts to find initial values for transformed parameters inside the region of [-2, 2]. For a distribution of Exponential(100), the appropriate transformation is log(x) (see the variable transformation docs for more info).\nThus, HMC attempts to find initial values of log(x) in the region of [-2, 2], which corresponds to x in the region of [exp(-2), exp(2)] = [0.135, 7.39]. However, all of these values of x will give rise to a zero probability density for y because the value of y = 50.0 is outside the support of Uniform(0, x). Thus, the log density of the model is -Inf, as can be seen with logjoint:\n\nlogjoint(model, (x = exp(-2),))\n\n-Inf\n\n\n\nlogjoint(model, (x = exp(2),))\n\n-Inf\n\n\nYou can fix this by overriding the default initialisation strategy (discussed in more detail in the sampling options page).\n\n# Use initial parameters drawn from the model's prior.\nsample(model, NUTS(), 1000; initial_params=InitFromPrior())\n\n\n┌ Info: Found initial step size\n└ ϵ = 3.2\n\n\n\n\nChains MCMC chain (1000×15×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 3.27 seconds\nCompute duration = 3.27 seconds\nparameters = x\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\n# Use manually specified initial parameters (always in untransformed space).\nsample(model, NUTS(), 1000; initial_params=InitFromParams((x = 60.0,)))\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\n\n\nChains MCMC chain (1000×15×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 0.3 seconds\nCompute duration = 0.3 seconds\nparameters = x\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nMore generally, you may also consider reparameterising the model to avoid such issues.", "crumbs": [ "Get Started", "User Guide", "Troubleshooting" ] }, { "objectID": "usage/troubleshooting/index.html#forwarddiff-type-parameters", "href": "usage/troubleshooting/index.html#forwarddiff-type-parameters", "title": "Troubleshooting", "section": "ForwardDiff type parameters", "text": "ForwardDiff type parameters\n\nMethodError: no method matching Float64(::ForwardDiff.Dual{… The type Float64 exists, but no method is defined for this combination of argument types when trying to construct it.\n\nA common error with ForwardDiff looks like this:\n\n@model function forwarddiff_fail()\n x = Float64[0.0, 1.0]\n a ~ Normal()\n @show typeof(a)\n x[1] = a\n b ~ MvNormal(x, I)\nend\nsample(forwarddiff_fail(), NUTS(; adtype=AutoForwardDiff()), 10)\n\ntypeof(a) = Float64\ntypeof(a) = Float64\ntypeof(a) = Float64\ntypeof(a) = ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}\n\n\n\nMethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3})\nThe type `Float64` exists, but no method is defined for this combination of argument types when trying to construct it.\n\nClosest candidates are:\n (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat\n @ Base rounding.jl:265\n (::Type{T})(::T) where T<:Number\n @ Core boot.jl:900\n Float64(::UInt128)\n @ Base float.jl:260\n ...\n\nStacktrace:\n [1] convert(::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3})\n @ Base ./number.jl:7\n [2] setindex!(A::Vector{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}, i::Int64)\n @ Base ./array.jl:987\n [3] forwarddiff_fail(__model__::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.InitContext{Random.TaskLocalRNG, InitFromParams{DynamicPPL.VectorWithRanges{true, @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, Nothing}}, false}, __varinfo__::DynamicPPL.OnlyAccsVarInfo{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}})\n @ Main.Notebook ~/work/docs/docs/usage/troubleshooting/index.qmd:123\n [4] _evaluate!!(model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.InitContext{Random.TaskLocalRNG, InitFromParams{DynamicPPL.VectorWithRanges{true, @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, Nothing}}, false}, varinfo::DynamicPPL.OnlyAccsVarInfo{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:997\n [5] evaluate!!(model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.InitContext{Random.TaskLocalRNG, InitFromParams{DynamicPPL.VectorWithRanges{true, @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, Nothing}}, false}, varinfo::DynamicPPL.OnlyAccsVarInfo{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:983\n [6] init!!(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, vi::DynamicPPL.OnlyAccsVarInfo{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}, strategy::InitFromParams{DynamicPPL.VectorWithRanges{true, @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, Nothing})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:938\n [7] init!!(model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, vi::DynamicPPL.OnlyAccsVarInfo{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}, strategy::InitFromParams{DynamicPPL.VectorWithRanges{true, @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, Nothing})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:943\n [8] logdensity_at(params::Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}, ::Val{true}, model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, getlogdensity::typeof(DynamicPPL.getlogjoint_internal), iden_varname_ranges::@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, varname_ranges::Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}, accs::DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/logdensityfunction.jl:291\n [9] (::DifferentiationInterface.FixTail{typeof(DynamicPPL.logdensity_at), Tuple{Val{true}, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}})(args::Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}})\n @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/MgcE4/src/utils/context.jl:172\n [10] vector_mode_dual_eval!(f::DifferentiationInterface.FixTail{typeof(DynamicPPL.logdensity_at), Tuple{Val{true}, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, x::Vector{Float64})\n @ ForwardDiff ~/.julia/packages/ForwardDiff/9ocoj/src/apiutils.jl:24\n [11] vector_mode_gradient!(result::DiffResults.MutableDiffResult{1, Float64, Tuple{Vector{Float64}}}, f::DifferentiationInterface.FixTail{typeof(DynamicPPL.logdensity_at), Tuple{Val{true}, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}})\n @ ForwardDiff ~/.julia/packages/ForwardDiff/9ocoj/src/gradient.jl:105\n [12] gradient!(result::DiffResults.MutableDiffResult{1, Float64, Tuple{Vector{Float64}}}, f::DifferentiationInterface.FixTail{typeof(DynamicPPL.logdensity_at), Tuple{Val{true}, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, ::Val{false})\n @ ForwardDiff ~/.julia/packages/ForwardDiff/9ocoj/src/gradient.jl:39\n [13] value_and_gradient(::typeof(DynamicPPL.logdensity_at), ::DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, ::AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, ::Vector{Float64}, ::DifferentiationInterface.Constant{Val{true}}, ::DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, ::DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, ::DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, ::DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, ::DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}})\n @ DifferentiationInterfaceForwardDiffExt ~/.julia/packages/DifferentiationInterface/MgcE4/ext/DifferentiationInterfaceForwardDiffExt/onearg.jl:419\n [14] logdensity_and_gradient(ldf::LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}, params::Vector{Float64})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/logdensityfunction.jl:373\n [15] (::Base.Fix1{typeof(LogDensityProblems.logdensity_and_gradient), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}})(y::Vector{Float64})\n @ Base ./operators.jl:1127\n [16] ∂H∂θ(h::AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64, Vector{Float64}}, AdvancedHMC.GaussianKinetic, Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}, Base.Fix1{typeof(LogDensityProblems.logdensity_and_gradient), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, θ::Vector{Float64})\n @ AdvancedHMC ~/.julia/packages/AdvancedHMC/kEVkt/src/hamiltonian.jl:46\n [17] phasepoint(h::AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64, Vector{Float64}}, AdvancedHMC.GaussianKinetic, Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}, Base.Fix1{typeof(LogDensityProblems.logdensity_and_gradient), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, θ::Vector{Float64}, r::Vector{Float64})\n @ AdvancedHMC ~/.julia/packages/AdvancedHMC/kEVkt/src/hamiltonian.jl:103\n [18] phasepoint(rng::Random.TaskLocalRNG, θ::Vector{Float64}, h::AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64, Vector{Float64}}, AdvancedHMC.GaussianKinetic, Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}, Base.Fix1{typeof(LogDensityProblems.logdensity_and_gradient), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}})\n @ AdvancedHMC ~/.julia/packages/AdvancedHMC/kEVkt/src/hamiltonian.jl:185\n [19] find_initial_params(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, varinfo::DynamicPPL.VarInfo{@NamedTuple{a::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:a, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:a, typeof(identity)}}, Vector{Float64}}, b::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:b, typeof(identity)}, Int64}, Vector{IsoNormal}, Vector{AbstractPPL.VarName{:b, typeof(identity)}}, Vector{Float64}}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}, hamiltonian::AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64, Vector{Float64}}, AdvancedHMC.GaussianKinetic, Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}, Base.Fix1{typeof(LogDensityProblems.logdensity_and_gradient), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, init_strategy::InitFromUniform{Float64}; max_attempts::Int64)\n @ Turing.Inference ~/.julia/packages/Turing/gwHwZ/src/mcmc/hmc.jl:164\n [20] find_initial_params(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, varinfo::DynamicPPL.VarInfo{@NamedTuple{a::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:a, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:a, typeof(identity)}}, Vector{Float64}}, b::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:b, typeof(identity)}, Int64}, Vector{IsoNormal}, Vector{AbstractPPL.VarName{:b, typeof(identity)}}, Vector{Float64}}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}, hamiltonian::AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64, Vector{Float64}}, AdvancedHMC.GaussianKinetic, Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}, Base.Fix1{typeof(LogDensityProblems.logdensity_and_gradient), LogDensityFunction{true, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{3, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{true}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{a::DynamicPPL.RangeAndLinked, b::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 3}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, init_strategy::InitFromUniform{Float64})\n @ Turing.Inference ~/.julia/packages/Turing/gwHwZ/src/mcmc/hmc.jl:152\n [21] initialstep(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, spl::NUTS{AutoForwardDiff{nothing, Nothing}, AdvancedHMC.DiagEuclideanMetric}, vi_original::DynamicPPL.VarInfo{@NamedTuple{a::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:a, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:a, typeof(identity)}}, Vector{Float64}}, b::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:b, typeof(identity)}, Int64}, Vector{IsoNormal}, Vector{AbstractPPL.VarName{:b, typeof(identity)}}, Vector{Float64}}}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}; initial_params::InitFromUniform{Float64}, nadapts::Int64, verbose::Bool, kwargs::@Kwargs{})\n @ Turing.Inference ~/.julia/packages/Turing/gwHwZ/src/mcmc/hmc.jl:216\n [22] step(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, spl::NUTS{AutoForwardDiff{nothing, Nothing}, AdvancedHMC.DiagEuclideanMetric}; initial_params::InitFromUniform{Float64}, kwargs::@Kwargs{nadapts::Int64})\n @ Turing.Inference ~/.julia/packages/Turing/gwHwZ/src/mcmc/abstractmcmc.jl:182\n [23] step\n @ ~/.julia/packages/Turing/gwHwZ/src/mcmc/abstractmcmc.jl:164 [inlined]\n [24] macro expansion\n @ ~/.julia/packages/AbstractMCMC/XBmfQ/src/sample.jl:188 [inlined]\n [25] (::AbstractMCMC.var\"#29#30\"{Nothing, Int64, Int64, Int64, UnionAll, Nothing, @Kwargs{nadapts::Int64, initial_params::InitFromUniform{Float64}}, Random.TaskLocalRNG, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, NUTS{AutoForwardDiff{nothing, Nothing}, AdvancedHMC.DiagEuclideanMetric}, Int64, Float64, Int64, Int64})()\n @ AbstractMCMC ~/.julia/packages/AbstractMCMC/XBmfQ/src/logging.jl:134\n [26] with_logstate(f::AbstractMCMC.var\"#29#30\"{Nothing, Int64, Int64, Int64, UnionAll, Nothing, @Kwargs{nadapts::Int64, initial_params::InitFromUniform{Float64}}, Random.TaskLocalRNG, DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, NUTS{AutoForwardDiff{nothing, Nothing}, AdvancedHMC.DiagEuclideanMetric}, Int64, Float64, Int64, Int64}, logstate::Base.CoreLogging.LogState)\n @ Base.CoreLogging ./logging/logging.jl:524\n [27] with_logger(f::Function, logger::LoggingExtras.TeeLogger{Tuple{LoggingExtras.EarlyFilteredLogger{TerminalLoggers.TerminalLogger, AbstractMCMC.var\"#1#3\"{Module}}, LoggingExtras.EarlyFilteredLogger{Base.CoreLogging.ConsoleLogger, AbstractMCMC.var\"#2#4\"{Module}}}})\n @ Base.CoreLogging ./logging/logging.jl:635\n [28] with_progresslogger(f::Function, _module::Module, logger::Base.CoreLogging.ConsoleLogger)\n @ AbstractMCMC ~/.julia/packages/AbstractMCMC/XBmfQ/src/logging.jl:157\n [29] macro expansion\n @ ~/.julia/packages/AbstractMCMC/XBmfQ/src/logging.jl:133 [inlined]\n [30] mcmcsample(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, sampler::NUTS{AutoForwardDiff{nothing, Nothing}, AdvancedHMC.DiagEuclideanMetric}, N::Int64; progress::Bool, progressname::String, callback::Nothing, num_warmup::Int64, discard_initial::Int64, thinning::Int64, chain_type::Type, initial_state::Nothing, kwargs::@Kwargs{nadapts::Int64, initial_params::InitFromUniform{Float64}})\n @ AbstractMCMC ~/.julia/packages/AbstractMCMC/XBmfQ/src/sample.jl:168\n [31] sample(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, sampler::NUTS{AutoForwardDiff{nothing, Nothing}, AdvancedHMC.DiagEuclideanMetric}, N::Int64; check_model::Bool, chain_type::Type, initial_params::InitFromUniform{Float64}, initial_state::Nothing, progress::Bool, nadapts::Int64, discard_adapt::Bool, discard_initial::Int64, kwargs::@Kwargs{})\n @ Turing.Inference ~/.julia/packages/Turing/gwHwZ/src/mcmc/hmc.jl:121\n [32] sample\n @ ~/.julia/packages/Turing/gwHwZ/src/mcmc/hmc.jl:88 [inlined]\n [33] #sample#1\n @ ~/.julia/packages/Turing/gwHwZ/src/mcmc/abstractmcmc.jl:73 [inlined]\n [34] sample(model::DynamicPPL.Model{typeof(forwarddiff_fail), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, spl::NUTS{AutoForwardDiff{nothing, Nothing}, AdvancedHMC.DiagEuclideanMetric}, N::Int64)\n @ Turing.Inference ~/.julia/packages/Turing/gwHwZ/src/mcmc/abstractmcmc.jl:70\n [35] top-level scope\n @ ~/work/docs/docs/usage/troubleshooting/index.qmd:126\n\n\n\nThe problem here is the line x[1] = a. When the log probability density of the model is calculated, a is sampled from a normal distribution and is thus a Float64; however, when ForwardDiff calculates the gradient of the log density, a is a ForwardDiff.Dual object. However, x is always a Vector{Float64}, and the call x[1] = a attempts to insert a Dual object into a Vector{Float64}, which is not allowed.\n\n\n\n\n\n\nNote\n\n\n\nIn more detail: the basic premise of ForwardDiff is that functions have to accept Real parameters instead of Float64 (since Dual is a subtype of Real). Here, the line x[1] = a is equivalent to setindex!(x, a, 1), and although the method setindex!(::Vector{Float64}, ::Real, ...) does exist, it attempts to convert the Real into a Float64, which is where it fails.\n\n\nThere are two ways around this.\nFirstly, you could broaden the type of the container:\n\n@model function forwarddiff_working1()\n x = Real[0.0, 1.0]\n a ~ Normal()\n x[1] = a\n b ~ MvNormal(x, I)\nend\nsample(forwarddiff_working1(), NUTS(; adtype=AutoForwardDiff()), 10)\n\n\n┌ Info: Found initial step size\n└ ϵ = 1.6\n\n\n\n\nChains MCMC chain (10×17×1 Array{Float64, 3}):\n\nIterations = 6:1:15\nNumber of chains = 1\nSamples per chain = 10\nWall duration = 2.44 seconds\nCompute duration = 2.44 seconds\nparameters = a, b[1], b[2]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nThis is generally unfavourable because the Vector{Real} type contains an abstract type parameter. As a result, memory allocation is less efficient (because the compiler does not know the size of each vector’s elements). Furthermore, the compiler cannot infer the type of x[1], which can lead to type stability issues (to see this in action, run x = Real[0.0, 1.0]; @code_warntype x[1] in the Julia REPL).\nA better solution is to pass a type as a parameter to the model:\n\n@model function forwarddiff_working2(::Type{T}=Float64) where T\n x = T[0.0, 1.0]\n a ~ Normal()\n x[1] = a\n b ~ MvNormal(x, I)\nend\nsample(forwarddiff_working2(), NUTS(; adtype=AutoForwardDiff()), 10)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.8\n\n\n\n\nChains MCMC chain (10×17×1 Array{Float64, 3}):\n\nIterations = 6:1:15\nNumber of chains = 1\nSamples per chain = 10\nWall duration = 1.35 seconds\nCompute duration = 1.35 seconds\nparameters = a, b[1], b[2]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nAlternatively, you can use a different AD backend such as Mooncake.jl which does not rely on dual numbers.", "crumbs": [ "Get Started", "User Guide", "Troubleshooting" ] }, { "objectID": "usage/troubleshooting/index.html#growablearray-warnings", "href": "usage/troubleshooting/index.html#growablearray-warnings", "title": "Troubleshooting", "section": "GrowableArray warnings", "text": "GrowableArray warnings\n\n\n\n\n\n\nWarningThis section refers to a future version of DynamicPPL.jl\n\n\n\nThis warning refers to one that is in the upcoming release of DynamicPPL v0.40. They are not currently available in released versions of DynamicPPL.jl and Turing.jl.\n\n\n\nusing DynamicPPL\nif pkgversion(DynamicPPL) >= v\"0.40\"\n error(\"This page needs to be updated\")\nend\n\n\nReturning a Base.Array with a presumed size based on the indices used to set values; but this may not be the actual shape or size of the actual AbstractArray that was inside the DynamicPPL model. You should inspect the returned result to make sure that it has the correct value.\n\nThis warning is seen when using a VarNamedTuple — a mapping of VarNames to values — that contains indexed variables (such as x[1]) but does not know what the type of x is.\nGenerally, this warning is likely to occur when you have provided initial parameters or conditioning values as a Dict{VarName}, or a VarNamedTuple without template information. To fix this, it is recommended that you supply a VarNamedTuple with template information. That is, instead of\n\nusing DynamicPPL\n\nDict(@varname(x[1]) => 1.0, @varname(x[2]) => 2.0)\n\nDict{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}, Float64} with 2 entries:\n x[1] => 1.0\n x[2] => 2.0\n\n\nor\n@vnt begin\n x[1] = 1.0\n x[2] = 2.0\nend\nyou should use\n@vnt begin\n @template x = Vector{Float64}(undef, 2)\n x[1] = 1.0\n x[2] = 2.0\nend\nwhere the template Vector{Float64}(undef, 2) informs DynamicPPL of the type and size of x that will be used inside the model; i.e., your model looks something like\n\n@model function mymodel()\n x = Vector{Float64}(undef, 2)\n x[1] ~ Normal()\n x[2] ~ Normal()\n return nothing\nend\n\nmymodel (generic function with 2 methods)\n\n\nPlease see the VarNamedTuple documentation page for more information about what this means.", "crumbs": [ "Get Started", "User Guide", "Troubleshooting" ] }, { "objectID": "usage/probability-interface/index.html", "href": "usage/probability-interface/index.html", "title": "Querying Model Probabilities", "section": "", "text": "The easiest way to manipulate and query Turing models is via the DynamicPPL probability interface.\nLet’s use a simple model of normally-distributed data as an example.\nusing Turing\nusing DynamicPPL\nusing Random\n\n@model function gdemo(n)\n μ ~ Normal(0, 1)\n x ~ MvNormal(fill(μ, n), I)\nend\n\ngdemo (generic function with 2 methods)\nWe generate some data using μ = 0:\nRandom.seed!(1776)\ndataset = randn(100)\ndataset[1:5]\n\n5-element Vector{Float64}:\n 0.8488780584442736\n -0.31936138249336765\n -1.3982098801744465\n -0.05198933163879332\n -1.1465116601038348", "crumbs": [ "Get Started", "User Guide", "Querying Model Probabilities" ] }, { "objectID": "usage/probability-interface/index.html#conditioning-and-deconditioning", "href": "usage/probability-interface/index.html#conditioning-and-deconditioning", "title": "Querying Model Probabilities", "section": "Conditioning and Deconditioning", "text": "Conditioning and Deconditioning\nBayesian models can be transformed with two main operations, conditioning and deconditioning (also known as marginalisation). Conditioning takes a variable and fixes its value as known. We do this by passing a model and a collection of conditioned variables to |, or its alias, condition:\n\n# (equivalently)\n# conditioned_model = condition(gdemo(length(dataset)), (x=dataset, μ=0))\nconditioned_model = gdemo(length(dataset)) | (x=dataset, μ=0)\n\nModel{typeof(gdemo), (:n,), (), (), Tuple{Int64}, Tuple{}, DynamicPPL.ConditionContext{@NamedTuple{x::Vector{Float64}, μ::Int64}, DefaultContext}, false}(gdemo, (n = 100,), NamedTuple(), ConditionContext((x = [0.8488780584442736, -0.31936138249336765, -1.3982098801744465, -0.05198933163879332, -1.1465116601038348, -0.6306168227545849, 0.6862766694322289, -0.5485073478947856, -0.17212004616875684, 1.2883226251958486, -0.13661316034377538, 2.4316115122026973, 0.2251319215717449, -0.5115708179083417, -0.7810712258995324, -1.0191704692490737, 1.1210038448250719, -1.6944509713762377, -0.27314823183454695, 0.25273963222687423, 1.3914215917992434, 0.7525340831125464, 0.847154387311101, -0.7130402796655171, 0.2983575202861233, -0.1785631526879386, 0.08659477535701691, -0.5167265137098563, 2.111309740316035, 0.3957655443124509, -0.0804390853521051, 1.255042471667049, -0.07882822403959532, 1.2261373761992618, 0.43953618247769816, -0.40640013183427787, -0.6868635949523503, 1.7380713294668497, 0.13685965156352295, 0.1485185624825999, -0.7798816720822024, 2.2595105995080846, -0.13609014938597142, 0.22785777205259913, -2.1005250433485725, 0.44205288222935385, -1.238456637875994, -2.3727125492433427, -0.24406624959402184, -0.04488042525902438, 0.27510026183444175, 0.42472846594528796, 1.0337924022589282, 0.9126364433535069, -0.9006583845907805, 0.8665471057463393, 1.4924737539852484, 1.2886591566091432, 1.037264411147446, 1.4731954133339449, -0.31874662373651885, 1.2255399151799211, -1.6642044048811695, -0.5717328092786154, -1.2700237196779645, 0.5748199649058684, 0.16467729820692942, -1.195290550625328, -0.37133526877621703, -0.3018979982049836, -2.0183406292097397, -0.9588803575112745, 0.7177183994733006, -1.0133440177662316, -1.0881357990941283, 1.0487446580734279, 2.627227367991459, -1.59963908284846, -0.3122512299247273, -1.0265333654194488, 0.5557085182114885, -0.3206725445321106, -1.4314746067673778, 1.5740113510560039, -0.6566477752702335, 0.31342313477927125, 0.33135361418686027, -1.0489180508346863, -0.2670759024309527, 0.4683952221006179, 0.04918061587657951, 1.239814741442417, 2.2239462179369296, 1.8507671783064434, 1.756319462015174, -0.6577450354719728, 2.2795431083561626, -0.492273906928334, 0.7045614632761499, 0.11260553216111485], μ = 0), DynamicPPL.DefaultContext()))\n\n\nThis operation can be reversed by applying decondition:\n\noriginal_model = decondition(conditioned_model)\n\nModel{typeof(gdemo), (:n,), (), (), Tuple{Int64}, Tuple{}, DefaultContext, false}(gdemo, (n = 100,), NamedTuple(), DefaultContext())\n\n\nWe can also decondition only some of the variables:\n\npartially_conditioned = decondition(conditioned_model, :μ)\n\nModel{typeof(gdemo), (:n,), (), (), Tuple{Int64}, Tuple{}, DynamicPPL.ConditionContext{@NamedTuple{x::Vector{Float64}}, DefaultContext}, false}(gdemo, (n = 100,), NamedTuple(), ConditionContext((x = [0.8488780584442736, -0.31936138249336765, -1.3982098801744465, -0.05198933163879332, -1.1465116601038348, -0.6306168227545849, 0.6862766694322289, -0.5485073478947856, -0.17212004616875684, 1.2883226251958486, -0.13661316034377538, 2.4316115122026973, 0.2251319215717449, -0.5115708179083417, -0.7810712258995324, -1.0191704692490737, 1.1210038448250719, -1.6944509713762377, -0.27314823183454695, 0.25273963222687423, 1.3914215917992434, 0.7525340831125464, 0.847154387311101, -0.7130402796655171, 0.2983575202861233, -0.1785631526879386, 0.08659477535701691, -0.5167265137098563, 2.111309740316035, 0.3957655443124509, -0.0804390853521051, 1.255042471667049, -0.07882822403959532, 1.2261373761992618, 0.43953618247769816, -0.40640013183427787, -0.6868635949523503, 1.7380713294668497, 0.13685965156352295, 0.1485185624825999, -0.7798816720822024, 2.2595105995080846, -0.13609014938597142, 0.22785777205259913, -2.1005250433485725, 0.44205288222935385, -1.238456637875994, -2.3727125492433427, -0.24406624959402184, -0.04488042525902438, 0.27510026183444175, 0.42472846594528796, 1.0337924022589282, 0.9126364433535069, -0.9006583845907805, 0.8665471057463393, 1.4924737539852484, 1.2886591566091432, 1.037264411147446, 1.4731954133339449, -0.31874662373651885, 1.2255399151799211, -1.6642044048811695, -0.5717328092786154, -1.2700237196779645, 0.5748199649058684, 0.16467729820692942, -1.195290550625328, -0.37133526877621703, -0.3018979982049836, -2.0183406292097397, -0.9588803575112745, 0.7177183994733006, -1.0133440177662316, -1.0881357990941283, 1.0487446580734279, 2.627227367991459, -1.59963908284846, -0.3122512299247273, -1.0265333654194488, 0.5557085182114885, -0.3206725445321106, -1.4314746067673778, 1.5740113510560039, -0.6566477752702335, 0.31342313477927125, 0.33135361418686027, -1.0489180508346863, -0.2670759024309527, 0.4683952221006179, 0.04918061587657951, 1.239814741442417, 2.2239462179369296, 1.8507671783064434, 1.756319462015174, -0.6577450354719728, 2.2795431083561626, -0.492273906928334, 0.7045614632761499, 0.11260553216111485],), DynamicPPL.DefaultContext()))\n\n\nWe can see which of the variables in a model have been conditioned with DynamicPPL.conditioned:\n\nDynamicPPL.conditioned(partially_conditioned)\n\n(x = [0.8488780584442736, -0.31936138249336765, -1.3982098801744465, -0.05198933163879332, -1.1465116601038348, -0.6306168227545849, 0.6862766694322289, -0.5485073478947856, -0.17212004616875684, 1.2883226251958486 … 0.04918061587657951, 1.239814741442417, 2.2239462179369296, 1.8507671783064434, 1.756319462015174, -0.6577450354719728, 2.2795431083561626, -0.492273906928334, 0.7045614632761499, 0.11260553216111485],)\n\n\n\n\n\n\n\n\nNote\n\n\n\nSometimes it is helpful to define convenience functions for conditioning on some variable(s). For instance, in this example we might want to define a version of gdemo that conditions on some observations of x:\ngdemo(x::AbstractVector{<:Real}) = gdemo(length(x)) | (; x)\nFor illustrative purposes, however, we do not use this function in the examples below.", "crumbs": [ "Get Started", "User Guide", "Querying Model Probabilities" ] }, { "objectID": "usage/probability-interface/index.html#probabilities-and-densities", "href": "usage/probability-interface/index.html#probabilities-and-densities", "title": "Querying Model Probabilities", "section": "Probabilities and Densities", "text": "Probabilities and Densities\nWe often want to calculate the (unnormalised) probability density for an event. This probability might be a prior, a likelihood, or a posterior (joint) density. DynamicPPL provides convenient functions for this. To begin, let’s define a model gdemo, condition it on a dataset, and draw a sample. The returned sample only contains μ, since the value of x has already been fixed:\n\nmodel = gdemo(length(dataset)) | (x=dataset,)\n\nRandom.seed!(124)\nsample = rand(model)\n\n(μ = -0.6680014719649068,)\n\n\nWe can then calculate the joint probability of a set of samples (here drawn from the prior) with logjoint.\n\nlogjoint(model, sample)\n\n-181.7247437162069\n\n\nFor models with many variables rand(model) can be prohibitively slow since it returns a NamedTuple of samples from the prior distribution of the unconditioned variables. We recommend working with samples of type DataStructures.OrderedDict in this case (which Turing re-exports, so can be used directly):\n\nRandom.seed!(124)\nsample_dict = rand(OrderedDict, model)\n\nOrderedDict{VarName, Any} with 1 entry:\n μ => -0.668001\n\n\nlogjoint can also be used on this sample:\n\nlogjoint(model, sample_dict)\n\n-181.7247437162069\n\n\nThe prior probability and the likelihood of a set of samples can be calculated with the functions logprior and loglikelihood respectively. The log joint probability is the sum of these two quantities:\n\nlogjoint(model, sample) ≈ loglikelihood(model, sample) + logprior(model, sample)\n\ntrue\n\n\n\nlogjoint(model, sample_dict) ≈ loglikelihood(model, sample_dict) + logprior(model, sample_dict)\n\ntrue", "crumbs": [ "Get Started", "User Guide", "Querying Model Probabilities" ] }, { "objectID": "usage/probability-interface/index.html#example-cross-validation", "href": "usage/probability-interface/index.html#example-cross-validation", "title": "Querying Model Probabilities", "section": "Example: Cross-validation", "text": "Example: Cross-validation\nTo give an example of the probability interface in use, we can use it to estimate the performance of our model using cross-validation. In cross-validation, we split the dataset into several equal parts. Then, we choose one of these sets to serve as the validation set. Here, we measure fit using the cross entropy (Bayes loss).1 (For the sake of simplicity, in the following code, we enforce that nfolds must divide the number of data points. For a more competent implementation, see MLUtils.jl.)\n\n# Calculate the train/validation splits across `nfolds` partitions, assume `length(dataset)` divides `nfolds`\nfunction kfolds(dataset::Array{<:Real}, nfolds::Int)\n fold_size, remaining = divrem(length(dataset), nfolds)\n if remaining != 0\n error(\"The number of folds must divide the number of data points.\")\n end\n first_idx = firstindex(dataset)\n last_idx = lastindex(dataset)\n splits = map(0:(nfolds - 1)) do i\n start_idx = first_idx + i * fold_size\n end_idx = start_idx + fold_size\n train_set_indices = [first_idx:(start_idx - 1); end_idx:last_idx]\n return (view(dataset, train_set_indices), view(dataset, start_idx:(end_idx - 1)))\n end\n return splits\nend\n\nfunction cross_val(\n dataset::Vector{<:Real};\n nfolds::Int=5,\n nsamples::Int=1_000,\n rng::Random.AbstractRNG=Random.default_rng(),\n)\n # Initialize `loss` in a way such that the loop below does not change its type\n model = gdemo(1) | (x=[first(dataset)],)\n loss = zero(logjoint(model, rand(rng, model)))\n\n for (train, validation) in kfolds(dataset, nfolds)\n # First, we train the model on the training set, i.e., we obtain samples from the posterior.\n # For normally-distributed data, the posterior can be computed in closed form.\n # For general models, however, typically samples will be generated using MCMC with Turing.\n posterior = Normal(mean(train), 1)\n samples = rand(rng, posterior, nsamples)\n\n # Evaluation on the validation set.\n validation_model = gdemo(length(validation)) | (x=validation,)\n loss += sum(samples) do sample\n logjoint(validation_model, (μ=sample,))\n end\n end\n\n return loss\nend\n\ncross_val(dataset)\n\n-212760.30282411768", "crumbs": [ "Get Started", "User Guide", "Querying Model Probabilities" ] }, { "objectID": "usage/probability-interface/index.html#footnotes", "href": "usage/probability-interface/index.html#footnotes", "title": "Querying Model Probabilities", "section": "Footnotes", "text": "Footnotes\n\n\nSee ParetoSmooth.jl for a faster and more accurate implementation of cross-validation than the one provided here.↩︎", "crumbs": [ "Get Started", "User Guide", "Querying Model Probabilities" ] }, { "objectID": "versions.html", "href": "versions.html", "title": "Latest Version", "section": "", "text": "Latest Version\n\n\n\nv0.42\nDocumentation\nChangelog\n\n\n\n\n\nPrevious Versions\n\n\n\nv0.41\nDocumentation\n\n\nv0.40\nDocumentation\n\n\n\n\n\nArchived Versions\nDocumentation for archived versions is available on our deprecated documentation site.\n\n\n\nv0.31\nDocumentation\n\n\nv0.30\nDocumentation\n\n\nv0.29\nDocumentation\n\n\nv0.28\nDocumentation\n\n\nv0.27\nDocumentation\n\n\nv0.26\nDocumentation\n\n\nv0.25\nDocumentation\n\n\nv0.24\nDocumentation\n\n\n\n\n\n\n\n Back to top" }, { "objectID": "developers/inference/implementing-samplers/index.html", "href": "developers/inference/implementing-samplers/index.html", "title": "Implementing Samplers", "section": "", "text": "In this tutorial, we’ll go through step-by-step how to implement a “simple” sampler in AbstractMCMC.jl in such a way that it can be easily applied to Turing.jl models.\nIn particular, we’re going to implement a version of Metropolis-adjusted Langevin (MALA).\nNote that we will implement this sampler in the AbstractMCMC.jl framework, completely “ignoring” Turing.jl until the very end of the tutorial, at which point we’ll use a single line of code to make the resulting sampler available to Turing.jl. This is to really drive home the point that one can implement samplers in a way that is accessible to all of Turing.jl’s users without having to use Turing.jl yourself.", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#quick-overview-of-mala", "href": "developers/inference/implementing-samplers/index.html#quick-overview-of-mala", "title": "Implementing Samplers", "section": "Quick overview of MALA", "text": "Quick overview of MALA\nWe can view MALA as a single step of the leapfrog integrator with resampling of momentum \\(p\\) at every step.1 To make that statement a bit more concrete, we first define the extended target \\(\\bar{\\gamma}(x, p)\\) as\n\\[\\begin{equation*}\n\\log \\bar{\\gamma}(x, p) \\propto \\log \\gamma(x) + \\log \\gamma_{\\mathcal{N}(0, M)}(p)\n\\end{equation*}\\]\nwhere \\(\\gamma_{\\mathcal{N}(0, M)}\\) denotes the density for a zero-centred Gaussian with covariance matrix \\(M\\). We then consider targeting this joint distribution over both \\(x\\) and \\(p\\) as follows. First we define the map\n\\[\\begin{equation*}\n\\begin{split}\n L_{\\epsilon}: \\quad & \\mathbb{R}^d \\times \\mathbb{R}^d \\to \\mathbb{R}^d \\times \\mathbb{R}^d \\\\\n & (x, p) \\mapsto (\\tilde{x}, \\tilde{p}) := L_{\\epsilon}(x, p)\n\\end{split}\n\\end{equation*}\\]\nas\n\\[\\begin{equation*}\n\\begin{split}\n p_{1 / 2} &:= p + \\frac{\\epsilon}{2} \\nabla \\log \\gamma(x) \\\\\n \\tilde{x} &:= x + \\epsilon M^{-1} p_{1 /2 } \\\\\n p_1 &:= p_{1 / 2} + \\frac{\\epsilon}{2} \\nabla \\log \\gamma(\\tilde{x}) \\\\\n \\tilde{p} &:= - p_1\n\\end{split}\n\\end{equation*}\\]\nThis might be familiar for some readers as a single step of the Leapfrog integrator. We then define the MALA kernel as follows: given the current iterate \\(x_i\\), we sample the next iterate \\(x_{i + 1}\\) as\n\\[\\begin{equation*}\n\\begin{split}\n p &\\sim \\mathcal{N}(0, M) \\\\\n (\\tilde{x}, \\tilde{p}) &:= L_{\\epsilon}(x_i, p) \\\\\n \\alpha &:= \\min \\left\\{ 1, \\frac{\\bar{\\gamma}(\\tilde{x}, \\tilde{p})}{\\bar{\\gamma}(x_i, p)} \\right\\} \\\\\n x_{i + 1} &:=\n \\begin{cases}\n \\tilde{x} \\quad & \\text{ with prob. } \\alpha \\\\\n x_i \\quad & \\text{ with prob. } 1 - \\alpha\n \\end{cases}\n\\end{split}\n\\end{equation*}\\]\ni.e. we accept the proposal \\(\\tilde{x}\\) with probability \\(\\alpha\\) and reject it, thus sticking with our current iterate, with probability \\(1 - \\alpha\\).", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#what-we-need-from-a-model-logdensityproblems.jl", "href": "developers/inference/implementing-samplers/index.html#what-we-need-from-a-model-logdensityproblems.jl", "title": "Implementing Samplers", "section": "What we need from a model: LogDensityProblems.jl", "text": "What we need from a model: LogDensityProblems.jl\nThere are a few things we need from the “target” / “model” / density that we want to sample from:\n\nWe need access to log-density evaluations \\(\\log \\gamma(x)\\) so we can compute the acceptance ratio involving \\(\\log \\bar{\\gamma}(x, p)\\).\nWe need access to log-density gradients \\(\\nabla \\log \\gamma(x)\\) so we can compute the Leapfrog steps \\(L_{\\epsilon}(x, p)\\).\nWe also need access to the “size” of the model so we can determine the size of \\(M\\).\n\nLuckily for us, there is a package called LogDensityProblems.jl which provides an interface for exactly this!\nTo demonstrate how one can implement the “LogDensityProblems.jl interface”2 we will use a simple Gaussian model as an example:\n\nusing LogDensityProblems: LogDensityProblems\n\n# Let's define some type that represents the model.\nstruct IsotropicNormalModel{M<:AbstractVector{<:Real}}\n \"mean of the isotropic Gaussian\"\n mean::M\nend\n\n# Specifies what input length the model expects.\nLogDensityProblems.dimension(model::IsotropicNormalModel) = length(model.mean)\n# Implementation of the log-density evaluation of the model.\nfunction LogDensityProblems.logdensity(model::IsotropicNormalModel, x::AbstractVector{<:Real})\n return - sum(abs2, x .- model.mean) / 2\nend\n\nThis gives us all of the properties we want for our MALA sampler with the exception of the computation of the gradient \\(\\nabla \\log \\gamma(x)\\). There is the method LogDensityProblems.logdensity_and_gradient which should return a 2-tuple where the first entry is the evaluation of the logdensity \\(\\log \\gamma(x)\\) and the second entry is the gradient \\(\\nabla \\log \\gamma(x)\\).\nThere are two ways to “implement” this method: 1) we implement it by hand, which is feasible in the case of our IsotropicNormalModel, or 2) we defer the implementation of this to an automatic differentiation backend.\nTo implement it by hand we can simply do\n\n# Tell LogDensityProblems.jl that first-order, i.e. gradient information, is available.\nLogDensityProblems.capabilities(::Type{<:IsotropicNormalModel}) = LogDensityProblems.LogDensityOrder{1}()\n\n# Implement `logdensity_and_gradient`.\nfunction LogDensityProblems.logdensity_and_gradient(model::IsotropicNormalModel, x)\n logγ_x = LogDensityProblems.logdensity(model, x)\n ∇logγ_x = -x .* (x - model.mean)\n return logγ_x, ∇logγ_x\nend\n\nLet’s just try it out:\n\n# Instantiate the problem.\nmodel = IsotropicNormalModel([-5., 0., 5.])\n# Create some example input that we can test on.\nx_example = randn(LogDensityProblems.dimension(model))\n# Evaluate!\nLogDensityProblems.logdensity(model, x_example)\n\n-28.168790858005828\n\n\nTo defer it to an automatic differentiation backend, we can do\n\n# Tell LogDensityProblems.jl we only have access to 0-th order information.\nLogDensityProblems.capabilities(::Type{<:IsotropicNormalModel}) = LogDensityProblems.LogDensityOrder{0}()\n\n# Use `LogDensityProblemsAD`'s `ADgradient` in combination with some AD backend to implement `logdensity_and_gradient`.\nusing LogDensityProblemsAD, ADTypes, ForwardDiff\nmodel_with_grad = ADgradient(AutoForwardDiff(), model)\nLogDensityProblems.logdensity(model_with_grad, x_example)\n\n-28.168790858005828\n\n\nWe’ll continue with the second approach in this tutorial since this is typically what one does in practice, because there are better hobbies to spend time on than deriving gradients by hand.\nAt this point, one might wonder how we’re going to tie this back to Turing.jl in the end. Effectively, when working with inference methods that only require log-density evaluations and / or higher-order information of the log-density, Turing.jl actually converts the user-provided Model into an object implementing the above methods for LogDensityProblems.jl. As a result, most samplers provided by Turing.jl are actually implemented to work with LogDensityProblems.jl, enabling their use both within Turing.jl and outside of Turing.jl! Moreover, there exists similar conversions for Stan through BridgeStan and StanLogDensityProblems.jl, which means that a sampler supporting the LogDensityProblems.jl interface can easily be used on both Turing.jl and Stan models (in addition to user-provided models, as our IsotropicNormalModel above)!", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#implementing-mala-in-abstractmcmc.jl", "href": "developers/inference/implementing-samplers/index.html#implementing-mala-in-abstractmcmc.jl", "title": "Implementing Samplers", "section": "Implementing MALA in AbstractMCMC.jl", "text": "Implementing MALA in AbstractMCMC.jl\nNow that we’ve established that a model implementing the LogDensityProblems.jl interface provides us with all the information we need from \\(\\log \\gamma(x)\\), we can address the question: given an object that implements the LogDensityProblems.jl interface, how can we define a sampler for it?\nWe’re going to do this by making our sampler a sub-type of AbstractMCMC.AbstractSampler in addition to implementing a few methods from AbstractMCMC.jl. Why? Because it gets us a lot of functionality for free, as we will see later.\nMoreover, AbstractMCMC.jl provides a very natural interface for MCMC algorithms.\nFirst, we’ll define our MALA type\n\nusing AbstractMCMC\n\nstruct MALA{T,A} <: AbstractMCMC.AbstractSampler\n \"stepsize used in the leapfrog step\"\n ϵ_init::T\n \"covariance matrix used for the momentum\"\n M_init::A\nend\n\nNotice how we’ve added the suffix _init to both the stepsize and the covariance matrix. We’ve done this because a AbstractMCMC.AbstractSampler should be immutable. Of course there might be many scenarios where we want to allow something like the stepsize and / or the covariance matrix to vary between iterations, e.g. during the burn-in / adaptation phase of the sampling process we might want to adjust the parameters using statistics computed from these initial iterations. But information which can change between iterations should not go in the sampler itself! Instead, this information should go in the sampler state.\nThe sampler state should at the very least contain all the necessary information to perform the next MCMC iteration, but usually contains further information, e.g. quantities and statistics useful for evaluating whether the sampler has converged.\nWe will use the following sampler state for our MALA sampler:\n\nstruct MALAState{A<:AbstractVector{<:Real}}\n \"current position\"\n x::A\n \"whether the proposal was accepted\"\n accepted::Bool\nend\n\nIf we also wanted to adapt the parameters of our MALA, e.g. alter the stepsize depending on acceptance rates, we could also put ϵ in the state, but for now we’ll keep things simple.\nMoreover, we also want a sample type, which is a type meant for “public consumption”, i.e. the end-user. This is generally going to contain a subset of the information present in the state. But in such a simple scenario as this, we similarly only have a AbstractVector{<:Real}:\n\nstruct MALASample{A<:AbstractVector{<:Real}}\n \"current position\"\n x::A\nend\n\nWe currently have three things:\n\nA AbstractMCMC.AbstractSampler implementation called MALA.\nA state MALAState for our sampler MALA.\nA sample MALASample for our sampler MALA.\n\nThat means that we’re ready to implement the only thing that really matters: AbstractMCMC.step.\nAbstractMCMC.step defines the MCMC iteration of our MALA given the current MALAState. Specifically, the signature of the function is as follows:\nfunction AbstractMCMC.step(\n # The RNG to ensure reproducibility.\n rng::Random.AbstractRNG,\n # The model that defines our target.\n model::AbstractMCMC.AbstractModel,\n # The sampler for which we're taking a `step`.\n sampler::AbstractMCMC.AbstractSampler,\n # The current sampler `state`.\n state;\n # Additional keyword arguments that we may or may not need.\n kwargs...\n)\nMoreover, there is a specific AbstractMCMC.AbstractModel which is used to indicate that the model that is provided implements the LogDensityProblems.jl interface: AbstractMCMC.LogDensityModel.\nSince, as we discussed earlier, in our case we’re indeed going to work with types that support the LogDensityProblems.jl interface, we’ll define AbstractMCMC.step for such a AbstractMCMC.LogDensityModel.\nNote that AbstractMCMC.LogDensityModel has no other purpose; it has a single field called logdensity, and it does nothing else. But by wrapping the model in AbstractMCMC.LogDensityModel, it allows samplers that want to work with LogDensityProblems.jl to define their AbstractMCMC.step on this type without running into method ambiguities.\nAll in all, that means that the signature for our AbstractMCMC.step is going to be the following:\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n # `LogDensityModel` so we know we're working with LogDensityProblems.jl model.\n model::AbstractMCMC.LogDensityModel,\n # Our sampler.\n sampler::MALA,\n # Our sampler state.\n state::MALAState;\n kwargs...\n)\nGreat! Now let’s actually implement the full AbstractMCMC.step for our MALA.\nLet’s remind ourselves what we’re going to do:\n\nSample a new momentum \\(p\\).\nCompute the log-density of the extended target \\(\\log \\bar{\\gamma}(x, p)\\).\nTake a single leapfrog step \\((\\tilde{x}, \\tilde{p}) = L_{\\epsilon}(x, p)\\).\nAccept or reject the proposed \\((\\tilde{x}, \\tilde{p})\\).\n\nAll in all, this results in the following:\n\nusing Random: Random\nusing Distributions # so we get the `MvNormal`\n\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n model_wrapper::AbstractMCMC.LogDensityModel,\n sampler::MALA,\n state::MALAState;\n kwargs...\n)\n # Extract the wrapped model which implements LogDensityProblems.jl.\n model = model_wrapper.logdensity\n # Let's just extract the sampler parameters to make our lives easier.\n ϵ = sampler.ϵ_init\n M = sampler.M_init\n # Extract the current parameters.\n x = state.x\n # Sample the momentum.\n p_dist = MvNormal(zeros(LogDensityProblems.dimension(model)), M)\n p = rand(rng, p_dist)\n # Propose using a single leapfrog step.\n x̃, p̃ = leapfrog_step(model, x, p, ϵ, M)\n # Accept or reject proposal.\n logp = LogDensityProblems.logdensity(model, x) + logpdf(p_dist, p)\n logp̃ = LogDensityProblems.logdensity(model, x̃) + logpdf(p_dist, p̃)\n logα = logp̃ - logp\n state_new = if log(rand(rng)) < logα\n # Accept.\n MALAState(x̃, true)\n else\n # Reject.\n MALAState(x, false)\n end\n # Return the \"sample\" and the sampler state.\n return MALASample(state_new.x), state_new\nend\n\nFairly straight-forward.\nOf course, we haven’t defined the leapfrog_step method yet, so let’s do that:\n\nfunction leapfrog_step(model, x, p, ϵ, M)\n # Update momentum `p` using \"position\" `x`.\n ∇logγ_x = last(LogDensityProblems.logdensity_and_gradient(model, x))\n p1 = p + (ϵ / 2) .* ∇logγ_x\n # Update the \"position\" `x` using momentum `p1`.\n x̃ = x + ϵ .* (M \\ p1)\n # Update momentum `p1` using position `x̃`\n ∇logγ_x̃ = last(LogDensityProblems.logdensity_and_gradient(model, x̃))\n p2 = p1 + (ϵ / 2) .* ∇logγ_x̃\n # Flip momentum `p2`.\n p̃ = -p2\n return x̃, p̃\nend\n\nleapfrog_step (generic function with 1 method)\n\n\nWith all of this, we’re technically ready to sample!\n\nusing Random, LinearAlgebra\n\nrng = Random.default_rng()\nsampler = MALA(1, I)\nstate = MALAState(zeros(LogDensityProblems.dimension(model)), true)\n\nx_next, state_next = AbstractMCMC.step(\n rng,\n AbstractMCMC.LogDensityModel(model),\n sampler,\n state\n)\n\n(MALASample{Vector{Float64}}([0.0, 0.0, 0.0]), MALAState{Vector{Float64}}([0.0, 0.0, 0.0], false))\n\n\nGreat, it works!\nAnd I promised we would get quite some functionality for free if we implemented AbstractMCMC.step, and so we can now simply call sample to perform standard MCMC sampling:\n\n# Perform 1000 iterations with our `MALA` sampler.\nsamples = sample(model_with_grad, sampler, 10_000; initial_state=state, progress=false)\n# Concatenate into a matrix.\nsamples_matrix = stack(sample -> sample.x, samples)\n\n3×10000 Matrix{Float64}:\n -1.67252 -3.81411 -4.62147 … -3.30707 -4.63974 -4.63974\n 0.875303 0.0883427 1.27871 0.269654 -0.253632 -0.253632\n 3.58251 4.08553 4.80916 4.60535 4.52596 4.52596\n\n\n\n# Compute the marginal means and standard deviations.\nhcat(mean(samples_matrix; dims=2), std(samples_matrix; dims=2))\n\n3×2 Matrix{Float64}:\n -4.98855 0.999406\n -0.00329862 1.0096\n 4.97488 1.01242\n\n\nLet’s visualise the samples\n\nusing StatsPlots\nplot(transpose(samples_matrix[:, 1:10:end]), alpha=0.5, legend=false)\n\n\n\n\nLook at that! Things are working; amazin’.\nWe can also exploit AbstractMCMC.jl’s parallel sampling capabilities:\n\n# Run separate 4 chains for 10 000 iterations using threads to parallelize.\nnum_chains = 4\nsamples = sample(\n model_with_grad,\n sampler,\n MCMCThreads(),\n 10_000,\n num_chains;\n # Note we need to provide an initial state for every chain.\n initial_state=fill(state, num_chains),\n progress=false\n)\nsamples_array = stack(map(Base.Fix1(stack, sample -> sample.x), samples))\n\n3×10000×4 Array{Float64, 3}:\n[:, :, 1] =\n -1.58217 -3.33151 -3.15142 … -5.47863 -6.61153 -5.69779\n -0.142273 0.774179 0.161176 0.208187 0.877072 1.28301\n 1.99605 3.39636 4.10356 5.61721 5.19758 3.33529\n\n[:, :, 2] =\n -1.88265 -1.39037 -2.75676 … -4.60941 -4.60941 -5.13744\n -1.49305 -0.341701 0.059757 -0.0564321 -0.0564321 0.304737\n 2.28062 2.81969 5.45502 5.03587 5.03587 5.62103\n\n[:, :, 3] =\n -3.29735 -2.31687 -3.12642 -4.88533 … -5.27027 -6.81827 -6.46193\n -2.30501 -0.350151 -0.580339 -0.212002 -0.261682 0.374526 -1.01262\n 2.66778 4.92536 6.03951 3.97291 6.82846 6.55197 5.44195\n\n[:, :, 4] =\n -2.77513 -2.54917 -3.31506 -3.31506 … -4.63497 -3.24562 -2.42345\n 1.27932 1.92767 -0.402341 -0.402341 0.792634 0.795813 -1.19676\n 1.55639 2.21654 4.81564 4.81564 3.95741 4.68559 5.06683\n\n\nBut the fact that we have to provide the AbstractMCMC.sample call, etc. with an initial_state to get started is a bit annoying. We can avoid this by also defining a AbstractMCMC.step without the state argument:\n\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n model_wrapper::AbstractMCMC.LogDensityModel,\n ::MALA;\n # NOTE: No state provided!\n kwargs...\n)\n model = model_wrapper.logdensity\n # Let's just create the initial state by sampling using a Gaussian.\n x = randn(rng, LogDensityProblems.dimension(model))\n\n return MALASample(x), MALAState(x, true)\nend\n\nEquipped with this, we no longer need to provide the initial_state everywhere:\n\nsamples = sample(model_with_grad, sampler, 10_000; progress=false)\nsamples_matrix = stack(sample -> sample.x, samples)\nhcat(mean(samples_matrix; dims=2), std(samples_matrix; dims=2))\n\n3×2 Matrix{Float64}:\n -4.98637 1.00402\n 0.00155562 1.00473\n 4.99783 1.00079", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#using-our-sampler-with-turing.jl", "href": "developers/inference/implementing-samplers/index.html#using-our-sampler-with-turing.jl", "title": "Implementing Samplers", "section": "Using our sampler with Turing.jl", "text": "Using our sampler with Turing.jl\nAs we promised, all of this hassle of implementing our MALA sampler in a way that uses LogDensityProblems.jl and AbstractMCMC.jl gets us something more than just an “automatic” implementation of AbstractMCMC.sample.\nIt also enables use with Turing.jl through the externalsampler, but we need to do one final thing first: we need to tell Turing.jl how to extract a vector of parameters from the state returned in our implementation of AbstractMCMC.step. In our case, the state is a MALAState, so we just need the following line:\n\n# Overload the `getparams` method for our \"state\" type.\nAbstractMCMC.getparams(state::MALAState) = state.x\n\nOptionally, we can also implement AbstractMCMC.getstats which instead returns a NamedTuple of statistics about the current state. When sampling with Turing, these statistics will be included in the output chain.\n\nAbstractMCMC.getstats(state::MALAState) = (accepted=state.accepted,)\n\nAnd with that, we’re good to go!\n\n\n\n\n\n\nNote\n\n\n\nUp until Turing.jl v0.41, you would have needed to define Turing.Inference.getparams(::MALASample). This has been changed in favour of the AbstractMCMC interface, which means that you no longer need to depend on Turing.jl to implement an external sampler.\n\n\n\nusing Turing\n\n# Our previous model defined as a Turing.jl model.\n@model mvnormal_model() = x ~ MvNormal([-5., 0., 5.], I)\n# Instantiate our model.\nturing_model = mvnormal_model()\n# Call `sample` but now we're passing in a Turing.jl `model` and wrapping\n# our `MALA` sampler in the `externalsampler` to tell Turing.jl that the sampler\n# expects something that implements LogDensityProblems.jl.\nchain = sample(turing_model, externalsampler(sampler), 10_000; progress=false)\n\nChains MCMC chain (10000×7×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 2.68 seconds\nCompute duration = 2.68 seconds\nparameters = x[1], x[2], x[3]\ninternals = accepted, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nPretty neat, eh?\n\nModels with constrained parameters\nOne thing we’ve sort of glossed over in all of the above is that MALA, at least how we’ve implemented it, requires \\(x\\) to live in \\(\\mathbb{R}^d\\) for some \\(d > 0\\). If some of the parameters were in fact constrained, e.g. we were working with a Beta distribution which has support on the interval \\((0, 1)\\), not on \\(\\mathbb{R}^d\\), we could easily end up outside of the valid range \\((0, 1)\\).\n\n@model beta_model() = x ~ Beta(3, 3)\nturing_model = beta_model()\nchain = sample(turing_model, externalsampler(sampler), 10_000; progress=false)\n\nChains MCMC chain (10000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 1.32 seconds\nCompute duration = 1.32 seconds\nparameters = x\ninternals = accepted, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nBy default, Turing.jl avoids such a situation from occurring by transforming the constrained parameters to an unconstrained space before passing them to the externalsampler. If this is undesirable, you can pass the unconstrained keyword argument to externalsampler:\n\nchain_constrained = sample(turing_model, externalsampler(sampler; unconstrained=false), 10_000; progress=false)\n\nChains MCMC chain (10000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 0.39 seconds\nCompute duration = 0.39 seconds\nparameters = x\ninternals = accepted, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nIt turns out that this still sort of works because logpdf doesn’t error when evaluating outside of the support of the distribution, but instead returns -Inf:\n\nlogpdf(Beta(3, 3), 10.0)\n\n-Inf\n\n\nand so the samples that fall outside of the range are always rejected. But do notice how much worse all the diagnostics are, e.g. ess_tail is very poor compared to when we use unconstrained=true:\n\ness(chain)\n\n\nESS\n\n parameters ess ess_per_sec\n Symbol Float64 Float64\n\n x 3900.2271 2943.5676\n\n\n\n\n\n\ness(chain_constrained)\n\n\nESS\n\n parameters ess ess_per_sec\n Symbol Float64 Float64\n\n x 59.0796 151.8755\n\n\n\n\n\nMoreover, in more complex cases this won’t just result in a “nice” -Inf log-density value, but instead will error:\n\n@model function demo()\n σ² ~ truncated(Normal(), lower=0)\n # If we end up with negative values for `σ²`, the `Normal` will error.\n x ~ Normal(0, σ²)\nend\nsample(demo(), externalsampler(sampler; unconstrained=false), 10_000; progress=false)\n\n\nDomainError with -0.9867561686898247:\nNormal: the condition σ >= zero(σ) is not satisfied.\nStacktrace:\n [1] #371\n @ ~/.julia/packages/Distributions/xMnxM/src/univariate/continuous/normal.jl:37 [inlined]\n [2] check_args\n @ ~/.julia/packages/Distributions/xMnxM/src/utils.jl:89 [inlined]\n [3] #Normal#370\n @ ~/.julia/packages/Distributions/xMnxM/src/univariate/continuous/normal.jl:37 [inlined]\n [4] Normal\n @ ~/.julia/packages/Distributions/xMnxM/src/univariate/continuous/normal.jl:36 [inlined]\n [5] Normal\n @ ~/.julia/packages/Distributions/xMnxM/src/univariate/continuous/normal.jl:42 [inlined]\n [6] macro expansion\n @ ~/.julia/packages/DynamicPPL/oIycL/src/compiler.jl:612 [inlined]\n [7] demo\n @ ~/work/docs/docs/developers/inference/implementing-samplers/index.qmd:491 [inlined]\n [8] _evaluate!!\n @ ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:997 [inlined]\n [9] evaluate!!\n @ ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:983 [inlined]\n [10] init!!\n @ ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:938 [inlined]\n [11] init!!\n @ ~/.julia/packages/DynamicPPL/oIycL/src/model.jl:943 [inlined]\n [12] DynamicPPL.ParamsWithStats(param_vector::Vector{Float64}, ldf::LogDensityFunction{false, DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{2, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{σ²::DynamicPPL.RangeAndLinked, x::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{2, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{false}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{σ²::DynamicPPL.RangeAndLinked, x::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}, stats::@NamedTuple{accepted::Bool}; include_colon_eq::Bool, include_log_probs::Bool)\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/chains.jl:153\n [13] DynamicPPL.ParamsWithStats(param_vector::Vector{Float64}, ldf::LogDensityFunction{false, DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, AutoForwardDiff{2, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, typeof(DynamicPPL.getlogjoint_internal), @NamedTuple{σ²::DynamicPPL.RangeAndLinked, x::DynamicPPL.RangeAndLinked}, DifferentiationInterfaceForwardDiffExt.ForwardDiffGradientPrep{Tuple{typeof(DynamicPPL.logdensity_at), AutoForwardDiff{2, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}, Vector{Float64}, Tuple{DifferentiationInterface.Constant{Val{false}}, DifferentiationInterface.Constant{DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}}, DifferentiationInterface.Constant{typeof(DynamicPPL.getlogjoint_internal)}, DifferentiationInterface.Constant{@NamedTuple{σ²::DynamicPPL.RangeAndLinked, x::DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{Dict{AbstractPPL.VarName, DynamicPPL.RangeAndLinked}}, DifferentiationInterface.Constant{DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}}}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}}, NTuple{6, Nothing}}, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::DynamicPPL.LogPriorAccumulator{Float64}, LogJacobian::DynamicPPL.LogJacobianAccumulator{Float64}, LogLikelihood::DynamicPPL.LogLikelihoodAccumulator{Float64}}}}, stats::@NamedTuple{accepted::Bool})\n @ DynamicPPL ~/.julia/packages/DynamicPPL/oIycL/src/chains.jl:131\n [14] step(rng::TaskLocalRNG, model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, sampler_wrapper::Turing.Inference.ExternalSampler{false, MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}}; initial_state::Nothing, initial_params::InitFromPrior, kwargs::@Kwargs{})\n @ Turing.Inference ~/.julia/packages/Turing/gwHwZ/src/mcmc/external_sampler.jl:192\n [15] step\n @ ~/.julia/packages/Turing/gwHwZ/src/mcmc/external_sampler.jl:141 [inlined]\n [16] macro expansion\n @ ~/.julia/packages/AbstractMCMC/XBmfQ/src/sample.jl:188 [inlined]\n [17] (::AbstractMCMC.var\"#29#30\"{Nothing, Int64, Int64, Int64, UnionAll, Nothing, @Kwargs{initial_params::InitFromPrior}, TaskLocalRNG, DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, Turing.Inference.ExternalSampler{false, MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}}, Int64, Float64, Int64, Int64})()\n @ AbstractMCMC ~/.julia/packages/AbstractMCMC/XBmfQ/src/logging.jl:134\n [18] with_logstate(f::AbstractMCMC.var\"#29#30\"{Nothing, Int64, Int64, Int64, UnionAll, Nothing, @Kwargs{initial_params::InitFromPrior}, TaskLocalRNG, DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, Turing.Inference.ExternalSampler{false, MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}}, Int64, Float64, Int64, Int64}, logstate::Base.CoreLogging.LogState)\n @ Base.CoreLogging ./logging/logging.jl:524\n [19] with_logger(f::Function, logger::LoggingExtras.TeeLogger{Tuple{LoggingExtras.EarlyFilteredLogger{TerminalLoggers.TerminalLogger, AbstractMCMC.var\"#1#3\"{Module}}, LoggingExtras.EarlyFilteredLogger{Base.CoreLogging.ConsoleLogger, AbstractMCMC.var\"#2#4\"{Module}}}})\n @ Base.CoreLogging ./logging/logging.jl:635\n [20] with_progresslogger(f::Function, _module::Module, logger::Base.CoreLogging.ConsoleLogger)\n @ AbstractMCMC ~/.julia/packages/AbstractMCMC/XBmfQ/src/logging.jl:157\n [21] macro expansion\n @ ~/.julia/packages/AbstractMCMC/XBmfQ/src/logging.jl:133 [inlined]\n [22] mcmcsample(rng::TaskLocalRNG, model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, sampler::Turing.Inference.ExternalSampler{false, MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}}, N::Int64; progress::Bool, progressname::String, callback::Nothing, num_warmup::Int64, discard_initial::Int64, thinning::Int64, chain_type::Type, initial_state::Nothing, kwargs::@Kwargs{initial_params::InitFromPrior})\n @ AbstractMCMC ~/.julia/packages/AbstractMCMC/XBmfQ/src/sample.jl:168\n [23] sample(rng::TaskLocalRNG, model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext, false}, spl::Turing.Inference.ExternalSampler{false, MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}}, N::Int64; initial_params::InitFromPrior, check_model::Bool, chain_type::Type, kwargs::@Kwargs{progress::Bool})\n @ Turing.Inference ~/.julia/packages/Turing/gwHwZ/src/mcmc/abstractmcmc.jl:87\n [24] sample\n @ ~/.julia/packages/Turing/gwHwZ/src/mcmc/abstractmcmc.jl:76 [inlined]\n [25] #sample#1\n @ ~/.julia/packages/Turing/gwHwZ/src/mcmc/abstractmcmc.jl:73 [inlined]\n [26] top-level scope\n @ ~/work/docs/docs/developers/inference/implementing-samplers/index.qmd:493\n\n\n\nAs expected, we run into a DomainError at some point. This would not have happened while if we set unconstrained=true, letting Turing.jl transform the model to an unconstrained form behind the scenes, everything works as expected:\n\nsample(demo(), externalsampler(sampler; unconstrained=true), 10_000; progress=false)\n\nChains MCMC chain (10000×6×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 1.18 seconds\nCompute duration = 1.18 seconds\nparameters = σ², x\ninternals = accepted, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nIf you have implemented a sampler and you know for sure that your sampler cannot work with unconstrained parameters, you can disable the default behaviour by overloading the following method:\n# This cell isn't actually run; it's just a demonstration.\nAbstractMCMC.requires_unconstrained_space(::MALA) = false\nSimilarly, which automatic differentiation backend one should use can be specified through the adtype keyword argument too. For example, if we want to use ReverseDiff.jl instead of the default ForwardDiff.jl:\n\nimport ReverseDiff\n# Specify that we want to use `AutoReverseDiff`.\nsample(\n demo(),\n externalsampler(sampler; unconstrained=true, adtype=AutoReverseDiff()),\n 10_000;\n progress=false\n)\n\nChains MCMC chain (10000×6×1 Array{Float64, 3}):\n\nIterations = 1:1:10000\nNumber of chains = 1\nSamples per chain = 10000\nWall duration = 2.04 seconds\nCompute duration = 2.04 seconds\nparameters = σ², x\ninternals = accepted, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#summary", "href": "developers/inference/implementing-samplers/index.html#summary", "title": "Implementing Samplers", "section": "Summary", "text": "Summary\nAt this point it’s worth maybe reminding ourselves what we did and also why we did it:\n\nWe define our models in the LogDensityProblems.jl interface because it makes the sampler agnostic to how the underlying model is implemented.\nWe implement our sampler in the AbstractMCMC.jl interface, which just means that our sampler is a subtype of AbstractMCMC.AbstractSampler and we implement the MCMC transition in AbstractMCMC.step.\nPoints 1 and 2 makes it so our sampler can be used with a wide range of model implementations, amongst them being models implemented in both Turing.jl and Stan. This gives you, the inference implementer, a large collection of models to test your inference method on, in addition to allowing users of Turing.jl and Stan to try out your inference method with minimal effort.", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Implementing Samplers" ] }, { "objectID": "developers/inference/implementing-samplers/index.html#footnotes", "href": "developers/inference/implementing-samplers/index.html#footnotes", "title": "Implementing Samplers", "section": "Footnotes", "text": "Footnotes\n\n\nWe’re going with the leapfrog formulation because in a future version of this tutorial we’ll add a section extending this simple “baseline” MALA sampler to more complex versions. See issue #479 for progress on this.↩︎\nThere is no such thing as a proper interface in Julia (at least not officially), and so we use the word “interface” here to mean a few minimal methods that needs to be implemented by any type that we treat as a target model.↩︎", "crumbs": [ "Get Started", "Developers", "Inference in Detail", "Implementing Samplers" ] }, { "objectID": "developers/inference/abstractmcmc-turing/index.html", "href": "developers/inference/abstractmcmc-turing/index.html", "title": "How Turing Implements AbstractMCMC", "section": "", "text": "Prerequisite: Interface guide." }, { "objectID": "developers/inference/abstractmcmc-turing/index.html#introduction", "href": "developers/inference/abstractmcmc-turing/index.html#introduction", "title": "How Turing Implements AbstractMCMC", "section": "Introduction", "text": "Introduction\nConsider the following Turing, code block:\n\nusing Turing\n\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n return y ~ Normal(m, sqrt(s²))\nend\n\nmod = gdemo(1.5, 2)\nalg = IS()\nn_samples = 1000\n\nchn = sample(mod, alg, n_samples, progress=false)\n\nChains MCMC chain (1000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 2.6 seconds\nCompute duration = 2.6 seconds\nparameters = s², m\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nThe function sample is part of the AbstractMCMC interface. As explained in the interface guide, building a sampling method that can be used by sample consists of overloading the structs and functions in AbstractMCMC. The interface guide also gives a standalone example of their implementation, AdvancedMH.jl.\nTuring sampling methods (most of which are written here) also implement AbstractMCMC. Turing defines a particular architecture for AbstractMCMC implementations, which enables working with models defined by the @model macro, and uses DynamicPPL as a backend. The goal of this page is to describe this architecture, and how you would go about implementing your own sampling method in Turing, using Importance Sampling as an example. I don’t go into all the details: for instance, I don’t address selectors or parallelism.\nFirst, we explain how Importance Sampling works in the abstract. Consider the model defined in the first code block. Mathematically, it can be written:\n\\[\n\\begin{align*}\ns &\\sim \\text{InverseGamma}(2, 3), \\\\\nm &\\sim \\text{Normal}(0, \\sqrt{s}), \\\\\nx &\\sim \\text{Normal}(m, \\sqrt{s}), \\\\\ny &\\sim \\text{Normal}(m, \\sqrt{s}).\n\\end{align*}\n\\]\nThe latent variables are \\(s\\) and \\(m\\), the observed variables are \\(x\\) and \\(y\\). The model joint distribution \\(p(s,m,x,y)\\) decomposes into the prior \\(p(s,m)\\) and the likelihood \\(p(x,y \\mid s,m).\\) Since \\(x = 1.5\\) and \\(y = 2\\) are observed, the goal is to infer the posterior distribution \\(p(s,m \\mid x,y).\\)\nImportance Sampling produces independent samples \\((s_i, m_i)\\) from the prior distribution. It also outputs unnormalized weights\n\\[\nw_i = \\frac {p(x,y,s_i,m_i)} {p(s_i, m_i)} = p(x,y \\mid s_i, m_i)\n\\]\nsuch that the empirical distribution\n\\[\n\\frac{1}{N} \\sum_{i =1}^N \\frac {w_i} {\\sum_{j=1}^N w_j} \\delta_{(s_i, m_i)}\n\\]\nis a good approximation of the posterior." }, { "objectID": "developers/inference/abstractmcmc-turing/index.html#define-a-sampler", "href": "developers/inference/abstractmcmc-turing/index.html#define-a-sampler", "title": "How Turing Implements AbstractMCMC", "section": "1. Define a Sampler", "text": "1. Define a Sampler\nRecall the last line of the above code block:\n\nchn = sample(mod, alg, n_samples, progress=false)\n\nChains MCMC chain (1000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 0.06 seconds\nCompute duration = 0.06 seconds\nparameters = s², m\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nHere sample takes as arguments a model mod, an algorithm alg, and a number of samples n_samples, and returns an instance chn of Chains which can be analysed using the functions in MCMCChains.\n\nModels\nTo define a model, you declare a joint distribution on variables in the @model macro, and specify which variables are observed and which should be inferred, as well as the value of the observed variables. Thus, when implementing Importance Sampling,\n\nmod = gdemo(1.5, 2)\n\nDynamicPPL.Model{typeof(gdemo), (:x, :y), (), (), Tuple{Float64, Int64}, Tuple{}, DynamicPPL.DefaultContext, false}(gdemo, (x = 1.5, y = 2), NamedTuple(), DynamicPPL.DefaultContext())\n\n\ncreates an instance mod of the struct Model, which corresponds to the observations of a value of 1.5 for x, and a value of 2 for y.\nThis is all handled by DynamicPPL, more specifically here. I will return to how models are used to inform sampling algorithms below.\n\n\nAlgorithms\nAn algorithm is just a sampling method: in Turing, it is a subtype of the abstract type InferenceAlgorithm. Defining an algorithm may require specifying a few high-level parameters. For example, “Hamiltonian Monte-Carlo” may be too vague, but “Hamiltonian Monte Carlo with 10 leapfrog steps per proposal and a stepsize of 0.01” is an algorithm. “Metropolis-Hastings” may be too vague, but “Metropolis-Hastings with proposal distribution p” is an algorithm. Thus\n\nstepsize = 0.01\nL = 10\nalg = HMC(stepsize, L)\n\nHMC{AutoForwardDiff{nothing, Nothing}, AdvancedHMC.UnitEuclideanMetric}(0.01, 10, AutoForwardDiff())\n\n\ndefines a Hamiltonian Monte-Carlo algorithm, an instance of HMC, which is a subtype of InferenceAlgorithm.\nIn the case of Importance Sampling, there is no need to specify additional parameters:\n\nalg = IS()\n\nIS()\n\n\ndefines an Importance Sampling algorithm, an instance of IS, a subtype of InferenceAlgorithm.\nWhen creating your own Turing sampling method, you must, therefore, build a subtype of InferenceAlgorithm corresponding to your method.\n\n\nSamplers\nSamplers are not the same as algorithms. An algorithm is a generic sampling method, a sampler is an object that stores information about how algorithm and model interact during sampling, and is modified as sampling progresses. The Sampler struct is defined in DynamicPPL.\nTuring implements AbstractMCMC’s AbstractSampler with the Sampler struct defined in DynamicPPL. The most important attributes of an instance spl of Sampler are:\n\nspl.alg: the sampling method used, an instance of a subtype of InferenceAlgorithm\nspl.state: information about the sampling process, see below\n\nWhen you call sample(mod, alg, n_samples), Turing first uses model and alg to build an instance spl of Sampler , then calls the native AbstractMCMC function sample(mod, spl, n_samples).\nWhen you define your own Turing sampling method, you must therefore build:\n\na sampler constructor that uses a model and an algorithm to initialise an instance of Sampler. For Importance Sampling:\n\n\nfunction Sampler(alg::IS, model::Model, s::Selector)\n info = Dict{Symbol,Any}()\n state = ISState(model)\n return Sampler(alg, info, s, state)\nend\n\n\na state struct implementing AbstractSamplerState corresponding to your method: we cover this in the following paragraph.\n\n\n\nStates\nThe vi field contains all the important information about sampling: first and foremost, the values of all the samples, but also the distributions from which they are sampled, the names of model parameters, and other metadata. As we will see below, many important steps during sampling correspond to queries or updates to spl.state.vi.\nBy default, you can use SamplerState, a concrete type defined in inference/Inference.jl, which extends AbstractSamplerState and has no field except for vi:\n\nmutable struct SamplerState{VIType<:VarInfo} <: AbstractSamplerState\n vi::VIType\nend\n\nWhen doing Importance Sampling, we care not only about the values of the samples but also their weights. We will see below that the weight of each sample is also added to spl.state.vi. Moreover, the average\n\\[\n\\frac 1 N \\sum_{j=1}^N w_i = \\frac 1 N \\sum_{j=1}^N p(x,y \\mid s_i, m_i)\n\\]\nof the sample weights is a particularly important quantity:\n\nit is used to normalise the empirical approximation of the posterior distribution\nits logarithm is the importance sampling estimate of the log evidence \\(\\log p(x, y)\\)\n\nTo avoid having to compute it over and over again, is.jldefines an IS-specific concrete type ISState for sampler states, with an additional field final_logevidence containing\n\\[\n\\log \\frac 1 N \\sum_{j=1}^N w_i.\n\\]\n\nmutable struct ISState{V<:VarInfo,F<:AbstractFloat} <: AbstractSamplerState\n vi::V\n final_logevidence::F\nend\n\n# additional constructor\nISState(model::Model) = ISState(VarInfo(model), 0.0)\n\nThe following diagram summarizes the hierarchy presented above.\n\n\n\n\n\n\n\nG\n\n\n\nspl\n\nspl\nSampler\n<:AbstractSampler\n\n\n\nstate\n\nspl.state\nState\n<:AbstractSamplerState\n\n\n\nspl->state\n\n\n\n\n\nalg\n\nspl.alg\nAlgorithm\n<:InferenceAlgorithm\n\n\n\nspl->alg\n\n\n\n\n\nplaceholder1\n\n...\n\n\n\nspl->placeholder1\n\n\n\n\n\nvi\n\nspl.state.vi\nVarInfo\n<:AbstractVarInfo\n\n\n\nstate->vi\n\n\n\n\n\nplaceholder2\n\n...\n\n\n\nstate->placeholder2\n\n\n\n\n\nplaceholder3\n\n...\n\n\n\nalg->placeholder3\n\n\n\n\n\nplaceholder4\n\n...\n\n\n\nplaceholder1->placeholder4" }, { "objectID": "developers/inference/abstractmcmc-turing/index.html#overload-the-functions-used-inside-mcmcsample", "href": "developers/inference/abstractmcmc-turing/index.html#overload-the-functions-used-inside-mcmcsample", "title": "How Turing Implements AbstractMCMC", "section": "2. Overload the functions used inside mcmcsample", "text": "2. Overload the functions used inside mcmcsample\nA lot of the things here are method-specific. However, Turing also has some functions that make it easier for you to implement these functions, for example.\n\nTransitions\nAbstractMCMC stores information corresponding to each individual sample in objects called transition, but does not specify what the structure of these objects could be. You could decide to implement a type MyTransition for transitions corresponding to the specifics of your methods. However, there are many situations in which the only information you need for each sample is:\n\nits value: \\(\\theta\\)\nlog of the joint probability of the observed data and this sample: lp\n\nInference.jl defines a struct Transition, which corresponds to this default situation\n\nstruct Transition{T,F<:AbstractFloat}\n θ::T\n lp::F\nend\n\nIt also contains a constructor that builds an instance of Transition from an instance spl of Sampler: \\(\\theta\\) is spl.state.vi converted to a namedtuple, and lp is getlogp(spl.state.vi). is.jl uses this default constructor at the end of the step! function here.\n\n\nHow sample works\nA crude summary, which ignores things like parallelism, is the following:\nsample calls mcmcsample, which calls\n\nsample_init! to set things up\nstep! repeatedly to produce multiple new transitions\nsample_end! to perform operations once all samples have been obtained\nbundle_samples to convert a vector of transitions into a more palatable type, for instance a Chain.\n\nYou can, of course, implement all of these functions, but AbstractMCMC as well as Turing, also provide default implementations for simple cases. For instance, importance sampling uses the default implementations of sample_init! and bundle_samples, which is why you don’t see code for them inside is.jl." }, { "objectID": "developers/inference/abstractmcmc-turing/index.html#overload-assume-and-observe", "href": "developers/inference/abstractmcmc-turing/index.html#overload-assume-and-observe", "title": "How Turing Implements AbstractMCMC", "section": "3. Overload assume and observe", "text": "3. Overload assume and observe\nThe functions mentioned above, such as sample_init!, step!, etc., must, of course, use information about the model in order to generate samples! In particular, these functions may need samples from distributions defined in the model or to evaluate the density of these distributions at some values of the corresponding parameters or observations.\nFor an example of the former, consider Importance Sampling as defined in is.jl. This implementation of Importance Sampling uses the model prior distribution as a proposal distribution, and therefore requires samples from the prior distribution of the model. Another example is Approximate Bayesian Computation, which requires multiple samples from the model prior and likelihood distributions in order to generate a single sample.\nAn example of the latter is the Metropolis-Hastings algorithm. At every step of sampling from a target posterior\n\\[\np(\\theta \\mid x_{\\text{obs}}),\n\\]\nin order to compute the acceptance ratio, you need to evaluate the model joint density\n\\[\np\\left(\\theta_{\\text{prop}}, x_{\\text{obs}}\\right)\n\\]\nwith \\(\\theta_{\\text{prop}}\\) a sample from the proposal and \\(x_{\\text{obs}}\\) the observed data.\nThis begs the question: how can these functions access model information during sampling? Recall that the model is stored as an instance m of Model. One of the attributes of m is the model evaluation function m.f, which is built by compiling the @model macro. Executing f runs the tilde statements of the model in order, and adds model information to the sampler (the instance of Sampler that stores information about the ongoing sampling process) at each step (see here for more information about how the @model macro is compiled). The DynamicPPL functions assume and observe determine what kind of information to add to the sampler for every tilde statement.\nConsider an instance m of Model and a sampler spl, with associated VarInfo vi = spl.state.vi. At some point during the sampling process, an AbstractMCMC function such as step! calls m(vi, ...), which calls the model evaluation function m.f(vi, ...).\n\nfor every tilde statement in the @model macro, m.f(vi, ...) returns model-related information (samples, value of the model density, etc.), and adds it to vi. How does it do that?\n\nrecall that the code for m.f(vi, ...) is automatically generated by compilation of the @model macro\nfor every tilde statement in the @model declaration, this code contains a call to assume(vi, ...) if the variable on the LHS of the tilde is a model parameter to infer, and observe(vi, ...) if the variable on the LHS of the tilde is an observation\nin the file corresponding to your sampling method (ie in Turing.jl/src/inference/<your_method>.jl), you have overloaded assume and observe, so that they can modify vi to include the information and samples that you care about!\nat a minimum, assume and observe return the log density lp of the sample or observation. the model evaluation function then immediately calls acclogp!!(vi, lp), which adds lp to the value of the log joint density stored in vi.\n\n\nHere’s what assume looks like for Importance Sampling:\n\nfunction DynamicPPL.assume(rng, spl::Sampler{<:IS}, dist::Distribution, vn::VarName, vi)\n r = rand(rng, dist)\n push!(vi, vn, r, dist, spl)\n return r, 0\nend\n\nThe function first generates a sample r from the distribution dist (the right hand side of the tilde statement). It then adds r to vi, and returns r and 0.\nThe observe function is even simpler:\n\nfunction DynamicPPL.observe(spl::Sampler{<:IS}, dist::Distribution, value, vi)\n return logpdf(dist, value)\nend\n\nIt simply returns the density (in the discrete case, the probability) of the observed value under the distribution dist." }, { "objectID": "developers/inference/abstractmcmc-turing/index.html#summary-importance-sampling-step-by-step", "href": "developers/inference/abstractmcmc-turing/index.html#summary-importance-sampling-step-by-step", "title": "How Turing Implements AbstractMCMC", "section": "4. Summary: Importance Sampling step by step", "text": "4. Summary: Importance Sampling step by step\nWe focus on the AbstractMCMC functions that are overridden in is.jl and executed inside mcmcsample: step!, which is called n_samples times, and sample_end!, which is executed once after those n_samples iterations.\n\nDuring the \\(i\\)-th iteration, step! does 3 things:\n\nempty!!(spl.state.vi): remove information about the previous sample from the sampler’s VarInfo\nmodel(rng, spl.state.vi, spl): call the model evaluation function\n\ncalls to assume add the samples from the prior \\(s_i\\) and \\(m_i\\) to spl.state.vi\ncalls to assume or observe are followed by the line acclogp!!(vi, lp), where lp is an output of assume and observe\nlp is set to 0 after assume, and to the value of the density at the observation after observe\nWhen all the tilde statements have been covered, spl.state.vi.logp[] is the sum of the lp, i.e., the likelihood \\(\\log p(x, y \\mid s_i, m_i) = \\log p(x \\mid s_i, m_i) + \\log p(y \\mid s_i, m_i)\\) of the observations given the latent variable samples \\(s_i\\) and \\(m_i\\).\n\nreturn Transition(spl): build a transition from the sampler, and return that transition\n\nthe transition’s vi field is simply spl.state.vi\nthe lp field contains the likelihood spl.state.vi.logp[]\n\n\nWhen the n_samples iterations are completed, sample_end! fills the final_logevidence field of spl.state\n\nIt simply takes the logarithm of the average of the sample weights, using the log weights for numerical stability" }, { "objectID": "developers/contributing/index.html", "href": "developers/contributing/index.html", "title": "Contributing", "section": "", "text": "Turing is an open-source project and is hosted on GitHub. We welcome contributions from the community in all forms, large or small: bug reports, feature implementations, code contributions, or improvements to documentation or infrastructure are all extremely valuable. We would also very much appreciate examples of models written using Turing.\n\nHow to get involved\nOur outstanding issues are tabulated on our issue tracker. Closing one of these may involve implementing new features, fixing bugs, or writing example models.\nYou can also join the #turing channel on the Julia Slack and say hello!\nIf you are new to open source software, please see GitHub’s introduction or Julia’s contribution guide on using version control for collaboration.\n\n\nDocumentation\n\nWhere is Turing’s documentation?\nEach of the packages in the Turing ecosystem (see Libraries) has its own documentation, which is typically found in the docs folder of the corresponding package. For example, the source code for DynamicPPL’s documentation can be found in its repository.\nOn top of the library-specific documentation, we also have a general documentation repository, which is what builds the website you are currently reading! Anything that appears in turinglang.org/docs is built from the docs repository.\nOther sections of the website (anything that isn’t a package, or a tutorial) – for example, the list of libraries – is built from the turinglang.github.io repository.\nIn general, we prefer documentation to be written on the TuringLang/docs repository. This is because it is more easily discoverable for users via the search bar and sidebar. Documentation written on the individual package repositories can be found via the main site’s search bar (due to a GitHub workflow that scrapes all the packages’ contents and indexes them here), but once you navigate to a package’s documentation, you cannot then use the sidebar to come back to the main documentation site.\n\n\nDocumenting unreleased features\nThere are sometimes cases where it is not possible to add docs to the TuringLang/docs repository. In particular, because the TuringLang/docs repo builds from a released version of Turing and all its dependencies, new features in unreleased versions cannot be documented here. However, it’s always better to document things as you go along rather than to wait for a release and then play catch-up! In such instances, we recommend first adding documentation to the relevant package’s docs folder (using Documenter.jl as usual), and later copying it over to the main TuringLang/docs repository (adjusting for the Quarto format) once the new version is released. Note that because the TuringLang/docs repository is tied to a specific version of Turing.jl, if you have updated the documentation for a new dependency of Turing (e.g. DynamicPPL or Bijectors), you also need to ensure that there is a version of Turing.jl that is compatible with that new version.\n\n\nEnvironments\nThe TuringLang/docs repository is built from a single Manifest file, which contains a pinned version of Turing.jl. All notebooks are run with the same environment, which ensures consistency across the documentation site.\nIn general, you should not add new packages to this environment which depend on Turing (i.e., reverse dependencies of Turing), or packages that have Turing extensions. The reason for this is because such packages will have compatibility bounds on Turing. Thus, we will be unable to update the docs to use the newest Turing version, until and unless those packages also update their compatibility bounds.\n\n\n\nTests\nTuring, like most software libraries, has a test suite. You can run the whole suite by running julia --project=. from the root of the Turing repository, and then running\nimport Pkg; Pkg.test(\"Turing\")\nThe test suite subdivides into files in the test folder, and you can run only some of them using commands like\nimport Pkg; Pkg.test(\"Turing\"; test_args=[\"optim\", \"hmc\", \"--skip\", \"ext\"])\nThis one would run all files with “optim” or “hmc” in their path, such as test/optimisation/Optimisation.jl, but not files with “ext” in their path. Alternatively, you can set these arguments as command line arguments when you run Julia\njulia --project=. -e 'import Pkg; Pkg.test(; test_args=ARGS)' -- optim hmc --skip ext\nAlternatively, set the global ARGS variable, and call include(\"test/runtests.jl\").\n\n\nPull requests, versions, and releases\nWe merge all code changes through pull requests on GitHub. To make a contribution to one of the Turing packages, fork it on GitHub, start a new branch on your fork, and add commits to it. Once you’re done, open a pull request to the main repository under TuringLang. Someone from the dev team will review your code (if they don’t, ping @TuringLang/maintainers in a comment to get their attention) and check that the continuous integration tests pass (with some allowed exceptions, see below). If all looks good, we’ll merge your PR with gratitude. If not, we’ll help you fix it and then merge it with gratitude.\nEverything in this section about pull requests and branches applies to the Turing.jl and DynamicPPL.jl repositories. Most of it also applies to other repositories under the TuringLang ecosystem, though some do not bother with the main/breaking distinction or with a HISTORY.md. As at August 2025 we are slowly moving towards having all repos do the full process, so a new HISTORY.md in a repo that doesn’t yet have one is always welcome.\n\nBranches\nLike Julia packages generally, Turing.jl follows semantic versioning. Because of this, we have two persistently alive branches in our repository: main and breaking. All code that gets released as a new version of Turing gets merged into main, and a release is made from there. However, any breaking changes should first be merged into breaking. breaking will then periodically be merged into main.\nThe idea is that breaking always contains commits that build towards the next breaking release in the semantic versioning sense. That is, if the changes you make might break or change the behaviour of correctly written code that uses Turing.jl, your PR should target the breaking branch, and your code should be merged into breaking. If your changes cause no such breakage for users, your PR should target main. Notably, any bug fixes should merge directly into main.\nThis way we can frequently release new patch version from main, while developing breaking changes in parallel on breaking. E.g. if the current version is 0.19.3, and someone fixes a bug, we can merge the fix into main and release it as 0.19.4. Meanwhile, breaking changes can be developed and merged into breaking, which is building towards a release of 0.20.0. Multiple breaking changes may be accumulated into breaking, before finally the breaking-to-main merge is done, and 0.20.0 is released. On breaking the version number should then immediately be bumped to 0.21.\nWe do not generally backport bug fixes, although we may consider doing so in special circumstances.\n\n\nChange history\nWe keep a cumulative changelog in a file called HISTORY.md at the root of the repository. It should have an entry for every new breaking release, explaining everything our users need to know about the changes, such as what may have broken and how to fix things to work with the new version. Any major new features should also be described in HISTORY.md, as may any other changes that are useful for users to know about. Bug fixes generally don’t need an entry in HISTORY.md. Any new breaking release must have an entry in HISTORY.md, entries for non-breaking releases are optional.\n\n\nContinuous integration (CI) tests\nWe generally run the whole test suite of each repository in a GitHub action, typically for a few different versions of Julia, including the earliest supported version and the latest stable release. On some repositories we also run a few other checks in CI, such as code formatting and simple benchmarks. Generally all tests except those run on a prerelease version of Julia (e.g. a release candidate of an upcoming Julia release), and all code formatting checks, should pass before merging a PR. Exceptions can be made if the cause of the failure is known and unrelated to the PR. CI checks other than tests and formatting serve various purposes, and some of them can be allowed to fail. Some examples are\n\nAnything running on a prerelease of Julia. These inform us of trouble ahead when that prerelease becomes an actual release, but don’t require fixing for a PR to be merged.\nAny code coverage checks. Code coverage numbers can be helpful in catching missing tests or cases where the tests don’t test what they are intended to. However, we do not insist on any particular coverage figures, since they are not a very good metric of a test suite’s extensiveness.\nThe benchmarks on DynamicPPL repo. These should be investigated to understand why they fail. If the reason is a bug in the PR, an actual test should be added to the test suite to catch it. However, sometimes they fail for unrelated reasons.\nOccasionally CI failures are caused by bugs that require upstream fixes (such as for AD backends, or base Julia). Please ping a maintainer if you are unsure if this is the case. A good indicator for this is if the same test is failing on the base branch of your pull request.\nThe CI check in the docs repo for whether the docs are built with the latest Turing.jl release. This test failing is a reminder that we should make a PR to update to the latest version, but does not need fixing when working on a PR that makes unrelated changes to the documentation.\n\nIf you are ever unsure whether some CI check needs to pass, or if the reason why one is failing is mysterious or seems unrelated to the PR, ask a maintainer and they’ll help you out.\n\n\nPlease make mistakes\nGetting pull requests from outside the core developer team is one of the greatest joys of open source maintenance, and Turing’s community of contributors is its greatest asset. If you are thinking of contributing, please do open a pull request, even an imperfect or half-finished one, or an issue to discuss it first if you prefer. You don’t need to nail all of the above details on the first go, the dev team is very happy to help you figure out how to bump version numbers or whether you need to target main or breaking.\n\n\nFor Turing.jl core developers\nIf you are a core developer of TuringLang, two notes, in addition to the above, apply:\n\nYou don’t need to make your own fork of the package you are editing. Just make a new branch on the main repository, usually named your-username/change-you-are-making (we don’t strictly enforce this convention though). You should definitely still make a branch and a PR, and never push directly to main or breaking.\nYou can make a release of the package after your work is merged into main. This is done by leaving a comment on the latest commit on main, saying\n\n@JuliaRegistrator register\n\nRelease notes:\n[YOUR RELEASE NOTES HERE]\nIf you are making a breaking release, your release notes must also contain the string Breaking changes somewhere in them (this is mandated by the @JuliaRegistrator bot, described below).\nThe @JuliaRegistrator bot will handle creating a pull request into the Julia central package repository and tagging a new release in the repository. The release notes should be a copy-paste of the notes written in HISTORY.md if such an entry exists, or otherwise (for a patch release) a short summary of changes.\nEven core devs should always merge all their code through pull requests into main or breaking. All code should generally be reviewed by another core developer and pass continuous integration (CI) checks. Exceptions can be made in some cases though, such as ignoring failing CI checks where the cause is known and not due to the current pull request, or skipping code review when the pull request author is an experienced developer of the package and the changes are trivial.\n\n\n\nCode Formatting\nTuring uses JuliaFormatter.jl to ensure consistent code style across the codebase. All code must be formatted before submitting a pull request, and ideally with every commit.\n\nInstalling JuliaFormatter\nWe use version 1 of JuliaFormatter. Install it in your global Julia environment (not the project environment, as adding it to the Project.toml would make it an invalid dependency of the project):\njulia -e 'using Pkg; Pkg.add(name=\"JuliaFormatter\", version=\"1\"); Pkg.pin(\"JuliaFormatter\")'\n\n\nFormatting Code\nTo format all Julia files in the current directory and subdirectories:\njulia -e 'using JuliaFormatter; format(\".\")'\nRun this command from the root of the repository before committing your changes. This ensures your code follows the project’s formatting standards and maintains consistency across the codebase.\n\n\n\nStyle Guide\nTuring has a style guide, described below. Reviewing it before making a pull request is not strictly necessary, but you may be asked to change portions of your code to conform with the style guide before it is merged.\nMost Turing code follows Blue: a Style Guide for Julia. These conventions were created from a variety of sources including Python’s PEP8, Julia’s Notes for Contributors, and Julia’s Style Guide.\n\nSynopsis\n\nUse 4 spaces per indentation level, no tabs.\nTry to adhere to a 92 character line length limit.\nUse upper camel case convention for modules and types.\nUse lower case with underscores for method names (note: Julia code likes to use lower case without underscores).\nComments are good, try to explain the intentions of the code.\nUse whitespace to make the code more readable.\nNo whitespace at the end of a line (trailing whitespace).\nAvoid padding brackets with spaces. ex. Int64(value) preferred over Int64( value ).\n\n\n\nA Word on Consistency\nWhen adhering to the Blue style, it’s important to realise that these are guidelines, not rules. This is stated best in the PEP8:\n\nA style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is most important.\n\n\nBut most importantly: know when to be inconsistent – sometimes the style guide just doesn’t apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don’t hesitate to ask!\n\n\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Developers", "Contributing" ] }, { "objectID": "developers/transforms/dynamicppl/index.html", "href": "developers/transforms/dynamicppl/index.html", "title": "Variable transformations in DynamicPPL", "section": "", "text": "In the final part of this chapter, we will discuss the higher-level implications of constrained distributions in the Turing.jl framework.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/transforms/dynamicppl/index.html#linked-and-unlinked-varinfos-in-dynamicppl", "href": "developers/transforms/dynamicppl/index.html#linked-and-unlinked-varinfos-in-dynamicppl", "title": "Variable transformations in DynamicPPL", "section": "Linked and unlinked VarInfos in DynamicPPL", "text": "Linked and unlinked VarInfos in DynamicPPL\n\nimport Random\nRandom.seed!(468);\n\n# Turing re-exports the entirety of Distributions\nusing Turing\n\nWhen we are performing Bayesian inference, we are trying to sample from a joint probability distribution, which isn’t usually a single, well-defined distribution like in the rather simplified example above. However, each random variable in the model will have its own distribution, and often some of these will be constrained. For example, if b ~ LogNormal() is a random variable in a model, then \\(p(b)\\) will be zero for any \\(b \\leq 0\\). Consequently, any joint probability \\(p(b, c, \\ldots)\\) will also be zero for any combination of parameters where \\(b \\leq 0\\), and so that joint distribution is itself constrained.\nTo get around this, DynamicPPL allows the variables to be transformed in exactly the same way as above. For simplicity, consider the following model:\n\nusing DynamicPPL\n\n@model function demo()\n x ~ LogNormal()\nend\n\nmodel = demo()\nvi = VarInfo(model)\nvn_x = @varname(x)\n# Retrieve the 'internal' representation of x – we'll explain this later\nDynamicPPL.getindex_internal(vi, vn_x)\n\n1-element Vector{Float64}:\n 1.0746648736094493\n\n\nThe call to VarInfo executes the model once and stores the sampled value inside vi. By default, VarInfo itself stores un-transformed values. We can see this by comparing the value of the logpdf stored inside the VarInfo:\n\nDynamicPPL.getlogp(vi)\n\n(logprior = -0.9935400392011169, logjac = 0.0, loglikelihood = 0.0)\n\n\nwith a manual calculation:\n\nlogpdf(LogNormal(), DynamicPPL.getindex_internal(vi, vn_x))\n\n1-element Vector{Float64}:\n -0.9935400392011169\n\n\nIn DynamicPPL, the link function can be used to transform the variables. This function does three things: first, it transforms the variables; secondly, it updates the value of logp (by adding the Jacobian term); and thirdly, it sets a flag on the variables to indicate that it has been transformed. Note that this acts on all variables in the model, including unconstrained ones. (Unconstrained variables just have an identity transformation.)\n\nvi_linked = DynamicPPL.link(vi, model)\nprintln(\"Transformed value: $(DynamicPPL.getindex_internal(vi_linked, vn_x))\")\nprintln(\"Transformed logp: $(DynamicPPL.getlogp(vi_linked))\")\nprintln(\"Transformed flag: $(DynamicPPL.is_transformed(vi_linked, vn_x))\")\n\nTransformed value: [0.07200886749732066]\nTransformed logp: (logprior = -0.9935400392011169, logjac = -0.07200886749732066, loglikelihood = 0.0)\nTransformed flag: true\n\n\nIndeed, we can see that the new logp value matches with\n\nlogpdf(Normal(), DynamicPPL.getindex_internal(vi_linked, vn_x))\n\n1-element Vector{Float64}:\n -0.9215311717037962\n\n\nThe reverse transformation, invlink, reverts all of the above steps:\n\nvi = DynamicPPL.invlink(vi_linked, model) # Same as the previous vi\nprintln(\"Un-transformed value: $(DynamicPPL.getindex_internal(vi, vn_x))\")\nprintln(\"Un-transformed logp: $(DynamicPPL.getlogp(vi))\")\nprintln(\"Un-transformed flag: $(DynamicPPL.is_transformed(vi, vn_x))\")\n\nUn-transformed value: [1.0746648736094493]\nUn-transformed logp: (logprior = -0.9935400392011169, logjac = 0.0, loglikelihood = 0.0)\nUn-transformed flag: false\n\n\n\nModel and internal representations\nIn DynamicPPL, there is a difference between the value of a random variable and its ‘internal’ value. This is most easily seen by first transforming, and then comparing the output of getindex and getindex_internal. The former extracts the regular value, which we call the model representation (because it is consistent with the distribution specified in the model). The latter, as the name suggests, gets the internal representation of the variable, which is how it is actually stored in the VarInfo object.\n\nprintln(\" Model representation: $(getindex(vi_linked, vn_x))\")\nprintln(\"Internal representation: $(DynamicPPL.getindex_internal(vi_linked, vn_x))\")\n\n Model representation: 1.0746648736094493\nInternal representation: [0.07200886749732066]\n\n\n\n\n\n\n\n\nNote\n\n\n\nNote that vi_linked[vn_x] can also be used as shorthand for getindex(vi_linked, vn_x); this usage is common in the DynamicPPL/Turing codebase.\n\n\nWe can see (for this linked varinfo) that there are two differences between these outputs:\n\nThe internal representation has been transformed using the bijector (in this case, the log function). This means that the is_transformed() flag which we used above doesn’t modify the model representation: it only tells us whether the internal representation has been transformed or not.\nThe internal representation is a vector, whereas the model representation is a scalar. This is because in DynamicPPL, all internal values are vectorised (i.e. converted into some vector), regardless of distribution. On the other hand, since the model specifies a univariate distribution, the model representation is a scalar.\n\nOne might also ask, what is the internal representation for an unlinked varinfo?\n\nprintln(\" Model representation: $(getindex(vi, vn_x))\")\nprintln(\"Internal representation: $(DynamicPPL.getindex_internal(vi, vn_x))\")\n\n Model representation: 1.0746648736094493\nInternal representation: [1.0746648736094493]\n\n\nFor an unlinked VarInfo, the internal representation is vectorised, but not transformed. We call this an unlinked internal representation; conversely, when the VarInfo has been linked, each variable will have a corresponding linked internal representation.\nThis sequence of events is summed up in the following diagram, where f(..., args) indicates that the ... is to be replaced with the object at the beginning of the arrow:\n\n\n\nFunctions related to variable transforms in DynamicPPL\n\n\nIn the final part of this article, we will take a more in-depth look at the internal DynamicPPL machinery that allows us to convert between representations and obtain the correct probability densities. Before that, though, we will take a quick high-level look at how the HMC sampler in Turing.jl uses the functions introduced so far.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/transforms/dynamicppl/index.html#hmc-in-turing.jl", "href": "developers/transforms/dynamicppl/index.html#hmc-in-turing.jl", "title": "Variable transformations in DynamicPPL", "section": "HMC in Turing.jl", "text": "HMC in Turing.jl\nWhile DynamicPPL provides the functionality for transforming variables, the transformation itself happens at an even higher level, i.e. in the sampler itself. The HMC sampler in Turing.jl is in this file.\nIn the first step of sampling, it calls link on the sampler. What this means is that from the perspective of the HMC sampler, it never sees the constrained variable: it always thinks that it is sampling from an unconstrained distribution.\nThe biggest prerequisite for this to work correctly is that the potential energy term in the Hamiltonian—or in other words, the model log density—must be programmed correctly to include the Jacobian term. This is exactly the same as how we had to make sure to define logq(y) correctly in the toy HMC example above.\nWithin Turing.jl, this is correctly handled by calling\nx, inv_logjac = with_logabsdet_jacobian(y, inverse_transform)\nand then passing inv_logjac to DynamicPPL’s LogJacobianAccumulator.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/transforms/dynamicppl/index.html#a-deeper-dive-into-dynamicppls-internal-machinery", "href": "developers/transforms/dynamicppl/index.html#a-deeper-dive-into-dynamicppls-internal-machinery", "title": "Variable transformations in DynamicPPL", "section": "A deeper dive into DynamicPPL’s internal machinery", "text": "A deeper dive into DynamicPPL’s internal machinery\nAs described above, DynamicPPL stores a (possibly linked) internal representation which is accessible via getindex_internal, but can also provide users with the original, untransformed, model representation via getindex. This abstraction allows the user to obtain samples from constrained distributions without having to perform the transformation themselves.\n\n\n\nMore functions related to variable transforms in DynamicPPL\n\n\nThe conversion between these representations is done using several internal functions in DynamicPPL, as depicted in the above diagram. The following operations are labelled:\n\nThis is linking, i.e. transforming a constrained variable to an unconstrained one.\nThis is vectorisation: for example, converting a scalar value to a 1-element vector.\nThis arrow brings us from the model representation to the linked internal representation. This is the composition of (1) and (2): linking and then vectorising.\nThis arrow brings us from the model representation to the unlinked internal representation. This only requires a single step, vectorisation.\n\nEach of these steps can be accomplished using the following functions.\n\n\n\n\n\n\n\n\n\nTo get the function\nTo get the inverse function\n\n\n\n\n(1)\nlink_transform(dist)\ninvlink_transform(dist)\n\n\n(2)\nto_vec_transform(dist)\nfrom_vec_transform(dist)\n\n\n(3)\nto_linked_internal_transform(vi, vn[, dist])\nfrom_linked_internal_transform(vi, vn[, dist])\n\n\n(4)\nto_internal_transform(vi, vn[, dist])\nfrom_internal_transform(vi, vn[, dist])\n\n\n\nNote that these functions do not perform the actual transformation; rather, they return the transformation function itself. For example, let’s take a look at the VarInfo from the previous section, which contains a single variable x ~ LogNormal().\n\nmodel_repn = vi[vn_x]\n\n1.0746648736094493\n\n\n\n# (1) Get the link function\nf_link = DynamicPPL.link_transform(LogNormal())\n# (2) Get the vectorise function\nf_vec = DynamicPPL.to_vec_transform(LogNormal())\n\n# Apply it to the model representation\nlinked_internal_repn = f_vec(f_link(model_repn))\n\n1-element Vector{Float64}:\n 0.07200886749732066\n\n\nEquivalently, we could have done:\n\n# (3) Get the linked internal transform function\nf_linked_internal = DynamicPPL.to_linked_internal_transform(vi, vn_x, LogNormal())\n\n# Apply it to the model representation\nlinked_internal_repn = f_linked_internal(model_repn)\n\n1-element Vector{Float64}:\n 0.07200886749732066\n\n\nAnd let’s confirm that this is the same as the linked internal representation, using the VarInfo that we linked earlier:\n\nDynamicPPL.getindex_internal(vi_linked, vn_x)\n\n1-element Vector{Float64}:\n 0.07200886749732066\n\n\nThe purpose of having all of these machinery is to allow other parts of DynamicPPL, such as the tilde pipeline, to handle transformed variables correctly. The following diagram shows how assume first checks whether the variable is transformed (using is_transformed), and then applies the appropriate transformation function.\n\n\n\n\n\n\n%%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%%\n%%{ init: { 'flowchart': { 'curve': 'linear', 'wrappingWidth': -1 } } }%%\ngraph TD\n A[\"x ~ LogNormal()\"]:::boxStyle\n B[\"vn = <span style='color:#3B6EA8 !important;'>@varname</span>(x)<br>dist = LogNormal()<br>x, vi = ...\"]:::boxStyle\n C[\"assume(vn, dist, vi)\"]:::boxStyle\n D([\"<span style='color:#3B6EA8 !important;'>if</span> is_transformed(vi, vn)\"]):::boxStyle\n E[\"f = from_internal_transform(vi, vn, dist)\"]:::boxStyle\n F[\"f = from_linked_internal_transform(vi, vn, dist)\"]:::boxStyle\n G[\"<span style='color:#3B6EA8 !important;'>x, logjac = with_logabsdet_jacobian(f, getindex_internal(vi, vn, dist))</span>\"]:::boxStyle\n H[\"<span style='color:#3B6EA8 !important;'>return</span> x, logpdf(dist, x) - logjac, vi\"]:::boxStyle\n \n A -.->|<span style='color:#3B6EA8 ; background-color:#ffffff;'>@model</span>| B\n B -.->|<span style='color:#000000 ; background-color:#ffffff;'>tilde-pipeline</span>| C\n C --> D\n D -->|<span style='color:#97365B ; background-color:#ffffff;'>false</span>| E\n D -->|<span style='color:#97365B ; background-color:#ffffff;'>true</span>| F\n E --> G\n F --> G\n G --> H\n\n classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000\n linkStyle default stroke:#000000,stroke-width:1px,color:#000000\n\n\n\n\n\n\nHere, with_logabsdet_jacobian is defined in the ChangesOfVariables.jl package, and returns both the effect of the transformation f as well as the log Jacobian term.\nBecause we chose f appropriately, we find here that x is always the model representation; furthermore, if the variable was not linked (i.e. is_transformed was false), the log Jacobian term will be zero. However, if it was linked, then the Jacobian term would be appropriately included, making sure that sampling proceeds correctly.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/transforms/dynamicppl/index.html#why-do-we-need-to-do-this-at-runtime", "href": "developers/transforms/dynamicppl/index.html#why-do-we-need-to-do-this-at-runtime", "title": "Variable transformations in DynamicPPL", "section": "Why do we need to do this at runtime?", "text": "Why do we need to do this at runtime?\nGiven that we know whether a VarInfo is linked or not, one might wonder why we need both from_internal_transform and from_linked_internal_transform at the point where the model is evaluated. Could we not, for example, store the required transformation inside the VarInfo when we link it, and simply reuse that each time?\nThat is, why can’t we just do\n\n\n\n\n\n%%{ init: { 'flowchart': { 'curve': 'linear', 'wrappingWidth': -1 } } }%%\n%%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%%\ngraph TD\n A[\"assume(varinfo, <span style='color:#3B6EA8 !important;'>@varname</span>(x), Normal())\"]:::boxStyle\n B[\"f = from_internal_transform(varinfo, varname, dist)\"]:::boxStyle\n C[\"x, logjac = with_logabsdet_jacobian(f, getindex_internal(varinfo, varname))\"]:::boxStyle\n D[\"<span style='color:#3B6EA8 !important;'>return</span> x, logpdf(dist, x) - logjac, varinfo\"]:::dashedBox\n \n A --> B\n B --> C\n C --> D\n\n classDef dashedBox fill:#ffffff,stroke:#000000,stroke-dasharray: 5 5,font-family:Courier,color:#000000\n classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000\n\n linkStyle default stroke:#000000,stroke-width:1px,color:#000000\n\n\n\n\n\n\nwhere from_internal_transform here only looks up a stored transformation function?\nUnfortunately, this is not possible in general, because the transformation function might not be a constant between different model evaluations. Consider, for example, the following model:\n\n@model function demo_dynamic_constraint()\n m ~ Normal()\n x ~ truncated(Normal(); lower=m)\n return (m=m, x=x)\nend\n\ndemo_dynamic_constraint (generic function with 2 methods)\n\n\nHere, m is distributed according to a plain Normal(), whereas the variable x is constrained to be in the domain (m, Inf). Because of this, we expect that any time we sample from the model, we should have that m < x (in terms of their model representations):\n\nmodel = demo_dynamic_constraint()\nvi = VarInfo(model)\nvn_m, vn_x = @varname(m), @varname(x)\n\nvi[vn_m], vi[vn_x]\n\n(-0.0740437565595174, 0.6327762377562545)\n\n\n(Note that vi[vn] is a shorthand for getindex(vi, vn), so this retrieves the model representations of m and x.) So far, so good. Let’s now link this VarInfo so that we end up working in an ‘unconstrained’ space, where both m and x can take on any values in (-Inf, Inf). First, we should check that the model representations are unchanged when linking:\n\nvi_linked = link(vi, model)\nvi_linked[vn_m], vi_linked[vn_x]\n\n(-0.0740437565595174, 0.6327762377562545)\n\n\nBut if we change the value of m, to, say, a bit larger than x:\n\n# Update the model representation for `m` in `vi_linked`.\nvi_linked[vn_m] = vi_linked[vn_x] + 1\nvi_linked[vn_m], vi_linked[vn_x]\n\n(1.6327762377562545, 0.6327762377562545)\n\n\n\n\n\n\n\n\nWarning\n\n\n\nThis is just for demonstration purposes! You shouldn’t be directly setting variables in a linked varinfo like this unless you know for a fact that the value will be compatible with the constraints of the model.\n\n\nNow, we see that the constraint m < x is no longer satisfied. Hence, one might expect that if we try to evaluate the model using this VarInfo, we should obtain an error. Here, evaluate!! returns two things: the model’s return value itself (which we defined above to be a NamedTuple), and the resulting VarInfo post-evaluation.\n\nretval, ret_varinfo = DynamicPPL.evaluate!!(model, vi_linked)\ngetlogp(ret_varinfo)\n\n(logprior = -2.9368285247373587, logjac = 0.34697925043108835, loglikelihood = 0.0)\n\n\nBut we don’t get any errors! Indeed, we could even calculate the ‘log probability density’ for this evaluation.\nTo understand this, we need to look at the actual value which was used during the model evaluation. We can glean this from the return value (or from the returned VarInfo, but the former is easier):\n\nretval\n\n(m = 1.6327762377562545, x = 2.3395962320720263)\n\n\nWe can see here that the model evaluation used the value of m that we provided, but the value of x was ‘updated’.\nThe reason for this is that internally in a model evaluation, we construct the transformation function from the internal to the model representation based on the current realizations in the model! That is, we take the dist in a x ~ dist expression at model evaluation time and use that to construct the transformation, thus allowing it to change between model evaluations without invalidating the transformation.\nKnowing that the distribution of x depends on the value of m, we can now understand how the model representation of x got updated. The linked VarInfo does not store the model representation of x, but only its linked internal representation. So, what happened during the model evaluation above was that the linked internal representation of x – which was constructed using the original value of m – was transformed back into a new model representation using a different value of m.\nWe can reproduce the ‘new’ value of x by performing these transformations manually:\n\n# Generate a fresh linked VarInfo (without the new / 'corrupted' values)\nvi_linked = link(vi, model)\n# See the linked internal representations\nDynamicPPL.getindex_internal(vi_linked, vn_m), DynamicPPL.getindex_internal(vi_linked, vn_x)\n\n([-0.0740437565595174], [-0.34697925043108835])\n\n\nNow we update the value of m like we did before:\n\nvi_linked[vn_m] = vi_linked[vn_x] + 1\nvi_linked[vn_m]\n\n1.6327762377562545\n\n\nWhen evaluating the model, the distribution of x is now changed, and so is the corresponding inverse bijector:\n\nnew_dist_x = truncated(Normal(); lower=vi_linked[vn_m])\nnew_f_inv = DynamicPPL.invlink_transform(new_dist_x)\n\nBijectors.Inverse{Bijectors.TruncatedBijector{Float64, Float64}}(Bijectors.TruncatedBijector{Float64, Float64}(1.6327762377562545, Inf))\n\n\nand if we apply this to the internal representation of x:\n\nnew_f_inv(DynamicPPL.getindex_internal(vi_linked, vn_x))\n\n1-element Vector{Float64}:\n 2.3395962320720263\n\n\nwhich is the same value as we got above in retval.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/transforms/dynamicppl/index.html#conclusion", "href": "developers/transforms/dynamicppl/index.html#conclusion", "title": "Variable transformations in DynamicPPL", "section": "Conclusion", "text": "Conclusion\nIn this chapter of the Turing docs, we’ve looked at:\n\nwhy variables might need to be transformed;\nhow this is accounted for mathematically with the Jacobian term;\nthe basic API and functionality of Bijectors.jl; and\nthe higher-level usage of transforms in DynamicPPL and Turing.\n\nThis will hopefully have equipped you with a better understanding of how constrained variables are handled in the Turing framework. With this knowledge, you should especially find it easier to navigate DynamicPPL’s VarInfo type, which forms the backbone of model evaluation.", "crumbs": [ "Get Started", "Developers", "Variable Transformations", "Variable transformations in DynamicPPL" ] }, { "objectID": "developers/contexts/submodel-condition/index.html", "href": "developers/contexts/submodel-condition/index.html", "title": "Conditioning and fixing in submodels", "section": "", "text": "This page is a technical explanation of how contexts are managed for submodels.\nA user-facing guide to submodels is available on this page.", "crumbs": [ "Get Started", "Developers", "DynamicPPL Contexts", "Conditioning and fixing in submodels" ] }, { "objectID": "developers/contexts/submodel-condition/index.html#prefixcontext", "href": "developers/contexts/submodel-condition/index.html#prefixcontext", "title": "Conditioning and fixing in submodels", "section": "PrefixContext", "text": "PrefixContext\nSubmodels in DynamicPPL come with the notion of prefixing variables: under the hood, this is implemented by adding a PrefixContext to the context stack.\nPrefixContext is a context that, as the name suggests, prefixes all variables inside a model with a given symbol. Thus, for example:\n\nusing DynamicPPL, Distributions\n\n@model function f()\n x ~ Normal()\n return y ~ Normal()\nend\n\n@model function g()\n return a ~ to_submodel(f())\nend\n\ng (generic function with 2 methods)\n\n\ninside the submodel f, the variables x and y become a.x and a.y respectively. This is easiest to observe by running the model:\n\nvi = VarInfo(g())\nkeys(vi)\n\n2-element Vector{VarName{:a}}:\n a.x\n a.y\n\n\n\n\n\n\n\n\nNote\n\n\n\nIn this case, where to_submodel is called without any other arguments, the prefix to be used is automatically inferred from the name of the variable on the left-hand side of the tilde. We will return to the ‘manual prefixing’ case later.\n\n\nThe phrase ‘becoming’ a different variable is a little underspecified: it is useful to pinpoint the exact location where the prefixing occurs, which is tilde_assume. The method responsible for it is tilde_assume(::PrefixContext, right, vn, vi): this attaches the prefix in the context to the VarName argument, before recursively calling tilde_assume with the new prefixed VarName. This means that even though a statement x ~ dist still enters the tilde pipeline at the top level as x, if the model evaluation context contains a PrefixContext, any function after tilde_assume(::PrefixContext, ...) will see a.x instead.", "crumbs": [ "Get Started", "Developers", "DynamicPPL Contexts", "Conditioning and fixing in submodels" ] }, { "objectID": "developers/contexts/submodel-condition/index.html#conditioncontext", "href": "developers/contexts/submodel-condition/index.html#conditioncontext", "title": "Conditioning and fixing in submodels", "section": "ConditionContext", "text": "ConditionContext\nConditionContext is a context which stores values of variables that are to be conditioned on. These values may be stored as a Dict which maps VarNames to values, or alternatively as a NamedTuple. The latter only works correctly if all VarNames are ‘basic’, in that they have an identity optic (i.e., something like a.x or a[1] is forbidden). Because of this limitation, we will only use Dict in this example.\n\n\n\n\n\n\nNote\n\n\n\nIf a ConditionContext with a NamedTuple encounters anything to do with a prefix, its internal NamedTuple is converted to a Dict anyway, so it is quite reasonable to ignore the NamedTuple case in this exposition.\n\n\nOne can inspect the conditioning values with, for example:\n\n@model function d()\n x ~ Normal()\n return y ~ Normal()\nend\n\ncond_model = d() | (@varname(x) => 1.0)\ncond_ctx = cond_model.context\n\nConditionContext(Dict(x => 1.0), DynamicPPL.DefaultContext())\n\n\nThere are several internal functions that are used to determine whether a variable is conditioned, and if so, what its value is.\n\nDynamicPPL.hasconditioned_nested(cond_ctx, @varname(x))\n\ntrue\n\n\n\nDynamicPPL.getconditioned_nested(cond_ctx, @varname(x))\n\n1.0\n\n\nThese functions are in turn used by the function DynamicPPL.contextual_isassumption, which is largely the same as hasconditioned_nested, but also checks whether the value is missing (in which case it isn’t really conditioned).\n\nDynamicPPL.contextual_isassumption(cond_ctx, @varname(x))\n\nfalse\n\n\n\n\n\n\n\n\nNote\n\n\n\nNotice that (neglecting missing values) the return value of contextual_isassumption is the opposite of hasconditioned_nested, i.e. for a variable that is conditioned on, contextual_isassumption returns false.\n\n\nIf a variable x is conditioned on, then the effect of this is to set the value of x to the given value (while still including its contribution to the log probability density). Since x is no longer a random variable, if we were to evaluate the model, we would find only one key in the VarInfo:\n\nkeys(VarInfo(cond_model))\n\n1-element Vector{VarName{:y, typeof(identity)}}:\n y", "crumbs": [ "Get Started", "Developers", "DynamicPPL Contexts", "Conditioning and fixing in submodels" ] }, { "objectID": "developers/contexts/submodel-condition/index.html#joint-behaviour-desiderata-at-the-model-level", "href": "developers/contexts/submodel-condition/index.html#joint-behaviour-desiderata-at-the-model-level", "title": "Conditioning and fixing in submodels", "section": "Joint behaviour: desiderata at the model level", "text": "Joint behaviour: desiderata at the model level\nWhen paired together, these two contexts have the potential to cause substantial confusion: PrefixContext modifies the variable names that are seen, which may cause them to be out of sync with the values contained inside the ConditionContext.\nWe begin by mentioning some high-level desiderata for their joint behaviour. Take these models, for example:\n\n@model function inner()\n println(\"inner context: $(__model__.context)\")\n x ~ Normal()\n return y ~ Normal()\nend\n\n@model function outer()\n println(\"outer context: $(__model__.context)\")\n return a ~ to_submodel(inner())\nend\n\n# 'Outer conditioning'\nwith_outer_cond = outer() | (@varname(a.x) => 1.0)\n\n# 'Inner conditioning'\ninner_cond = inner() | (@varname(x) => 1.0)\n@model function outer2()\n println(\"outer context: $(__model__.context)\")\n return a ~ to_submodel(inner_cond)\nend\nwith_inner_cond = outer2()\n\nModel{typeof(outer2), (), (), (), Tuple{}, Tuple{}, DefaultContext, false}(outer2, NamedTuple(), NamedTuple(), DefaultContext())\n\n\nWe want that:\n\nkeys(VarInfo(outer())) should return [a.x, a.y];\nkeys(VarInfo(with_outer_cond)) should return [a.y];\nkeys(VarInfo(with_inner_cond)) should return [a.y],\n\nIn other words, we can condition submodels either from the outside (point (2)) or from the inside (point (3)), and the variable name we use to specify the conditioning should match the level at which we perform the conditioning.\nThis is an incredibly salient point because it means that submodels can be treated as individual, opaque objects, and we can condition them without needing to know what it will be prefixed with, or the context in which that submodel is being used. For example, this means we can reuse inner_cond in another model with a different prefix, and it will still have its inner x value be conditioned, despite the prefix differing.\n\n\n\n\n\n\nNote\n\n\n\nIn the current version of DynamicPPL, these criteria are all fulfilled. However, this was not the case in the past: in particular, point (3) was not fulfilled, and users had to condition the internal submodel with the prefixes that were used outside. (See this GitHub issue for more information; this issue was the direct motivation for this documentation page.)", "crumbs": [ "Get Started", "Developers", "DynamicPPL Contexts", "Conditioning and fixing in submodels" ] }, { "objectID": "developers/contexts/submodel-condition/index.html#desiderata-at-the-context-level", "href": "developers/contexts/submodel-condition/index.html#desiderata-at-the-context-level", "title": "Conditioning and fixing in submodels", "section": "Desiderata at the context level", "text": "Desiderata at the context level\nThe above section describes how we expect conditioning and prefixing to behave from a user’s perpective. We now turn to the question of how we implement this in terms of DynamicPPL contexts. We do not specify the implementation details here, but we will sketch out something resembling an API that will allow us to achieve the target behaviour.\nPoint (1) does not involve any conditioning, only prefixing; it is therefore already satisfied by virtue of the tilde_assume method shown above.\nPoints (2) and (3) are more tricky. As the reader may surmise, the difference between them is the order in which the contexts are stacked.\nFor the outer conditioning case (point (2)), the ConditionContext will contain a VarName that is already prefixed. When we enter the inner submodel, this ConditionContext has to be passed down and somehow combined with the PrefixContext that is created when we enter the submodel. We make the claim here that the best way to do this is to nest the PrefixContext inside the ConditionContext. This is indeed what happens, as can be demonstrated by running the model.\n\nwith_outer_cond()\n\nouter context: ConditionContext(Dict(a.x => 1.0), DynamicPPL.InitContext{Random.TaskLocalRNG, DynamicPPL.InitFromPrior}(Random.TaskLocalRNG(), DynamicPPL.InitFromPrior()))\ninner context: ConditionContext(Dict(a.x => 1.0), DynamicPPL.PrefixContext{AbstractPPL.VarName{:a, typeof(identity)}, DynamicPPL.InitContext{Random.TaskLocalRNG, DynamicPPL.InitFromPrior}}(a, DynamicPPL.InitContext{Random.TaskLocalRNG, DynamicPPL.InitFromPrior}(Random.TaskLocalRNG(), DynamicPPL.InitFromPrior())))\n\n\n0.15212055397784535\n\n\nFor the inner conditioning case (point (3)), the outer model is not run with any special context. The inner model will itself contain a ConditionContext will contain a VarName that is not prefixed. When we run the model, this ConditionContext should be then nested inside a PrefixContext to form the final evaluation context. Again, we can run the model to see this in action:\n\nwith_inner_cond()\n\nouter context: DynamicPPL.InitContext{Random.TaskLocalRNG, DynamicPPL.InitFromPrior}(Random.TaskLocalRNG(), DynamicPPL.InitFromPrior())\ninner context: DynamicPPL.PrefixContext{AbstractPPL.VarName{:a, typeof(identity)}, DynamicPPL.ConditionContext{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Float64}, DynamicPPL.InitContext{Random.TaskLocalRNG, DynamicPPL.InitFromPrior}}}(a, ConditionContext(Dict(x => 1.0), DynamicPPL.InitContext{Random.TaskLocalRNG, DynamicPPL.InitFromPrior}(Random.TaskLocalRNG(), DynamicPPL.InitFromPrior())))\n\n\n-0.19476676319322261\n\n\nPutting all of the information so far together, what it means is that if we have these two inner contexts (taken from above):\n\nusing DynamicPPL: PrefixContext, ConditionContext, DefaultContext\n\ninner_ctx_with_outer_cond = ConditionContext(\n Dict(@varname(a.x) => 1.0), PrefixContext(@varname(a))\n)\ninner_ctx_with_inner_cond = PrefixContext(\n @varname(a), ConditionContext(Dict(@varname(x) => 1.0))\n)\n\nPrefixContext{VarName{:a, typeof(identity)}, ConditionContext{Dict{VarName{:x, typeof(identity)}, Float64}, DefaultContext}}(a, ConditionContext(Dict(x => 1.0), DynamicPPL.DefaultContext()))\n\n\nthen we want both of these to be true (and thankfully, they are!):\n\nDynamicPPL.hasconditioned_nested(inner_ctx_with_outer_cond, @varname(a.x))\n\ntrue\n\n\n\nDynamicPPL.hasconditioned_nested(inner_ctx_with_inner_cond, @varname(a.x))\n\ntrue\n\n\nThis allows us to finally specify our task as follows:\n\nGiven the correct arguments, we need to make sure that hasconditioned_nested and getconditioned_nested behave correctly.\nWe need to make sure that both the correct arguments are supplied. In order to do so:\n\n\n(2a) We need to make sure that when evaluating a submodel, the context stack is arranged such that PrefixContext is applied inside the parent model’s context, but outside the submodel’s own context.\n(2b) We also need to make sure that the VarName passed to it is prefixed correctly.", "crumbs": [ "Get Started", "Developers", "DynamicPPL Contexts", "Conditioning and fixing in submodels" ] }, { "objectID": "developers/contexts/submodel-condition/index.html#how-do-we-do-it", "href": "developers/contexts/submodel-condition/index.html#how-do-we-do-it", "title": "Conditioning and fixing in submodels", "section": "How do we do it?", "text": "How do we do it?\n\nhasconditioned_nested and getconditioned_nested accomplish this by first ‘collapsing’ the context stack, i.e. they go through the context stack, remove all PrefixContexts, and apply those prefixes to any conditioned variables below it in the stack. Once the PrefixContexts have been removed, one can then iterate through the context stack and check if any of the ConditionContexts contain the variable, or get the value itself. For more details the reader is encouraged to read the source code.\n\n(2a) We ensure that the context stack is correctly arranged by relying on the behaviour of make_evaluate_args_and_kwargs. This function is called whenever a model (which itself contains a context) is evaluated with a separate (‘external’) context, and makes sure to arrange both of these contexts such that the model’s context is nested inside the external context. Thus, as long as prefixing is implemented by applying a PrefixContext on the outermost layer of the inner model context, this will be correctly combined with an external context to give the behaviour seen above.\n(2b) At first glance, it seems like tilde_assume can take care of the VarName prefixing for us (as described in the first section). However, this is not actually the case: contextual_isassumption, which is the function that calls hasconditioned_nested, is much higher in the call stack than tilde_assume is. So, we need to explicitly prefix it before passing it to contextual_isassumption. This is done inside the @model macro, or technically, its subsidiary function isassumption.", "crumbs": [ "Get Started", "Developers", "DynamicPPL Contexts", "Conditioning and fixing in submodels" ] }, { "objectID": "developers/contexts/submodel-condition/index.html#nested-submodels", "href": "developers/contexts/submodel-condition/index.html#nested-submodels", "title": "Conditioning and fixing in submodels", "section": "Nested submodels", "text": "Nested submodels\nJust in case the above wasn’t complicated enough, we need to also be very careful when dealing with nested submodels, which have multiple layers of PrefixContexts which may be interspersed with ConditionContexts. For example, in this series of nested submodels,\n\n@model function charlie()\n x ~ Normal()\n y ~ Normal()\n return z ~ Normal()\nend\n@model function bravo()\n return b ~ to_submodel(charlie() | (@varname(x) => 1.0))\nend\n@model function alpha()\n return a ~ to_submodel(bravo() | (@varname(b.y) => 1.0))\nend\n\nalpha (generic function with 2 methods)\n\n\nwe expect that the only variable to be sampled should be z inside charlie, or rather, a.b.z once it has been through the prefixes.\n\nkeys(VarInfo(alpha()))\n\n1-element Vector{VarName{:a, ComposedFunction{Accessors.PropertyLens{:z}, Accessors.PropertyLens{:b}}}}:\n a.b.z\n\n\nThe general strategy that we adopt is similar to above. Following the principle that PrefixContext should be nested inside the outer context, but outside the inner submodel’s context, we can infer that the correct context inside charlie should be:\n\nbig_ctx = PrefixContext(\n @varname(a),\n ConditionContext(\n Dict(@varname(b.y) => 1.0),\n PrefixContext(@varname(b), ConditionContext(Dict(@varname(x) => 1.0))),\n ),\n)\n\nPrefixContext{VarName{:a, typeof(identity)}, ConditionContext{Dict{VarName{:b, Accessors.PropertyLens{:y}}, Float64}, PrefixContext{VarName{:b, typeof(identity)}, ConditionContext{Dict{VarName{:x, typeof(identity)}, Float64}, DefaultContext}}}}(a, ConditionContext(Dict(b.y => 1.0), DynamicPPL.PrefixContext{AbstractPPL.VarName{:b, typeof(identity)}, DynamicPPL.ConditionContext{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Float64}, DynamicPPL.DefaultContext}}(b, ConditionContext(Dict(x => 1.0), DynamicPPL.DefaultContext()))))\n\n\nWe need several things to work correctly here: we need the VarName prefixing to behave correctly, and then we need to implement hasconditioned_nested and getconditioned_nested on the resulting prefixed VarName. It turns out that the prefixing itself is enough to illustrate the most important point in this section, namely, the need to traverse the context stack in a different direction to what most of DynamicPPL does.\nLet’s work with a function called myprefix(::AbstractContext, ::VarName) (to avoid confusion with any existing DynamicPPL function). We should like myprefix(big_ctx, @varname(x)) to return @varname(a.b.x). Consider the following naive implementation, which mirrors a lot of code in the tilde-pipeline:\n\nusing DynamicPPL: childcontext, AbstractContext, AbstractParentContext\nusing AbstractPPL: AbstractPPL\n\nmyprefix(::AbstractContext, vn::VarName) = vn\nmyprefix(ctx::AbstractParentContext, vn::VarName) = myprefix(childcontext(ctx), vn)\nfunction myprefix(ctx::DynamicPPL.PrefixContext, vn::VarName)\n # The functionality to actually manipulate the VarNames is in AbstractPPL\n new_vn = AbstractPPL.prefix(vn, ctx.vn_prefix)\n # Then pass to the child context\n return myprefix(childcontext(ctx), new_vn)\nend\n\nmyprefix(big_ctx, @varname(x))\n\nb.a.x\n\n\nThis implementation clearly is not correct, because it applies the inner PrefixContext before the outer one.\nThe right way to implement myprefix is to, essentially, reverse the order of two lines above:\n\nfunction myprefix(ctx::DynamicPPL.PrefixContext, vn::VarName)\n # Pass to the child context first\n new_vn = myprefix(childcontext(ctx), vn)\n # Then apply this context's prefix\n return AbstractPPL.prefix(new_vn, ctx.vn_prefix)\nend\n\nmyprefix(big_ctx, @varname(x))\n\na.b.x\n\n\nThis is a much better result! The implementation of related functions such as hasconditioned_nested and getconditioned_nested, under the hood, use a similar recursion scheme, so you will find that this is a common pattern when reading the source code of various prefixing-related functions. When editing this code, it is worth being mindful of this as a potential source of incorrectness.\n\n\n\n\n\n\nNote\n\n\n\nIf you have encountered left and right folds, the above discussion illustrates the difference between them: the wrong implementation of myprefix uses a left fold (which collects prefixes in the opposite order from which they are encountered), while the correct implementation uses a right fold.", "crumbs": [ "Get Started", "Developers", "DynamicPPL Contexts", "Conditioning and fixing in submodels" ] }, { "objectID": "developers/contexts/submodel-condition/index.html#loose-ends-1-manual-prefixing", "href": "developers/contexts/submodel-condition/index.html#loose-ends-1-manual-prefixing", "title": "Conditioning and fixing in submodels", "section": "Loose ends 1: Manual prefixing", "text": "Loose ends 1: Manual prefixing\nSometimes users may want to manually prefix a model, for example:\n\n@model function inner_manual()\n x ~ Normal()\n return y ~ Normal()\nend\n\n@model function outer_manual()\n return _unused ~ to_submodel(prefix(inner_manual(), :a), false)\nend\n\nouter_manual (generic function with 2 methods)\n\n\nIn this case, the VarName on the left-hand side of the tilde is not used, and the prefix is instead specified using the prefix function.\nThe way to deal with this follows on from the previous discussion. Specifically, we said that:\n\n[…] as long as prefixing is implemented by applying a PrefixContext on the outermost layer of the inner model context, this will be correctly combined […]\n\nWhen automatic prefixing is used, this application of PrefixContext occurs inside the tilde_assume!! method. In the manual prefixing case, we need to make sure that prefix(submodel::Model, ::Symbol) does the same thing, i.e. it inserts a PrefixContext at the outermost layer of submodel’s context. We can see that this is precisely what happens:\n\n@model f() = x ~ Normal()\n\nmodel = f()\nprefixed_model = prefix(model, :a)\n\n(model.context, prefixed_model.context)\n\n(DefaultContext(), PrefixContext{VarName{:a, typeof(identity)}, DefaultContext}(a, DefaultContext()))", "crumbs": [ "Get Started", "Developers", "DynamicPPL Contexts", "Conditioning and fixing in submodels" ] }, { "objectID": "developers/contexts/submodel-condition/index.html#loose-ends-2-fixedcontext", "href": "developers/contexts/submodel-condition/index.html#loose-ends-2-fixedcontext", "title": "Conditioning and fixing in submodels", "section": "Loose ends 2: FixedContext", "text": "Loose ends 2: FixedContext\nFinally, note that all of the above also applies to the interaction between PrefixContext and FixedContext, except that the functions have different names. (FixedContext behaves the same way as ConditionContext, except that unlike conditioned variables, fixed variables do not contribute to the log probability density.) This generally results in a large amount of code duplication, but the concepts that underlie both contexts are exactly the same.", "crumbs": [ "Get Started", "Developers", "DynamicPPL Contexts", "Conditioning and fixing in submodels" ] }, { "objectID": "developers/compiler/model-manual/index.html", "href": "developers/compiler/model-manual/index.html", "title": "Manually Defining a Model", "section": "", "text": "Traditionally, models in Turing are defined using the @model macro:\n\nusing Turing\n\n@model function gdemo(x)\n # Set priors.\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n\n # Observe each value of x.\n x .~ Normal(m, sqrt(s²))\n\n return nothing\nend\n\nmodel = gdemo([1.5, 2.0])\n\nDynamicPPL.Model{typeof(gdemo), (:x,), (), (), Tuple{Vector{Float64}}, Tuple{}, DynamicPPL.DefaultContext, false}(gdemo, (x = [1.5, 2.0],), NamedTuple(), DynamicPPL.DefaultContext())\n\n\nThe @model macro accepts a function definition and rewrites it such that call of the function generates a Model struct for use by the sampler.\nHowever, models can be constructed by hand without the use of a macro. Taking the gdemo model above as an example, the macro-based definition can be implemented as well (a bit less generally) with the macro-free version\n\nusing DynamicPPL\n\n# Create the model function.\nfunction gdemo2(model, varinfo, x)\n # Assume s² has an InverseGamma distribution.\n s², varinfo = DynamicPPL.tilde_assume!!(\n model.context, InverseGamma(2, 3), @varname(s²), varinfo\n )\n\n # Assume m has a Normal distribution.\n m, varinfo = DynamicPPL.tilde_assume!!(\n model.context, Normal(0, sqrt(s²)), @varname(m), varinfo\n )\n\n # Observe each value of x[i] according to a Normal distribution.\n for i in eachindex(x)\n _retval, varinfo = DynamicPPL.tilde_observe!!(\n model.context, Normal(m, sqrt(s²)), x[i], @varname(x[i]), varinfo\n )\n end\n\n # The final return statement should comprise both the original return\n # value and the updated varinfo.\n return nothing, varinfo\nend\n\n# The `false` type parameter here indicates that this model does not need\n# threadsafe evaluation (see the threadsafe evaluation page for details)\ngdemo2(x) = DynamicPPL.Model{false}(gdemo2, (; x))\n\n# Instantiate a Model object with our data variables.\nmodel2 = gdemo2([1.5, 2.0])\n\nModel{typeof(gdemo2), (:x,), (), (), Tuple{Vector{Float64}}, Tuple{}, DefaultContext, false}(gdemo2, (x = [1.5, 2.0],), NamedTuple(), DefaultContext())\n\n\nWe can sample from this model in the same way:\n\nchain = sample(model2, NUTS(), 1000; progress=false)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.8\n\n\n\n\nChains MCMC chain (1000×16×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 6.62 seconds\nCompute duration = 6.62 seconds\nparameters = s², m\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nThe subsequent pages in this section will show how the @model macro does this behind-the-scenes.\n\n\n\n Back to top", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "Manually Defining a Model" ] }, { "objectID": "developers/compiler/minituring-contexts/index.html", "href": "developers/compiler/minituring-contexts/index.html", "title": "A Mini Turing Implementation II: Contexts", "section": "", "text": "In the Mini Turing tutorial we developed a miniature version of the Turing language, to illustrate its core design. A passing mention was made of contexts. In this tutorial we develop that aspect of our mini Turing language further to demonstrate how and why contexts are an important part of Turing’s design.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "A Mini Turing Implementation II: Contexts" ] }, { "objectID": "developers/compiler/minituring-contexts/index.html#contexts-within-contexts", "href": "developers/compiler/minituring-contexts/index.html#contexts-within-contexts", "title": "A Mini Turing Implementation II: Contexts", "section": "Contexts within contexts", "text": "Contexts within contexts\nLet’s use the above two contexts to provide a slightly more general definition of the SamplingContext and the Metropolis-Hastings sampler we wrote in the mini Turing tutorial.\n\nstruct SamplingContext{S<:AbstractMCMC.AbstractSampler,R<:Random.AbstractRNG}\n rng::R\n sampler::S\n subcontext::Union{PriorContext, JointContext}\nend\n\nThe new aspect here is the subcontext field. Note that this is a context within a context! The idea is that we don’t need to hard code how the MCMC sampler evaluates the log probability, but rather can pass that work onto the subcontext. This way the same sampler can be used to sample from either the joint or the prior distribution.\nThe methods for SamplingContext are largely as in the our earlier mini Turing case, except they now pass some of the work onto the subcontext:\n\nfunction observe(context::SamplingContext, args...)\n # Sampling doesn't affect the observed values, so nothing to do here other than pass to\n # the subcontext.\n return observe(context.subcontext, args...)\nend\n\nstruct PriorSampler <: AbstractMCMC.AbstractSampler end\n\nfunction assume(context::SamplingContext{PriorSampler}, varinfo, dist, var_id)\n sample = Random.rand(context.rng, dist)\n varinfo[var_id] = (sample, NaN)\n # Once the value has been sampled, let the subcontext handle evaluating the log\n # probability.\n return assume(context.subcontext, varinfo, dist, var_id)\nend;\n\n# The subcontext field of the MHSampler determines which distribution this sampler\n# samples from.\nstruct MHSampler{D, T<:Real} <: AbstractMCMC.AbstractSampler\n sigma::T\n subcontext::D\nend\n\nMHSampler(subcontext) = MHSampler(1, subcontext)\n\nfunction assume(context::SamplingContext{<:MHSampler}, varinfo, dist, var_id)\n sampler = context.sampler\n old_value = varinfo.values[var_id]\n\n # propose a random-walk step, i.e, add the current value to a random\n # value sampled from a Normal distribution centred at 0\n value = rand(context.rng, Normal(old_value, sampler.sigma))\n varinfo[var_id] = (value, NaN)\n # Once the value has been sampled, let the subcontext handle evaluating the log\n # probability.\n return assume(context.subcontext, varinfo, dist, var_id)\nend;\n\n# The following three methods are identical to before, except for passing\n# `sampler.subcontext` to the context SamplingContext.\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG, model::MiniModel, sampler::MHSampler; kwargs...\n)\n vi = VarInfo()\n ctx = SamplingContext(rng, PriorSampler(), sampler.subcontext)\n model.f(vi, ctx, values(model.data)...)\n return vi, vi\nend\n\nfunction AbstractMCMC.step(\n rng::Random.AbstractRNG,\n model::MiniModel,\n sampler::MHSampler,\n prev_state::VarInfo; # is just the old trace\n kwargs...,\n)\n vi = prev_state\n new_vi = deepcopy(vi)\n ctx = SamplingContext(rng, sampler, sampler.subcontext)\n model.f(new_vi, ctx, values(model.data)...)\n\n # Compute log acceptance probability\n # Since the proposal is symmetric the computation can be simplified\n logα = sum(values(new_vi.logps)) - sum(values(vi.logps))\n\n # Accept proposal with computed acceptance probability\n if -Random.randexp(rng) < logα\n return new_vi, new_vi\n else\n return prev_state, prev_state\n end\nend;\n\nfunction AbstractMCMC.bundle_samples(\n samples, model::MiniModel, ::MHSampler, ::Any, ::Type{Chains}; kwargs...\n)\n # We get a vector of traces\n values = [sample.values for sample in samples]\n params = [key for key in keys(values[1]) if key ∉ keys(model.data)]\n vals = reduce(hcat, [value[p] for value in values] for p in params)\n # Composing the `Chains` data-structure, of which analysing infrastructure is provided\n chains = Chains(vals, params)\n return chains\nend;\n\nWe can use this to sample from the joint distribution just like before:\n\nsample(MiniModel(m, (x=3.0,)), MHSampler(JointContext()), 1_000_000; chain_type=Chains, progress=false)\n\nChains MCMC chain (1000000×2×1 Array{Float64, 3}):\n\nIterations = 1:1:1000000\nNumber of chains = 1\nSamples per chain = 1000000\nparameters = a, b\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nor we can choose to sample from the prior instead\n\nsample(MiniModel(m, (x=3.0,)), MHSampler(PriorContext()), 1_000_000; chain_type=Chains, progress=false)\n\nChains MCMC chain (1000000×2×1 Array{Float64, 3}):\n\nIterations = 1:1:1000000\nNumber of chains = 1\nSamples per chain = 1000000\nparameters = a, b\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nOf course, using an MCMC algorithm to sample from the prior is unnecessary and silly (PriorSampler exists, after all), but the point is to illustrate the flexibility of the context system. We could, for instance, use the same setup to implement an Approximate Bayesian Computation (ABC) algorithm.\nThe use of contexts also goes far beyond just evaluating log probabilities and sampling. Some examples from Turing are\n\nFixedContext, which fixes some variables to given values and removes them completely from the evaluation of any log probabilities. They power the Turing.fix and Turing.unfix functions.\nConditionContext conditions the model on fixed values for some parameters. They are used by Turing.condition and Turing.decondition, i.e. the model | (parameter=value,) syntax. The difference between fix and condition is whether the log probability for the corresponding variable is included in the overall log density.\nPriorExtractorContext collects information about what the prior distribution of each variable is.\nPrefixContext adds prefixes to variable names, allowing models to be used within other models without variable name collisions.\nPointwiseLikelihoodContext records the log likelihood of each individual variable.\nDebugContext collects useful debugging information while executing the model.\n\nAll of the above are what Turing calls parent contexts, which is to say that they all keep a subcontext just like our above SamplingContext did. Their implementations of assume and observe call the implementation of the subcontext once they are done doing their own work of fixing/conditioning/prefixing/etc. Contexts are often chained, so that e.g. a DebugContext may wrap within it a PrefixContext, which may in turn wrap a ConditionContext, etc. The only contexts that don’t have a subcontext in the Turing are the ones for evaluating the prior, likelihood, and joint distributions. These are called leaf contexts.\nThe above version of mini Turing is still much simpler than the full Turing language, but the principles of how contexts are used are the same.", "crumbs": [ "Get Started", "Developers", "DynamicPPL's Compiler", "A Mini Turing Implementation II: Contexts" ] }, { "objectID": "uri/initial-parameters.html", "href": "uri/initial-parameters.html", "title": "Troubleshooting - Initial parameters", "section": "", "text": "If you are not redirected, please click here.\n\nNo matching items\n Back to top" }, { "objectID": "getting-started/index.html", "href": "getting-started/index.html", "title": "Getting Started", "section": "", "text": "Installation\nTo use Turing, you need to install Julia first and then install Turing.\nYou will need to install Julia 1.10.8 or greater, which you can get from the official Julia website.\nTuring is officially registered in the Julia General package registry, which means that you can install a stable version of Turing by running the following in the Julia REPL:\n\nusing Pkg\nPkg.add(\"Turing\")\n\n\n\nSupported versions and platforms\nFormally, we only run continuous integration tests on: (1) the minimum supported minor version (typically an LTS release), and (2) the latest minor version of Julia. We test on Linux (x64), macOS (Apple Silicon), and Windows (x64). The Turing developer team will prioritise fixing issues on these platforms and versions.\nIf you run into a problem on a different version (e.g. older patch releases) or platforms (e.g. 32-bit), please do feel free to post an issue! If we are able to help, we will try to fix it, but we cannot guarantee support for untested versions.\n\n\nExample usage\nFirst, we load the Turing and StatsPlots modules. The latter is required for visualising the results.\n\nusing Turing\nusing StatsPlots\n\nWe then specify our model, which is a simple Gaussian model with unknown mean and variance. Models are defined as ordinary Julia functions, prefixed with the @model macro. Each statement inside closely resembles how the model would be defined with mathematical notation. Here, both x and y are observed values, and are therefore passed as function parameters. m and s² are the parameters to be inferred.\n\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n y ~ Normal(m, sqrt(s²))\nend\n\ngdemo (generic function with 2 methods)\n\n\nSuppose we observe x = 1.5 and y = 2, and want to infer the mean and variance. We can pass these data as arguments to the gdemo function, and run a sampler to collect the results. Here, we collect 1000 samples using the No U-Turn Sampler (NUTS) algorithm.\n\nchain = sample(gdemo(1.5, 2), NUTS(), 1000, progress=false)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.8\n\n\n\n\nChains MCMC chain (1000×16×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 6.56 seconds\nCompute duration = 6.56 seconds\nparameters = s², m\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nWe can plot the results:\n\nplot(chain)\n\n\n\n\nand obtain summary statistics by indexing the chain:\n\nmean(chain[:m]), mean(chain[:s²])\n\n(1.1629111921009405, 1.9036830525826844)\n\n\n\n\nWhere to go next\n\n\n\n\n\n\nNoteNote on prerequisites\n\n\n\nFamiliarity with Julia is assumed throughout the Turing documentation. If you are new to Julia, Learning Julia is a good starting point.\nThe underlying theory of Bayesian machine learning is not explained in detail in this documentation. A thorough introduction to the field is Pattern Recognition and Machine Learning (Bishop, 2006); an online version is available here (PDF, 18.1 MB).\n\n\nThe next page on Turing’s core functionality explains the basic features of the Turing language. From there, you can either look at worked examples of how different models are implemented in Turing, or specific tips and tricks that can help you get the most out of Turing.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Getting Started" ] }, { "objectID": "tutorials/hidden-markov-models/index.html", "href": "tutorials/hidden-markov-models/index.html", "title": "Hidden Markov Models", "section": "", "text": "This tutorial illustrates training Bayesian hidden Markov models (HMMs) using Turing. The main goals are learning the transition matrix, emission parameter, and hidden states. For a more rigorous academic overview of hidden Markov models, see An Introduction to Hidden Markov Models and Bayesian Networks (Ghahramani, 2001).\nIn this tutorial, we assume there are \\(k\\) discrete hidden states; the observations are continuous and normally distributed - centred around the hidden states. This assumption reduces the number of parameters to be estimated in the emission matrix.\nLet’s load the libraries we’ll need, and set a random seed for reproducibility.\n# Load libraries.\nusing Turing, StatsPlots, Random, Bijectors\n\n# Set a random seed\nRandom.seed!(12345678);", "crumbs": [ "Get Started", "Tutorials", "Hidden Markov Models" ] }, { "objectID": "tutorials/hidden-markov-models/index.html#simple-state-detection", "href": "tutorials/hidden-markov-models/index.html#simple-state-detection", "title": "Hidden Markov Models", "section": "Simple State Detection", "text": "Simple State Detection\nIn this example, we’ll use something where the states and emission parameters are straightforward.\n\n# Define the emission parameter.\ny = [fill(1.0, 6)..., fill(2.0, 6)..., fill(3.0, 7)...,\n fill(2.0, 4)..., fill(1.0, 7)...]\nN = length(y);\nK = 3;\n\n# Plot the data we just made.\nplot(y; xlim=(0, 30), ylim=(-1, 5), size=(500, 250), legend = false)\nscatter!(y, color = :blue; xlim=(0, 30), ylim=(-1, 5), size=(500, 250), legend = false)\n\n\n\n\nWe can see that we have three states, one for each height of the plot (1, 2, 3). This height is also our emission parameter, so state one produces a value of one, state two produces a value of two, and so on.\nUltimately, we would like to understand three major parameters:\n\nThe transition matrix. This is a matrix that assigns a probability of switching from one state to any other state, including the state that we are already in.\nThe emission parameters, which describes a typical value emitted by some state. In the plot above, the emission parameter for state one is simply one.\nThe state sequence is our understanding of what state we were actually in when we observed some data. This is very important in more sophisticated HMMs, where the emission value does not equal our state.\n\nWith this in mind, let’s set up our model. We are going to use some of our knowledge as modelers to provide additional information about our system. This takes the form of the prior on our emission parameter.\n\\[\nm_i \\sim \\mathrm{Normal}(i, 0.5) \\quad \\text{where} \\quad m = \\{1,2,3\\}\n\\]\nSimply put, this says that we expect state one to emit values in a Normally distributed manner, where the mean of each state’s emissions is that state’s value. The variance of 0.5 helps the model converge more quickly — consider the case where we have a variance of 1 or 2. In this case, the likelihood of observing a 2 when we are in state 1 is actually quite high, as it is within a standard deviation of the true emission value. Applying the prior that we are likely to be tightly centred around the mean prevents our model from being too confused about the state that is generating our observations.\nThe priors on our transition matrix are noninformative, using T[i] ~ Dirichlet(ones(K)/K). The Dirichlet prior used in this way assumes that the state is likely to change to any other state with equal probability. As we’ll see, this transition matrix prior will be overwritten as we observe data.\n\n# Turing model definition.\n@model function BayesHmm(y, K)\n # Get observation length.\n N = length(y)\n\n # State sequence.\n s = zeros(Int, N)\n\n # Emission matrix.\n m = Vector(undef, K)\n\n # Transition matrix.\n T = Vector{Vector}(undef, K)\n\n # Assign distributions to each element\n # of the transition matrix and the\n # emission matrix.\n for i in 1:K\n T[i] ~ Dirichlet(ones(K) / K)\n m[i] ~ Normal(i, 0.5)\n end\n\n # Observe each point of the input.\n s[1] ~ Categorical(K)\n y[1] ~ Normal(m[s[1]], 0.1)\n\n for i in 2:N\n s[i] ~ Categorical(vec(T[s[i - 1]]))\n y[i] ~ Normal(m[s[i]], 0.1)\n end\nend;\n\nWe will use a combination of two samplers (HMC and Particle Gibbs) by passing them to the Gibbs sampler. The Gibbs sampler allows for compositional inference, where different samplers can be applied to different parameters based on their properties. (For API details of these samplers, please see Turing.jl’s API documentation.)\nIn this case, we use HMC for m and T, representing the emission and transition matrices respectively. We use the Particle Gibbs sampler for s, the state sequence. You may wonder why it is that we are not assigning s to the HMC sampler, and why it is that we need compositional Gibbs sampling at all.\nThe parameter s is not a continuous variable. It is a vector of integers, and thus Hamiltonian methods like HMC and NUTS won’t work correctly. Gibbs allows us to apply the right tools to the best effect. If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use different automatic differentiation backends for each parameter space.\nTime to run our sampler.\n\ng = Gibbs((:m, :T) => HMC(0.01, 50), :s => PG(120))\nchn = sample(BayesHmm(y, 3), g, 1000)\n\nChains MCMC chain (1000×45×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 1145.25 seconds\nCompute duration = 1145.25 seconds\nparameters = T[1][1], T[1][2], T[1][3], m[1], T[2][1], T[2][2], T[2][3], m[2], T[3][1], T[3][2], T[3][3], m[3], s[1], s[2], s[3], s[4], s[5], s[6], s[7], s[8], s[9], s[10], s[11], s[12], s[13], s[14], s[15], s[16], s[17], s[18], s[19], s[20], s[21], s[22], s[23], s[24], s[25], s[26], s[27], s[28], s[29], s[30]\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nLet’s see how well our chain performed. Ordinarily, using display(chn) would be a good first step, but we have generated a lot of parameters here (s[1], s[2], m[1], and so on). It’s a bit easier to show how our model performed graphically.\nThe code below generates an animation showing the graph of the data above, and the data our model generates in each sample.\n\n# Extract our m and s parameters from the chain.\nm_set = MCMCChains.group(chn, :m).value\ns_set = MCMCChains.group(chn, :s).value\n\n# Iterate through the MCMC samples.\nNs = 1:length(chn)\n\n# Make an animation.\nanimation = @gif for i in Ns\n m = m_set[i, :]\n s = Int.(s_set[i, :])\n emissions = m[s]\n\n p = plot(\n y;\n chn=:red,\n size=(500, 250),\n xlabel=\"Time\",\n ylabel=\"State\",\n legend=:topright,\n label=\"True data\",\n xlim=(0, 30),\n ylim=(-1, 5),\n )\n plot!(emissions; color=:blue, label=\"Sample $i\")\nend every 3\n\n\n[ Info: Saved animation to /tmp/jl_adalmhGBQh.gif\n\n\n\n\n\n\n\nLooks like our model did a pretty good job, but we should also check to make sure our chain converges. A quick check is to examine whether the diagonal (representing the probability of remaining in the current state) of the transition matrix appears to be stationary. The code below extracts the diagonal and shows a traceplot of each persistence probability.\n\n# Index the chain with the persistence probabilities.\nsubchain = chn[[\"T[1][1]\", \"T[2][2]\", \"T[3][3]\"]]\n\nplot(subchain; seriestype=:traceplot, title=\"Persistence Probability\", legend=false)\n\n\n\n\nA cursory examination of the traceplot above indicates that all three chains converged to something resembling stationary. We can use the diagnostic functions provided by MCMCChains to engage in some more formal tests, like the Heidelberg and Welch diagnostic:\n\nheideldiag(MCMCChains.group(chn, :T))[1]\n\n\nHeidelberger and Welch diagnostic - Chain 1\n\n parameters burnin stationarity pvalue mean halfwidth tes ⋯\n Symbol Int64 Bool Float64 Float64 Float64 Boo ⋯\n\n T[1][1] 0.0000 1.0000 0.0528 0.8866 0.0182 1.000 ⋯\n T[1][2] 200.0000 1.0000 0.1035 0.0538 0.0203 0.000 ⋯\n T[1][3] 500.0000 0.0000 0.0000 0.0486 0.0294 0.000 ⋯\n T[2][1] 400.0000 1.0000 0.0839 0.0949 0.0512 0.000 ⋯\n T[2][2] 500.0000 0.0000 0.0001 0.7064 0.1004 0.000 ⋯\n T[2][3] 500.0000 0.0000 0.0000 0.1987 0.1553 0.000 ⋯\n T[3][1] 500.0000 0.0000 0.0032 0.0777 0.1131 0.000 ⋯\n T[3][2] 500.0000 0.0000 0.0000 0.3093 0.1709 0.000 ⋯\n T[3][3] 500.0000 0.0000 0.0000 0.6130 0.2720 0.000 ⋯\n\n 1 column omitted\n\n\n\n\nThe p-values on the test suggest that we cannot reject the hypothesis that the observed sequence comes from a stationary distribution, so we can be reasonably confident that our transition matrix has converged to something reasonable.", "crumbs": [ "Get Started", "Tutorials", "Hidden Markov Models" ] }, { "objectID": "tutorials/hidden-markov-models/index.html#efficient-inference-with-the-forward-algorithm", "href": "tutorials/hidden-markov-models/index.html#efficient-inference-with-the-forward-algorithm", "title": "Hidden Markov Models", "section": "Efficient Inference With The Forward Algorithm", "text": "Efficient Inference With The Forward Algorithm\nWhile the above method works well for the simple example in this tutorial, some users may desire a more efficient method, especially when their model is more complicated. One simple way to improve inference is to marginalize out the hidden states of the model with an appropriate algorithm, calculating only the posterior over the continuous random variables. Not only does this allow more efficient inference via Rao-Blackwellization, but now we can sample our model with NUTS() alone, which is usually a much more performant MCMC kernel.\nThankfully, HiddenMarkovModels.jl provides an extremely efficient implementation of many algorithms related to hidden Markov models. This allows us to rewrite our model as:\n\nusing HiddenMarkovModels\nusing FillArrays\nusing LinearAlgebra\nusing LogExpFunctions\n\n\n@model function BayesHmm2(y, K)\n m ~ Bijectors.ordered(MvNormal([1.0, 2.0, 3.0], 0.5I))\n T ~ filldist(Dirichlet(fill(1/K, K)), K)\n\n hmm = HMM(softmax(ones(K)), copy(T'), [Normal(m[i], 0.1) for i in 1:K])\n @addlogprob! logdensityof(hmm, y)\nend\n\nchn2 = sample(BayesHmm2(y, 3), NUTS(), 1000)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.025\n\n\n\n\nChains MCMC chain (1000×26×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 7.46 seconds\nCompute duration = 7.46 seconds\nparameters = m[1], m[2], m[3], T[1, 1], T[2, 1], T[3, 1], T[1, 2], T[2, 2], T[3, 2], T[1, 3], T[2, 3], T[3, 3]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nWe can compare the chains of these two models, confirming the posterior estimate is similar (modulo label switching concerns with the Gibbs model):\n\n\nPlotting Chains\nplot(chn[\"m[1]\"], label = \"m[1], Model 1, Gibbs\", color = :lightblue)\nplot!(chn2[\"m[1]\"], label = \"m[1], Model 2, NUTS\", color = :blue)\nplot!(chn[\"m[2]\"], label = \"m[2], Model 1, Gibbs\", color = :pink)\nplot!(chn2[\"m[2]\"], label = \"m[2], Model 2, NUTS\", color = :red)\nplot!(chn[\"m[3]\"], label = \"m[3], Model 1, Gibbs\", color = :yellow)\nplot!(chn2[\"m[3]\"], label = \"m[3], Model 2, NUTS\", color = :orange)\n\n\n\n\n\n\nRecovering Marginalized Trajectories\nWe can use the viterbi() algorithm, also from the HiddenMarkovModels package, to recover the most probable state for each parameter set in our posterior sample:\n\n@model function BayesHmmRecover(y, K, IncludeGenerated = false)\n m ~ Bijectors.ordered(MvNormal([1.0, 2.0, 3.0], 0.5I))\n T ~ filldist(Dirichlet(fill(1/K, K)), K)\n\n hmm = HMM(softmax(ones(K)), copy(T'), [Normal(m[i], 0.1) for i in 1:K])\n @addlogprob! logdensityof(hmm, y)\n\n # Conditional generation of the hidden states.\n if IncludeGenerated\n seq, _ = viterbi(hmm, y)\n s := [m[s] for s in seq]\n end\nend\n\nchn_recover = sample(BayesHmmRecover(y, 3, true), NUTS(), 1000)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.009761810302734378\n\n\n\n\nChains MCMC chain (1000×56×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 3.7 seconds\nCompute duration = 3.7 seconds\nparameters = m[1], m[2], m[3], T[1, 1], T[2, 1], T[3, 1], T[1, 2], T[2, 2], T[3, 2], T[1, 3], T[2, 3], T[3, 3], s[1], s[2], s[3], s[4], s[5], s[6], s[7], s[8], s[9], s[10], s[11], s[12], s[13], s[14], s[15], s[16], s[17], s[18], s[19], s[20], s[21], s[22], s[23], s[24], s[25], s[26], s[27], s[28], s[29], s[30]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nPlotting the estimated states, we can see that the results align well with our expectations:\n\np = plot(xlim=(0, 30), ylim=(-1, 5), size=(500, 250))\nfor i in 1:100\n ind = rand(DiscreteUniform(1, 1000))\n plot!(MCMCChains.group(chn_recover, :s).value[ind,:], color = :grey, opacity = 0.1, legend = :false)\nend\nscatter!(y, color = :blue)\n\np", "crumbs": [ "Get Started", "Tutorials", "Hidden Markov Models" ] }, { "objectID": "tutorials/gaussian-processes-introduction/index.html", "href": "tutorials/gaussian-processes-introduction/index.html", "title": "Gaussian Processes: Introduction", "section": "", "text": "JuliaGPs packages integrate well with Turing.jl because they implement the Distributions.jl interface. This tutorial assumes basic knowledge of Gaussian processes (i.e., a general understanding of what they are); for a comprehensive introduction, see Rasmussen and Williams (2006). For a more in-depth understanding of the JuliaGPs functionality used here, please consult the JuliaGPs docs.\nIn this tutorial, we will model the putting dataset discussed in Chapter 21 of Bayesian Data Analysis. The dataset comprises the result of measuring how often a golfer successfully gets the ball in the hole, depending on how far away from it they are. The goal of inference is to estimate the probability of any given shot being successful at a given distance.\n\nLet’s download the data and take a look at it:\n\nusing CSV, DataFrames\n\ndf = CSV.read(\"golf.dat\", DataFrame; delim=' ', ignorerepeated=true)\ndf[1:5, :]\n\n5×3 DataFrame\n\n\n\nRow\ndistance\nn\ny\n\n\n\nInt64\nInt64\nInt64\n\n\n\n\n1\n2\n1443\n1346\n\n\n2\n3\n694\n577\n\n\n3\n4\n455\n337\n\n\n4\n5\n353\n208\n\n\n5\n6\n272\n149\n\n\n\n\n\n\nThese are the first 5 rows of the dataset (which comprises only 19 rows in total). Observe it has three columns:\n\ndistance – how far away from the hole. We will refer to distance as d throughout the rest of this tutorial\nn – how many shots were taken from a given distance\ny – how many shots were successful from a given distance\n\nWe will use a Binomial model for the data, whose success probability is parametrised by a transformation of a GP. Something along the lines of:\n\\[\n\\begin{aligned}\nf & \\sim \\operatorname{GP}(0, k) \\\\\ny_j \\mid f(d_j) & \\sim \\operatorname{Binomial}(n_j, g(f(d_j))) \\\\\ng(x) & := \\frac{1}{1 + e^{-x}}\n\\end{aligned}\n\\]\nTo do this, let’s define our Turing.jl model:\n\nusing AbstractGPs, LogExpFunctions, Turing\n\n@model function putting_model(d, n; jitter=1e-4)\n v ~ Gamma(2, 1)\n l ~ Gamma(4, 1)\n f = GP(v * with_lengthscale(SEKernel(), l))\n f_latent ~ f(d, jitter)\n y ~ product_distribution(Binomial.(n, logistic.(f_latent)))\n return (fx=f(d, jitter), f_latent=f_latent, y=y)\nend\n\nputting_model (generic function with 2 methods)\n\n\nWe first define an AbstractGPs.GP, which represents a distribution over functions, and is entirely separate from Turing.jl. We place a prior over its variance v and length-scale l. f(d, jitter) constructs the multivariate Gaussian comprising the random variables in f whose indices are in d (plus a bit of independent Gaussian noise with variance jitter – see the docs for more details). f(d, jitter) has the type AbstractMvNormal, and is the bit of AbstractGPs.jl that implements the Distributions.jl interface, so it’s legal to put it on the right-hand side of a ~. From this you should deduce that f_latent is distributed according to a multivariate Gaussian. The remaining lines comprise standard Turing.jl code that is encountered in other tutorials and Turing documentation.\nBefore performing inference, we might want to inspect the prior that our model places over the data, to see whether there is anything obviously wrong. These kinds of prior predictive checks are straightforward to perform using Turing.jl, since it is possible to sample from the prior easily by just calling the model:\n\nm = putting_model(Float64.(df.distance), df.n)\nm().y\n\n19-element Vector{Int64}:\n 639\n 322\n 232\n 186\n 153\n 151\n 167\n 161\n 144\n 181\n 153\n 151\n 139\n 141\n 162\n 147\n 144\n 127\n 126\n\n\nWe make use of this to see what kinds of datasets we simulate from the prior:\n\nusing Plots\n\nfunction plot_data(d, n, y, xticks, yticks)\n ylims = (0, round(maximum(n), RoundUp; sigdigits=2))\n margin = -0.5 * Plots.mm\n plt = plot(; xticks=xticks, yticks=yticks, ylims=ylims, margin=margin, grid=false)\n bar!(plt, d, n; color=:red, label=\"\", alpha=0.5)\n bar!(plt, d, y; label=\"\", color=:blue, alpha=0.7)\n return plt\nend\n\n# Construct model and run some prior predictive checks.\nm = putting_model(Float64.(df.distance), df.n)\nhists = map(1:20) do j\n xticks = j > 15 ? :auto : nothing\n yticks = rem(j, 5) == 1 ? :auto : nothing\n return plot_data(df.distance, df.n, m().y, xticks, yticks)\nend\nplot(hists...; layout=(4, 5))\n\n\n\n\nIn this case, the only prior knowledge we is that the proportion of successful shots ought to decrease monotonically as the distance from the hole increases, which should show up in the data as the blue lines generally go down as we move from left to right on each graph. Unfortunately, there is not a simple way to enforce monotonicity in the samples from a GP, and we can see this in some of the plots above, so we must hope that we have enough data to ensure that this relationship holds approximately under the posterior. In any case, you can judge for yourself whether you think this is the most useful visualisation that we can perform; if you think there is something better to look at, please let us know!\nMoving on, we generate samples from the posterior using the default NUTS sampler. We’ll make use of ReverseDiff.jl, as it has better performance than ForwardDiff.jl on this example. See the automatic differentiation docs for more info.\n\nusing Random, ReverseDiff\n\nm_post = m | (y=df.y,)\nchn = sample(Xoshiro(123456), m_post, NUTS(; adtype=AutoReverseDiff()), 1_000, progress=false)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.0125\n\n\n\n\nChains MCMC chain (1000×35×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 148.83 seconds\nCompute duration = 148.83 seconds\nparameters = v, l, f_latent[1], f_latent[2], f_latent[3], f_latent[4], f_latent[5], f_latent[6], f_latent[7], f_latent[8], f_latent[9], f_latent[10], f_latent[11], f_latent[12], f_latent[13], f_latent[14], f_latent[15], f_latent[16], f_latent[17], f_latent[18], f_latent[19]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nWe can use these samples and the posterior function from AbstractGPs to sample from the posterior probability of success at any distance we choose:\n\nd_pred = 1:0.2:21\nsamples = map(returned(m_post, chn)[1:10:end]) do x\n return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4)))\nend\np = plot()\nplot!(d_pred, reduce(hcat, samples); label=\"\", color=:blue, alpha=0.2)\nscatter!(df.distance, df.y ./ df.n; label=\"\", color=:red)\n\n\n\n\nWe can see that the general trend is indeed down as the distance from the hole increases, and that if we move away from the data, the posterior uncertainty quickly inflates. This suggests that the model is probably going to do a reasonable job of interpolating between observed data, but less good a job at extrapolating to larger distances.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Tutorials", "Gaussian Processes: Introduction" ] }, { "objectID": "tutorials/probabilistic-pca/index.html", "href": "tutorials/probabilistic-pca/index.html", "title": "Probabilistic PCA", "section": "", "text": "Principal component analysis (PCA) is a fundamental technique to analyse and visualise data. It is an unsupervised learning method mainly used for dimensionality reduction.\nFor example, we have a data matrix \\(\\mathbf{X} \\in \\mathbb{R}^{N \\times D}\\), and we would like to extract \\(k \\ll D\\) principal components which captures most of the information from the original matrix. The goal is to understand \\(\\mathbf{X}\\) through a lower dimensional subspace (e.g. two-dimensional subspace for visualisation convenience) spanned by the principal components.\nIn order to project the original data matrix into low dimensions, we need to find the principal directions where most of the variations of \\(\\mathbf{X}\\) lie in. Traditionally, this is implemented via singular value decomposition (SVD) which provides a robust and accurate computational framework for decomposing matrix into products of rotation-scaling-rotation matrices, particularly for large datasets (see an illustration here):\n\\[\n\\mathbf{X}_{N \\times D} = \\mathbf{U}_{N \\times r} \\times \\boldsymbol{\\Sigma}_{r \\times r} \\times \\mathbf{V}^T_{r \\times D}\n\\]\nwhere \\(\\Sigma_{r \\times r}\\) contains only \\(r := \\operatorname{rank} \\mathbf{X} \\leq \\min\\{N,D\\}\\) non-zero singular values of \\(\\mathbf{X}\\). If we pad \\(\\Sigma\\) with zeros and add arbitrary orthonormal columns to \\(\\mathbf{U}\\) and \\(\\mathbf{V}\\), we obtain the more compact form:1\n\\[\n\\mathbf{X}_{N \\times D} = \\mathbf{U}_{N \\times N} \\mathbf{\\Sigma}_{N \\times D} \\mathbf{V}_{D \\times D}^T\n\\]\nwhere \\(\\mathbf{U}\\) and \\(\\mathbf{V}\\) are unitary matrices (i.e. with orthonormal columns). Such a decomposition always exists for any matrix. Columns of \\(\\mathbf{V}\\) are the principal directions/axes. The percentage of variations explained can be calculated using the ratios of singular values.2\nHere we take a probabilistic perspective. For more details and a mathematical derivation, we recommend Bishop’s textbook (Christopher M. Bishop, Pattern Recognition and Machine Learning, 2006). The idea of probabilistic PCA is to find a latent variable \\(z\\) that can be used to describe the hidden structure in a dataset.3 Consider a data set \\(\\mathbf{X}_{D \\times N}=\\{x_i\\}\\) with \\(i=1,2,...,N\\) data points, where each data point \\(x_i\\) is \\(D\\)-dimensional (i.e. \\(x_i \\in \\mathcal{R}^D\\)). Note that, here we use the flipped version of the data matrix. We aim to represent the original \\(n\\) dimensional vector using a lower dimensional latent variable \\(z_i \\in \\mathcal{R}^k\\).\nWe first assume that each latent variable \\(z_i\\) is normally distributed:\n\\[\nz_i \\sim \\mathcal{N}(0, I)\n\\]\nand the corresponding data point is generated via projection:\n\\[\nx_i | z_i \\sim \\mathcal{N}(\\mathbf{W} z_i + \\boldsymbol{μ}, \\sigma^2 \\mathbf{I})\n\\]\nwhere the projection matrix \\(\\mathbf{W}_{D \\times k}\\) accommodates the principal axes. The above formula expresses \\(x_i\\) as a linear combination of the basis columns in the projection matrix W, where the combination coefficients sit in z_i (they are the coordinates of x_i in the new \\(k\\)-dimensional space.). We can also express the above formula in matrix form: \\(\\mathbf{X}_{D \\times N} \\approx \\mathbf{W}_{D \\times k} \\mathbf{Z}_{k \\times N}\\). We are interested in inferring \\(\\mathbf{W}\\), \\(μ\\) and \\(\\sigma\\).\nClassical PCA is the limiting case of probabilistic PCA when the noise variance approaches zero, i.e. \\(\\sigma^2 \\to 0\\). Probabilistic PCA generalises classical PCA; this connection can be seen by marginalising out the latent variable.4", "crumbs": [ "Get Started", "Tutorials", "Probabilistic PCA" ] }, { "objectID": "tutorials/probabilistic-pca/index.html#overview-of-pca", "href": "tutorials/probabilistic-pca/index.html#overview-of-pca", "title": "Probabilistic PCA", "section": "", "text": "Principal component analysis (PCA) is a fundamental technique to analyse and visualise data. It is an unsupervised learning method mainly used for dimensionality reduction.\nFor example, we have a data matrix \\(\\mathbf{X} \\in \\mathbb{R}^{N \\times D}\\), and we would like to extract \\(k \\ll D\\) principal components which captures most of the information from the original matrix. The goal is to understand \\(\\mathbf{X}\\) through a lower dimensional subspace (e.g. two-dimensional subspace for visualisation convenience) spanned by the principal components.\nIn order to project the original data matrix into low dimensions, we need to find the principal directions where most of the variations of \\(\\mathbf{X}\\) lie in. Traditionally, this is implemented via singular value decomposition (SVD) which provides a robust and accurate computational framework for decomposing matrix into products of rotation-scaling-rotation matrices, particularly for large datasets (see an illustration here):\n\\[\n\\mathbf{X}_{N \\times D} = \\mathbf{U}_{N \\times r} \\times \\boldsymbol{\\Sigma}_{r \\times r} \\times \\mathbf{V}^T_{r \\times D}\n\\]\nwhere \\(\\Sigma_{r \\times r}\\) contains only \\(r := \\operatorname{rank} \\mathbf{X} \\leq \\min\\{N,D\\}\\) non-zero singular values of \\(\\mathbf{X}\\). If we pad \\(\\Sigma\\) with zeros and add arbitrary orthonormal columns to \\(\\mathbf{U}\\) and \\(\\mathbf{V}\\), we obtain the more compact form:1\n\\[\n\\mathbf{X}_{N \\times D} = \\mathbf{U}_{N \\times N} \\mathbf{\\Sigma}_{N \\times D} \\mathbf{V}_{D \\times D}^T\n\\]\nwhere \\(\\mathbf{U}\\) and \\(\\mathbf{V}\\) are unitary matrices (i.e. with orthonormal columns). Such a decomposition always exists for any matrix. Columns of \\(\\mathbf{V}\\) are the principal directions/axes. The percentage of variations explained can be calculated using the ratios of singular values.2\nHere we take a probabilistic perspective. For more details and a mathematical derivation, we recommend Bishop’s textbook (Christopher M. Bishop, Pattern Recognition and Machine Learning, 2006). The idea of probabilistic PCA is to find a latent variable \\(z\\) that can be used to describe the hidden structure in a dataset.3 Consider a data set \\(\\mathbf{X}_{D \\times N}=\\{x_i\\}\\) with \\(i=1,2,...,N\\) data points, where each data point \\(x_i\\) is \\(D\\)-dimensional (i.e. \\(x_i \\in \\mathcal{R}^D\\)). Note that, here we use the flipped version of the data matrix. We aim to represent the original \\(n\\) dimensional vector using a lower dimensional latent variable \\(z_i \\in \\mathcal{R}^k\\).\nWe first assume that each latent variable \\(z_i\\) is normally distributed:\n\\[\nz_i \\sim \\mathcal{N}(0, I)\n\\]\nand the corresponding data point is generated via projection:\n\\[\nx_i | z_i \\sim \\mathcal{N}(\\mathbf{W} z_i + \\boldsymbol{μ}, \\sigma^2 \\mathbf{I})\n\\]\nwhere the projection matrix \\(\\mathbf{W}_{D \\times k}\\) accommodates the principal axes. The above formula expresses \\(x_i\\) as a linear combination of the basis columns in the projection matrix W, where the combination coefficients sit in z_i (they are the coordinates of x_i in the new \\(k\\)-dimensional space.). We can also express the above formula in matrix form: \\(\\mathbf{X}_{D \\times N} \\approx \\mathbf{W}_{D \\times k} \\mathbf{Z}_{k \\times N}\\). We are interested in inferring \\(\\mathbf{W}\\), \\(μ\\) and \\(\\sigma\\).\nClassical PCA is the limiting case of probabilistic PCA when the noise variance approaches zero, i.e. \\(\\sigma^2 \\to 0\\). Probabilistic PCA generalises classical PCA; this connection can be seen by marginalising out the latent variable.4", "crumbs": [ "Get Started", "Tutorials", "Probabilistic PCA" ] }, { "objectID": "tutorials/probabilistic-pca/index.html#the-gene-expression-example", "href": "tutorials/probabilistic-pca/index.html#the-gene-expression-example", "title": "Probabilistic PCA", "section": "The gene expression example", "text": "The gene expression example\nIn the first example, we illustrate:\n\nhow to specify the probabilistic model and\nhow to perform inference on \\(\\mathbf{W}\\), \\(\\boldsymbol{\\mu}\\) and \\(\\sigma\\) using MCMC.\n\nWe use simulated genome data to demonstrate these. The simulation is inspired by biological measurement of expression of genes in cells, and each cell is characterised by different gene features. While the human genome is (mostly) identical between all the cells in the body, there exist interesting differences in gene expression in different human tissues and disease conditions. One way to investigate certain diseases is to look at differences in gene expression in cells from patients and healthy controls (usually from the same tissue).\nUsually, we can assume that the changes in gene expression only affect a subset of all genes (and these can be linked to diseases in some way). One of the challenges for this kind of data is to explore the underlying structure, e.g. to make the connection between a certain state (healthy/disease) and gene expression. This becomes difficult when the dimension is very large (up to 20000 genes across 1000s of cells). So in order to find structure in this data, it is useful to project the data into a lower dimensional space.\nRegardless of the biological background, the more abstract problem formulation is to project the data living in high-dimensional space onto a representation in lower-dimensional space where most of the variation is concentrated in the first few dimensions. We use PCA to explore underlying structure or pattern which may not necessarily be obvious from looking at the raw data itself.\n\nStep 1: configuration of dependencies\nFirst, we load the dependencies used.\n\nusing Turing\nusing Mooncake\nusing LinearAlgebra, FillArrays\n\n# Packages for visualisation\nusing DataFrames, StatsPlots, Measures\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(1789);\n\nAll packages used in this tutorial are listed here. You can install them via using Pkg; Pkg.add(\"package_name\").\n\n\n\n\n\n\nCautionPackage usages:\n\n\n\nWe use DataFrames for instantiating matrices, LinearAlgebra and FillArrays to perform matrix operations; Turing for model specification and MCMC sampling, Mooncake for automatic differentiation when sampling. StatsPlots for visualising the results. , Measures for setting plot margin units. As all examples involve sampling, for reproducibility we set a fixed seed using the Random standard library.\n\n\n\n\nStep 2: Data generation\nHere, we simulate the biological gene expression problem described earlier. We simulate 60 cells, each cell has 9 gene features. This is a simplified problem with only a few cells and genes for demonstration purpose, which is not comparable to the complexity in real-life (e.g. thousands of features for each individual). Even so, spotting the structures or patterns in a 9-feature space would be a challenging task; it would be nice to reduce the dimensionality using p-PCA.\nBy design, we manually divide the 60 cells into two groups. the first 3 gene features of the first 30 cells have mean 10, while those of the last 30 cells have mean 10. These two groups of cells differ in the expression of genes.\n\nn_genes = 9 # D\nn_cells = 60 # N\n\n# create a diagonal block like expression matrix, with some non-informative genes;\n# not all features/genes are informative, some might just not differ very much between cells)\nmat_exp = randn(n_genes, n_cells)\nmat_exp[1:(n_genes ÷ 3), 1:(n_cells ÷ 2)] .+= 10\nmat_exp[(2 * (n_genes ÷ 3) + 1):end, (n_cells ÷ 2 + 1):end] .+= 10\n\n3×30 view(::Matrix{Float64}, 7:9, 31:60) with eltype Float64:\n 11.7413 10.5735 11.3817 8.50923 … 7.38716 8.7734 11.4717\n 9.28533 11.1225 9.43421 10.8904 11.6846 10.7264 9.64063\n 9.92113 8.42122 9.59885 9.90799 9.40715 8.40956 10.2522\n\n\nTo visualise the \\((D=9) \\times (N=60)\\) data matrix mat_exp, we use the heatmap plot.\n\nheatmap(\n mat_exp;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\nNote that:\n\nWe have made distinct feature differences between these two groups of cells (it is fairly obvious from looking at the raw data), in practice and with large enough data sets, it is often impossible to spot the differences from the raw data alone.\nIf you have some patience and compute resources you can increase the size of the dataset, or play around with the noise levels to make the problem increasingly harder.\n\n\n\nStep 3: Create the pPCA model\nHere we construct the probabilistic model pPCA(). As per the p-PCA formula, we think of each row (i.e. each gene feature) following an \\(N=60\\) dimensional multivariate normal distribution centred around the corresponding row of \\(\\mathbf{W}_{D \\times k} \\times \\mathbf{Z}_{k \\times N} + \\boldsymbol{\\mu}_{D \\times N}\\).\n\n@model function pPCA(X::AbstractMatrix{<:Real}, k::Int)\n # retrieve the dimension of input matrix X.\n N, D = size(X)\n\n # weights/loadings W\n W ~ filldist(Normal(), D, k)\n\n # latent variable z\n Z ~ filldist(Normal(), k, N)\n\n # mean offset\n μ ~ MvNormal(Eye(D))\n genes_mean = W * Z .+ reshape(μ, n_genes, 1)\n return X ~ arraydist([MvNormal(m, Eye(N)) for m in eachcol(genes_mean')])\nend;\n\nThe function pPCA() accepts:\n\na data array \\(\\mathbf{X}\\) (with no. of instances x dimension no. of features, NB: it is a transpose of the original data matrix);\nan integer \\(k\\) which indicates the dimension of the latent space (the space the original feature matrix is projected onto).\n\nSpecifically:\n\nit first extracts the dimension \\(D\\) and number of instances \\(N\\) of the input matrix;\ndraw samples of each entries of the projection matrix \\(\\mathbf{W}\\) from a standard normal;\ndraw samples of the latent variable \\(\\mathbf{Z}_{k \\times N}\\) from an MND;\ndraw samples of the offset \\(\\boldsymbol{\\mu}\\) from an MND, assuming uniform offset for all instances;\nFinally, we iterate through each gene dimension in \\(\\mathbf{X}\\), and define an MND for the sampling distribution (i.e. likelihood).\n\n\n\nStep 4: Sampling-based inference of the pPCA model\nHere we aim to perform MCMC sampling to infer the projection matrix \\(\\mathbf{W}_{D \\times k}\\), the latent variable matrix \\(\\mathbf{Z}_{k \\times N}\\), and the offsets \\(\\boldsymbol{\\mu}_{N \\times 1}\\).\nWe run the inference using the NUTS sampler. By default, sample samples a single chain (in this case with 500 samples). You can also use different samplers if you wish.\n\nsetprogress!(false)\n\n\nk = 2 # k is the dimension of the projected space, i.e. the number of principal components/axes of choice\nppca = pPCA(mat_exp', k) # instantiate the probabilistic model\nchain_ppca = sample(ppca, NUTS(; adtype=AutoMooncake()), 500);\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\n\n\nThe samples are saved in chain_ppca, which is an MCMCChains.Chains object. We can check its shape:\n\nsize(chain_ppca) # (no. of iterations, no. of vars, no. of chains) = (500, 159, 1)\n\n(500, 161, 1)\n\n\nSampling statistics such as R-hat, ESS, mean estimates, and so on can also be obtained from this:\n\ndescribe(chain_ppca)\n\n\nChains MCMC chain (500×161×1 Array{Float64, 3}):\n\nIterations = 251:1:750\nNumber of chains = 1\nSamples per chain = 500\nWall duration = 145.44 seconds\nCompute duration = 145.44 seconds\nparameters = W[1, 1], W[2, 1], W[3, 1], W[4, 1], W[5, 1], W[6, 1], W[7, 1], W[8, 1], W[9, 1], W[1, 2], W[2, 2], W[3, 2], W[4, 2], W[5, 2], W[6, 2], W[7, 2], W[8, 2], W[9, 2], Z[1, 1], Z[2, 1], Z[1, 2], Z[2, 2], Z[1, 3], Z[2, 3], Z[1, 4], Z[2, 4], Z[1, 5], Z[2, 5], Z[1, 6], Z[2, 6], Z[1, 7], Z[2, 7], Z[1, 8], Z[2, 8], Z[1, 9], Z[2, 9], Z[1, 10], Z[2, 10], Z[1, 11], Z[2, 11], Z[1, 12], Z[2, 12], Z[1, 13], Z[2, 13], Z[1, 14], Z[2, 14], Z[1, 15], Z[2, 15], Z[1, 16], Z[2, 16], Z[1, 17], Z[2, 17], Z[1, 18], Z[2, 18], Z[1, 19], Z[2, 19], Z[1, 20], Z[2, 20], Z[1, 21], Z[2, 21], Z[1, 22], Z[2, 22], Z[1, 23], Z[2, 23], Z[1, 24], Z[2, 24], Z[1, 25], Z[2, 25], Z[1, 26], Z[2, 26], Z[1, 27], Z[2, 27], Z[1, 28], Z[2, 28], Z[1, 29], Z[2, 29], Z[1, 30], Z[2, 30], Z[1, 31], Z[2, 31], Z[1, 32], Z[2, 32], Z[1, 33], Z[2, 33], Z[1, 34], Z[2, 34], Z[1, 35], Z[2, 35], Z[1, 36], Z[2, 36], Z[1, 37], Z[2, 37], Z[1, 38], Z[2, 38], Z[1, 39], Z[2, 39], Z[1, 40], Z[2, 40], Z[1, 41], Z[2, 41], Z[1, 42], Z[2, 42], Z[1, 43], Z[2, 43], Z[1, 44], Z[2, 44], Z[1, 45], Z[2, 45], Z[1, 46], Z[2, 46], Z[1, 47], Z[2, 47], Z[1, 48], Z[2, 48], Z[1, 49], Z[2, 49], Z[1, 50], Z[2, 50], Z[1, 51], Z[2, 51], Z[1, 52], Z[2, 52], Z[1, 53], Z[2, 53], Z[1, 54], Z[2, 54], Z[1, 55], Z[2, 55], Z[1, 56], Z[2, 56], Z[1, 57], Z[2, 57], Z[1, 58], Z[2, 58], Z[1, 59], Z[2, 59], Z[1, 60], Z[2, 60], μ[1], μ[2], μ[3], μ[4], μ[5], μ[6], μ[7], μ[8], μ[9]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nSummary Statistics\n\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n W[1, 1] -1.9251 1.8708 0.6037 7.2451 40.6447 1.0587 ⋯\n W[2, 1] -2.0109 1.9409 0.6299 8.1406 39.3717 1.0236 ⋯\n W[3, 1] -1.8695 1.8340 0.5923 6.5452 40.9573 1.0899 ⋯\n W[4, 1] -0.0163 0.2580 0.1967 1.7235 19.2569 1.5884 ⋯\n W[5, 1] -0.0959 0.4030 0.3194 1.5578 14.0343 1.7341 ⋯\n W[6, 1] -0.0390 0.1615 0.0109 301.6725 160.1371 1.0028 ⋯\n W[7, 1] 1.9819 1.9059 0.6386 8.7858 25.4058 1.0281 ⋯\n W[8, 1] 1.9469 1.9128 0.6119 6.5543 41.1891 1.0926 ⋯\n W[9, 1] 2.0062 1.9262 0.6231 7.7689 40.2406 1.0394 ⋯\n W[1, 2] 0.5162 2.1197 1.7752 1.4773 18.1857 1.8307 ⋯\n W[2, 2] 0.3824 2.1661 1.8203 1.4850 19.7338 1.8192 ⋯\n W[3, 2] 0.6604 2.1235 1.7604 1.4723 17.6845 1.8380 ⋯\n W[4, 2] -0.2079 0.2630 0.0745 13.8777 41.1502 1.0170 ⋯\n W[5, 2] -0.3258 0.3625 0.1027 12.9779 42.0360 1.0193 ⋯\n W[6, 2] 0.0209 0.1846 0.0153 148.5289 234.9666 1.0270 ⋯\n ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱\n\n 1 column and 132 rows omitted\n\nQuantiles\n\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n W[1, 1] -3.7043 -3.2527 -2.7943 -1.0277 2.9479\n W[2, 1] -3.7707 -3.3511 -2.9091 -1.2026 3.0032\n W[3, 1] -3.6346 -3.2250 -2.6170 -0.9398 2.9965\n W[4, 1] -0.5113 -0.1813 -0.0137 0.1477 0.5035\n W[5, 1] -0.7646 -0.3767 -0.1488 0.1618 0.7650\n W[6, 1] -0.3381 -0.1507 -0.0444 0.0680 0.2754\n W[7, 1] -2.9064 1.3107 2.8466 3.2825 3.6952\n W[8, 1] -3.1397 0.9817 2.7474 3.3339 3.7901\n W[9, 1] -2.9964 1.1223 2.8992 3.3724 3.8124\n W[1, 2] -3.4655 -1.1495 0.6408 2.3105 3.5224\n W[2, 2] -3.6344 -1.3491 0.5320 2.2120 3.6353\n W[3, 2] -3.4329 -0.9057 0.9862 2.4670 3.5747\n W[4, 2] -0.6371 -0.3778 -0.2531 -0.0641 0.4396\n W[5, 2] -0.8475 -0.5770 -0.4193 -0.1246 0.5472\n W[6, 2] -0.3612 -0.1029 0.0100 0.1495 0.3607\n ⋮ ⋮ ⋮ ⋮ ⋮ ⋮\n\n 132 rows omitted\n\n\n\n\n\n\nStep 5: posterior predictive checks\nWe try to reconstruct the input data using the posterior mean as parameter estimates. We first retrieve the samples for the projection matrix W from chain_ppca. This can be done using the Julia group(chain, parameter_name) function. Then we calculate the mean value for each element in \\(W\\), averaging over the whole chain of samples.\n\n# Extract parameter estimates for predicting x - mean of posterior\nW = reshape(mean(group(chain_ppca, :W))[:, 2], (n_genes, k))\nZ = reshape(mean(group(chain_ppca, :Z))[:, 2], (k, n_cells))\nμ = mean(group(chain_ppca, :μ))[:, 2]\n\nmat_rec = W * Z .+ repeat(μ; inner=(1, n_cells))\n\n9×60 Matrix{Float64}:\n 6.44097 6.53263 6.75585 … 3.24598 3.35023 3.10137\n 6.76423 6.83596 7.12461 3.63275 3.59412 3.23961\n 6.42376 6.53713 6.69766 3.11209 3.36292 3.21709\n -0.159498 -0.191214 -0.114262 0.092049 -0.106665 -0.236692\n 0.0897361 0.0404874 0.175911 0.392096 0.0667245 -0.157516\n -0.0101966 -0.00674884 -0.00589838 … -0.0885817 -0.0767007 -0.075484\n 3.43481 3.39128 3.04403 6.28822 6.49317 6.94952\n 3.35041 3.23698 3.05911 6.75939 6.52662 6.69669\n 3.18825 3.1074 2.84088 6.39205 6.37354 6.69064\n\n\n\nheatmap(\n mat_rec;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\nWe can quantitatively check the absolute magnitudes of the column average of the gap between mat_exp and mat_rec:\n\ndiff_matrix = mat_exp .- mat_rec\nfor col in 4:6\n @assert abs(mean(diff_matrix[:, col])) <= 0.5\nend\n\nWe observe that, using posterior mean, the recovered data matrix mat_rec has values align with the original data matrix - particularly the same pattern in the first and last 3 gene features are captured, which implies the inference and p-PCA decomposition are successful. This is satisfying as we have just projected the original 9-dimensional space onto a 2-dimensional space - some info has been cut off in the projection process, but we haven’t lost any important info, e.g. the key differences between the two groups. The is the desirable property of PCA: it picks up the principal axes along which most of the (original) data variations cluster, and remove those less relevant. If we choose the reduced space dimension \\(k\\) to be exactly \\(D\\) (the original data dimension), we would recover exactly the same original data matrix mat_exp, i.e. all information will be preserved.\nNow we have represented the original high-dimensional data in two dimensions, without losing the key information about the two groups of cells in the input data. Finally, the benefits of performing PCA is to analyse and visualise the dimension-reduced data in the projected, low-dimensional space. we save the dimension-reduced matrix \\(\\mathbf{Z}\\) as a DataFrame, rename the columns and visualise the first two dimensions.\n\ndf_pca = DataFrame(Z', :auto)\nrename!(df_pca, Symbol.([\"z\" * string(i) for i in collect(1:k)]))\ndf_pca[!, :type] = repeat([1, 2]; inner=n_cells ÷ 2)\n\nscatter(df_pca[:, :z1], df_pca[:, :z2]; xlabel=\"z1\", ylabel=\"z2\", group=df_pca[:, :type])\n\n\n\n\nWe see the two groups are well separated in this 2-D space. As an unsupervised learning method, performing PCA on this dataset gives membership for each cell instance. Another way to put it: 2 dimensions is enough to capture the main structure of the data.\n\n\nFurther extension: automatic choice of the number of principal components with ARD\nA direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data? This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space. In the case of PCA, there exist a lot of heuristics to make that choice. For example, We can tune the number of principal components using empirical methods such as cross-validation based on some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained 5.\nFor p-PCA, this can be done in an elegant and principled way, using a technique called Automatic Relevance Determination (ARD). ARD can help pick the correct number of principal directions by regularising the solution space using a parameterised, data-dependent prior distribution that effectively prunes away redundant or superfluous features 6. Essentially, we are using a specific prior over the factor loadings \\(\\mathbf{W}\\) that allows us to prune away dimensions in the latent space. The prior is determined by a precision hyperparameter \\(\\alpha\\). Here, smaller values of \\(\\alpha\\) correspond to more important components. You can find more details about this in, for example, Bishop (2006) 7.\n\n@model function pPCA_ARD(X)\n # Dimensionality of the problem.\n N, D = size(X)\n\n # latent variable Z\n Z ~ filldist(Normal(), D, N)\n\n # weights/loadings w with Automatic Relevance Determination part\n α ~ filldist(Gamma(1.0, 1.0), D)\n W ~ filldist(MvNormal(zeros(D), 1.0 ./ sqrt.(α)), D)\n\n mu = (W' * Z)'\n\n tau ~ Gamma(1.0, 1.0)\n return X ~ arraydist([MvNormal(m, 1.0 / sqrt(tau)) for m in eachcol(mu)])\nend;\n\nInstead of drawing samples of each entry in \\(\\mathbf{W}\\) from a standard normal, this time we repeatedly draw \\(D\\) samples from the \\(D\\)-dimensional MND, forming a \\(D \\times D\\) matrix \\(\\mathbf{W}\\). This matrix is a function of \\(\\alpha\\) as the samples are drawn from the MND parameterised by \\(\\alpha\\). We also introduce a hyper-parameter \\(\\tau\\) which is the precision in the sampling distribution. We also re-parametrise the sampling distribution, i.e. each dimension across all instances is a 60-dimensional multivariate normal distribution. Re-parametrisation can sometimes accelerate the sampling process.\nWe instantiate the model and ask Turing to sample from it using NUTS sampler. The sample trajectories of \\(\\alpha\\) is plotted using the plot function from the package StatsPlots.\n\nppca_ARD = pPCA_ARD(mat_exp') # instantiate the probabilistic model\nchain_ppcaARD = sample(ppca_ARD, NUTS(; adtype=AutoMooncake()), 500) # sampling\nplot(group(chain_ppcaARD, :α); margin=6.0mm)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.0125\n\n\n\n\n\n\n\nAgain, we do some inference diagnostics. Here we look at the convergence of the chains for the \\(α\\) parameter. This parameter determines the relevance of individual components. We see that the chains have converged and the posterior of the \\(\\alpha\\) parameters is centred around much smaller values in two instances. In the following, we will use the mean of the small values to select the relevant dimensions (remember that, smaller values of \\(\\alpha\\) correspond to more important components.). We can clearly see from the values of \\(\\alpha\\) that there should be two dimensions (corresponding to \\(\\bar{\\alpha}_3=\\bar{\\alpha}_5≈0.05\\)) for this dataset.\n\n# Extract parameter mean estimates of the posterior\nW = permutedims(reshape(mean(group(chain_ppcaARD, :W))[:, 2], (n_genes, n_genes)))\nZ = permutedims(reshape(mean(group(chain_ppcaARD, :Z))[:, 2], (n_genes, n_cells)))'\nα = mean(group(chain_ppcaARD, :α))[:, 2]\nplot(α; label=\"α\")\n\n\n\n\nWe can inspect α to see which elements are small (i.e. high relevance). To do this, we first sort α using sortperm() (in ascending order by default), and record the indices of the first two smallest values (among the \\(D=9\\) \\(\\alpha\\) values). After picking the desired principal directions, we extract the corresponding subset loading vectors from \\(\\mathbf{W}\\), and the corresponding dimensions of \\(\\mathbf{Z}\\). We obtain a posterior predicted matrix \\(\\mathbf{X} \\in \\mathbb{R}^{2 \\times 60}\\) as the product of the two sub-matrices, and compare the recovered info with the original matrix.\n\nα_indices = sortperm(α)[1:2]\nk = size(α_indices)[1]\nX_rec = W[:, α_indices] * Z[α_indices, :]\n\ndf_rec = DataFrame(X_rec', :auto)\nheatmap(\n X_rec;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\nWe observe that, the data in the original space is recovered with key information, the distinct feature values in the first and last three genes for the two cell groups, are preserved. We can also examine the data in the dimension-reduced space, i.e. the selected components (rows) in \\(\\mathbf{Z}\\).\n\ndf_pro = DataFrame(Z[α_indices, :]', :auto)\nrename!(df_pro, Symbol.([\"z\" * string(i) for i in collect(1:k)]))\ndf_pro[!, :type] = repeat([1, 2]; inner=n_cells ÷ 2)\nscatter(\n df_pro[:, 1], df_pro[:, 2]; xlabel=\"z1\", ylabel=\"z2\", color=df_pro[:, \"type\"], label=\"\"\n)\n\n\n\n\nThis plot is very similar to the low-dimensional plot above, with the relevant dimensions chosen based on the values of \\(α\\) via ARD. When you are in doubt about the number of dimensions to project onto, ARD might provide an answer to that question.", "crumbs": [ "Get Started", "Tutorials", "Probabilistic PCA" ] }, { "objectID": "tutorials/probabilistic-pca/index.html#final-comments.", "href": "tutorials/probabilistic-pca/index.html#final-comments.", "title": "Probabilistic PCA", "section": "Final comments.", "text": "Final comments.\np-PCA is a linear map which linearly transforms the data between the original and projected spaces. It can also be thought of as a matrix factorisation method, in which \\(\\mathbf{X}=(\\mathbf{W} \\times \\mathbf{Z})^T\\). The projection matrix can be understood as a new basis in the projected space, and \\(\\mathbf{Z}\\) are the new coordinates.", "crumbs": [ "Get Started", "Tutorials", "Probabilistic PCA" ] }, { "objectID": "tutorials/probabilistic-pca/index.html#footnotes", "href": "tutorials/probabilistic-pca/index.html#footnotes", "title": "Probabilistic PCA", "section": "Footnotes", "text": "Footnotes\n\n\nGilbert Strang, Introduction to Linear Algebra, 5th Ed., Wellesley-Cambridge Press, 2016.↩︎\nGareth M. James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer, 2013.↩︎\nProbabilistic PCA by TensorFlow, “https://www.tensorflow.org/probability/examples/Probabilistic_PCA”.↩︎\nProbabilistic PCA by TensorFlow, “https://www.tensorflow.org/probability/examples/Probabilistic_PCA”.↩︎\nGareth M. James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer, 2013.↩︎\nDavid Wipf, Srikantan Nagarajan, A New View of Automatic Relevance Determination, NIPS 2007.↩︎\nChristopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.↩︎", "crumbs": [ "Get Started", "Tutorials", "Probabilistic PCA" ] }, { "objectID": "tutorials/gaussian-process-latent-variable-models/index.html", "href": "tutorials/gaussian-process-latent-variable-models/index.html", "title": "Gaussian Process Latent Variable Models", "section": "", "text": "In a previous tutorial, we have discussed latent variable models, in particular probabilistic principal component analysis (pPCA). Here, we show how we can extend the mapping provided by pPCA to non-linear mappings between input and output. For more details about the Gaussian Process Latent Variable Model (GPLVM), we refer the reader to the original publication and a further extension.\nIn short, the GPVLM is a dimensionality reduction technique that allows us to embed a high-dimensional dataset in a lower-dimensional embedding. Importantly, it provides the advantage that the linear mappings from the embedded space can be non-linearised through the use of Gaussian Processes.\n\nLet’s start by loading some dependencies.\n\nusing Turing\nusing AbstractGPs\nusing FillArrays\nusing LaTeXStrings\nusing Plots\nusing RDatasets\nusing ReverseDiff\nusing StatsBase\n\nusing LinearAlgebra\nusing Random\n\nRandom.seed!(1789);\n\nWe demonstrate the GPLVM with a very small dataset: Fisher’s Iris data set. This is mostly for reasons of run time, so the tutorial can be run quickly. As you will see, one of the major drawbacks of using GPs is their speed, although this is an active area of research. We will briefly touch on some ways to speed things up at the end of this tutorial. We transform the original data with non-linear operations in order to demonstrate the power of GPs to work on non-linear relationships, while keeping the problem reasonably small.\n\ndata = dataset(\"datasets\", \"iris\")\nspecies = data[!, \"Species\"]\nindex = shuffle(1:150)\n# we extract the four measured quantities,\n# so the dimension of the data is only d=4 for this toy example\ndat = Matrix(data[index, 1:4])\nlabels = data[index, \"Species\"]\n\n# non-linearize data to demonstrate ability of GPs to deal with non-linearity\ndat[:, 1] = 0.5 * dat[:, 1] .^ 2 + 0.1 * dat[:, 1] .^ 3\ndat[:, 2] = dat[:, 2] .^ 3 + 0.2 * dat[:, 2] .^ 4\ndat[:, 3] = 0.1 * exp.(dat[:, 3]) - 0.2 * dat[:, 3] .^ 2\ndat[:, 4] = 0.5 * log.(dat[:, 4]) .^ 2 + 0.01 * dat[:, 3] .^ 5\n\n# normalise data\ndt = fit(ZScoreTransform, dat; dims=1);\nStatsBase.transform!(dt, dat);\n\nWe will start out by demonstrating the basic similarity between pPCA (see the tutorial on this topic) and the GPLVM model. Indeed, pPCA is basically equivalent to running the GPLVM model with an automatic relevance determination (ARD) linear kernel.\nFirst, we re-introduce the pPCA model (see the tutorial on pPCA for details)\n\n@model function pPCA(x)\n # Dimensionality of the problem.\n N, D = size(x)\n # latent variable z\n z ~ filldist(Normal(), D, N)\n # weights/loadings W\n w ~ filldist(Normal(), D, D)\n mu = (w * z)'\n for d in 1:D\n x[:, d] ~ MvNormal(mu[:, d], I)\n end\n return nothing\nend;\n\nWe define two different kernels, a simple linear kernel with an Automatic Relevance Determination transform and a squared exponential kernel.\n\nlinear_kernel(α) = LinearKernel() ∘ ARDTransform(α)\nsekernel(α, σ) = σ * SqExponentialKernel() ∘ ARDTransform(α);\n\nAnd here is the GPLVM model. We create separate models for the two types of kernel.\n\n@model function GPLVM_linear(Y, K)\n # Dimensionality of the problem.\n N, D = size(Y)\n # K is the dimension of the latent space\n @assert K <= D\n noise = 1e-3\n\n # Priors\n α ~ MvLogNormal(MvNormal(Zeros(K), I))\n Z ~ filldist(Normal(), K, N)\n mu ~ filldist(Normal(), N)\n\n gp = GP(linear_kernel(α))\n gpz = gp(ColVecs(Z), noise)\n Y ~ filldist(MvNormal(mu, cov(gpz)), D)\n\n return nothing\nend;\n\n@model function GPLVM(Y, K)\n # Dimensionality of the problem.\n N, D = size(Y)\n # K is the dimension of the latent space\n @assert K <= D\n noise = 1e-3\n\n # Priors\n α ~ MvLogNormal(MvNormal(Zeros(K), I))\n σ ~ LogNormal(0.0, 1.0)\n Z ~ filldist(Normal(), K, N)\n mu ~ filldist(Normal(), N)\n\n gp = GP(sekernel(α, σ))\n gpz = gp(ColVecs(Z), noise)\n Y ~ filldist(MvNormal(mu, cov(gpz)), D)\n\n return nothing\nend;\n\n\n# Standard GPs don't scale very well in n, so we use a small subsample for the purpose of this tutorial\nn_data = 40\n# number of features to use from dataset\nn_features = 4\n# latent dimension for GP case\nndim = 4;\n\n\nppca = pPCA(dat[1:n_data, 1:n_features])\nchain_ppca = sample(ppca, NUTS{Turing.ReverseDiffAD{true}}(), 1000);\n\n\n# we extract the posterior mean estimates of the parameters from the chain\nz_mean = reshape(mean(group(chain_ppca, :z))[:, 2], (n_features, n_data))\nscatter(z_mean[1, :], z_mean[2, :]; group=labels[1:n_data], xlabel=L\"z_1\", ylabel=L\"z_2\")\n\nWe can see that the pPCA fails to distinguish the groups. In particular, the setosa species is not clearly separated from versicolor and virginica. This is due to the non-linearities that we introduced, as without them the two groups can be clearly distinguished using pPCA (see the pPCA tutorial).\nLet’s try the same with our linear kernel GPLVM model.\n\ngplvm_linear = GPLVM_linear(dat[1:n_data, 1:n_features], ndim)\nchain_linear = sample(gplvm_linear, NUTS{Turing.ReverseDiffAD{true}}(), 500);\n\n\n# we extract the posterior mean estimates of the parameters from the chain\nz_mean = reshape(mean(group(chain_linear, :Z))[:, 2], (n_features, n_data))\nalpha_mean = mean(group(chain_linear, :α))[:, 2]\n\nalpha1, alpha2 = partialsortperm(alpha_mean, 1:2; rev=true)\nscatter(\n z_mean[alpha1, :],\n z_mean[alpha2, :];\n group=labels[1:n_data],\n xlabel=L\"z_{\\mathrm{ard}_1}\",\n ylabel=L\"z_{\\mathrm{ard}_2}\",\n)\n\nWe can see that similar to the pPCA case, the linear kernel GPLVM fails to distinguish between the two groups (setosa on the one hand, and virginica and verticolor on the other).\nFinally, we demonstrate that by changing the kernel to a non-linear function, we are able to separate the data again.\n\ngplvm = GPLVM(dat[1:n_data, 1:n_features], ndim)\nchain_gplvm = sample(gplvm, NUTS{Turing.ReverseDiffAD{true}}(), 500);\n\n\n# we extract the posterior mean estimates of the parameters from the chain\nz_mean = reshape(mean(group(chain_gplvm, :Z))[:, 2], (ndim, n_data))\nalpha_mean = mean(group(chain_gplvm, :α))[:, 2]\n\nalpha1, alpha2 = partialsortperm(alpha_mean, 1:2; rev=true)\nscatter(\n z_mean[alpha1, :],\n z_mean[alpha2, :];\n group=labels[1:n_data],\n xlabel=L\"z_{\\mathrm{ard}_1}\",\n ylabel=L\"z_{\\mathrm{ard}_2}\",\n)\n\n\nlet\n @assert abs(\n mean(z_mean[alpha1, labels[1:n_data] .== \"setosa\"]) -\n mean(z_mean[alpha1, labels[1:n_data] .!= \"setosa\"]),\n ) > 1\nend\n\nNow, the split between the two groups is visible again.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Tutorials", "Gaussian Process Latent Variable Models" ] }, { "objectID": "tutorials/bayesian-linear-regression/index.html", "href": "tutorials/bayesian-linear-regression/index.html", "title": "Bayesian Linear Regression", "section": "", "text": "Turing is powerful when applied to complex hierarchical models, but it can also be applied to common statistical procedures, like linear regression. This tutorial covers how to implement a linear regression model in Turing.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Linear Regression" ] }, { "objectID": "tutorials/bayesian-linear-regression/index.html#set-up", "href": "tutorials/bayesian-linear-regression/index.html#set-up", "title": "Bayesian Linear Regression", "section": "Set Up", "text": "Set Up\nWe begin by importing all the necessary libraries.\n\n# Import Turing.\nusing Turing\n\n# Package for loading the data set.\nusing RDatasets\n\n# Package for visualisation.\nusing StatsPlots\n\n# Functionality for splitting the data.\nusing MLUtils: splitobs\n\n# Functionality for constructing arrays with identical elements efficiently.\nusing FillArrays\n\n# Functionality for normalising the data and evaluating the model predictions.\nusing StatsBase\n\n# Functionality for working with scaled identity matrices.\nusing LinearAlgebra\n\n# For ensuring reproducibility.\nusing StableRNGs: StableRNG\n\n\nsetprogress!(false)\n\nWe will use the mtcars dataset from the RDatasets package. mtcars contains a variety of statistics on different car models, including their miles per gallon, number of cylinders, and horsepower, among others.\nWe want to know if we can construct a Bayesian linear regression model to predict the miles per gallon of a car, given the other statistics it has. Let us take a look at the data we have.\n\n# Load the dataset.\ndata = RDatasets.dataset(\"datasets\", \"mtcars\")\n\n# Show the first six rows of the dataset.\nfirst(data, 6)\n\n6×12 DataFrame\n\n\n\nRow\nModel\nMPG\nCyl\nDisp\nHP\nDRat\nWT\nQSec\nVS\nAM\nGear\nCarb\n\n\n\nString31\nFloat64\nInt64\nFloat64\nInt64\nFloat64\nFloat64\nFloat64\nInt64\nInt64\nInt64\nInt64\n\n\n\n\n1\nMazda RX4\n21.0\n6\n160.0\n110\n3.9\n2.62\n16.46\n0\n1\n4\n4\n\n\n2\nMazda RX4 Wag\n21.0\n6\n160.0\n110\n3.9\n2.875\n17.02\n0\n1\n4\n4\n\n\n3\nDatsun 710\n22.8\n4\n108.0\n93\n3.85\n2.32\n18.61\n1\n1\n4\n1\n\n\n4\nHornet 4 Drive\n21.4\n6\n258.0\n110\n3.08\n3.215\n19.44\n1\n0\n3\n1\n\n\n5\nHornet Sportabout\n18.7\n8\n360.0\n175\n3.15\n3.44\n17.02\n0\n0\n3\n2\n\n\n6\nValiant\n18.1\n6\n225.0\n105\n2.76\n3.46\n20.22\n1\n0\n3\n1\n\n\n\n\n\n\n\nsize(data)\n\n(32, 12)\n\n\nThe next step is to get our data ready for testing. We’ll split the mtcars dataset into two subsets, one for training our model and one for evaluating our model. Then, we separate the targets we want to learn (MPG, in this case) and standardise the datasets by subtracting each column’s mean and dividing by the standard deviation of that column. This standardisation ensures all features have similar scales (mean 0, standard deviation 1), which helps the sampler explore the parameter space more efficiently.\n\n# Remove the model column.\nselect!(data, Not(:Model))\n\n# Split our dataset 70%/30% into training/test sets.\ntrainset, testset = map(DataFrame, splitobs(StableRNG(468), data; at=0.7, shuffle=true))\n\n# Turing requires data in matrix form.\ntarget = :MPG\ntrain = Matrix(select(trainset, Not(target)))\ntest = Matrix(select(testset, Not(target)))\ntrain_target = trainset[:, target]\ntest_target = testset[:, target]\n\n# Standardise the features.\ndt_features = fit(ZScoreTransform, train; dims=1)\nStatsBase.transform!(dt_features, train)\nStatsBase.transform!(dt_features, test)\n\n# Standardise the targets.\ndt_targets = fit(ZScoreTransform, train_target)\nStatsBase.transform!(dt_targets, train_target)\nStatsBase.transform!(dt_targets, test_target);", "crumbs": [ "Get Started", "Tutorials", "Bayesian Linear Regression" ] }, { "objectID": "tutorials/bayesian-linear-regression/index.html#model-specification", "href": "tutorials/bayesian-linear-regression/index.html#model-specification", "title": "Bayesian Linear Regression", "section": "Model Specification", "text": "Model Specification\nIn a traditional frequentist model using OLS, our model might look like:\n\\[\n\\mathrm{MPG}_i = \\alpha + \\boldsymbol{\\beta}^\\mathsf{T}\\boldsymbol{X_i}\n\\]\nwhere \\(\\boldsymbol{\\beta}\\) is a vector of coefficients and \\(\\boldsymbol{X}\\) is a vector of inputs for observation \\(i\\). The Bayesian model we are more concerned with is the following:\n\\[\n\\mathrm{MPG}_i \\sim \\mathcal{N}(\\alpha + \\boldsymbol{\\beta}^\\mathsf{T}\\boldsymbol{X_i}, \\sigma^2)\n\\]\nwhere \\(\\alpha\\) is an intercept term common to all observations, \\(\\boldsymbol{\\beta}\\) is a coefficient vector, \\(\\boldsymbol{X_i}\\) is the observed data for car \\(i\\), and \\(\\sigma^2\\) is a common variance term.\nFor \\(\\sigma^2\\), we assign a prior of truncated(Normal(0, 100); lower=0). This is consistent with Andrew Gelman’s recommendations on noninformative priors for variance. The intercept term (\\(\\alpha\\)) is assumed to be normally distributed with a mean of zero and a variance of three. This represents our assumptions that miles per gallon can be explained mostly by our various variables, but a high variance term indicates our uncertainty about that. Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10. We do not know that our coefficients are different from zero, and we do not know which ones are likely to be the most important, so the variance term is quite high. Lastly, each observation \\(y_i\\) is distributed according to the calculated mu term given by \\(\\alpha + \\boldsymbol{\\beta}^\\mathsf{T}\\boldsymbol{X_i}\\).\n\n# Bayesian linear regression.\n@model function linear_regression(x, y)\n # Set variance prior.\n σ² ~ truncated(Normal(0, 100); lower=0)\n\n # Set intercept prior.\n intercept ~ Normal(0, sqrt(3))\n\n # Set the priors on our coefficients.\n nfeatures = size(x, 2)\n coefficients ~ MvNormal(Zeros(nfeatures), 10.0 * I)\n\n # Calculate all the mu terms.\n mu = intercept .+ x * coefficients\n return y ~ MvNormal(mu, σ² * I)\nend\n\nlinear_regression (generic function with 2 methods)\n\n\nWith our model specified, we can call the sampler. We will use the No U-Turn Sampler (NUTS) here.\n\nmodel = linear_regression(train, train_target)\nchain = sample(StableRNG(468), model, NUTS(), 20_000)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\n\n\nChains MCMC chain (20000×26×1 Array{Float64, 3}):\n\nIterations = 1001:1:21000\nNumber of chains = 1\nSamples per chain = 20000\nWall duration = 11.65 seconds\nCompute duration = 11.65 seconds\nparameters = σ², intercept, coefficients[1], coefficients[2], coefficients[3], coefficients[4], coefficients[5], coefficients[6], coefficients[7], coefficients[8], coefficients[9], coefficients[10]\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nWe can also check the densities and traces of the parameters visually using the plot functionality.\n\nplot(chain)\n\n\n\n\nIt looks like all parameters have converged.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Linear Regression" ] }, { "objectID": "tutorials/bayesian-linear-regression/index.html#comparing-to-ols", "href": "tutorials/bayesian-linear-regression/index.html#comparing-to-ols", "title": "Bayesian Linear Regression", "section": "Comparing to OLS", "text": "Comparing to OLS\nA satisfactory test of our model is to evaluate how well it predicts. Importantly, we want to compare our model to existing tools like OLS. The code below uses the GLM.jl package to generate a traditional OLS multiple regression model on the same data as our probabilistic model.\n\n# Import the GLM package.\nusing GLM\n\n# Perform multiple regression OLS.\ntrain_with_intercept = hcat(ones(size(train, 1)), train)\nols = lm(train_with_intercept, train_target)\n\n# Compute predictions on the training data set and unstandardise them.\ntrain_prediction_ols = GLM.predict(ols)\nStatsBase.reconstruct!(dt_targets, train_prediction_ols)\n\n# Compute predictions on the test data set and unstandardise them.\ntest_with_intercept = hcat(ones(size(test, 1)), test)\ntest_prediction_ols = GLM.predict(ols, test_with_intercept)\nStatsBase.reconstruct!(dt_targets, test_prediction_ols);\n\nThe function below accepts a chain and an input matrix and calculates predictions. We use the samples starting from sample 200 onwards, discarding the initial samples as burn-in to allow the sampler to reach the typical set.\n\n# Make a prediction given an input vector.\nfunction prediction(chain, x)\n p = get_params(chain[200:end, :, :])\n targets = p.intercept' .+ x * reduce(hcat, p.coefficients)'\n return vec(mean(targets; dims=2))\nend\n\nprediction (generic function with 1 method)\n\n\nWhen we make predictions, we unstandardise them so they are more understandable.\n\n# Calculate the predictions for the training and testing sets and unstandardise them.\ntrain_prediction_bayes = prediction(chain, train)\nStatsBase.reconstruct!(dt_targets, train_prediction_bayes)\ntest_prediction_bayes = prediction(chain, test)\nStatsBase.reconstruct!(dt_targets, test_prediction_bayes)\n\n# Show the predictions on the test data set.\nDataFrame(; MPG=testset[!, target], Bayes=test_prediction_bayes, OLS=test_prediction_ols)\n\n10×3 DataFrame\n\n\n\nRow\nMPG\nBayes\nOLS\n\n\n\nFloat64\nFloat64\nFloat64\n\n\n\n\n1\n30.4\n26.9667\n26.9093\n\n\n2\n19.2\n16.3201\n16.0834\n\n\n3\n14.3\n12.0142\n11.9393\n\n\n4\n22.8\n26.7045\n26.5984\n\n\n5\n22.8\n31.473\n32.153\n\n\n6\n13.3\n9.133\n8.83012\n\n\n7\n18.7\n17.0732\n17.1669\n\n\n8\n10.4\n13.6681\n13.821\n\n\n9\n19.7\n20.0141\n20.0243\n\n\n10\n15.2\n16.9798\n17.0774\n\n\n\n\n\n\nNow let’s evaluate the loss for each method, and each prediction set. We will use the mean squared error to evaluate loss, given by \\[\n\\mathrm{MSE} = \\frac{1}{n} \\sum_{i=1}^n {(y_i - \\hat{y_i})^2}\n\\] where \\(y_i\\) is the actual value (true MPG) and \\(\\hat{y_i}\\) is the predicted value using either OLS or Bayesian linear regression. A lower MSE indicates a closer fit to the data.\n\nprintln(\n \"Training set:\",\n \"\\n\\tBayes loss: \",\n msd(train_prediction_bayes, trainset[!, target]),\n \"\\n\\tOLS loss: \",\n msd(train_prediction_ols, trainset[!, target]),\n)\n\nprintln(\n \"Test set:\",\n \"\\n\\tBayes loss: \",\n msd(test_prediction_bayes, testset[!, target]),\n \"\\n\\tOLS loss: \",\n msd(test_prediction_ols, testset[!, target]),\n)\n\nTraining set:\n Bayes loss: 4.9158288763013545\n OLS loss: 4.909437479783827\nTest set:\n Bayes loss: 14.97282908418693\n OLS loss: 16.70400759383217\n\n\nWe can see from this that both linear regression techniques perform fairly similarly. The Bayesian linear regression approach performs worse on the training set, but better on the test set. This indicates that the Bayesian approach is more able to generalise to unseen data, i.e., it is not overfitting the training data as much.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Linear Regression" ] }, { "objectID": "tutorials/coin-flipping/index.html", "href": "tutorials/coin-flipping/index.html", "title": "Introduction: Coin Flipping", "section": "", "text": "This is the first of a series of guided tutorials on the Turing language. In this tutorial, we will use Bayesian inference to estimate the probability that a coin flip will result in heads, given a series of observations.\n\nSetup\nFirst, let us load some packages that we need to simulate a coin flip:\n\nusing Distributions\n\nusing Random\nRandom.seed!(12); # Set seed for reproducibility\n\nand to visualise our results.\n\nusing StatsPlots\n\nNote that Turing is not loaded here — we do not use it in this example. Next, we configure the data generating model. Let us set the true probability that a coin flip turns up heads\n\np_true = 0.5;\n\nand set the number of coin flips we will show our model.\n\nN = 100;\n\nWe simulate N coin flips by drawing N random samples from the Bernoulli distribution with success probability p_true. The draws are collected in a variable called data:\n\ndata = rand(Bernoulli(p_true), N);\n\nHere are the first five coin flips:\n\ndata[1:5]\n\n5-element Vector{Bool}:\n 1\n 0\n 0\n 0\n 1\n\n\n\n\nCoin Flipping Without Turing\nThe following example illustrates the effect of updating our beliefs with every piece of new evidence we observe.\nAssume that we are unsure about the probability of heads in a coin flip. To get an intuitive understanding of what “updating our beliefs” is, we will visualise the probability of heads in a coin flip after each observed evidence.\nWe begin by specifying a prior belief about the distribution of heads and tails in a coin toss. Here we choose a Beta distribution as prior distribution for the probability of heads. Before any coin flip is observed, we assume a uniform distribution \\(\\operatorname{U}(0, 1) = \\operatorname{Beta}(1, 1)\\) of the probability of heads. I.e., every probability is equally likely initially.\n\nprior_belief = Beta(1, 1);\n\nWith our priors set and our data at hand, we can perform Bayesian inference.\nThis is a fairly simple process. We expose one additional coin flip to our model every iteration, such that the first run only sees the first coin flip, while the last iteration sees all the coin flips. In each iteration we update our belief to an updated version of the original Beta distribution that accounts for the new proportion of heads and tails. The update is particularly simple since our prior distribution is a conjugate prior, which allows for analytical posterior computation. Note that such closed-form expressions (as implemented in the updated_belief function below) are not available for most models, which is why we need sampling methods like MCMC.\n\nfunction updated_belief(prior_belief::Beta, data::AbstractArray{Bool})\n # Count the number of heads and tails.\n heads = sum(data)\n tails = length(data) - heads\n\n # Update our prior belief in closed form (this is possible because we use a conjugate prior).\n return Beta(prior_belief.α + heads, prior_belief.β + tails)\nend\n\n# Show updated belief for increasing number of observations\n@gif for n in 0:N\n plot(\n updated_belief(prior_belief, data[1:n]);\n size=(500, 250),\n title=\"Updated belief after $n observations\",\n xlabel=\"probability of heads\",\n ylabel=\"\",\n legend=nothing,\n xlim=(0, 1),\n fill=0,\n α=0.3,\n w=3,\n )\n vline!([p_true])\nend\n\n\nGKS: cannot open display - headless operation mode active\n[ Info: Saved animation to /tmp/jl_un032VcJuD.gif\n\n\n\n\n\n\n\nThe animation above shows that with increasing evidence our belief about the probability of heads in a coin flip slowly adjusts towards the true value. The orange line in the animation represents the true probability of seeing heads on a single coin flip, while the mode of the distribution shows what the model believes the probability of a heads is given the evidence it has seen.\nFor the mathematically inclined, the \\(\\operatorname{Beta}\\) distribution is updated by adding each coin flip to the parameters \\(\\alpha\\) and \\(\\beta\\) of the distribution. Initially, the parameters are defined as \\(\\alpha = 1\\) and \\(\\beta = 1\\). Over time, with more and more coin flips, \\(\\alpha\\) and \\(\\beta\\) will be approximately equal to each other as we are equally likely to flip a heads or a tails.\nThe mean of the \\(\\operatorname{Beta}(\\alpha, \\beta)\\) distribution is\n\\[\\operatorname{E}[X] = \\dfrac{\\alpha}{\\alpha+\\beta}.\\]\nThis implies that the plot of the distribution will become centred around 0.5 for a large enough number of coin flips, as we expect \\(\\alpha \\approx \\beta\\).\nThe variance of the \\(\\operatorname{Beta}(\\alpha, \\beta)\\) distribution is\n\\[\\operatorname{var}[X] = \\dfrac{\\alpha\\beta}{(\\alpha + \\beta)^2 (\\alpha + \\beta + 1)}.\\]\nThus the variance of the distribution will approach 0 with more and more samples, as the denominator will grow faster than will the numerator. More samples means less variance. This implies that the distribution will reflect less uncertainty about the probability of receiving a heads and the plot will become more tightly centred around 0.5 for a large enough number of coin flips.\n\n\nCoin Flipping With Turing\nWe now move away from the closed-form expression above. We use Turing to specify the same model and to approximate the posterior distribution with samples. To do so, we first need to load Turing.\n\nusing Turing\n\nAdditionally, we load MCMCChains, a library for analysing and visualising the samples with which we approximate the posterior distribution.\n\nusing MCMCChains\n\nFirst, we define the coin-flip model using Turing.\n\n# Unconditioned coinflip model with `N` observations.\n@model function coinflip(; N::Int)\n # Our prior belief about the probability of heads in a coin toss.\n p ~ Beta(1, 1)\n\n # Heads or tails of a coin are drawn from `N` independent and identically\n # distributed Bernoulli distributions with success rate `p`.\n y ~ filldist(Bernoulli(p), N)\n\n return y\nend;\n\nIn the Turing model the prior distribution of the variable p, the probability of heads in a coin toss, and the distribution of the observations y are specified on the right-hand side of the ~ expressions. The @model macro modifies the body of the Julia function coinflip and, e.g., replaces the ~ statements with internal function calls that are used for sampling.\nHere we defined a model that is not conditioned on any specific observations as this allows us to easily obtain samples of both p and y with\n\nrand(coinflip(; N))\n\n(p = 0.9520583115441003, y = Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])\n\n\nThe model can be conditioned on observations using the | operator, which fixes certain variables to observed values. See the documentation of the condition syntax in DynamicPPL.jl for more details. In the conditioned model below, the observations y are fixed to data.\n\ncoinflip(y::AbstractVector{<:Real}) = coinflip(; N=length(y)) | (; y)\n\nmodel = coinflip(data);\n\nAfter defining the model, we can approximate the posterior distribution by drawing samples from the distribution. In this example, we use a Hamiltonian Monte Carlo sampler to draw these samples. Other tutorials give more information on the samplers available in Turing and discuss their use for different models.\n\nsampler = NUTS();\n\nWe approximate the posterior distribution with 1000 samples:\n\nchain = sample(model, sampler, 2_000, progress=false);\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.8\n\n\n\n\nThe sample function and common keyword arguments are explained more extensively in the documentation of AbstractMCMC.jl.\nAfter finishing the sampling process, we can visually compare the closed-form posterior distribution with the approximation obtained with Turing.\n\nhistogram(chain)\n\n\n\n\nNow we can build our plot:\n\n# Visualise a blue density plot of the approximate posterior distribution using HMC (see Chain 1 in the legend).\ndensity(chain; xlim=(0, 1), legend=:best, w=2, c=:blue)\n\n# Visualise a green density plot of the posterior distribution in closed-form.\nplot!(\n 0:0.01:1,\n pdf.(updated_belief(prior_belief, data), 0:0.01:1);\n xlabel=\"probability of heads\",\n ylabel=\"\",\n title=\"\",\n xlim=(0, 1),\n label=\"Closed-form\",\n fill=0,\n α=0.3,\n w=3,\n c=:lightgreen,\n)\n\n# Visualise the true probability of heads in red.\nvline!([p_true]; label=\"True probability\", c=:red)\n\n\n\n\nAs we can see, the samples obtained with Turing closely approximate the true posterior distribution. Hopefully this tutorial has provided an easy-to-follow, yet informative introduction to Turing’s simpler applications. More advanced usage is demonstrated in other tutorials.\n\n\n\n\n Back to top", "crumbs": [ "Get Started", "Tutorials", "Introduction: Coin Flipping" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html", "href": "tutorials/bayesian-differential-equations/index.html", "title": "Bayesian Differential Equations", "section": "", "text": "A basic scientific problem is to mathematically model a system of interest, then compare this model to the observable reality around us. Such models often involve dynamical systems of differential equations. In practice, these equations often have unknown parameters we would like to estimate. The “forward problem” of simulation consists of solving the differential equations for a given set of parameters, while the “inverse problem” of parameter estimation uses observed data to infer the unknown model parameters. Bayesian inference provides a robust approach to parameter estimation with quantified uncertainty.\nusing Turing\nusing DifferentialEquations\n# Load StatsPlots for visualizations and diagnostics.\nusing StatsPlots\nusing LinearAlgebra\nusing Distributions\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(14);", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html#the-lotkavolterra-model", "href": "tutorials/bayesian-differential-equations/index.html#the-lotkavolterra-model", "title": "Bayesian Differential Equations", "section": "The Lotka–Volterra Model", "text": "The Lotka–Volterra Model\nThe Lotka–Volterra equations, also known as the predator–prey equations, are a pair of first-order nonlinear differential equations. These differential equations are frequently used to describe the dynamics of biological systems in which two species interact, one as a predator and the other as prey. The populations change through time according to the pair of equations\n\\[\n\\begin{aligned}\n\\frac{\\mathrm{d}x}{\\mathrm{d}t} &= (\\alpha - \\beta y(t))x(t), \\\\\n\\frac{\\mathrm{d}y}{\\mathrm{d}t} &= (\\delta x(t) - \\gamma)y(t),\n\\end{aligned}\n\\]\nwhere \\(x(t)\\) and \\(y(t)\\) denote the populations of prey and predator at time \\(t\\), respectively, and \\(\\alpha, \\beta, \\gamma, \\delta\\) are positive parameters. In the absence of predators, the prey population \\(x\\) would increase exponentially at rate \\(\\alpha\\) (with dimensions of time-1). However, the predators kill some prey at a rate \\(\\beta\\) (prey predator-1 time-1), which enables the predator population to increase at rate \\(\\delta\\) (predators prey-1 time-1). Finally, predators are removed by natural mortality at rate \\(\\gamma\\) (time-1).\nWe implement the Lotka–Volterra model and simulate it with parameters \\(\\alpha = 1.5\\), \\(\\beta = 1\\), \\(\\gamma = 3\\), and \\(\\delta = 1\\) and initial conditions \\(x(0) = y(0) = 1\\).\n\n# Define Lotka–Volterra model.\nfunction lotka_volterra(du, u, p, t)\n # Model parameters.\n α, β, γ, δ = p\n # Current state.\n x, y = u\n\n # Evaluate differential equations.\n du[1] = (α - β * y) * x # prey\n du[2] = (δ * x - γ) * y # predator\n\n return nothing\nend\n\n# Define initial-value problem.\nu0 = [1.0, 1.0]\np = [1.5, 1.0, 3.0, 1.0]\ntspan = (0.0, 10.0)\nprob = ODEProblem(lotka_volterra, u0, tspan, p)\n\n# Plot simulation.\nplot(solve(prob, Tsit5()))\n\n\n\n\nWe generate noisy observations to use for the parameter estimation tasks in this tutorial. With the saveat argument to the differential equation solver, we specify that the solution is stored only at 0.1 time units.\nTo make the example more realistic, we generate data as random Poisson counts based on the “true” population densities of predator and prey from the simulation. Poisson-distributed data are common in ecology (for instance, counts of animals detected by a camera trap). We assume the Poisson rate parameter \\(\\lambda\\) is proportional to the underlying animal densities, with proportionality constant \\(q = 1.7\\) (representing observation efficiency).\n\nsol = solve(prob, Tsit5(); saveat=0.1)\nq = 1.7\nodedata = rand.(Poisson.(q * Array(sol)))\n\n# Plot simulation and noisy observations.\nplot(sol, label=[\"Prey\" \"Predator\"])\nscatter!(sol.t, odedata'; color=[1 2], label=\"\")\n\n\n\n\nAn even more realistic example could be fitted to the famous hare-and-lynx system using the long-term trapping records of the Hudson’s Bay Company. A Stan implementation of this problem with slightly different priors can be found here. For this tutorial, though, we will stick with simulated data.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html#direct-handling-of-bayesian-estimation-with-turing", "href": "tutorials/bayesian-differential-equations/index.html#direct-handling-of-bayesian-estimation-with-turing", "title": "Bayesian Differential Equations", "section": "Direct Handling of Bayesian Estimation with Turing", "text": "Direct Handling of Bayesian Estimation with Turing\nDifferentialEquations.jl is the main Julia package for numerically solving differential equations. Its functionality is completely interoperable with Turing.jl, which means that we can directly simulate differential equations inside a Turing @model.\nFor the purposes of this tutorial, we choose priors for the parameters that are quite close to the ground truth. As justification, we can imagine we have preexisting estimates for the biological rates. Practically, this helps us to illustrate the results without needing to run overly long MCMC chains.\nNote we also have to take special care with the ODE solver. For certain parameter combinations, the numerical solver may predict animal densities that are just barely below zero. This causes errors with the Poisson distribution, which needs a non-negative mean \\(\\lambda\\). To avoid this happening, we tell the solver to aim for small absolute and relative errors (abstol=1e-6, reltol=1e-6). We also add a fudge factor ϵ = 1e-5 to the predicted data. Since ϵ is greater than the solver’s tolerance, it should overcome any remaining numerical error, making sure all predicted values are positive. At the same time, it is so small compared to the data that it should have a negligible effect on inference. If this approach doesn’t work, there are some more ideas to try here. In the case of continuous observations (e.g. data derived from modelling chemical reactions), it is sufficient to use a normal distribution with the mean as the data point and an appropriately chosen variance (which can itself also be a parameter with a prior distribution).\n\n@model function fitlv(data, prob)\n # Prior distributions.\n α ~ truncated(Normal(1.5, 0.2); lower=0.5, upper=2.5)\n β ~ truncated(Normal(1.1, 0.2); lower=0, upper=2)\n γ ~ truncated(Normal(3.0, 0.2); lower=1, upper=4)\n δ ~ truncated(Normal(1.0, 0.2); lower=0, upper=2)\n q ~ truncated(Normal(1.7, 0.2); lower=0, upper=3)\n\n # Simulate Lotka–Volterra model. \n p = [α, β, γ, δ]\n predicted = solve(prob, Tsit5(); p=p, saveat=0.1, abstol=1e-6, reltol=1e-6)\n ϵ = 1e-5\n \n # Observations.\n for i in eachindex(predicted)\n data[:, i] ~ arraydist(Poisson.(q .* predicted[i] .+ ϵ))\n end\n\n return nothing\nend\n\nmodel = fitlv(odedata, prob)\n\n# Sample 3 independent chains with forward-mode automatic differentiation (the default).\nchain = sample(model, NUTS(), MCMCSerial(), 1000, 3; progress=false)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.05\n┌ Info: Found initial step size\n└ ϵ = 0.05\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\n\n\nChains MCMC chain (1000×19×3 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 3\nSamples per chain = 1000\nWall duration = 49.38 seconds\nCompute duration = 44.92 seconds\nparameters = α, β, γ, δ, q\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nThe estimated parameters are close to the parameter values the observations were generated with. We can also check visually that the chains have converged.\n\nplot(chain)\n\n\n\n\n\nData retrodiction\nIn Bayesian analysis it is often useful to retrodict the data, i.e. generate simulated data using samples from the posterior distribution, and compare to the original data (see for instance section 3.3.2 - model checking of McElreath’s book “Statistical Rethinking”). Here, we solve the ODE for 300 randomly picked posterior samples in the chain. We plot the ensemble of solutions to check if the solution resembles the data. The 300 retrodicted time courses from the posterior are plotted in gray, the noisy observations are shown as blue and red dots, and the green and purple lines are the ODE solution that was used to generate the data.\n\nplot(; legend=false)\nposterior_samples = sample(chain[[:α, :β, :γ, :δ]], 300; replace=false)\nfor p in eachrow(Array(posterior_samples))\n sol_p = solve(prob, Tsit5(); p=p, saveat=0.1)\n plot!(sol_p; alpha=0.1, color=\"#BBBBBB\")\nend\n\n# Plot simulation and noisy observations.\nplot!(sol; color=[1 2], linewidth=1)\nscatter!(sol.t, odedata'; color=[1 2])\n\n\n\n\nWe can see that, even though we added quite a bit of noise to the data the posterior distribution reproduces quite accurately the “true” ODE solution.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html#lotkavolterra-model-without-data-of-prey", "href": "tutorials/bayesian-differential-equations/index.html#lotkavolterra-model-without-data-of-prey", "title": "Bayesian Differential Equations", "section": "Lotka–Volterra model without data of prey", "text": "Lotka–Volterra model without data of prey\nOne can also perform parameter inference for a Lotka–Volterra model with incomplete data. For instance, let us suppose we have only observations of the predators but not of the prey. We can fit the model only to the \\(y\\) variable of the system without providing any data for \\(x\\):\n\n@model function fitlv2(data::AbstractVector, prob)\n # Prior distributions.\n α ~ truncated(Normal(1.5, 0.2); lower=0.5, upper=2.5)\n β ~ truncated(Normal(1.1, 0.2); lower=0, upper=2)\n γ ~ truncated(Normal(3.0, 0.2); lower=1, upper=4)\n δ ~ truncated(Normal(1.0, 0.2); lower=0, upper=2)\n q ~ truncated(Normal(1.7, 0.2); lower=0, upper=3)\n\n # Simulate Lotka–Volterra model but save only the second state of the system (predators).\n p = [α, β, γ, δ]\n predicted = solve(prob, Tsit5(); p=p, saveat=0.1, save_idxs=2, abstol=1e-6, reltol=1e-6)\n ϵ = 1e-5\n\n # Observations of the predators.\n data ~ arraydist(Poisson.(q .* predicted.u .+ ϵ))\n\n return nothing\nend\n\nmodel2 = fitlv2(odedata[2, :], prob)\n\n# Sample 3 independent chains.\nchain2 = sample(model2, NUTS(0.45), MCMCSerial(), 5000, 3; progress=false)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n┌ Info: Found initial step size\n└ ϵ = 0.2\n┌ Info: Found initial step size\n└ ϵ = 0.025\n\n\n\n\nChains MCMC chain (5000×19×3 Array{Float64, 3}):\n\nIterations = 1001:1:6000\nNumber of chains = 3\nSamples per chain = 5000\nWall duration = 22.94 seconds\nCompute duration = 22.53 seconds\nparameters = α, β, γ, δ, q\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nAgain we inspect the trajectories of 300 randomly selected posterior samples.\n\nplot(; legend=false)\nposterior_samples = sample(chain2[[:α, :β, :γ, :δ]], 300; replace=false)\nfor p in eachrow(Array(posterior_samples))\n sol_p = solve(prob, Tsit5(); p=p, saveat=0.1)\n plot!(sol_p; alpha=0.1, color=\"#BBBBBB\")\nend\n\n# Plot simulation and noisy observations.\nplot!(sol; color=[1 2], linewidth=1)\nscatter!(sol.t, odedata'; color=[1 2])\n\n\n\n\nNote that here the observations of the prey (blue dots) were not used in the parameter estimation! Yet, the model can predict the values of \\(x\\) relatively accurately, albeit with a wider distribution of solutions, reflecting the greater uncertainty in the prediction of the \\(x\\) values.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html#inference-of-delay-differential-equations", "href": "tutorials/bayesian-differential-equations/index.html#inference-of-delay-differential-equations", "title": "Bayesian Differential Equations", "section": "Inference of Delay Differential Equations", "text": "Inference of Delay Differential Equations\nHere we show an example of inference with another type of differential equation: a delay differential equation (DDE). DDEs are differential equations where derivatives are functions of values at an earlier point in time. This is useful to model a delayed effect, such as the incubation time of a virus.\nHere is a delayed version of the Lotka–Volterra system:\n\\[\n\\begin{aligned}\n\\frac{\\mathrm{d}x}{\\mathrm{d}t} &= \\alpha x(t-\\tau) - \\beta y(t) x(t),\\\\\n\\frac{\\mathrm{d}y}{\\mathrm{d}t} &= - \\gamma y(t) + \\delta x(t) y(t),\n\\end{aligned}\n\\]\nwhere \\(\\tau\\) is a (positive) delay and \\(x(t-\\tau)\\) is the variable \\(x\\) at an earlier time point \\(t - \\tau\\).\nThe initial-value problem of the delayed system can be implemented as a DDEProblem. As described in the DDE example, here the function h is the history function that can be used to obtain a state at an earlier time point. Again we use parameters \\(\\alpha = 1.5\\), \\(\\beta = 1\\), \\(\\gamma = 3\\), and \\(\\delta = 1\\) and initial conditions \\(x(0) = y(0) = 1\\). Moreover, we assume \\(x(t) = 1\\) for \\(t < 0\\).\n\nfunction delay_lotka_volterra(du, u, h, p, t)\n # Model parameters.\n α, β, γ, δ = p\n\n # Current state.\n x, y = u\n # Evaluate differential equations\n du[1] = α * h(p, t - 1; idxs=1) - β * x * y\n du[2] = -γ * y + δ * x * y\n\n return nothing\nend\n\n# Define initial-value problem.\np = (1.5, 1.0, 3.0, 1.0)\nu0 = [1.0; 1.0]\ntspan = (0.0, 10.0)\nh(p, t; idxs::Int) = 1.0\nprob_dde = DDEProblem(delay_lotka_volterra, u0, h, tspan, p);\n\nWe generate observations by sampling from the corresponding Poisson distributions derived from the simulation results:\n\nsol_dde = solve(prob_dde; saveat=0.1)\nddedata = rand.(Poisson.(q .* Array(sol_dde)))\n\n# Plot simulation and noisy observations.\nplot(sol_dde)\nscatter!(sol_dde.t, ddedata'; color=[1 2], label=\"\")\n\n\n\n\nNow we define the Turing model for the Lotka–Volterra model with a delay, and sample 3 independent chains.\n\n@model function fitlv_dde(data, prob)\n # Prior distributions.\n α ~ truncated(Normal(1.5, 0.2); lower=0.5, upper=2.5)\n β ~ truncated(Normal(1.1, 0.2); lower=0, upper=2)\n γ ~ truncated(Normal(3.0, 0.2); lower=1, upper=4)\n δ ~ truncated(Normal(1.0, 0.2); lower=0, upper=2)\n q ~ truncated(Normal(1.7, 0.2); lower=0, upper=3)\n\n # Simulate Lotka–Volterra model.\n p = [α, β, γ, δ]\n predicted = solve(prob, MethodOfSteps(Tsit5()); p=p, saveat=0.1, abstol=1e-6, reltol=1e-6)\n ϵ = 1e-5\n\n # Observations.\n for i in eachindex(predicted)\n data[:, i] ~ arraydist(Poisson.(q .* predicted[i] .+ ϵ))\n end\nend\n\nmodel_dde = fitlv_dde(ddedata, prob_dde)\n\nchain_dde = sample(model_dde, NUTS(), MCMCSerial(), 300, 3; progress=false)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.05\n┌ Info: Found initial step size\n└ ϵ = 0.2\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\n\n\nChains MCMC chain (300×19×3 Array{Float64, 3}):\n\nIterations = 151:1:450\nNumber of chains = 3\nSamples per chain = 300\nWall duration = 12.57 seconds\nCompute duration = 12.42 seconds\nparameters = α, β, γ, δ, q\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\nplot(chain_dde)\n\n\n\n\nFinally, we plot trajectories of 300 randomly selected samples from the posterior. Again, the dots indicate our observations, the coloured lines are the “true” simulations without noise, and the gray lines are trajectories from the posterior samples.\n\nplot(; legend=false)\nposterior_samples = sample(chain_dde[[:α, :β, :γ, :δ]], 300; replace=false)\nfor p in eachrow(Array(posterior_samples))\n sol_p = solve(prob_dde, MethodOfSteps(Tsit5()); p=p, saveat=0.1)\n plot!(sol_p; alpha=0.1, color=\"#BBBBBB\")\nend\n\n# Plot simulation and noisy observations.\nplot!(sol_dde; color=[1 2], linewidth=1)\nscatter!(sol_dde.t, ddedata'; color=[1 2])\n\n\n\n\nThe fit is pretty good even though the data was quite noisy to start.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "tutorials/bayesian-differential-equations/index.html#scaling-to-large-models-adjoint-sensitivities", "href": "tutorials/bayesian-differential-equations/index.html#scaling-to-large-models-adjoint-sensitivities", "title": "Bayesian Differential Equations", "section": "Scaling to Large Models: Adjoint Sensitivities", "text": "Scaling to Large Models: Adjoint Sensitivities\nTuring’s gradient-based MCMC algorithms, such as NUTS, use ForwardDiff by default. This works well for small models, but for larger models with many parameters, reverse-mode automatic differentiation is often more efficient (see the automatic differentiation page for more information).\nTo use reverse-mode AD with differential equations, you need to first load the SciMLSensitivity.jl package, which forms part of SciML’s differential equation suite. Here, ‘sensitivity’ refers to the derivative of the solution of a differential equation with respect to its parameters. More details on the mathematical theory that underpins these methods can be found in the SciMLSensitivity documentation.\nOnce SciMLSensitivity has been loaded, you can use one of the AD backends which are compatible with SciMLSensitivity.jl. For example, if we wanted to use Mooncake.jl, we could run:\n\nimport Mooncake\nimport SciMLSensitivity\n\n# Define the AD backend to use\nadtype = AutoMooncake()\n\n# Sample a single chain with 1000 samples using Mooncake\nsample(model, NUTS(; adtype=adtype), 1000; progress=false)\n\n\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\n\n\nChains MCMC chain (1000×19×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 318.98 seconds\nCompute duration = 318.98 seconds\nparameters = α, β, γ, δ, q\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n(If SciMLSensitivity is not loaded, the call to sample will error.)\nSciMLSensitivity has a number of sensitivity analysis algorithms: in this case it will automatically choose a default for you. You can also manually specify an algorithm by providing the sensealg keyword argument to the solve function; the existing algorithms are covered in this page of the SciMLSensitivity docs.\nFor more examples of adjoint usage on large parameter models, consult the DiffEqFlux documentation.", "crumbs": [ "Get Started", "Tutorials", "Bayesian Differential Equations" ] }, { "objectID": "core-functionality/index.html", "href": "core-functionality/index.html", "title": "Core Functionality", "section": "", "text": "This article provides an overview of the core functionality in Turing.jl, which are likely to be used across a wide range of models.", "crumbs": [ "Get Started", "Core Functionality" ] }, { "objectID": "core-functionality/index.html#basics", "href": "core-functionality/index.html#basics", "title": "Core Functionality", "section": "Basics", "text": "Basics\n\nIntroduction\nA probabilistic program is a Julia function wrapped in a @model macro. In this function, arbitrary Julia code can be used, but to ensure correctness of inference it should not have external effects or modify global state.\nTo specify distributions of random variables, Turing models use ~ notation: x ~ distr where x is an identifier. This resembles the notation used in statistical models. For example, the model:\n\\[\\begin{align}\na &\\sim \\text{Normal}(0, 1) \\\\\nx &\\sim \\text{Normal}(a, 1)\n\\end{align}\\]\nis written in Turing as:\nusing Turing\n\n@model function mymodel()\n a ~ Normal(0, 1)\n x ~ Normal(a, 1)\nend\n\n\nTilde-statements\nIndexing and field access is supported, so that x[i] ~ distr and x.field ~ distr are valid statements. However, in these cases, x must be defined in the scope of the model function. distr is typically either a distribution from Distributions.jl (see this page for implementing custom distributions), or another Turing model wrapped in to_submodel() (see this page for submodels).\nThere are two classes of tilde-statements: observe statements, where the left-hand side contains an observed value, and assume statements, where the left-hand side is not observed. These respectively correspond to likelihood and prior terms.\nIt is easier to start by explaining when a variable is treated as an observed value. This can happen in one of two ways:\n\nThe variable is passed as one of the arguments to the model function; or\nThe value of the variable in the model is explicitly conditioned or fixed.\n\n\n\n\n\n\n\nCaution\n\n\n\nNote that it is not enough for the variable to be defined in the current scope. For example, in\n@model function mymodel(x)\n y = x + 1\n y ~ Normal(0, 1)\nend\ny is not treated as an observed value.\n\n\nIn such a case, x is considered to be an observed value, assumed to have been drawn from the distribution distr. The likelihood (if needed) is computed using loglikelihood(distr, x).\nOn the other hand, if neither of the above are true, then this is treated as an assume-statement: inside the probabilistic program, this samples a new variable called x, distributed according to distr, and places it in the current scope.\n\n\nSimple Gaussian Demo\nBelow is a simple Gaussian demo illustrating the basic usage of Turing.jl.\n\n# Import packages.\nusing Turing\nusing StatsPlots\n\n# Define a simple Normal model with unknown mean and variance.\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n return y ~ Normal(m, sqrt(s²))\nend\n\ngdemo (generic function with 2 methods)\n\n\nIn Turing.jl, MCMC sampling is performed using the sample() function, which (at its most basic) takes a model, a sampler, and the number of samples to draw.\nFor this model, the prior expectation of s² is mean(InverseGamma(2, 3)) = 3/(2 - 1) = 3, and the prior expectation of m is 0. We can check this using the Prior sampler:\n\nusing Random\nRandom.seed!(468)\nsetprogress!(false)\n\n\np1 = sample(gdemo(missing, missing), Prior(), 100000)\n\nChains MCMC chain (100000×7×1 Array{Float64, 3}):\n\nIterations = 1:1:100000\nNumber of chains = 1\nSamples per chain = 100000\nWall duration = 0.84 seconds\nCompute duration = 0.84 seconds\nparameters = s², m, x, y\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nTo perform inference, we simply need to specify the sampling algorithm we want to use.\n\n# Run sampler, collect results.\nc1 = sample(gdemo(1.5, 2), SMC(), 1000)\nc2 = sample(gdemo(1.5, 2), PG(10), 1000)\nc3 = sample(gdemo(1.5, 2), HMC(0.1, 5), 1000)\nc4 = sample(gdemo(1.5, 2), Gibbs(:m => PG(10), :s² => HMC(0.1, 5)), 1000)\nc5 = sample(gdemo(1.5, 2), NUTS(0.65), 1000)\n\n\n┌ Info: Found initial step size\n└ ϵ = 3.2\n\n\n\n\nChains MCMC chain (1000×16×1 Array{Float64, 3}):\n\nIterations = 501:1:1500\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 1.87 seconds\nCompute duration = 1.87 seconds\nparameters = s², m\ninternals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nThe arguments for each sampler are:\n\nSMC: number of particles.\nPG: number of particles, number of iterations.\nHMC: leapfrog step size, leapfrog step numbers.\nGibbs: component sampler 1, component sampler 2, …\nNUTS: number of adaptation steps (optional), target accept ratio.\n\nMore information about each sampler can be found in Turing.jl’s API docs.\nThe MCMCChains module (which is re-exported by Turing) provides plotting tools for the Chain objects returned by a sample function. See the MCMCChains repository for more information on the suite of tools available for diagnosing MCMC chains.\n# Summarise results\ndescribe(c3)\n\n# Plot results\nplot(c3)\nsavefig(\"gdemo-plot.png\")\n\n\nConditioning on data\nUsing this syntax, a probabilistic model is defined in Turing. The model function generated by Turing can then be used to condition the model on data. Subsequently, the sample function can be used to generate samples from the posterior distribution.\nIn the following example, the defined model is conditioned to the data (arg_1 = 1, arg_2 = 2) by passing the arguments 1 and 2 to the model function.\n@model function model_name(arg_1, arg_2)\n arg_1 ~ ...\n arg_2 ~ ...\nend\nThe conditioned model can then be passed onto the sample function to run posterior inference.\nmodel = model_name(1, 2)\nchn = sample(model, HMC(0.5, 20), 1000) # Sample with HMC.\nAlternatively, one can also use the conditioning operator | to condition the model on data. In this case, the model does not need to be defined with arg_1 and arg_2 as parameters.\n@model function model_name()\n arg_1 ~ ...\n arg_2 ~ ...\nend\n\n# Condition the model on data.\nmodel = model_name() | (arg_1 = 1, arg_2 = 2) \n\n\nAnalysing MCMC chains\nThe returned chain contains samples of the variables in the model.\nvar_1 = mean(chn[:var_1]) # Taking the mean of a variable named var_1.\nThe key (:var_1) can either be a Symbol or a String. For example, to fetch x[1], one can use chn[Symbol(\"x[1]\")] or chn[\"x[1]\"]. If you want to retrieve all parameters associated with a specific symbol, you can use group. As an example, if you have the parameters \"x[1]\", \"x[2]\", and \"x[3]\", calling group(chn, :x) or group(chn, \"x\") will return a new chain with only \"x[1]\", \"x[2]\", and \"x[3]\".\n\n\nTilde-statement ordering\nTuring does not have a declarative form. Thus, the ordering of tilde-statements in a Turing model is important: random variables cannot be used until they have been first declared in a tilde-statement. For example, the following example works:\n# Define a simple Normal model with unknown mean and variance.\n@model function model_function(y)\n s ~ Poisson(1)\n y ~ Normal(s, 1)\n return y\nend\n\nsample(model_function(10), SMC(), 100)\nBut if we switch the s ~ Poisson(1) and y ~ Normal(s, 1) lines, the model will no longer sample correctly:\n# Define a simple Normal model with unknown mean and variance.\n@model function model_function(y)\n y ~ Normal(s, 1)\n s ~ Poisson(1)\n return y\nend\n\nsample(model_function(10), SMC(), 100)\n\n\nSampling Multiple Chains\nTuring supports distributed and threaded parallel sampling. To do so, call sample(model, sampler, parallel_type, n, n_chains), where parallel_type can be either MCMCThreads() or MCMCDistributed() for thread and parallel sampling, respectively.\nHaving multiple chains in the same object is valuable for evaluating convergence. Some diagnostic functions like gelmandiag require multiple chains.\nIf you want to sample multiple chains without using parallelism, you can use MCMCSerial():\n# Sample 3 chains in a serial fashion.\nchains = sample(model, sampler, MCMCSerial(), 1000, 3)\nThe chains variable now contains a Chains object which can be indexed by chain. To pull out the first chain from the chains object, use chains[:,:,1]. The method is the same if you use either of the below parallel sampling methods.\n\nMultithreaded sampling\nIf you wish to perform multithreaded sampling, you can call sample with the following signature:\n# Sample four chains using multiple threads, each with 1000 samples.\nsample(model, sampler, MCMCThreads(), 1000, 4)\nBe aware that Turing cannot add threads for you – you must have started your Julia instance with multiple threads to experience any kind of parallelism. See the Julia documentation for details on how to achieve this.\n\n\nDistributed sampling\nTo perform distributed sampling (using multiple processes), you must first import Distributed.\nProcess parallel sampling can be done like so:\n\n# Load Distributed to add processes and the @everywhere macro.\nusing Distributed\n\n# Load Turing.\nusing Turing\n\n# Add four processes to use for sampling.\naddprocs(4; exeflags=\"--project=$(Base.active_project())\")\n\n# Initialise everything on all the processes.\n# Note: Make sure to do this after you've already loaded Turing,\n# so each process does not have to precompile.\n# Parallel sampling may fail silently if you do not do this.\n@everywhere using Turing\n\n# Define a model on all processes.\n@everywhere @model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\nend\n\n# Declare the model instance everywhere.\n@everywhere model = gdemo([1.5, 2.0])\n\n# Sample four chains using multiple processes, each with 1000 samples.\nsample(model, NUTS(), MCMCDistributed(), 1000, 4)\n\n\n\n\nSampling from an Unconditional Distribution (The Prior)\nTuring allows you to sample from a declared model’s prior. If you wish to draw a chain from the prior to inspect your prior distributions, you can run\n\nchain = sample(model, Prior(), n_samples)\n\nYou can also run your model (as if it were a function) from the prior distribution, by calling the model without specifying inputs or a sampler. In the below example, we specify a gdemo model which returns two variables, x and y. Here, including the return statement is necessary to retrieve the sampled x and y values.\n\n@model function gdemo(x, y)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n x ~ Normal(m, sqrt(s²))\n y ~ Normal(m, sqrt(s²))\n return x, y\nend\n\ngdemo (generic function with 2 methods)\n\n\nTo produce a sample from the prior distribution, we instantiate the model with missing inputs:\n\n# Samples from p(x,y)\ng_prior_sample = gdemo(missing, missing)\ng_prior_sample()\n\n(-2.5578872742615753, 0.2701583499660189)\n\n\n\n\nSampling from a Conditional Distribution (The Posterior)\n\nTreating observations as random variables\nInputs to the model that have a value missing are treated as parameters, aka random variables, to be estimated/sampled. This can be useful if you want to simulate draws for that parameter, or if you are sampling from a conditional distribution. Turing supports the following syntax:\n\n@model function gdemo(x, ::Type{T}=Float64) where {T}\n if x === missing\n # Initialise `x` if missing\n x = Vector{T}(undef, 2)\n end\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\nend\n\n# Construct a model with x = missing\nmodel = gdemo(missing)\nc = sample(model, HMC(0.05, 20), 500)\n\nChains MCMC chain (500×16×1 Array{Union{Missing, Float64}, 3}):\n\nIterations = 1:1:500\nNumber of chains = 1\nSamples per chain = 500\nWall duration = 2.33 seconds\nCompute duration = 2.33 seconds\nparameters = s², m, x[1], x[2]\ninternals = logprior, loglikelihood, logjoint, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nNote the need to initialise x when missing since we are iterating over its elements later in the model. The generated values for x can be extracted from the Chains object using c[:x].\nTuring also supports mixed missing and non-missing values in x, where the missing ones will be treated as random variables to be sampled while the others get treated as observations. For example:\n\n@model function gdemo(x)\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n for i in eachindex(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\nend\n\n# x[1] is a parameter, but x[2] is an observation\nmodel = gdemo([missing, 2.4])\nc = sample(model, HMC(0.01, 5), 500)\n\nChains MCMC chain (500×15×1 Array{Union{Missing, Float64}, 3}):\n\nIterations = 1:1:500\nNumber of chains = 1\nSamples per chain = 500\nWall duration = 1.49 seconds\nCompute duration = 1.49 seconds\nparameters = s², m, x[1]\ninternals = logprior, loglikelihood, logjoint, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\n\nDefault Values\nArguments to Turing models can have default values much like how default values work in normal Julia functions. For instance, the following will assign missing to x and treat it as a random variable. If the default value is not missing, x will be assigned that value and will be treated as an observation instead.\n\nusing Turing\n\n@model function generative(x=missing, ::Type{T}=Float64) where {T<:Real}\n if x === missing\n # Initialise x when missing\n x = Vector{T}(undef, 10)\n end\n s² ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s²))\n for i in 1:length(x)\n x[i] ~ Normal(m, sqrt(s²))\n end\n return s², m\nend\n\nm = generative()\nchain = sample(m, HMC(0.01, 5), 1000)\n\nChains MCMC chain (1000×24×1 Array{Union{Missing, Float64}, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 2.57 seconds\nCompute duration = 2.57 seconds\nparameters = s², m, x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8], x[9], x[10]\ninternals = logprior, loglikelihood, logjoint, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\n\nAccess Values inside Chain\nYou can access the values inside a chain in several ways:\n\nTurn them into a DataFrame object\nUse their raw AxisArray form\nCreate a three-dimensional Array object\n\nFor example, let c be a Chain:\n\nDataFrame(c) converts c to a DataFrame,\nc.value retrieves the values inside c as an AxisArray, and\nc.value.data retrieves the values inside c as a 3D Array.\n\n\n\nVariable Types and Type Parameters\nThe element type of a vector (or matrix) of random variables should match the eltype of its prior distribution, i.e., <: Integer for discrete distributions and <: AbstractFloat for continuous distributions.\nSome automatic differentiation backends (used in conjunction with Hamiltonian samplers such as HMC or NUTS) further require that the vector’s element type needs to either be:\n\nReal to enable auto-differentiation through the model which uses special number types that are sub-types of Real, or\nSome type parameter T defined in the model header using the type parameter syntax, e.g. function gdemo(x, ::Type{T} = Float64) where {T}.\n\nSimilarly, when using a particle sampler, the Julia variable used should either be:\n\nAn Array, or\nAn instance of some type parameter T defined in the model header using the type parameter syntax, e.g. function gdemo(x, ::Type{T} = Vector{Float64}) where {T}.\n\n\n\n\nQuerying Probabilities from Model or Chain\nTuring offers three functions: loglikelihood, logprior, and logjoint to query the log-likelihood, log-prior, and log-joint probabilities of a model, respectively.\nLet’s look at a simple model called gdemo:\n\n@model function gdemo0()\n s ~ InverseGamma(2, 3)\n m ~ Normal(0, sqrt(s))\n return x ~ Normal(m, sqrt(s))\nend\n\ngdemo0 (generic function with 2 methods)\n\n\nIf we observe x to be 1.0, we can condition the model on this datum using the condition syntax:\n\nmodel = gdemo0() | (x=1.0,)\n\nDynamicPPL.Model{typeof(gdemo0), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{@NamedTuple{x::Float64}, DynamicPPL.DefaultContext}, false}(gdemo0, NamedTuple(), NamedTuple(), ConditionContext((x = 1.0,), DynamicPPL.DefaultContext()))\n\n\nNow, let’s compute the log-likelihood of the observation given specific values of the model parameters, s and m:\n\nloglikelihood(model, (s=1.0, m=1.0))\n\n-0.9189385332046728\n\n\nWe can easily verify that value in this case:\n\nlogpdf(Normal(1.0, 1.0), 1.0)\n\n-0.9189385332046728\n\n\nWe can also compute the log-prior probability of the model for the same values of s and m:\n\nlogprior(model, (s=1.0, m=1.0))\n\n-2.221713955868453\n\n\n\nlogpdf(InverseGamma(2, 3), 1.0) + logpdf(Normal(0, sqrt(1.0)), 1.0)\n\n-2.221713955868453\n\n\nFinally, we can compute the log-joint probability of the model parameters and data:\n\nlogjoint(model, (s=1.0, m=1.0))\n\n-3.1406524890731258\n\n\n\nlogpdf(Normal(1.0, 1.0), 1.0) +\nlogpdf(InverseGamma(2, 3), 1.0) +\nlogpdf(Normal(0, sqrt(1.0)), 1.0)\n\n-3.1406524890731258\n\n\nQuerying with Chains object is easy as well:\n\nchn = sample(model, Prior(), 10)\n\nChains MCMC chain (10×5×1 Array{Float64, 3}):\n\nIterations = 1:1:10\nNumber of chains = 1\nSamples per chain = 10\nWall duration = 0.13 seconds\nCompute duration = 0.13 seconds\nparameters = s, m\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\n\nloglikelihood(model, chn)\n\n10×1 Matrix{Float64}:\n -1.4055024008761068\n -1.9993751246713904\n -1.1856946600391214\n -0.8806518673557528\n -1.6871887813672473\n -3.4502046054562925\n -1.1999971830410205\n -1.7795194347771714\n -2.605364158545942\n -1.4333729435353304\n\n\n\n\nMaximum likelihood and maximum a posteriori estimates\nTuring also has functions for estimating the maximum a posteriori and maximum likelihood parameters of a model. This can be done with\n\n@model function normal_model()\n x ~ Normal()\n y ~ Normal(x)\nend\n\nnmodel = normal_model() | (; y = 2.0,)\n\nmaximum_likelihood(nmodel)\n\nModeResult with maximized lp of -0.92\n[2.0]\n\n\nor\n\nmaximum_a_posteriori(nmodel)\n\nModeResult with maximized lp of -2.84\n[0.9999999999999999]\n\n\nFor more details see the mode estimation page.", "crumbs": [ "Get Started", "Core Functionality" ] }, { "objectID": "core-functionality/index.html#beyond-the-basics", "href": "core-functionality/index.html#beyond-the-basics", "title": "Core Functionality", "section": "Beyond the Basics", "text": "Beyond the Basics\n\nCompositional Sampling Using Gibbs\nTuring.jl provides a Gibbs interface to combine different samplers. For example, one can combine an HMC sampler with a PG sampler to run inference for different parameters in a single model as below.\n\n@model function simple_choice(xs)\n p ~ Beta(2, 2)\n z ~ Bernoulli(p)\n for i in 1:length(xs)\n if z == 1\n xs[i] ~ Normal(0, 1)\n else\n xs[i] ~ Normal(2, 1)\n end\n end\nend\n\nsimple_choice_f = simple_choice([1.5, 2.0, 0.3])\n\nchn = sample(simple_choice_f, Gibbs(:p => HMC(0.2, 3), :z => PG(20)), 1000)\n\nChains MCMC chain (1000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 25.67 seconds\nCompute duration = 25.67 seconds\nparameters = p, z\ninternals = logprior, loglikelihood, logjoint\n\nUse `describe(chains)` for summary statistics and quantiles.\n\n\nThe Gibbs sampler can be used to specify unique automatic differentiation backends for different variable spaces. Please see the Automatic Differentiation page for more.\nFor more details of compositional sampling in Turing.jl, please see the corresponding paper.\n\n\nWorking with filldist and arraydist\nTuring provides filldist(dist::Distribution, n::Int) and arraydist(dists::AbstractVector{<:Distribution}) as a simplified interface to construct product distributions, e.g., to model a set of variables that share the same structure but vary by group.\n\nConstructing product distributions with filldist\nThe function filldist provides a general interface to construct product distributions over distributions of the same type and parameterisation. Note that, in contrast to the product distribution interface provided by Distributions.jl (Product), filldist supports product distributions over univariate or multivariate distributions.\nExample usage:\n\n@model function demo(x, g)\n k = length(unique(g))\n a ~ filldist(Exponential(), k) # = Product(fill(Exponential(), k))\n mu = a[g]\n for i in eachindex(x)\n x[i] ~ Normal(mu[i])\n end\n return mu\nend\n\ndemo (generic function with 2 methods)\n\n\n\n\nConstructing product distributions with arraydist\nThe function arraydist provides a general interface to construct product distributions over distributions of varying type and parameterisation. Note that in contrast to the product distribution interface provided by Distributions.jl (Product), arraydist supports product distributions over univariate or multivariate distributions.\nExample usage:\n\n@model function demo(x, g)\n k = length(unique(g))\n a ~ arraydist([Exponential(i) for i in 1:k])\n mu = a[g]\n for i in eachindex(x)\n x[i] ~ Normal(mu[i])\n end\n return mu\nend\n\ndemo (generic function with 2 methods)\n\n\n\n\n\nWorking with MCMCChains.jl\nTuring.jl wraps its samples using MCMCChains.Chain so that all the functions working for MCMCChains.Chain can be re-used in Turing.jl. Two typical functions are MCMCChains.describe and MCMCChains.plot, which can be used as follows for an obtained chain chn. For more information on MCMCChains, please see the GitHub repository.\n\ndescribe(chn) # Lists statistics of the samples.\nplot(chn) # Plots statistics of the samples.\n\n\nChains MCMC chain (1000×5×1 Array{Float64, 3}):\n\nIterations = 1:1:1000\nNumber of chains = 1\nSamples per chain = 1000\nWall duration = 25.67 seconds\nCompute duration = 25.67 seconds\nparameters = p, z\ninternals = logprior, loglikelihood, logjoint\n\nSummary Statistics\n\n parameters mean std mcse ess_bulk ess_tail rhat e ⋯\n Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯\n\n p 0.4360 0.2181 0.0240 77.7761 92.5265 1.0194 ⋯\n z 0.1720 0.3776 0.0171 488.3215 NaN 1.0081 ⋯\n\n 1 column omitted\n\nQuantiles\n\n parameters 2.5% 25.0% 50.0% 75.0% 97.5%\n Symbol Float64 Float64 Float64 Float64 Float64\n\n p 0.0556 0.2772 0.4255 0.5911 0.8525\n z 0.0000 0.0000 0.0000 0.0000 1.0000\n\n\n\n\n\n\n\n\nThere are numerous functions in addition to describe and plot in the MCMCChains package, such as those used in convergence diagnostics. For more information on the package, please see the GitHub repository.\n\n\nChanging Default Settings\nSome of Turing.jl’s default settings can be changed for better usage.\n\nAD Backend\nTuring is thoroughly tested with three automatic differentiation (AD) backend packages. The default AD backend is ForwardDiff, which uses forward-mode AD. Two reverse-mode AD backends are also supported, namely Mooncake and ReverseDiff. Mooncake and ReverseDiff also require the user to explicitly load them using import Mooncake or import ReverseDiff next to using Turing.\nFor more information on Turing’s automatic differentiation backend, please see the Automatic Differentiation article as well as the ADTests website, where a number of AD backends (not just those above) are tested against Turing.jl.\n\n\nProgress Logging\nTuring.jl uses ProgressLogging.jl to log the sampling progress. Progress logging is enabled as default but might slow down inference. It can be turned on or off by setting the keyword argument progress of sample to true or false. Moreover, you can enable or disable progress logging globally by calling setprogress!(true) or setprogress!(false), respectively.\nTuring uses heuristics to select an appropriate visualisation backend. If you use Jupyter notebooks, the default backend is ConsoleProgressMonitor.jl. In all other cases, progress logs are displayed with TerminalLoggers.jl. Alternatively, if you provide a custom visualisation backend, Turing uses it instead of the default backend.", "crumbs": [ "Get Started", "Core Functionality" ] } ]