--- title: "The Beta distribution of the third kind (or generalised Beta prime)" author: "Stéphane Laurent" date: '2019-07-22' tags: maths, statistics, R, special-functions rbloggers: yes output: md_document: variant: markdown preserve_yaml: true toc: yes html_document: highlight: kate keep_md: no toc: yes highlighter: pandoc-solarized --- {r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, fig.path = "figures/Beta3-", attr.source = ".numberLines")  We present the family of so-called *Beta distributions of the third kind*. In the context of Bayesian statistics, it is a conjugate family of prior distributions on the odds parameter of the binomial model. This distribution is known, but nobody provided a way to sample from it. We show how one can sample from this distribution in R. # Preliminaries: the (scaled) Beta prime distribution The Beta distribution of the third kind generalizes the Beta distribution of the second kind, also known under the name *Beta prime distribution*. The *Beta prime distribution*$\mathcal{B}'(c,d,\lambda)$is the distribution of the random variable$\lambda\frac{U}{1-U}$where$U \sim \mathcal{B}(c,d)$. Its density function at$x \geqslant 0$is $$\frac{1}{\lambda^c B(c,d)} \frac{x^{c-1}}{\left(1+\frac{x}{\lambda}\right)^{c+d}}.$$ Usually the definition does not include the scale parameter$\lambda$(that is, it is usually defined for$\lambda=1$only). It is easy to implement a sampler for this distribution, the density function and the cumulative density function: {r} rbetaprime <- function(n, c, d, lambda = 1){ stopifnot(lambda > 0) u <- rbeta(n, c, d) lambda * u/(1-u) } dbetaprime <- function(x, c, d, lambda = 1){ stopifnot(lambda > 0) lambda/(lambda+x)^2 * dbeta(x/lambda/(1+x/lambda), c, d) } pbetaprime <- function(q, c, d, lambda){ stopifnot(lambda > 0) pbeta(q/lambda/(1+q/lambda), c, d) }  # Beta distribution of the third kind The *Beta distribution of the third kind*$\mathcal{B}_3$was firstly introduced (as far as I know) in the paper *Some Poisson mixtures with a hyperscale parameter*, written by myself. For parameters$c>0$,$d>0$,$\kappa \in \mathbb{R}$,$\tau>0$, the density function of$\mathcal{B}_3(c,d,\kappa,\tau)$is $$f(\phi) = \frac{1} {B(c,d){}_2\!F_1\left(c, c+d-\kappa, c+d; 1 - \frac{1}{\tau}\right)} \frac{\phi^{c-1}(1+\phi)^{-\kappa}} {\left(1+\frac{\phi}{\tau}\right)^{c+d-\kappa}}, \quad \phi \geqslant 0.$$ Thus, for$\kappa=0$, the$\mathcal{B}_3(c,d,\kappa,\tau)$distribution equals$\mathcal{B}'(c,d,\tau)$, and for$\kappa = c+d$or$\tau=1$, it equals$\mathcal{B}'(c,d,1)$. Note that in general,$\tau$is not a scale parameter. Let's write a R function computing this density: {r} library(gsl) Gauss2F1 <- function(a,b,c,x){ if(x>=0 && x<1){ # hyperg_2F1 works fine in this situation hyperg_2F1(a, b, c, x) }else{ # transform to come down to the first situation hyperg_2F1(c-a, b, c, 1-1/(1-x)) / (1-x)^b } } dB3 <- function(x, c, d, kappa, tau){ stopifnot(c > 0, d > 0, tau > 0) if(kappa == 0){ dbetaprime(x, c, d, tau) }else if(kappa == c+d){ dbetaprime(x, c, d, 1) }else{ 1/beta(c,d) * x^(c-1)*(1+x)^(-kappa)/(1+x/tau)^(c+d-kappa) / Gauss2F1(c, c+d-kappa, c+d, 1-1/tau) } }  This distribution is related to the *four-parameter generalized Beta distribution* introduced by Chen & Novick in the paper *Bayesian analysis for binomial models with generalized beta prior distributions* (1984); this distribution takes its value in$[0,1]$. They are related by an elementary transformation: $$\Theta \sim GB4(c, d, \kappa, \tau) \quad \iff\quad \frac{\Theta}{1-\Theta} \sim \mathcal{B}_3\left(c, d, c+d-\kappa, \frac{1}{\tau}\right).$$ # Update 2019-09-05: generalised Beta distribution I've just discovered that the$GB4$distribution appears in the paper *On Kummer’s distributions of type two and generalized Beta distributions* written by Hamza & Vallois. It is named *generalised Beta distribution* in this paper, it is denoted by$\beta_\delta(a,b,c)$and its density function at$u \in [0,1]$is given by $$\frac{1}{\beta(a,b){}_2\!F_1(-c,a;a+b;1-\delta)} u^{a-1}(1-u)^{b-1}\bigl(1+(\delta-1)u\bigr)^c$$ for$a,b,\delta>0$and$c \in \mathbb{R}$. We have the following relation: if$\Phi \sim \mathcal{B}_3(c, d, \kappa, \tau)$, then $$\frac{\Phi}{1+\Phi} \sim \beta_{\frac{1}{\tau}}(c, d, \kappa-c-d).$$ So, maybe a better name for$\mathcal{B}_3$would be *generalised Beta prime distribution*. # Cumulative distribution function The cumulative distribution function of$\mathcal{B}_3$involves the *Appell hypergeometric function*$F_1$. A Fortran implementation of this function is available in the R package appell. This package has been removed from CRAN, but you can still install it. If$\Phi \sim \mathcal{B}_3(c,d,\kappa,\tau)$, then, for$q \geqslant 0$, $$\Pr(\Phi \leqslant q) = \frac{q^c F_1\left(c; \kappa, c+d-\kappa; c+1; -q, -\frac{q}{\tau}\right)} {cB(c,d){}_2\!F_1\left(c, c+d-\kappa, c+d; 1 - \frac{1}{\tau}\right)}.$$ Here is a R implementation. I found that it works well with the option userflag = 0 of the appellf1 function. {r} pB3 <- function(q, c, d, kappa, tau, userflag = 0){ stopifnot(c > 0, d > 0, tau > 0) if(kappa == 0){ pbetaprime(q, c, d, tau) }else if(kappa == c+d){ pbetaprime(q, c, d, 1) }else{ C <- beta(c,d) * Gauss2F1(c, c+d-kappa, c+d, 1-1/tau) Appell <- appell::appellf1(c, kappa, c+d-kappa, c+1, -q, -q/tau, userflag = userflag) q^c/c * Re(Appell$val) / C } }  # Sampling the Beta distribution of the third kind It is not very easy to sample the $\mathcal{B}_3$ distribution. In her [master thesis](https://savoirs.usherbrooke.ca/bitstream/handle/11143/9640/Chabot_Myriam_MSc_2016.pdf), Myriam Chabot proved that it can be represented as a discrete mixture of $\mathcal{B}_2$ distributions, and we will use this result. This result is the following one. - For $\tau < 1$, let $K$ be a random variable on $\mathbb{N}$ whose probability mass at $k\in\mathbb{N}$ is given by $$\frac{1}{{}_2\!F_1(d, c+d-\kappa, c+d, 1-\tau)} \frac{(1-\tau)^k}{k!} \frac{{(c+d-\kappa)}_k{(d)}_k}{{(c+d)}_k}$$ and let $\Phi$ be a random variable such that $$(\Phi \mid K=k) \sim \mathcal{B}'(c, d+k, 1).$$ Then $\Phi \sim \mathcal{B}_3(c,d,\kappa,\tau)$. - For $\tau > 1$, let $K$ be a random variable on $\mathbb{N}$ whose probability mass at $k\in\mathbb{N}$ is given by $$\frac{1}{{}_2\!F_1\left(c, c+d-\kappa, c+d, 1-\frac{1}{\tau}\right)} \frac{\left(1-\frac{1}{\tau}\right)^k}{k!} \frac{{(c+d-\kappa)}_k{(c)}_k}{{(c+d)}_k}$$ and let $\Phi$ be a random variable such that $$(\Phi \mid K=k) \sim \mathcal{B}'(c+k, d, 1).$$ Then $\Phi \sim \mathcal{B}_3(c,d,\kappa,\tau)$. So we can sample $\mathcal{B}_3(c,d,\kappa,\tau)$ if we are able to sample these discrete distributions. To do so, we use the Runuran package. {r, message=FALSE} library(Runuran) pmf_unnormalized <- function(k, c, d, kappa, tau){ out <- numeric(length(k)) positive <- k >= 0 k <- k[positive] out[positive] <- if(tau < 1){ exp(k*log(1-tau) - lfactorial(k) + lnpoch(c+d-kappa,k) + lnpoch(d,k) - lnpoch(c+d,k)) }else{ exp(k*log(1-1/tau) - lfactorial(k) + lnpoch(c+d-kappa,k) + lnpoch(c,k) - lnpoch(c+d,k)) } out } NormalizingConstant <- function(c, d, kappa, tau){ if(tau < 1){ hyperg_2F1(d, c+d-kappa, c+d, 1-tau) }else{ hyperg_2F1(c, c+d-kappa, c+d, 1-1/tau) } } Ksampler <- function(n, c, d, kappa, tau){ dist <- unuran.discr.new( pmf = function(k) pmf_unnormalized(k, c, d, kappa, tau), lb = 0, ub= Inf, sum = NormalizingConstant(c, d, kappa, tau) ) unuran <- unuran.new(dist, method="dgt") ur(unuran, n) } rB3 <- function(n, c, d, kappa, tau){ stopifnot(c > 0, d > 0, tau > 0) if(tau == 1 || kappa == c+d){ rbetaprime(n, c, d, 1) }else if(kappa == 0){ rbetaprime(n, c, d, tau) }else{ K <- Ksampler(n, c, d, kappa, tau) if(tau < 1){ rbetaprime(n, c, d + K, 1) }else{ rbetaprime(n, c + K, d, 1) } } }  Let's check. The density: {r density} c <- 2; d <- 3; kappa <- 4; tau <- 5 nsims <- 1000000 sims <- rB3(nsims, c, d, kappa, tau) plot(density(sims, from = 0, to = quantile(sims, 0.95))) curve(dB3(x, c, d, kappa, tau), add = TRUE, col = "red", lty = "dashed", lwd = 2)  The cumulative distribution function: {r cdf} q <- seq(0.1, 4, length.out = 10)[-6] cdfValues <- sapply(q, function(x) pB3(x, c, d, kappa, tau)) empirical_cdf <- ecdf(sims) plot(empirical_cdf, xlim = c(0,4)) points(q, cdfValues, pch=19)  I've removed the sixth value of the vector q because there is a crash of appellf1 for this value: {r, error=TRUE} q <- seq(0.1, 4, length.out = 10) pB3(q[6], c, d, kappa, tau)  It works with the option userflag = 1: {r} pB3(q[6], c, d, kappa, tau, userflag = 1)  Finally perhaps the option userflag = 1 is a better default value: {r cdfbis} cdfValues <- sapply(q, function(x){ pB3(x, c, d, kappa, tau, userflag = 1) }) plot(empirical_cdf, xlim = c(0,4)) points(q, cdfValues, pch=19)  # Application to the Bayesian binomial model Consider the binomial statistical model parameterized by the odds ratio $\phi$: $$L(\phi \mid x) \propto \frac{\phi^x}{(1+\phi)^n}.$$ Take a $\mathcal{B}_3(c,d,\kappa,\tau)$ prior distribution on $\phi$: $$\pi(\phi) \propto \frac{\phi^{c-1}(1+\phi)^{-\kappa}} {\left(1+\frac{\phi}{\tau}\right)^{c+d-\kappa}}$$ Then the posterior distribution on $\phi$ is $\mathcal{B}_3(c+x,d+n-x,\kappa+n,\tau)$. # Application to the Bayesian "two Poisson samples" model Consider the statistical model given by two independent observations $$x \sim \mathcal{P}(\lambda S), \quad y\sim\mathcal{P}(\mu T)$$ where $S$ and $T$ are known "sample sizes". We parametrize the model by $\mu$ and $\phi := \frac{\lambda}{\mu}$. Assigning a Gamma prior distribution $\mathcal{G}{B}(a,b)$ on $\mu$, it is not difficult to get $$(\mu \mid \phi,x,y) \sim\mathcal{G}(a+x+y, b + \phi S + T).$$ In their paper *A Bayesian framework for the ratio of two Poisson rates in the context of vaccine efficacy trials* (2011), Laurent & Legrand derived the marginal likelihood on $\phi$ in the case of the semi-conjuguate family of prior distributions. Their result holds as long as $\mu$ and $\phi$ are independent under the prior distribution, and this result is $$\widetilde{L}(\phi \mid x,y) \propto \frac{\phi^x}{{(1+\rho\phi)}^{a+x+y}}$$ where $\rho = \frac{S}{T+b}$. Now, let's assign a $\mathcal{B}'(c,d,\tau)$ prior distribution on $\phi$. Then the posterior distribution on $\phi$ is given by $$\pi(\phi \mid x,y) \propto \pi(\phi) \widetilde{L}(\phi \mid x,y) \propto \frac{\phi^{c+x-1}(1+\rho\phi)^{-(a+x+y)}} {\left(1+\frac{\phi}{\tau}\right)^{c+d}},$$ and by noting that $\frac{\phi}{\tau} = \frac{\rho\phi}{\rho\tau}$, we recognize the scaled $\mathcal{B}_3$ distribution $$(\phi \mid x,y) \sim \frac{1}{\rho} \times \mathcal{B}_3(c+x, a+d+y, a+x+y, \rho\tau).$$ In particular, if $\tau = \frac{1}{\rho} = \frac{T+b}{S}$, then we find $$(\phi \mid x,y) \sim \mathcal{B}_2(c+x, a+d+y, \tau),$$ and this is the situation of the semi-conjugate family studied by Laurent & Legrand.