---
title: "Poisson-Beta, Gamma-Beta, and Poisson-Gamma-Beta distributions"
author: "Stéphane Laurent"
date: '2019-07-19'
tags: maths, statistics, R, special-functions
output:
  md_document:
    variant: markdown
    preserve_yaml: true
  html_document:
    highlight: kate
    keep_md: no
prettify: yes
linenums: yes
prettifycss: minimal
highlighter: pandoc-solarized
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, 
                      fig.path = "figures/PGGBPGB-")
```

# Poisson-Beta distribution

The *Poisson-Beta distribution* with parameters $a,b,\theta>0$ is defined 
as the distribution of a random variable $N$ such that
$$
(N \mid U=u) \sim \mathcal{P}(\theta u), \quad U \sim \mathcal{B}(a,b).
$$
It is easy to simulate:

```{r}
rPB <- function(n, a, b, theta = 1){
  stopifnot(a>0, b>0, theta>0)
  rpois(n, theta * rbeta(n, a, b))
}
```

Thanks to the first integral representation of the 
Kummer function ${}_1\!F_1$ given 
[here](http://functions.wolfram.com/HypergeometricFunctions/Hypergeometric1F1/07/01/01/), 
its probability mass at $x \in \mathbb{N}$ is
$$
\frac{\theta^x}{x!}\frac{B(a+x,b)}{B(a,b)}{}_1\!F_1(a+x, a+b+x; -\theta).
$$
The Kummer function ${}_1\!F_1$ is available in the R package `gsl`. 
Here is a R implementation of the probabiity mass function of 
$\mathcal{P}\mathcal{B}(a,b,\theta)$, vectorized in `x`:

```{r}
library(gsl)
dPB <- function(x, a, b, theta = 1){
  stopifnot(a>0, b>0, theta>0)
  out <- numeric(length(x))
  positive <- x >= 0
  x <- x[positive]
  logC <- lbeta(a+x,b) - lbeta(a,b) - lfactorial(x) + x*log(theta)
  H <- hyperg_1F1(a+x, a+b+x, -theta)
  out[positive] <- exp(logC)*H
  out
}
```

Let's check:

```{r}
a <- 10; b <- 5; theta <- 2
nsims <- 200000
sims <- rPB(nsims, a, b, theta)
sum(sims <= 3) / nsims
sum(dPB(0:3, a, b, theta))
```


# Gamma-Beta distribution

The *Gamma-Beta distribution* with parameters $\nu, a, b, \theta > 0$ 
is defined as the distribution of a random variable $X$ such that
$$
(X \mid U=u) \sim \mathcal{G}(\nu, \theta u), 
\quad U \sim \mathcal{B}(a,b),
$$
where the second parameter of the Gamma distribution $\mathcal{G}$ is 
its *rate* parameter, not the *scale* parameter. 

Let's write a sampler for this distribution:

```{r}
rGB <- function(n, nu, a, b, theta=1){
  stopifnot(nu>0, a>0, b>0, theta>0)
  rgamma(n, shape = nu, rate = theta*rbeta(n, a, b)) 
}
```

Equivalently, the $\mathcal{G}{B}(\nu,a,b,\theta)$ distribution is 
the distribution of the quotient $Y/U$ where 
$Y \sim \mathcal{G}(\nu,\theta)$ is independent of 
$U \sim \mathcal{B}(a,b)$. 
Thus this is also the distribution of $\frac{Y/U}{\theta}$ 
where $Y \sim \mathcal{G}(\nu,1)$ is independent of 
$U \sim \mathcal{B}(a,b)$, and then this is a scaled 
*confluent hypergeometric function kind one distribution* 
(the distribution of $Y/U$). As far as I know, this name firstly occurs 
in the book *Matrix variate distributions* by Gupta and Nagar, 
who studied the matrix-valued confluent hypergeometric function 
kind one distribution. 
In the paper 
*Properties of Matrix Variate Confluent Hypergeometric Function Distribution*, 
Gupta & al. more deeply study this distribution. Moreover they 
introduce a scale parameter in this distribution. With their 
notations, 
$$
\mathcal{G}\mathcal{B}(\nu, a, b, \theta) = 
CH_1(\nu, a+\nu, a+b+\nu, 1/\theta),
$$
that is, 
$$
CH_1(\nu, \alpha, \beta, \theta) = 
\mathcal{G}\mathcal{B}(\nu, \alpha-\nu, \beta-\alpha, 1/\theta).
$$
The constraints on the parameters of 
$CH_1(\nu, \alpha, \beta, \theta)$ are $\nu>0$, 
$\alpha>\nu$, $\beta \geqslant \alpha$ and $\theta > 0$. 
Thus it is not more general than the 
$\mathcal{G}\mathcal{B}$ distribution, except for the case 
$\beta = \alpha$. We will see that it is a Gamma distribution in 
this case. 
The condition $\beta \geqslant \alpha$ is not mentionned in the 
literature; Gupta & Nagar, Gupta & al., wrote $\beta > \nu$. 
However the density function they give is a positive quantity multiplied 
by ${}_1F_1\left(\alpha, \beta; -\frac{x}{\theta}\right)$, which 
can take negative values when $\beta < \alpha$.

The density function of $\mathcal{G}{B}(\nu,a,b,\theta)$ at 
$x \geqslant 0$ is, thanks to 
[this integral representation](https://en.wikipedia.org/wiki/Confluent_hypergeometric_function#Integral_representations) 
of the Kummer function ${}_1\!F_1$, 
$$
\theta^\nu \frac{\Gamma(a+b)}{B(a,\nu)\Gamma(a+b+\nu)}x^{\nu-1}
{}_1\!F_1(a+\nu, a+b+\nu; -\theta x).
$$
Indeed, this expression makes sense for $b = 0$. 
One has ${}_1\!F_1(\alpha, 0; -\theta x) = \exp(-\theta x)$, 
and thus we get the density of $\mathcal{G}(\nu, \theta)$ when $b=0$. 
This is not suprising because $\mathcal{B}(a,b) \to \delta_1$ when 
$b \to 0^+$. 

Thanks to the identity
$$
{}_1\!F_1(a+\nu, a+b+\nu; -\theta x) = 
\exp(-\theta x) {}_1\!F_1(b, a+b+\nu; \theta x),
$$
we can easily see that the density function is indeed positive.

Now let's implement this density in R:

```{r}
dGB <- function(x, nu, a, b, theta = 1){
  stopifnot(nu > 0, a > 0, b >= 0, theta > 0)
  if(b == 0){
    return(dgamma(x, shape = nu, rate = theta))
  }
  out <- numeric(length(x))
  positive <- x >= 0
  x <- x[positive]
  nu_eq_1 <- rep(nu == 1, length(x))
  log_x_power_nu_minus_one <- 
    ifelse(nu_eq_1, 0, (nu-1)*log(x))
  C <- exp(-lbeta(a,nu) - lnpoch(a+b,nu) + 
             log_x_power_nu_minus_one + nu*log(theta))
  d <- ifelse(C == Inf, # occurs when x=0 and nu < 1
              Inf, C*hyperg_1F1(a+nu, a+b+nu, -x*theta))
  out[positive] <- d
  out
}
```

Let's check:

```{r}
nu <- 5
sims <- rGB(nsims, nu, a, b, theta)
plot(density(sims, from = 0, to = quantile(sims, 0.99)))
curve(dGB(x, nu, a, b, theta), add = TRUE, 
      col = "red", lwd = 2, lty = "dashed")
```


Now, let's derive the cumulative distribution function of 
$\mathcal{G}\mathcal{B}(\nu, a, b, \theta)$. We have to derive 
the integral
$$
I = \int_0^q x^{\nu-1}
{}_1\!F_1(a+\nu, a+b+\nu; -\theta x) \, \mathrm{d}x.
$$
It is 
[known](http://functions.wolfram.com/HypergeometricFunctions/Hypergeometric1F1/21/01/02/01/) 
that
$$
\int_0^q x^{\nu-1} {}_1\!F_1(a, b; cx) \,\mathrm{d}x = 
\frac{q^\nu}{\nu} {}_2\!F_2(a, \nu; b, \nu+1; cq).
$$
Therefore, 
$$
I = \frac{q^\nu}{\nu} {}_2\!F_2(a+\nu, \nu; a+b+\nu, \nu+1; -\theta q).
$$

There is no solid implementation of the generalized hypergeometric function 
${}_2\!F_2$ in R. To evaluate it, we use its 
[double integral representation](http://functions.wolfram.com/HypergeometricFunctions/Hypergeometric2F2/07/03/)
$$
{}_2\!F_2(a_1, a_2; b_1, b_2; z) = 
\frac{\Gamma(b_1)}{\Gamma(a_1)\Gamma(b_1-a_1)} 
\frac{\Gamma(b_2)}{\Gamma(a_2)\Gamma(b_2-a_2)} 
\times ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{}\\
\int_0^1\int_0^1 
u^{a_1-1}(1-u)^{b_1-a_1-1} v^{a_2-1}(1-v)^{b_2-a_2-1}
\exp(zuv)\,\mathrm{d}u\mathrm{d}v, 
$$
valid for $b_1>a_1>0$ and $b_2>a_2>0$.

Thus, 
$$
I = \frac{q^\nu}{\nu}
\frac{\Gamma(a+b+\nu)}{\Gamma(a+\nu)\Gamma(b)} 
\frac{\Gamma(\nu+1)}{\Gamma(\nu)\Gamma(1)} 
\times ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{}\\
\int_0^1\int_0^1 
u^{a+\nu-1}(1-u)^{b-1} v^{\nu-1}
\exp(-\theta quv)\,\mathrm{d}u\mathrm{d}v.
$$
We get
$$
\Pr(X \leq q) = C \times \int_0^1\int_0^1 
u^{a+\nu-1}(1-u)^{b-1} v^{\nu-1}
\exp(-\theta quv)\,\mathrm{d}u\mathrm{d}v
$$
where 
$$
C = (\theta q)^\nu \frac{\Gamma(a+b)}{B(a,\nu)\Gamma(a+\nu)\Gamma(b)} =
 \frac{(\theta q)^\nu}{B(a,b)\Gamma(\nu)}.
$$

We implement this double integral with the `cubature` package:

```{r}
library(cubature)
pGBintegrand <- function(uv, theta, q){
  u <- uv[1]; v <- uv[2]
  u^(a+nu-1)*(1-u)^(b-1)*v^(nu-1)*exp(-q*u*v*theta)
}
pGB <- function(q, nu, a, b, theta = 1){
  stopifnot(nu > 0, a > 0, b >= 0, theta > 0)
  if(b == 0){
    return(pgamma(q, shape = nu, rate = theta))
  }
  if(q <= 0) return(0)
  if(q == Inf) return(1)
  integral_ <- 
    pcubature(pGBintegrand, c(0,0), c(1,1), theta = theta, q = q)
  integral <- integral_$integral
  exp(nu*log(theta*q) - lgamma(nu) - lbeta(a,b)) * integral
}
pGB(3, nu, a, b, theta)
sum(sims < 3)/nsims
```


# Poisson-Gamma-Beta distribution

The *Poisson-Gamma-Beta distribution* with parameters $\nu,a,b,\theta>0$ is 
defined as the distribution of a random variable $N$ such that
$$
(N \mid X=x) \sim \mathcal{P}(x), 
\quad X \sim \mathcal{G}\mathcal{B}(\nu,a,b,\theta).
$$
Equivalently,
$$
(N \mid X=x) \sim \mathcal{P}\left(\frac{x}{\theta}\right), 
\quad X \sim \mathcal{G}\mathcal{B}(\nu,a,b,1).
$$
The sampler is easy to implement:

```{r}
rPGB <- function(n, nu, a, b, theta = 1){
  stopifnot(nu>0, a>0, b>0, theta>0)
  rpois(n, rgamma(n, shape = nu, rate = theta*rbeta(n, a, b)))
}
```

The probability mass of 
$\mathcal{P}\mathcal{G}\mathcal{B}(\nu,a,b,\theta)$ at 
$k \in \mathbb{N}$ is 
$$
\theta^\nu\frac{{(\nu)}_k}{k!}\frac{B(a+\nu,b)}{B(a,b)}
{}_2\!F_1(a+\nu, \nu+k; a+b+\nu; -\theta),
$$
which is derived from 
[this identity](http://functions.wolfram.com/HypergeometricFunctions/Hypergeometric1F1/21/02/02/).

Let's write a R implementation. The `gsl` package provides the function 
`hyperg_2F1` which evaluates the Gauss hypergeometric function ${}_2\!F_1$. 
It works flawlessly when the variable $x$ is in the interval $[0,1)$. 
When $x<0$, we can come down to the case $x \in (0,1)$ by the 
[Pfaff transformation formula](https://en.wikipedia.org/wiki/Hypergeometric_function#Fractional_linear_transformations). 
Our `Gauss2F1` function below is an implementation of this procedure. 

```{r}
Gauss2F1 <- function(a,b,c,x){
  if(x>=0 && x<1){
    hyperg_2F1(a, b, c, x)
  }else{
    hyperg_2F1(c-a, b, c, 1-1/(1-x)) / (1-x)^b
  }
}
dPGB <- function(x, nu, a, b, theta=1, log.p = FALSE){
  stopifnot(nu > 0, a > 0, b > 0, theta > 0)
  out <- numeric(length(x))
  positive <- x >= 0
  x <- as.integer(x[positive])
  logC <- lnpoch(nu,x) - lfactorial(x) - lbeta(a,b) + 
    lbeta(a+nu,b) + nu*log(theta)
  H <- Gauss2F1(a+nu, nu+x, a+b+nu, -theta)
  out[positive] <- if(log.p) logC + log(H) else exp(logC) * H
  out
}
```

And let's check:

```{r}
sims <- rPGB(nsims, nu, a, b, theta)
sum(sims <= 5) / nsims
sum(dPGB(0:5, nu, a, b, theta))
```

We can write a R function for the cumulative distribution more efficient 
than taking the sum of the probability masses:

```{r}
pPGB <- function(q, nu, a, b, theta, lower.tail = TRUE, log.p = FALSE){
  if(q < 0){
    p <- if(log.p) -Inf else 0
    return(if(lower.tail) p else 1-p)
  }
  stopifnot(nu > 0, a > 0, b > 0, theta > 0)
  n <- c(0L:as.integer(q))
  logC <- lnpoch(nu,n) - lfactorial(n) - lbeta(a,b) + 
    lbeta(a+nu,b) + nu*log(theta)
  H <- Gauss2F1(a+nu, nu+n, a+b+nu, -theta)
  p <- sum(exp(logC)*H)
  p <- if(lower.tail) p else 1-p
  if(log.p){
    log(p)
  }else{
    p
  }
}
pPGB(5, nu, a, b, theta)
```