{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Compact Python wrapper library for commonly used _R-style_ functions\n", "

\n", " Basic functional programming nature of R provides users with extremely simple and compact interface for quick calculations of probabilities and essential descriptive/inferential statistics for a data analysis problem. On the other hand, Python scripting ability allows the analyst to use those statistics in a wide variety of analytics pipeline with limitless sophistication and creativity. To combine the advantage of both worlds, one needs a simple Python-based wrapper library which contains some basic functions pertaining to probability distributions and descriptive statistics defined in R-style so that users can call those functions fast without having to go to the proper Python statistical libraries and figure out the whole list of methods and arguments.\n", "

\n", "

\n", " Goal of this library is to provide simple Python sub-routines mimicing R-style statistical functions for quickly calculating density/point estimates, cumulative distributions, quantiles, and generating random variates for various important probability distributions. To maintain the spirit of R styling, no class hiararchy was used and just raw functions are defined in this file so that user can import this one Python script and use all the functions whenever he/she needs them with a single name call.\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic descriptive stats" ] }, { "cell_type": "code", "execution_count": 426, "metadata": {}, "outputs": [], "source": [ "def mean(array):\n", " \"\"\"\n", " Calculates the mean of an array/vector\n", " \"\"\"\n", " import numpy as np\n", " array=np.array(array)\n", " result= np.mean(array)\n", " return result" ] }, { "cell_type": "code", "execution_count": 427, "metadata": {}, "outputs": [], "source": [ "def sd(array):\n", " \"\"\"\n", " Calculates the standard deviation of an array/vector\n", " \"\"\"\n", " import numpy as np\n", " array=np.array(array)\n", " result= np.std(array)\n", " return result" ] }, { "cell_type": "code", "execution_count": 428, "metadata": {}, "outputs": [], "source": [ "def median(array):\n", " \"\"\"\n", " Calculates the median of an array/vector\n", " \"\"\"\n", " import numpy as np\n", " array=np.array(array)\n", " result= np.median(array)\n", " return result" ] }, { "cell_type": "code", "execution_count": 429, "metadata": {}, "outputs": [], "source": [ "def var(array):\n", " \"\"\"\n", " Calculates the variance of an array/vector\n", " \"\"\"\n", " import numpy as np\n", " array=np.array(array)\n", " result= np.var(array)\n", " return result" ] }, { "cell_type": "code", "execution_count": 430, "metadata": {}, "outputs": [], "source": [ "def cov(x,y=None):\n", " \"\"\"\n", " Calculates the covariance between two arrays/vectors or of a single matrix\n", " \"\"\"\n", " import numpy as np\n", " array1=np.array(x)\n", " if y!=None:\n", " array2=np.array(y)\n", " if array1.shape!=array2.shape:\n", " print(\"Error: incompatible dimensions\")\n", " return None\n", " covmat=np.cov(array1,array2)\n", " result=covmat[0][1]\n", " elif len(array1.shape)==1:\n", " result=float(np.cov(array1))\n", " else:\n", " result=np.cov(array1)\n", " return result " ] }, { "cell_type": "code", "execution_count": 431, "metadata": {}, "outputs": [], "source": [ "def fivenum(array):\n", " \"\"\"\n", " Calculates the Tuckey Five-number (min/median/max/1st quartile/3rd quartile) of an array/vector\n", " \"\"\"\n", " import numpy as np\n", " array=np.array(array)\n", " result=[0]*5\n", " result[0]=np.min(array)\n", " result[1]=np.percentile(array,25)\n", " result[2]=np.median(array)\n", " result[3]=np.percentile(array,75)\n", " result[4]=np.max(array)\n", " result=np.array(result)\n", " \n", " return result" ] }, { "cell_type": "code", "execution_count": 432, "metadata": {}, "outputs": [], "source": [ "def IQR(array):\n", " \"\"\"\n", " Calculates the inter-quartile range of an array/vector\n", " \"\"\"\n", " import numpy as np\n", " array=np.array(array)\n", " result = np.percentile(array,75)-np.percentile(array,25)\n", " \n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Probability distributions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Uniform distribution" ] }, { "cell_type": "code", "execution_count": 433, "metadata": {}, "outputs": [], "source": [ "def dunif(x, minimum=0,maximum=1):\n", " \"\"\"\n", " Calculates the point estimate of the uniform distribution\n", " \"\"\"\n", " from scipy.stats import uniform\n", " result=uniform.pdf(x=x,loc=minimum,scale=maximum-minimum)\n", " return result" ] }, { "cell_type": "code", "execution_count": 434, "metadata": {}, "outputs": [], "source": [ "def punif(q, minimum=0,maximum=1):\n", " \"\"\"\n", " Calculates the cumulative of the uniform distribution\n", " \"\"\"\n", " from scipy.stats import uniform\n", " result=uniform.cdf(x=q,loc=minimum,scale=maximum-minimum)\n", " return result" ] }, { "cell_type": "code", "execution_count": 435, "metadata": {}, "outputs": [], "source": [ "def qunif(p, minimum=0,maximum=1):\n", " \"\"\"\n", " Calculates the quantile function of the uniform distribution\n", " \"\"\"\n", " from scipy.stats import uniform\n", " result=uniform.ppf(q=p,loc=minimum,scale=maximum-minimum)\n", " return result" ] }, { "cell_type": "code", "execution_count": 436, "metadata": {}, "outputs": [], "source": [ "def runif(n, minimum=0,maximum=1):\n", " \"\"\"\n", " Generates random variables from the uniform distribution\n", " \"\"\"\n", " from scipy.stats import uniform\n", " result=uniform.rvs(size=n,loc=minimum,scale=maximum-minimum)\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Binomial distribution" ] }, { "cell_type": "code", "execution_count": 437, "metadata": {}, "outputs": [], "source": [ "def dbinom(x,size,prob=0.5):\n", " \"\"\"\n", " Calculates the point estimate of the binomial distribution\n", " \"\"\"\n", " from scipy.stats import binom\n", " result=binom.pmf(k=x,n=size,p=prob,loc=0)\n", " return result" ] }, { "cell_type": "code", "execution_count": 438, "metadata": {}, "outputs": [], "source": [ "def pbinom(q,size,prob=0.5):\n", " \"\"\"\n", " Calculates the cumulative of the binomial distribution\n", " \"\"\"\n", " from scipy.stats import binom\n", " result=binom.cdf(k=q,n=size,p=prob,loc=0)\n", " return result" ] }, { "cell_type": "code", "execution_count": 439, "metadata": {}, "outputs": [], "source": [ "def qbinom(p, size, prob=0.5):\n", " \"\"\"\n", " Calculates the quantile function from the binomial distribution\n", " \"\"\"\n", " from scipy.stats import binom\n", " result=binom.ppf(q=p,n=size,p=prob,loc=0)\n", " return result" ] }, { "cell_type": "code", "execution_count": 440, "metadata": {}, "outputs": [], "source": [ "def rbinom(n,size,prob=0.5):\n", " \"\"\"\n", " Generates random variables from the binomial distribution\n", " \"\"\"\n", " from scipy.stats import binom\n", " result=binom.rvs(n=size,p=prob,size=n)\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Normal distribution" ] }, { "cell_type": "code", "execution_count": 441, "metadata": {}, "outputs": [], "source": [ "def dnorm(x,mean=0,sd =1):\n", " \"\"\"\n", " Calculates the density of the Normal distribution\n", " \"\"\"\n", " from scipy.stats import norm\n", " result=norm.pdf(x,loc=mean,scale=sd)\n", " return result" ] }, { "cell_type": "code", "execution_count": 442, "metadata": {}, "outputs": [], "source": [ "def pnorm(q,mean=0,sd=1):\n", " \"\"\"\n", " Calculates the cumulative of the normal distribution\n", " \"\"\"\n", " from scipy.stats import norm\n", " result=norm.cdf(x=q,loc=mean,scale=sd)\n", " return result" ] }, { "cell_type": "code", "execution_count": 443, "metadata": {}, "outputs": [], "source": [ "def qnorm(p,mean=0,sd=1):\n", " \"\"\"\n", " Calculates the quantile function of the normal distribution\n", " \"\"\"\n", " from scipy.stats import norm\n", " result=norm.ppf(q=p,loc=mean,scale=sd)\n", " return result" ] }, { "cell_type": "code", "execution_count": 444, "metadata": {}, "outputs": [], "source": [ "def rnorm(n,mean=0,sd=1):\n", " \"\"\"\n", " Generates random variables from the normal distribution\n", " \"\"\"\n", " from scipy.stats import norm\n", " result=norm.rvs(size=n,loc=mean,scale=sd)\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Poisson distribution" ] }, { "cell_type": "code", "execution_count": 445, "metadata": {}, "outputs": [], "source": [ "def dpois(x,mu):\n", " \"\"\"\n", " Calculates the density/point estimate of the Poisson distribution\n", " \"\"\"\n", " from scipy.stats import poisson\n", " result=poisson.pmf(k=x,mu=mu)\n", " return result" ] }, { "cell_type": "code", "execution_count": 446, "metadata": {}, "outputs": [], "source": [ "def ppois(q,mu):\n", " \"\"\"\n", " Calculates the cumulative of the Poisson distribution\n", " \"\"\"\n", " from scipy.stats import poisson\n", " result=poisson.cdf(k=q,mu=mu)\n", " return result" ] }, { "cell_type": "code", "execution_count": 447, "metadata": {}, "outputs": [], "source": [ "def qpois(p,mu):\n", " \"\"\"\n", " Calculates the quantile function of the Poisson distribution\n", " \"\"\"\n", " from scipy.stats import poisson\n", " result=poisson.ppf(q=p,mu=mu)\n", " return result" ] }, { "cell_type": "code", "execution_count": 448, "metadata": {}, "outputs": [], "source": [ "def rpois(n,mu):\n", " \"\"\"\n", " Generates random variables from the Poisson distribution\n", " \"\"\"\n", " from scipy.stats import poisson\n", " result=poisson.rvs(size=n,mu=mu)\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### $\\chi^2-$ distribution" ] }, { "cell_type": "code", "execution_count": 449, "metadata": {}, "outputs": [], "source": [ "def dchisq(x,df,ncp=0):\n", " \"\"\"\n", " Calculates the density/point estimate of the chi-square distribution\n", " \"\"\"\n", " from scipy.stats import chi2,ncx2\n", " if ncp==0:\n", " result=chi2.pdf(x=x,df=df,loc=0,scale=1)\n", " else:\n", " result=ncx2.pdf(x=x,df=df,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 450, "metadata": {}, "outputs": [], "source": [ "def pchisq(q,df,ncp=0):\n", " \"\"\"\n", " Calculates the cumulative of the chi-square distribution\n", " \"\"\"\n", " from scipy.stats import chi2,ncx2\n", " if ncp==0:\n", " result=chi2.cdf(x=q,df=df,loc=0,scale=1)\n", " else:\n", " result=ncx2.cdf(x=q,df=df,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 451, "metadata": {}, "outputs": [], "source": [ "def qchisq(p,df,ncp=0):\n", " \"\"\"\n", " Calculates the quantile function of the chi-square distribution\n", " \"\"\"\n", " from scipy.stats import chi2,ncx2\n", " if ncp==0:\n", " result=chi2.ppf(q=p,df=df,loc=0,scale=1)\n", " else:\n", " result=ncx2.ppf(q=p,df=df,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 452, "metadata": {}, "outputs": [], "source": [ "def rchisq(n,df,ncp=0):\n", " \"\"\"\n", " Generates random variables from the chi-square distribution\n", " \"\"\"\n", " from scipy.stats import chi2,ncx2\n", " if ncp==0:\n", " result=chi2.rvs(size=n,df=df,loc=0,scale=1)\n", " else:\n", " result=ncx2.rvs(size=n,df=df,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Student's t-distribution" ] }, { "cell_type": "code", "execution_count": 453, "metadata": {}, "outputs": [], "source": [ "def dt(x,df,ncp=0):\n", " \"\"\"\n", " Calculates the density/point estimate of the t-distribution\n", " \"\"\"\n", " from scipy.stats import t,nct\n", " if ncp==0:\n", " result=t.pdf(x=x,df=df,loc=0,scale=1)\n", " else:\n", " result=nct.pdf(x=x,df=df,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 454, "metadata": {}, "outputs": [], "source": [ "def pt(q,df,ncp=0):\n", " \"\"\"\n", " Calculates the cumulative of the t-distribution\n", " \"\"\"\n", " from scipy.stats import t,nct\n", " if ncp==0:\n", " result=t.cdf(x=q,df=df,loc=0,scale=1)\n", " else:\n", " result=nct.cdf(x=q,df=df,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 455, "metadata": {}, "outputs": [], "source": [ "def qt(p,df,ncp=0):\n", " \"\"\"\n", " Calculates the quantile function of the t-distribution\n", " \"\"\"\n", " from scipy.stats import t,nct\n", " if ncp==0:\n", " result=t.ppf(q=p,df=df,loc=0,scale=1)\n", " else:\n", " result=nct.ppf(q=p,df=df,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 456, "metadata": {}, "outputs": [], "source": [ "def rt(n,df,ncp=0):\n", " \"\"\"\n", " Generates random variables from the t-distribution\n", " \"\"\"\n", " from scipy.stats import t,nct\n", " if ncp==0:\n", " result=t.rvs(size=n,df=df,loc=0,scale=1)\n", " else:\n", " result=nct.rvs(size=n,df=df,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### F-distribution" ] }, { "cell_type": "code", "execution_count": 457, "metadata": {}, "outputs": [], "source": [ "def df(x,df1,df2,ncp=0):\n", " \"\"\"\n", " Calculates the density/point estimate of the F-distribution\n", " \"\"\"\n", " from scipy.stats import f,ncf\n", " if ncp==0:\n", " result=f.pdf(x=x,dfn=df1,dfd=df2,loc=0,scale=1)\n", " else:\n", " result=ncf.pdf(x=x,dfn=df1,dfd=df2,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 458, "metadata": {}, "outputs": [], "source": [ "def pf(q,df1,df2,ncp=0):\n", " \"\"\"\n", " Calculates the cumulative of the F-distribution\n", " \"\"\"\n", " from scipy.stats import f,ncf\n", " if ncp==0:\n", " result=f.cdf(x=q,dfn=df1,dfd=df2,loc=0,scale=1)\n", " else:\n", " result=ncf.cdf(x=q,dfn=df1,dfd=df2,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 459, "metadata": {}, "outputs": [], "source": [ "def qf(p,df1,df2,ncp=0):\n", " \"\"\"\n", " Calculates the quantile function of the F-distribution\n", " \"\"\"\n", " from scipy.stats import f,ncf\n", " if ncp==0:\n", " result=f.ppf(q=p,dfn=df1,dfd=df2,loc=0,scale=1)\n", " else:\n", " result=ncf.ppf(q=p,dfn=df1,dfd=df2,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 460, "metadata": {}, "outputs": [], "source": [ "def rf(n,df1,df2,ncp=0):\n", " \"\"\"\n", " Calculates the quantile function of the F-distribution\n", " \"\"\"\n", " from scipy.stats import f,ncf\n", " if ncp==0:\n", " result=f.rvs(size=n,dfn=df1,dfd=df2,loc=0,scale=1)\n", " else:\n", " result=ncf.rvs(size=n,dfn=df1,dfd=df2,nc=ncp,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Beta distribution" ] }, { "cell_type": "code", "execution_count": 461, "metadata": {}, "outputs": [], "source": [ "def dbeta(x,shape1,shape2):\n", " \"\"\"\n", " Calculates the density/point estimate of the Beta-distribution\n", " \"\"\"\n", " from scipy.stats import beta\n", " result=beta.pdf(x=x,a=shape1,b=shape2,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 462, "metadata": {}, "outputs": [], "source": [ "def pbeta(q,shape1,shape2):\n", " \"\"\"\n", " Calculates the cumulative of the Beta-distribution\n", " \"\"\"\n", " from scipy.stats import beta\n", " result=beta.cdf(x=q,a=shape1,b=shape2,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 463, "metadata": {}, "outputs": [], "source": [ "def qbeta(p,shape1,shape2):\n", " \"\"\"\n", " Calculates the cumulative of the Beta-distribution\n", " \"\"\"\n", " from scipy.stats import beta\n", " result=beta.ppf(q=p,a=shape1,b=shape2,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 464, "metadata": {}, "outputs": [], "source": [ "def rbeta(n,shape1,shape2):\n", " \"\"\"\n", " Calculates the cumulative of the Beta-distribution\n", " \"\"\"\n", " from scipy.stats import beta\n", " result=beta.rvs(size=n,a=shape1,b=shape2,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gamma distribution" ] }, { "cell_type": "code", "execution_count": 465, "metadata": {}, "outputs": [], "source": [ "def dgamma(x,shape,rate=1):\n", " \"\"\"\n", " Calculates the density/point estimate of the Gamma-distribution\n", " \"\"\"\n", " from scipy.stats import gamma\n", " result=rate*gamma.pdf(x=rate*x,a=shape,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 466, "metadata": {}, "outputs": [], "source": [ "def pgamma(q,shape,rate=1):\n", " \"\"\"\n", " Calculates the cumulative of the Gamma-distribution\n", " \"\"\"\n", " from scipy.stats import gamma\n", " result=gamma.cdf(x=rate*q,a=shape,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 467, "metadata": {}, "outputs": [], "source": [ "def qgamma(p,shape,rate=1):\n", " \"\"\"\n", " Calculates the cumulative of the Gamma-distribution\n", " \"\"\"\n", " from scipy.stats import gamma\n", " result=(1/rate)*gamma.ppf(q=p,a=shape,loc=0,scale=1)\n", " return result" ] }, { "cell_type": "code", "execution_count": 468, "metadata": {}, "outputs": [], "source": [ "def rgamma(n,shape,rate=1):\n", " \"\"\"\n", " Calculates the cumulative of the Gamma-distribution\n", " \"\"\"\n", " from scipy.stats import gamma\n", " result=gamma.rvs(size=n,a=shape,loc=0,scale=1)\n", " return result" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }