{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7-final" }, "orig_nbformat": 2, "kernelspec": { "name": "Python 3.7.7 64-bit ('blog': conda)", "display_name": "Python 3.7.7 64-bit ('blog': conda)", "metadata": { "interpreter": { "hash": "a791f4b9424e3975ed64217961a14e58af61de2cbd0b16cd6f0b3a8e9eaf4389" } } } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "source": [ "# Beta distribution" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "When trying to to Bayesian interference to estimate probability, we can use Beta distribution as a prior. Below are some steps to calculate it.\n", "\n", "From the [Data Science from Scratch book](https://www.oreilly.com/library/view/data-science-from/9781492041122/)." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import math as m" ] }, { "source": [ "$ f(x, \\alpha, \\beta) = x^{\\alpha - 1} (1 - x)^{\\beta - 1} \\frac {1} {\\text{B}(\\alpha, \\beta)} $\n", "\n", "where\n", "\n", "$ \\text{B} = \\Gamma(\\alpha) \\Gamma(\\beta) \\frac{1} {\\Gamma(\\alpha + \\beta)} $\n", "\n", "where\n", "\n", "$\\Gamma(n)$ is the Gamma function which, for positive integers is\n", "\n", "$ \\Gamma(n) = (n - 1)!$" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def B(alpha: float, beta: float) -> float:\n", " \"This scales the parameters between 0 and 1\"\n", " return m.gamma(alpha) * m.gamma(beta) / m.gamma(alpha + beta)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def beta_pdf(x: float, alpha: float, beta:float) -> float:\n", " if x <= 0 or x >=1: return 0\n", "\n", " return x ** (alpha - 1) * (1 - x) ** (beta - 1) / B(alpha, beta)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import altair as alt" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame()\n", "\n", "Beta_combinations = [(1, 1), (10, 10), (4, 16), (16, 4), (30, 30)]\n", "\n", "for Beta in Beta_combinations:\n", " alpha, beta = Beta\n", " df_B = pd.DataFrame()\n", " df_B['x'] = pd.Series(np.arange(0.01, 1, .01)) \n", " df_B['y'] = df_B['x'].apply(lambda x: beta_pdf(x, alpha, beta))\n", " df_B['Beta'] = f'({alpha}, {beta})'\n", "\n", " df = pd.concat([df, df_B])\n" ] }, { "source": [ "The distribution centers around $ \\alpha \\frac{1} {\\alpha + \\beta} $\n", "\n", "- Beta(1, 1) is the uniform distribution in [1, 1]\n", "- When alpha is greater than beta the distribution is skewed to the left (and respectively, in the opposite case)\n", "- The greater alpha and beta are the 'tighter' is the distribution" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/html": "\n
\n", "text/plain": [ "alt.Chart(...)" ] }, "metadata": {}, "execution_count": 6 } ], "source": [ "alt.Chart(df).mark_line().encode(alt.X('x:Q'), alt.Y('y:Q'), alt.Color('Beta'), tooltip=['x', 'y', 'Beta'], strokeDash='Beta').properties(width=600)" ] } ] }