{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5-final" }, "orig_nbformat": 2, "kernelspec": { "name": "Python 3.8.5 64-bit ('streamlit': conda)", "display_name": "Python 3.8.5 64-bit ('streamlit': conda)", "metadata": { "interpreter": { "hash": "d5b47a323bfc67f94a37e7cb8fd52db1ad06cc55d227a5b8c375f2f5af6e26dc" } } } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "source": [ "# Probability interval boundaries with Python" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "In the previous post we calculated [proabilities]({% post_url 2020-10-07-probability-test %}) and [z values]({% post_url 2020-10-09-inverse-normal-cdf %}) with Python. Here we continue with finding the interval boundaries belonging to a particular tail or two-sided probabiltiy." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Libraries and helper functions" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import math as m\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def to_string(number: float, column_width: int = 20) -> str:\n", " # return str(number).ljust(column_width)\n", " return f\"{str(number): <{column_width}}\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def calc_normal_cdf(x: float, mu: float = 0, sigma: float = 1) -> float:\n", " return (1 + m.erf((x - mu) / m.sqrt(2) / sigma)) / 2" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def calc_inverse_normal_cdf(p: float, mu:float = 0, sigma: float = 1, tolerance: float = 1E-5, show_steps=False) -> float:\n", "\n", " if p == 0: return -np.inf\n", " if p == 1: return np.inf\n", "\n", " # In case it is not a standard normal distribution, calculate the standard normal first and then rescale\n", " if mu != 0 or sigma != 1:\n", " return mu + sigma * calc_inverse_normal_cdf(p, tolerance=tolerance)\n", "\n", " low_z = -10\n", " hi_z = 10\n", "\n", " if show_steps: print(f\"{'':<19}\".join(['low_z', 'mid_z', 'hi_z']), \"\\n\")\n", "\n", " while hi_z - low_z > tolerance:\n", " mid_z = (low_z + hi_z) / 2\n", " mid_p = calc_normal_cdf(mid_z)\n", "\n", " if mid_p < p:\n", " low_z = mid_z\n", " else:\n", " hi_z = mid_z\n", "\n", " if show_steps: print(\"\\t\".join(map(to_string, [low_z, mid_z, hi_z])))\n", "\n", " return mid_z\n" ] }, { "source": [ "## Upper bound" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def normal_upper_bound(probabilty: float, mu: float = 0, sigma: float = 1) -> float:\n", " \"\"\"Return z for which P(Z <= z) = probability\"\"\"\n", " return calc_inverse_normal_cdf(probabilty, mu, sigma)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def normal_lower_bound(probabilty: float, mu: float = 0, sigma: float = 1) -> float:\n", " \"\"\"Return z for which P(Z >= z) = probability\"\"\"\n", " return calc_inverse_normal_cdf(1 - probabilty, mu, sigma)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "0.0: -inf, inf\n0.1: -1.2816, 1.2816\n0.2: -0.8416, 0.8416\n0.3: -0.5244, 0.5244\n0.4: -0.2533, 0.2533\n0.5: -0.0000, -0.0000\n0.6: 0.2533, -0.2533\n0.7: 0.5244, -0.5244\n0.8: 0.8416, -0.8416\n0.9: 1.2816, -1.2816\n1.0: inf, -inf\n" ] } ], "source": [ "for i in np.arange(0, 1.1, .1):\n", " print(f\"{i:.1f}: {normal_upper_bound(i): >10.4f}, {normal_lower_bound(i): >10.4f}\")" ] }, { "source": [ "## Two sided bounds\n", "\n", "Two variations" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def normal_two_sided_bounds(probability: float, mu: float = 0, sigma: float = 1) -> float:\n", " if probability == 0: return 0, 0\n", "\n", " tail_probability = (1 - probability) / 2\n", " \n", " lower_bound = normal_upper_bound(tail_probability, mu, sigma)\n", " upper_bound = normal_lower_bound(tail_probability, mu, sigma)\n", "\n", " return lower_bound, upper_bound" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def normal_two_sided_bounds2(probability: float, mu: float = 0, sigma: float = 1) -> float:\n", " if probability == 0: return 0, 0\n", "\n", " if mu != 0:\n", " raise ValueError(\"Requires standard normal distribution\")\n", "\n", " tail_probability = (1 - probability) / 2\n", " z = normal_upper_bound(tail_probability, mu, sigma)\n", " \n", " return -z, z" ] }, { "source": [ "By default they do not produce the same results at the extremes because of how the inverse cdf function defines the theoretical uppern and lower boundaries (as in these cases it should produce +/- infinity).\n", "\n", "Accordingly, it is useful to define extreme values (i.e. probablities of 0 and 1)" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "0.0: True, (0, 0), (0, 0)\n0.1: False, (-0.12566566467285156, 0.12566566467285156), (0.12566566467285156, -0.12566566467285156)\n0.2: False, (-0.2533435821533203, 0.2533435821533203), (0.2533435821533203, -0.2533435821533203)\n0.3: False, (-0.3853130340576172, 0.3853130340576172), (0.3853130340576172, -0.3853130340576172)\n0.4: False, (-0.5243968963623047, 0.5243968963623047), (0.5243968963623047, -0.5243968963623047)\n0.5: False, (-0.6744861602783203, 0.6744861602783203), (0.6744861602783203, -0.6744861602783203)\n0.6: False, (-0.8416271209716797, 0.8416271209716797), (0.8416271209716797, -0.8416271209716797)\n0.7: False, (-1.0364246368408203, 1.0364246368408203), (1.0364246368408203, -1.0364246368408203)\n0.8: False, (-1.2815570831298828, 1.2815570831298828), (1.2815570831298828, -1.2815570831298828)\n0.9: False, (-1.6448497772216797, 1.6448497772216797), (1.6448497772216797, -1.6448497772216797)\n1.0: False, (-inf, inf), (inf, -inf)\n" ] } ], "source": [ "for i in np.arange(0, 1.1, .1):\n", " print(f\"{i:.1f}: {normal_two_sided_bounds(i) == normal_two_sided_bounds2(i)}, {normal_two_sided_bounds(i)}, {normal_two_sided_bounds2(i)}\")" ] } ] }