{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Bootstrapping\n", "\n", "Parametric inference can be non-robust:\n", "\n", "- inaccurate if parametric assumptions are violated\n", "- if we rely on asymptotic results, we may not achieve an acceptable level of accuracy\n", "\n", "Parmetric inference can be difficult:\n", "- derivation of sampling distribution may not be possible\n", "\n", "An alternative is to estimate the sampling distribution of a statistic **empirically** without making assumptions about the form of the population.\n", "\n", "The bootstrap is a simulation tool for assessing accuracy. It is related to cross-validation, which we will encounter later in the course when we talk about tuning machine learning models.\n", "\n", "This approach is most commonly used as a non-parametric method for calculating standard errors and confidence intervals. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "sns.set_context('notebook')\n", "\n", "np.random.seed(42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Jackknife\n", "\n", "A simple precursor to bootstrapping is the jackknife (Quenouille 1949), which facilitates the estimation of the bias and variance of an estimator. Recall that the bias of an estimator $\\widehat{\\theta}$ of $\\theta$ is:\n", "\n", "$$Bias(\\widehat{\\theta}) = E(\\widehat{\\theta}) - \\theta$$\n", "\n", "Consider calculating an estimate using $n-1$ values from the dataset $\\widehat{\\theta}_{(-i)}$ (that is, the $i^{th}$ value is removed). If this is repeated for each $i=1,\\ldots,n$ observation, we can average them to obtain:\n", "\n", "$$\\bar{\\theta}_{(n)} = n^{-1} \\sum_i \\widehat{\\theta}_{(-i)}$$\n", "\n", "The **jackknife bias estimate** is:\n", "\n", "$$b_{jack} = (n-1)(\\bar{\\theta}_{(n)} - \\widehat{\\theta})$$\n", "\n", "It can be shown that $b_{jk}$ estimates the bias up to order $O(n^{-2})$.\n", "\n", "Thus, a bias-corrected estimator is:\n", "\n", "$$\\widehat{\\theta}_{jack} = \\widehat{\\theta} - b_{jack}$$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def jackknife_bias(data, func, **kwargs):\n", " theta_hat = func(data, **kwargs)\n", " n = data.shape[0]\n", " idx = np.arange(n)\n", " theta_jack = sum(func(data[idx!=i], **kwargs) for i in range(n))/n\n", " return (n-1) * (theta_jack - theta_hat)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "x = np.random.normal(0, 2, 100)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.8072323532892591" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.std(x)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.82071996604174" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.std(x) - jackknife_bias(x, np.std)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way of expressing $\\widehat{\\theta}_{jack}$ is:\n", "\n", "$$\\widehat{\\theta}_{jack} = (n^{-1}) \\sum_i \\widetilde{\\theta}_i$$\n", "\n", "where $\\widetilde{\\theta}_i$ are known as **pseudovalues** and are defined as:\n", "\n", "$$\\widetilde{\\theta}_i = n \\widehat{\\theta} - (n-1) \\widehat{\\theta}_{(-i)}$$" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def jackknife(data, func, **kwargs):\n", " theta_hat = func(data, **kwargs)\n", " n = data.shape[0]\n", " idx = np.arange(n)\n", " return sum(n*theta_hat - (n-1)*func(data[idx!=i], **kwargs) for i in range(n))/n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.8207199660417088" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jackknife(x, np.std)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Correspondingly, the **jackknife variance estimate** of the estimator $\\widehat{\\theta}$ is:\n", "\n", "$$v_{jack} = \\frac{\\widetilde{s}^2}{n}$$\n", "\n", "where $\\widetilde{s}^2$ is the sample variance of the pseudo-values:\n", "\n", "$$\\widetilde{s}^2 = \\frac{\\sum_i (\\widetilde{\\theta}_i - n^{-1}\\sum_i \\widetilde{\\theta}_i)^2}{n-1}$$\n", "\n", "Under certain regularity conditions, $v_{jack}$ is a consistent estimator of $V(\\widehat{\\theta})$. It can be inconsistent for some statistics, such as the median, that are not smooth." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Write a function that implements a jackknife variance estimator." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# write your answer here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Bootstrap\n", "\n", "The bootstrap is a resampling method discovered by [Brad Efron](http://www.jstor.org/discover/10.2307/2958830?uid=3739568&uid=2&uid=4&uid=3739256&sid=21102342537691) that allows one to approximate the true sampling distribution of a dataset, and thereby obtain estimates of the mean and variance of some function of the distribution.\n", "\n", "In general, consider a statistic $T_n = g(X_1, \\ldots, X_n)$, which has variance $V_F(T_n)$. Note this implies that the variance is a function of the underlying $F$, which is unknown. \n", "\n", "One approach is to estimate $V_F(T_n)$ with some $V_{F_n}(T_n)$, which is a plug-in estimator of the variance. But, $V_{F_n}(T_n)$ may be difficult to compute, so we can attempt to approximate it with a **simulation estimate** $V_{boot}$. \n", "\n", "Here is the algorithm:\n", "\n", "1. Draw $X_{1}^*, \\ldots, X_{n}^* \\sim \\widehat{F}_n$\n", "2. For each r in R iterations: \n", " Calculate statistic $T^*_{rn} = g(X^*_{r1}, \\ldots, X^*_{rn})$\n", "3. Calculate variance: $V_{boot} = \\frac{1}{R} \\sum_r \\left(T^*_{rn} - \\bar{T}^*_{.n} \\right)^2$\n", "\n", "Rather than finding a way of drawing $n$ points at random from $\\widehat{F}_n$, we can instead draw a sample of size $n$ **with replacement from the original data**. So, step 1 can be modified to:\n", "\n", "> 1. Draw $X_{1}^*, \\ldots, X_{n}^*$ with replacement from $X_{1}, \\ldots, X_{n}$\n", "\n", "\n", "These are called **bootrstrap samples**:\n", "\n", "
\n", "$$S_1^* = \\{X_{11}^*, X_{12}^*, \\ldots, X_{1n}^*\\}$$\n", "
\n", "\n", "We regard S as an \"estimate\" of population P\n", "\n", "> population : sample :: sample : bootstrap sample" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To generate samples with replacement in Python, there are a handful of approaches. We can use NumPy to generate random integers (np,random.randint), and use these to index DataFrame rows with iloc." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lcavollweightagelbphsvilcpgleasonpgg45lpsatrain
1-0.5798182.76945950-1.3862940-1.38629460-0.430783T
2-0.9942523.31962658-1.3862940-1.38629460-0.162519T
3-0.5108262.69124374-1.3862940-1.386294720-0.162519T
4-1.2039733.28278958-1.3862940-1.38629460-0.162519T
50.7514163.43237362-1.3862940-1.386294600.371564T
\n", "
" ], "text/plain": [ " lcavol lweight age lbph svi lcp gleason pgg45 lpsa \\\n", "1 -0.579818 2.769459 50 -1.386294 0 -1.386294 6 0 -0.430783 \n", "2 -0.994252 3.319626 58 -1.386294 0 -1.386294 6 0 -0.162519 \n", "3 -0.510826 2.691243 74 -1.386294 0 -1.386294 7 20 -0.162519 \n", "4 -1.203973 3.282789 58 -1.386294 0 -1.386294 6 0 -0.162519 \n", "5 0.751416 3.432373 62 -1.386294 0 -1.386294 6 0 0.371564 \n", "\n", " train \n", "1 T \n", "2 T \n", "3 T \n", "4 T \n", "5 T " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prostate_data = pd.read_table('../data/prostate.data.txt', index_col=0)\n", "prostate_data.head()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lcavollweightagelbphsvilcpgleasonpgg45lpsatrain
9-0.7765293.53950947-1.3862940-1.386294601.047319F
621.9974183.719651631.61938811.9095427402.853592F
371.4231083.65713173-0.57981901.6582288152.157559T
973.4719663.974998680.43825512.9041657205.582932F
511.0919233.99360368-1.3862940-1.3862947502.656757T
\n", "
" ], "text/plain": [ " lcavol lweight age lbph svi lcp gleason pgg45 \\\n", "9 -0.776529 3.539509 47 -1.386294 0 -1.386294 6 0 \n", "62 1.997418 3.719651 63 1.619388 1 1.909542 7 40 \n", "37 1.423108 3.657131 73 -0.579819 0 1.658228 8 15 \n", "97 3.471966 3.974998 68 0.438255 1 2.904165 7 20 \n", "51 1.091923 3.993603 68 -1.386294 0 -1.386294 7 50 \n", "\n", " lpsa train \n", "9 1.047319 F \n", "62 2.853592 F \n", "37 2.157559 T \n", "97 5.582932 F \n", "51 2.656757 T " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random_ind = np.random.randint(0, prostate_data.shape[0], 5)\n", "prostate_data.iloc[random_ind]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy's random.choice function makes this even easier, and allows for custom sampling probabilities when non-uniform sampling is desired." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([44, 24, 79, 59, 32])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.choice(prostate_data.index, 5)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lcavollweightagelbphsvilcpgleasonpgg45lpsatrain
962.8825643.773910681.55814511.5581457805.477509T
881.7316563.36901862-1.38629410.3001057303.712352T
521.6601314.234831642.0731720-1.386294602.677591T
621.9974183.719651631.61938811.9095427402.853592F
580.4637343.764682491.4231080-1.386294602.794228T
\n", "
" ], "text/plain": [ " lcavol lweight age lbph svi lcp gleason pgg45 \\\n", "96 2.882564 3.773910 68 1.558145 1 1.558145 7 80 \n", "88 1.731656 3.369018 62 -1.386294 1 0.300105 7 30 \n", "52 1.660131 4.234831 64 2.073172 0 -1.386294 6 0 \n", "62 1.997418 3.719651 63 1.619388 1 1.909542 7 40 \n", "58 0.463734 3.764682 49 1.423108 0 -1.386294 6 0 \n", "\n", " lpsa train \n", "96 5.477509 T \n", "88 3.712352 T \n", "52 2.677591 T \n", "62 2.853592 F \n", "58 2.794228 T " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prostate_data.loc[np.random.choice(prostate_data.index, 5)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Generate 20 samples with replacement from the prostate dataset, weighting samples inversely by age." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Write your answer here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As of pandas version 0.16.1, resampling data from a DataFrame became even easier with the addition of the sample method." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lcavollweightagelbphsvilcpgleasonpgg45lpsatrain
521.6601314.234831642.0731720-1.386294602.677591T
12-1.3470743.598681631.2669480-1.386294601.266948T
392.6609594.085136681.37371611.8325817352.213754T
2-0.9942523.31962658-1.3862940-1.38629460-0.162519T
3-0.5108262.69124374-1.3862940-1.386294720-0.162519T
561.2669484.280132662.1222620-1.3862947152.718001T
811.4678743.070376660.55961600.2231447403.516013T
590.5423244.178226700.4382550-1.3862947202.806386T
2-0.9942523.31962658-1.3862940-1.38629460-0.162519T
2-0.9942523.31962658-1.3862940-1.38629460-0.162519T
922.5329033.677566611.3480731-1.3862947154.129551T
542.1270414.121473681.76644201.4469197402.691243F
872.0241933.731699581.6389970-1.386294603.680091T
962.8825643.773910681.55814511.5581457805.477509T
973.4719663.974998680.43825512.9041657205.582932F
1-0.5798182.76945950-1.3862940-1.38629460-0.430783T
19-0.5621193.26766641-1.3862940-1.386294601.558145T
2-0.9942523.31962658-1.3862940-1.38629460-0.162519T
530.5128243.633631641.49290400.0487907702.684440F
441.7715573.89690961-1.38629400.810930762.374906F
\n", "
" ], "text/plain": [ " lcavol lweight age lbph svi lcp gleason pgg45 \\\n", "52 1.660131 4.234831 64 2.073172 0 -1.386294 6 0 \n", "12 -1.347074 3.598681 63 1.266948 0 -1.386294 6 0 \n", "39 2.660959 4.085136 68 1.373716 1 1.832581 7 35 \n", "2 -0.994252 3.319626 58 -1.386294 0 -1.386294 6 0 \n", "3 -0.510826 2.691243 74 -1.386294 0 -1.386294 7 20 \n", "56 1.266948 4.280132 66 2.122262 0 -1.386294 7 15 \n", "81 1.467874 3.070376 66 0.559616 0 0.223144 7 40 \n", "59 0.542324 4.178226 70 0.438255 0 -1.386294 7 20 \n", "2 -0.994252 3.319626 58 -1.386294 0 -1.386294 6 0 \n", "2 -0.994252 3.319626 58 -1.386294 0 -1.386294 6 0 \n", "92 2.532903 3.677566 61 1.348073 1 -1.386294 7 15 \n", "54 2.127041 4.121473 68 1.766442 0 1.446919 7 40 \n", "87 2.024193 3.731699 58 1.638997 0 -1.386294 6 0 \n", "96 2.882564 3.773910 68 1.558145 1 1.558145 7 80 \n", "97 3.471966 3.974998 68 0.438255 1 2.904165 7 20 \n", "1 -0.579818 2.769459 50 -1.386294 0 -1.386294 6 0 \n", "19 -0.562119 3.267666 41 -1.386294 0 -1.386294 6 0 \n", "2 -0.994252 3.319626 58 -1.386294 0 -1.386294 6 0 \n", "53 0.512824 3.633631 64 1.492904 0 0.048790 7 70 \n", "44 1.771557 3.896909 61 -1.386294 0 0.810930 7 6 \n", "\n", " lpsa train \n", "52 2.677591 T \n", "12 1.266948 T \n", "39 2.213754 T \n", "2 -0.162519 T \n", "3 -0.162519 T \n", "56 2.718001 T \n", "81 3.516013 T \n", "59 2.806386 T \n", "2 -0.162519 T \n", "2 -0.162519 T \n", "92 4.129551 T \n", "54 2.691243 F \n", "87 3.680091 T \n", "96 5.477509 T \n", "97 5.582932 F \n", "1 -0.430783 T \n", "19 1.558145 T \n", "2 -0.162519 T \n", "53 2.684440 F \n", "44 2.374906 F " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prostate_data.sample(20, replace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bootstrap Estimates\n", "\n", "From our bootstrapped samples, we can extract *estimates* of the expectation and its variance:\n", "\n", "$$\\bar{T}^* = \\hat{E}(T^*) = \\frac{\\sum_r T_r^*}{R}$$\n", "\n", "$$\\hat{\\text{Var}}(T^*) = \\frac{\\sum_r (T_r^* - \\bar{T}^*)^2}{R-1}$$\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Under appropriate regularity conditions, $\\frac{s^2}{\\hat{\\text{Var}}(T^*)} \\rightarrow 1$ as $n \\rightarrow \\infty$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bootstrap Confidence Intervals\n", "\n", "There are a handful of ways for constructing confidence intervals from bootstrap samples, varying in ease of calculation and accuracy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bootstrap Normal Interval\n", "\n", "Perhaps the simplest bootstrap interval is the normal interval:\n", "\n", "$$T_n ± z_{α/2} \\widehat{SE}_{boot}$$\n", "\n", "where $\\widehat{SE}_{boot}$ is an estimate of the standard error using the bootstrap sample. Of course, this interval is not accurate unless the distribution of $T_n$ is close to Gaussian.\n", "\n", "\n", "We can first define a **pivotal interval**. Let $\\theta = T(F)$ and $\\widehat{\\theta}_n = T(\\widehat{F}_n)$, and further define $P_n = \\widehat{\\theta}_n - \\theta$. \n", "\n", "Let $H(p)$ denote the CDF of the pivot:\n", "\n", "$$H(p) = Pr_F(P_n \\le p)$$.\n", "\n", "Now define the interval $C_n = (a, b)$ where:\n", "\n", "\\begin{aligned}\n", "a &= \\widehat{\\theta}_n - H^{-1}(1-\\alpha/2) \\\\\n", "b &= \\widehat{\\theta}_n - H^{-1}(\\alpha/2).\n", "\\end{aligned}\n", "\n", "It can be shown that $C_n$ is an exact $1 - \\alpha$ confidence interval for $\\theta$. But clearly, $a$ and $b$ depend on the unknown $H$. \n", "\n", "A pivot confidence interval uses a bootstrap estimate of H as an approximation:\n", "\n", "$$\\widehat{H}_P = \\frac{1}{R} \\sum_r I\\left[(\\widehat{\\theta}_{nr} - \\widehat{\\theta}_{n.}) < p\\right]$$\n", "\n", "From this, an approximate $1-\\alpha$ confidence interval is $C_n = (\\widehat{a}, \\widehat{b})$:\n", "\n", "\\begin{aligned}\n", "\\widehat{a} &= \\widehat{\\theta}_n - \\widehat{H}^{-1}(1-\\alpha/2) = 2\\widehat{\\theta}_n - \\theta^*_{1 - \\alpha/2} \\\\\n", "\\widehat{b} &= \\widehat{\\theta}_n - \\widehat{H}^{-1}(\\alpha/2) = 2\\widehat{\\theta}_n - \\theta^*_{\\alpha/2}.\n", "\\end{aligned}\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's calculate the pivot interval for the standard deviation of the log-weights in the prostate dataset." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.42619720436063674" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weights = prostate_data.lweight.copy()\n", "n = weights.shape[0]\n", "weights.std(ddof=0)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1000, 97)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "R = 1000\n", "samples = np.array([weights.sample(n, replace=True) for r in range(R)])\n", "samples.shape" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "theta_hat = weights.std(ddof=0)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "estimates = samples.std(axis=1)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.36307114, 0.49457513])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2*theta_hat - np.percentile(estimates, [97.5, 2.5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An alternative approach for calculating a pivotal interval is to define:\n", "\n", "$$Z_n = \\frac{T_n - \\theta}{\\widehat{SE}_{boot}}$$\n", "\n", "which can be approximated with: \n", "\n", "$$Z^*_{nr} = \\frac{T^*_{nr} - T_n}{\\widehat{SE}^*_{boot}}$$\n", "\n", "where $\\widehat{SE}^*_{boot}$ is the standard error of $T^*_{nr}$.\n", "\n", "Here, the quantiles of $Z^*_{n1}, \\ldots, Z^*_{nR}$ should approximate the true quantiles of the distribution of $Z_n$. We can then calculate the following interval:\n", "\n", "$$C_n = (T_n - z^*_{1-\\alpha/2}\\widehat{SE}_{boot}, T_n - z^*_{\\alpha/2}\\widehat{SE}_{boot})$$\n", "\n", "where $z^*_{a}$ is the $a$ sample quantile of $Z^*_{n1}, \\ldots, Z^*_{nR}$.\n", "\n", "This is a **studentized pivotal interval**.\n", "\n", "This interval is more computationally-intensive because $\\widehat{SE}^*_{boot}$ has to be calculated for each bootstrap sample, but it has better accuracy than the non-studentized interval." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Write a function to calculate studentized pivotal intervals for arbitrary functions." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# Write your answer here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bootstrap Percentile Intervals\n", "\n", "An even simpler interval involves using the empirical quantiles of the bootstrapped statistics. Consider a monotone transformation $U = m(T)$ such that $U \\sim N(m(\\theta), c^2)$. Importantly, we don't need to know what $m$ is, only that it exists. If we let $U^*_r = m(T^*_r)$, then $U^*_{(R\\alpha/2)} = m(T^*_{(R\\alpha/2)})$ because the monotone transformation preserves the quantiles.\n", "\n", "Since $U \\sim N(m(\\theta), c^2)$, the $\\alpha/2$ quantile of $U$ is $m(\\theta) - z_{\\alpha/2} c$. From this, it can be shown that:\n", "\n", "$$Pr(T^*_{R\\alpha/2} \\le \\theta \\le T^*_{R(1 - \\alpha/2)}) = Pr\\left( -z_{\\alpha/2} \\le \\frac{U - m(T)}{c} \\le z_{\\alpha/2} \\right) = 1 - \\alpha$$\n", "\n", "This employs the *ordered* bootstrap replicates:\n", "\n", "$$T_{(1)}^*, T_{(2)}^*, \\ldots, T_{(R)}^*$$\n", "\n", "Simply extract the $100(\\alpha/2)$ and $100(1-\\alpha/2)$ percentiles:\n", "\n", "$$T_{[(R+1)\\alpha/2]}^* \\lt \\theta \\lt T_{[(R+1)(1-\\alpha/2)]}^*$$" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lcavollweightagelbphsvilcpgleasonpgg45lpsatrain
1-0.5798182.76945950-1.3862940-1.38629460-0.430783T
2-0.9942523.31962658-1.3862940-1.38629460-0.162519T
3-0.5108262.69124374-1.3862940-1.386294720-0.162519T
4-1.2039733.28278958-1.3862940-1.38629460-0.162519T
50.7514163.43237362-1.3862940-1.386294600.371564T
\n", "
" ], "text/plain": [ " lcavol lweight age lbph svi lcp gleason pgg45 lpsa \\\n", "1 -0.579818 2.769459 50 -1.386294 0 -1.386294 6 0 -0.430783 \n", "2 -0.994252 3.319626 58 -1.386294 0 -1.386294 6 0 -0.162519 \n", "3 -0.510826 2.691243 74 -1.386294 0 -1.386294 7 20 -0.162519 \n", "4 -1.203973 3.282789 58 -1.386294 0 -1.386294 6 0 -0.162519 \n", "5 0.751416 3.432373 62 -1.386294 0 -1.386294 6 0 0.371564 \n", "\n", " train \n", "1 T \n", "2 T \n", "3 T \n", "4 T \n", "5 T " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prostate_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extract subset of varibles, and take bootstrap samples." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "data_subset = prostate_data[['lcavol', 'lweight', 'lbph', 'lcp', 'lpsa']]\n", "\n", "bootstrap_data = (data_subset.sample(data_subset.shape[0], replace=True) for _ in range(1000))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will user scikit-learn's LinearRegression model to fit a regression model to each bootstrap sample, and store the resulting coefficients." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "from sklearn.linear_model import LinearRegression\n", "\n", "regmod = LinearRegression()\n", "\n", "# Empty array to store estimates\n", "coefs = np.empty((1000, 4))\n", "\n", "for i,X in enumerate(bootstrap_data):\n", " y = X.pop('lpsa')\n", " regmod.fit(X, y)\n", " coefs[i] = regmod.coef_" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.58320937, 0.58820544, 0.05492929, 0.09324581])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coefs.mean(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sort coefficients, and extract percentiles:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "coefs.sort(axis=0)\n", "boot_se = coefs[[25, 975], :].T" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEBCAYAAAB/rs7oAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAEzVJREFUeJzt3X2QXXV9x/H3lyCBAJoZIA6wCaGVoA7SDEIpahAL1JKB0ZanxEjFqghM46yFaSsPhYrAdMR26xAkZCyxkEYUpjhCqBbaYJRgGWF5srDykCegEpjGhwbTMXz7xz2LN3c32bvZs7m7+3u/Zu7cc36/3z37vSebz/7uOWfPRmYiSSrHbp0uQJK0axn8klQYg1+SCmPwS1JhDH5JKozBL0mFMfglqTAGvyQVxuCXpMIY/JJUGINfkgqze6cLAIiIycAxwEvA1g6XI0njxSTgQOChzNzS7ovGRPDTCP1VnS5CksapOcD32x08VoL/JYBVq1bR1dXV6VokaVzYsGEDc+bMgSpD2zVWgn8rQFdXFzNnzuxwKZI07gzrELkndyWpMAa/JBXG4Jekwhj8klQYg1+SCmPwS1JhDH51Tnd34yFplxor1/GrRL29na5AKpIzfkkqzJDBHxHXRcTzEZERccR2xkyKiEUR8WxEPBMRn6y/VElSHdqZ8d8JHA+s3cGYBcDbgMOA44ArI2LmSIuTJNVvyGP8mfl9gIjY0bCzgSWZ+TqwMSLuBM4Evtg6MCKmAlNbmr0zmyTtInWd3J3Btp8I1gHTtzO2G7iipq8rSRqmTlzV0wMsbWnrwvvxS9IuUVfwrwMOAR6q1ls/AbwhMzcBm5rbhjiMJEmqUV2Xc34T+FRE7BYRBwAfBu6oaduSpBq1cznnlyNiA43DMfdGxJNV+4qIOLoadgvwHPAT4EHg85n53CjVLEkagXau6vkM8JlB2uc2LW8FLqi3NEnSaPA3dyWpMAa/JBXG4Jekwhj8klQYg1+SCmPwS1JhDH5JKozBL0mFMfglqTAGvyQVxuCXpMIY/JJUGINfkgpj8EtSYQx+SSqMwS9JhTH4JakwBr8kFcbgl6TCGPySVBiDX5IKY/BLUmEMfkkqjMEvSYUx+CWpMAa/JBXG4JekwrQV/BExKyJWR0Rf9XzYIGOmRcTdEfFYRDwVETdExO71lyxJGol2Z/w3AosycxawCFg8yJhLgP/KzCOBdwHvBv64liolSbUZMvgjYhpwFLC8aloOHBURB7QMTWDfiNgNmAzsAbxQY62SpBq0cyhmOvBCZm4FyMytEfFi1b6xadxVwB3AS8DewPWZ+YPWjUXEVGBqS3PXTtQuSdoJdZ7cPRN4DDgQOBg4PiLOGGRcN/B8y2NVjXVIknagneBfDxwcEZMAqueDqvZmC4Flmfl6Zv4M+BbwgUG21wMc2vKYs3PlS5KGa8jgz8yXgV5gftU0H3gkMze2DH0e+EOAiNgDOAl4YpDtbcrMNc0PYMPOvwVJ0nC0e6jnfGBhRPTRmNmfDxARKyLi6GpMNzAnIh6n8YOiD1hSc72SpBFq6zr7zHwKOHaQ9rlNy88CJ9dXmiRpNPibu5JUGINfkgpj8EtSYQx+SSqMwS9JhTH4h6O7u/GQpHHM2yYPR29vpyuQpBFzxi9JhTH4JakwBr8kFcbgl6TCGPySVBiDX5IKY/BLUmEMfkkqjMEvSYUx+CWpMAa/JBXG4Jekwhj8klQYg1+SCmPwS1JhDH5JKozBL0mFMfglqTAGvyQVxuCXpMIY/JJUmLaCPyJmRcTqiOirng/bzrizIuLxiHiien5rveVKkkaq3Rn/jcCizJwFLAIWtw6IiKOBK4GTM/MI4H3Az2qqU5JUk92HGhAR04CjgJOrpuXA9RFxQGZubBr6WeC6zPxvgMwcNPQjYiowtaW5a7iFS5J2Tjsz/unAC5m5FaB6frFqb/ZO4Lci4nsR8XBEXBYRMcj2uoHnWx6rdvYNaJxatgwefBDuvx9mzmysS9ol6jy5uztwJI1PBu8HTgHOGWRcD3Boy2NOjXWMDoOqPsuWwXnnwZYtjfW1axvr7lNpl2gn+NcDB0fEJIDq+aCqvdla4PbM3JKZvwC+Bfxu68Yyc1Nmrml+ABtG8iZGnUFVr0svhc2bt23bvLnRLmnUDRn8mfky0AvMr5rmA4+0HN8H+GfgD6LhTcCJwKN1FtsxBlW91q0bXrukWrV7qOd8YGFE9AELq3UiYkV1NQ/A14GXgR/T+EHxJPDVesvtEIOqXjNmDK9dUq2GvKoHIDOfAo4dpH1u0/LrwJ9Xj4llxozG4Z3B2jV8V1/dOFTW/ClqypRGu6RR52/utuPqqxvB1Myg2nkLFsBNN8HkyY31Qw5prC9Y0Nm6pEK0NeMvXn8gfeITjRO8hxzSCH2DauctWABLljSWV67saClSaQz+dhlUkiYID/VIUmEMfkkqjMEvSYUx+CWpMAa/JBXG4Jekwhj8klQYg1+SCmPwS1JhDH5JKozBL0mFMfglqTAGvyQVxuCXpMIY/JJUGINfkgpj8EtSYQx+SSqMwS9JhTH4JakwBr8kFWb3Thcwrsye3ekKJGnEDP7h6OnpdAWSNGIe6pGkwrQV/BExKyJWR0Rf9XzYDsYeHhGbI+K6+sqUJNWl3Rn/jcCizJwFLAIWDzYoIiZVfXfWU54kqW5DBn9ETAOOApZXTcuBoyLigEGG/xVwF9BXW4WSpFq1c3J3OvBCZm4FyMytEfFi1b6xf1BEHAl8EPgAcPn2NhYRU4GpLc1dw6xbkrSTarmqJyLeBCwBPl79YNjR8G7gijq+riRp+NoJ/vXAwRExqQr1ScBBVXu/A4HfBlZUoT8ViIh4c2ae17K9HmBpS1sXsGon6pckDdOQwZ+ZL0dELzAfuLV6fiQzNzaNWQfs378eEVcC+2TmxYNsbxOwqbltiE8IkqQatXtVz/nAwojoAxZW60TEiog4erSKkyTVr61j/Jn5FHDsIO1ztzP+ypGVJUkaLf7mriQVxuCXpMIY/JJUGINfkgpj8EtSYQx+SSqMwS9JhTH4JakwBr8kFcbgl6TCGPySVBiDX5IKY/BLUmEMfkkqjMEvSYUx+CWpMAa/JBXG4Jekwhj8klQYg1+SCmPwS1JhDH5JKozBL0mtursbjwlq904XIEljTm9vpysYVc74JakwBr8kFcbgl6TCGPySVJi2gj8iZkXE6ojoq54PG2TM5RHxZEQ8GhE/iogP1l+uJGmk2p3x3wgsysxZwCJg8SBj/hM4JjN/B/hT4LaI2KueMiVJdRnycs6ImAYcBZxcNS0Hro+IAzJzY/+4zPxO08seAwLYD9jQsr2pwNSWL9M1/NIlSTujnev4pwMvZOZWgMzcGhEvVu0bt/OaPwGezcwNg/R1A1fsTLGSpJGr/Re4IuL9wFX85hNCqx5gaUtbF7Cq7lokSQO1E/zrgYMjYlI1258EHFS1byMijgNuBT6UmU8PtrHM3ARsanndsAuXJO2cIU/uZubLQC8wv2qaDzzSfHwfICKOAW4DzsjMh+suVJJUj3av6jkfWBgRfcDCap2IWBERR1djbgD2AhZHRG/1eFftFUsaaILfVEz1ausYf2Y+BRw7SPvcpuVjaqxL0nBM8JuKqV7enVOdM3t2pyuQimTwq3N6ejpdgVQk79UjSYUx+CWpMAa/JBXG4Jekwhj8klQYg1+SCmPwS1JhDH5JKozBL0mFMfglqdmyZfDgg3D//TBzZmN9gjH4JanfsmVw3nmwZUtjfe3axvoEC3+DX5L6XXopbN68bdvmzY32CcTgl6R+69YNr32cMvglqd+MGcNrH6cMfknqd/XVMGXKtm1TpjTaJxCDX5L6LVgAN90Ekyc31g85pLG+YEFn66qZf4hFGu/6Lz/csqVx+eHVV0+4oNqlFiyAJUsayytXdrSU0eKMXxrPCrn8UPUy+KXxrJDLD1Uvg18azwq5/FD1Mvil8ayQyw9VL4NfGs8KufxQ9TL4pfGskMsPVS8v55TGuwIuP1S9nPFLUmHaCv6ImBURqyOir3o+bJAxkyJiUUQ8GxHPRMQn6y9XkjRS7c74bwQWZeYsYBGweJAxC4C3AYcBxwFXRsTMGmqUJNVoyOCPiGnAUcDyqmk5cFREHNAy9GxgSWa+npkbgTuBM+ssVpI0cu2c3J0OvJCZWwEyc2tEvFi1b2waNwNY27S+rhqzjYiYCkxtae4CmDdvHnvuuecbjWeddRYXXnghmzdvZu7cuQMKO/fcczn33HN55ZVXOOOMMwb0X3DBBZx99tmsX7+ec845Z0D/RRddxGmnncbTTz/Npz/96QH9l112GSeddBK9vb10d3cP6L/mmmt4z3vewwMPPMAll1wyoL+np4fZs2dz77338oUvfGFA/+LFizn88MP59re/zZe+9KUB/bfccgvTp0/ntttu4ytf+cqA/ttvv53999+fpUuXsnTp0gH9K1asYMqUKdxwww184xvfGNC/sjoReN1113HXXXdt07fXXntxzz33AHDVVVdx3333bdO/3377cccddwDwuc99jtWrV2/T39XVxa233gpAd3c3vb292/TPmjWLm266CYDzzjuPvr6+bfpnz55NT08PAB/96EfZsGHDNv3HHXcc1157LQCnn346r7766jb9J554IpdffjkAp5xyCq+99to2/aeeeioXX3wxACeccAKtxtv3Xk+1f7ur9+L33gi/96rlsf69N2/evAH97ejEVT3dwBUd+LqSJCAyc8cDGod6+oD9qtn+JOBV4LDqkE7/uLuBmzPz9mr9emBtZn6xZXvbm/Gvev7555k5c+YI35JUoP6Zo5dz1mOc7M81a9Zw6KGHAhyamWvafd2Qx/gz82WgF5hfNc0HHmkO/co3gU9FxG7V8f8PA3cMsr1Nmbmm+QFsaB0nSRod7V7Vcz6wMCL6gIXVOhGxIiKOrsbcAjwH/AR4EPh8Zj5Xc72SpBFq6xh/Zj4FHDtI+9ym5a3ABfWVJkkaDf7mriQVxuCXpMJ4kzZJajV7dqcrGFUGvyS16ukZesw4ZvBLE8EEn6GqXga/NBFM8Bmq6uXJXUkqjMEvSYUx+CWpMAa/JBXG4Jekwhj8klQYg1+SCjNWruOfBAz4E2eSpO1rysxJw3ndkH+Ba1eIiPcBqzpdhySNU3My8/vtDh4rwT8ZOAZ4Cdja4XJ2pIvGD6g5+FfD6uD+rI/7sl7jZX9OAg4EHsrMLe2+aEwc6qkKbvunVadERP/ihuH8fUsNzv1ZH/dlvcbZ/nx2uC/w5K4kFcbgl6TCGPySVBiDf3g2AX9TPWvk3J/1cV/Wa0LvzzFxVY8kaddxxi9JhTH4JakwRQZ/RGRE7LOLv+bKiDh1V37N0VDnvouI3ojYq41xayLiiO30dUfEtDrqGYua9/eO9sMQ29ip15WgE1kwFhQZ/BobMnN2Zr42ws10AxM2+KXRUHzwR8Q7IuK7EfFYRDweER+r2i+KiIci4pGIWB0Rs6v2yyPi75tev19EvBoRe0fEPhFxc0Q8UT3+slPva7RFxAcj4u5qeVo1czqzWv+LiLimWj48Iu6p9uWjEfHxpm00z2bnVPv/sYj4h4hY2zJLPav6d1gTEX9WveZS4CDg9urTwzt30dvvpAUR8b2IeKZ/P8Abs/prB+urDNh/2tYOsmBlRPRUz8/0f2+Pa5lZ3ANIYB8at6zoA85s6tuvej6gqe0k4MFqeQaNewrtXq0vBP6xWv5b4GtAAG8GngROqfpWAqd2+r3XuO+mAK8CbwLmAw8AN1ZjvgOcWO3fHwFvr9r3BZ5uWu/f1mQa90OZU7X/UdV3RLW+BriuWp4J/BLYp6nviE7vl9He303vtf977a3Ai8CRbfYNuv9Kf7SZBSuB71Zj9gEeH+//l0uf8R9OI8C/2d+Qma9Wi++uZk9PAH8HzK761wE/BuZW484Fbq6WTwKWZMPPgeVV24STmZtp/GA7lsZ7/Dzw3ojYAzga+AEwC3gH8PWI6KVx06vJVVuzw4HXMnNVte1/YeD101+v+tYA/0PjJlol+ipAZv4UuBs4oc0+99+O7SgLAL6Wmb/OzF/S2Je/v6sLrNOYuElbB8WgjY3wuh04PjMfjoiDgBeahiwFPhYRzwFv6Q+sanutvxgxkX9R4j4aM/vfAy4Afgp8BHg0M38VjTtdvZKZs4fYzmD7rdWvmpa34vcu7Hi/tfa5/3Zs0CzYwdhx/f+69Bn/U8Cv+49NQ+OYPbAnjf8Y66vmC1tedwdwPHAxjR8C/f4N+GQ07AvMA+4dndLHhPuAjwPrM/P/qvUrq2doHNbZHBHn9L8gIt4eEW9u2c5TwN4R8d5qzIeAqW3W8HPgLTv9DsafcwEi4gDgFBqHIdrp045tLwv6nRMRu0fE3sCZwH/s6gLrVHTwZ+avgQ8B51cncx4F5laHaf4aeCgivgf8b8vrNgPfAs4B/qmp6yoas4HHgdXALZn5r6P/Tjrmh8D+/Cbo7wMOAf4d3ti/pwHzqhNmTwI3AHs0byQbt+X+CHBjRPyQxieInwI/a6OGLwM3F3Ryd11ErKLx/XVtZj7eZp92YHtZ0DTkYRqTuF7g7sy8qwNl1sZbNmhMiIh9M/MX1fIHaJwkn5mZr3e2svEhItbQOOH4RKdrmWgiYiWNk+PjOuybeZxPY8XpEfFZGp9CfwXMN/Sl0eGMX5IKU/QxfkkqkcEvSYUx+CWpMAa/JBXG4Jekwhj8klSY/wegrCNTBmH+RwAAAABJRU5ErkJggg==\n", "text/plain": [ "