{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Maximum Likelihood Estimation\n", "\n", "**Functions**\n", "\n", "`np.log`, `scipy.special.gamma`, `scipy.special.gammaln`, `scipy.stats.norm.cdf`, `scipy.optimize.minimize`,\n", "`scipy.stats.t`, `np.var`, `np.std`, `scipy.stats.norm.pdf`\n", "\n", "### Exercise 20\n", "\n", "Simulate a set of i.i.d. Student's t random variables with degree of freedom\n", "parameter $\\nu=10$. Standardize the residuals so that they have unit variance\n", "using the fact that $V\\left[x \\right]=\\frac{\\nu}{\\nu-2}$. Use these to estimate\n", "the degree of freedom using maximum likelihood. Note that the likelihood of\n", "a standardized Student's t is\n", "\n", "$$f(x;\\nu,\\mu,\\sigma^{2})=\\frac{\\Gamma\\left(\\frac{\\nu+1}{2}\\right)}{\\Gamma\\left(\\frac{\\nu}{2}\\right)}\\,\\frac{1}{\\sqrt{\\pi(\\nu-2)}}\\,\\frac{1}{\\sigma}\\,\\frac{1}{\\left(1+\\frac{\\left(x-\\mu\\right)^{2}}{\\sigma^{2}(\\nu-2)}\\right)^{\\frac{\\nu+1}{2}}}$$\n", "\n", "where $\\Gamma\\left(\\right)$ is known as the gamma function." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:07.404380Z", "iopub.status.busy": "2021-09-22T10:07:07.404380Z", "iopub.status.idle": "2021-09-22T10:07:07.490383Z", "shell.execute_reply": "2021-09-22T10:07:07.491380Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The variance is 1.0435286272711175\n" ] } ], "source": [ "import numpy as np\n", "\n", "rs = np.random.RandomState(19991231)\n", "var = 10 / (10 - 2)\n", "rvs = rs.standard_t(10, size=1000) / np.sqrt(var)\n", "print(f\"The variance is {rvs.var()}\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:07.495381Z", "iopub.status.busy": "2021-09-22T10:07:07.495381Z", "iopub.status.idle": "2021-09-22T10:07:07.554380Z", "shell.execute_reply": "2021-09-22T10:07:07.554380Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The log like at the population nu is 1435.8970961055188\n" ] } ], "source": [ "from scipy.special import gammaln\n", "\n", "\n", "def std_t_loglik(nu, x):\n", " # These are fixed for now\n", " mu = 0\n", " sigma2 = 1\n", " sigma = np.sqrt(sigma2)\n", "\n", " a = gammaln((nu + 1) / 2)\n", " b = gammaln(nu / 2)\n", " c = np.sqrt(np.pi * (nu - 2))\n", " d = (nu + 1) / 2\n", " e = (x - mu) ** 2\n", " f = sigma2 * (nu - 2)\n", "\n", " loglik = a - b - np.log(c) - np.log(sigma) - d * np.log(1 + e / f)\n", " return -(loglik.sum())\n", "\n", "\n", "print(f\"The log like at the population nu is {std_t_loglik(10, rvs)}\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:07.557379Z", "iopub.status.busy": "2021-09-22T10:07:07.557379Z", "iopub.status.idle": "2021-09-22T10:07:07.648380Z", "shell.execute_reply": "2021-09-22T10:07:07.648380Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The MLE is 12.651889299251675 and the optimized (negative) LLF is 1435.6002952027727\n" ] } ], "source": [ "from scipy.optimize import minimize\n", "\n", "starting_val = np.array([10])\n", "opt = minimize(\n", " std_t_loglik,\n", " starting_val,\n", " args=(rvs,),\n", " bounds=[(2.05, 100)],\n", " options={\"disp\": True},\n", ")\n", "print(f\"The MLE is {opt.x[0]} and the optimized (negative) LLF is {opt.fun}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 21\n", "\n", "Repeat the previous exercise using daily, weekly and monthly S&P 500 and Hang Seng data.\n", "Note that it is necessary to remove the mean and standardize by the standard deviation\n", "error before estimating the degree of freedom. What happens over longer horizons?" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:07.652380Z", "iopub.status.busy": "2021-09-22T10:07:07.652380Z", "iopub.status.idle": "2021-09-22T10:07:08.267380Z", "shell.execute_reply": "2021-09-22T10:07:08.267380Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The MLE is 3.3436193963160705 and the optimized (negative) LLF is 22731.853179025922\n" ] } ], "source": [ "import pandas as pd\n", "\n", "prices = pd.read_hdf(\"data/equity-indices.h5\", \"sp500\")\n", "rets = 100 * prices.Close.pct_change().dropna()\n", "\n", "std_rets = (rets - rets.mean()) / rets.std()\n", "starting_val = np.array([10])\n", "opt = minimize(\n", " std_t_loglik,\n", " starting_val,\n", " args=(std_rets,),\n", " bounds=[(2.05, 100)],\n", " options={\"disp\": True},\n", ")\n", "print(f\"The MLE is {opt.x[0]} and the optimized (negative) LLF is {opt.fun}\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:08.271380Z", "iopub.status.busy": "2021-09-22T10:07:08.271380Z", "iopub.status.idle": "2021-09-22T10:07:08.424257Z", "shell.execute_reply": "2021-09-22T10:07:08.424257Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For sp500, the MLE is 3.3436193963160705 and the optimized (negative) LLF is 22731.853179025922\n", "For weekly_sp500, the MLE is 4.8770758658148905 and the optimized (negative) LLF is 4948.088538563661\n", "For monthly_sp500, the MLE is 7.19581209620269 and the optimized (negative) LLF is 1169.4687637811749\n", "For hsi, the MLE is 3.3436193963160705 and the optimized (negative) LLF is 22731.853179025922\n", "For weekly_hsi, the MLE is 3.877025093756984 and the optimized (negative) LLF is 2270.538188606181\n", "For monthly_hsi, the MLE is 4.2998595733485825 and the optimized (negative) LLF is 531.4943348984161\n" ] } ], "source": [ "keys = [\"sp500\", \"weekly_sp500\", \"monthly_sp500\", \"hsi\", \"weekly_hsi\", \"monthly_hsi\"]\n", "for key in keys:\n", " prices = pd.read_hdf(\"data/equity-indices.h5\", key)\n", " rets = 100 * prices.Close.pct_change().dropna()\n", "\n", " std_rets = (rets - rets.mean()) / rets.std()\n", " starting_val = np.array([10])\n", " opt = minimize(\n", " std_t_loglik,\n", " starting_val,\n", " args=(std_rets,),\n", " bounds=[(2.05, 100)],\n", " options={\"disp\": True},\n", " )\n", " print(\n", " f\"For {key}, the MLE is {opt.x[0]} and the optimized (negative) LLF is {opt.fun}\"\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 22\n", "\n", "Repeat the previous problem by estimating the mean and variance simultaneously with\n", "the degree of freedom parameter." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:08.429252Z", "iopub.status.busy": "2021-09-22T10:07:08.428253Z", "iopub.status.idle": "2021-09-22T10:07:08.439273Z", "shell.execute_reply": "2021-09-22T10:07:08.439273Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [], "source": [ "def full_std_t_loglik(parameters, x):\n", " # These are fixed for now\n", " mu = parameters[0]\n", " sigma2 = parameters[1]\n", " nu = parameters[2]\n", " sigma = np.sqrt(sigma2)\n", "\n", " a = gammaln((nu + 1) / 2)\n", " b = gammaln(nu / 2)\n", " c = np.sqrt(np.pi * (nu - 2))\n", " d = (nu + 1) / 2\n", " e = (x - mu) ** 2\n", " f = sigma2 * (nu - 2)\n", "\n", " loglik = a - b - np.log(c) - np.log(sigma) - d * np.log(1 + e / f)\n", " return -(loglik.sum())" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:08.443273Z", "iopub.status.busy": "2021-09-22T10:07:08.442272Z", "iopub.status.idle": "2021-09-22T10:07:08.470141Z", "shell.execute_reply": "2021-09-22T10:07:08.470141Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "22751.09755414748" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prices = pd.read_hdf(\"data/equity-indices.h5\", \"sp500\")\n", "rets = 100 * prices.Close.pct_change().dropna()\n", "\n", "mean = rets.mean()\n", "var = rets.var()\n", "starting_val = np.array([mean, var, 10])\n", "full_std_t_loglik(starting_val, rets)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:08.474140Z", "iopub.status.busy": "2021-09-22T10:07:08.474140Z", "iopub.status.idle": "2021-09-22T10:07:08.549142Z", "shell.execute_reply": "2021-09-22T10:07:08.549142Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The MLE is [0.04885897 1.01908722 3.11732298] and the optimized (negative) LLF is 22034.20166113898\n" ] } ], "source": [ "bounds = [(-10 * np.abs(mean), 10 * np.abs(mean)), (var / 1000, 100 * var), (2.05, 100)]\n", "opt = minimize(\n", " full_std_t_loglik, starting_val, args=(rets,), bounds=bounds, options={\"disp\": True}\n", ")\n", "print(f\"The MLE is {opt.x} and the optimized (negative) LLF is {opt.fun}\")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:08.554143Z", "iopub.status.busy": "2021-09-22T10:07:08.554143Z", "iopub.status.idle": "2021-09-22T10:07:08.925141Z", "shell.execute_reply": "2021-09-22T10:07:08.925141Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For sp500, the MLE is [0.04885897 1.01908722 3.11732298] and the optimized (negative) LLF is 22034.20166113898\n", "For weekly_sp500, the MLE is [0.22792198 4.3108678 4.96478601] and the optimized (negative) LLF is 7646.416875194603\n", "For monthly_sp500, the MLE is [ 0.83033087 16.92290315 6.996814 ] and the optimized (negative) LLF is 2351.2108045919867\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "For hsi, the MLE is [0.04885897 1.01908722 3.11732298] and the optimized (negative) LLF is 22034.20166113898\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "For weekly_hsi, the MLE is [ 0.29490847 13.18079203 3.81003795] and the optimized (negative) LLF is 4459.737446190451\n", "For monthly_hsi, the MLE is [ 1.00200081 55.00076483 4.23246132] and the optimized (negative) LLF is 1316.7986085377956\n" ] } ], "source": [ "keys = [\"sp500\", \"weekly_sp500\", \"monthly_sp500\", \"hsi\", \"weekly_hsi\", \"monthly_hsi\"]\n", "for key in keys:\n", " prices = pd.read_hdf(\"data/equity-indices.h5\", key)\n", " rets = 100 * prices.Close.pct_change().dropna()\n", "\n", " mean = rets.mean()\n", " var = rets.var()\n", " starting_val = np.array([mean, var, 10])\n", "\n", " bounds = [\n", " (-10 * np.abs(mean), 10 * np.abs(mean)),\n", " (var / 1000, 100 * var),\n", " (2.05, 100),\n", " ]\n", " opt = minimize(\n", " full_std_t_loglik,\n", " starting_val,\n", " args=(rets,),\n", " bounds=bounds,\n", " options={\"disp\": True},\n", " )\n", " print(\n", " f\"For {key}, the MLE is {opt.x} and the optimized (negative) LLF is {opt.fun}\"\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 23\n", "\n", "Simulate a set of Bernoulli random variables $y_{i}$ where\n", "\n", "$$p_{i}=\\Phi\\left(x_{i}\\right)$$\n", "\n", "where $X_{i}\\sim N\\left(0,1\\right)$. (Note: $p_{i}$ is the probability of\n", "success and $\\Phi\\left(\\right)$ is the standard Normal CDF). Use this simulated data to\n", "estimate the Probit model where $p_{i}=\\Phi\\left(\\alpha_{0}+\\alpha_{1}x_{i}\\right)$ \n", "using maximum likelihood. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:08.928140Z", "iopub.status.busy": "2021-09-22T10:07:08.928140Z", "iopub.status.idle": "2021-09-22T10:07:09.049141Z", "shell.execute_reply": "2021-09-22T10:07:09.049141Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [], "source": [ "from scipy import stats\n", "\n", "nobs = 1000\n", "\n", "x = rs.standard_normal(size=nobs)\n", "p = stats.norm.cdf(x)\n", "y = 1.0 * rs.random_sample(size=nobs) < p" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:09.053141Z", "iopub.status.busy": "2021-09-22T10:07:09.053141Z", "iopub.status.idle": "2021-09-22T10:07:09.065141Z", "shell.execute_reply": "2021-09-22T10:07:09.065141Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [], "source": [ "def probit_llf(params, y, x):\n", " a0 = params[0]\n", " a1 = params[1]\n", " xhat = a0 + a1 * x\n", " phat = stats.norm.cdf(xhat)\n", "\n", " loglik = y * np.log(phat) + (1 - y) * np.log(1 - phat)\n", " return -(loglik.sum())" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:09.069141Z", "iopub.status.busy": "2021-09-22T10:07:09.068141Z", "iopub.status.idle": "2021-09-22T10:07:09.082168Z", "shell.execute_reply": "2021-09-22T10:07:09.081171Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "506.6527515404093" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "probit_llf([0, 1], y, x)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:09.085167Z", "iopub.status.busy": "2021-09-22T10:07:09.085167Z", "iopub.status.idle": "2021-09-22T10:07:09.097193Z", "shell.execute_reply": "2021-09-22T10:07:09.097193Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The MLE is [-0.04581815 0.99027212]\n" ] } ], "source": [ "starting_val = np.array([0.0, 1.0])\n", "opt = minimize(probit_llf, starting_val, args=(y, x))\n", "print(f\"The MLE is {opt.x}\")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Exercise 24\n", "\n", "Estimate the asymptotic covariance of the estimated parameters in the previous.\n", "\n", "The derivative of the log-likelihood for a single observation is\n", "\n", "$$ \\frac{\\partial \\left\\{y_i \\ln\\left(\\Phi\\left(\\alpha_0+\\alpha_1 x_i \\right)\\right) + \\left(1-y_i\\right) \\ln\\left(1-\\Phi\\left(\\alpha_0+\\alpha_1 x_i \\right)\\right)\\right\\}}{\\partial \\alpha_j} $$\n", "\n", "which is\n", "\n", "$$ y_i \\frac{\\phi\\left(\\alpha_0+\\alpha_1 x_i \\right)}{\\Phi\\left(\\alpha_0+\\alpha_1 x_i \\right)} - \\left(1-y_i\\right)\\frac{\\phi\\left(\\alpha_0+\\alpha_1 x_i \\right)}{1-\\Phi\\left(\\alpha_0+\\alpha_1 x_i \\right)} $$\n", "\n", "for $\\alpha_0$ and \n", "\n", "$$ y_i x_i \\frac{\\phi\\left(\\alpha_0+\\alpha_1 x_i \\right)}{\\Phi\\left(\\alpha_0+\\alpha_1 x_i \\right)} - \\left(1-y_i\\right)x_i\\frac{\\phi\\left(\\alpha_0+\\alpha_1 x_i \\right)}{1-\\Phi\\left(\\alpha_0+\\alpha_1 x_i \\right)} $$\n", "\n", "for $\\alpha_1$ where $\\Phi(\\cdot)$ is the cdf of a standard Normal random variable and\n", "$\\phi(\\cdot)$ is the pdf of a standard Normal random variable.\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:09.101192Z", "iopub.status.busy": "2021-09-22T10:07:09.101192Z", "iopub.status.idle": "2021-09-22T10:07:09.113197Z", "shell.execute_reply": "2021-09-22T10:07:09.113197Z" }, "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
a0a1
a00.260670-0.081185
a1-0.0811850.130076
\n", "
" ], "text/plain": [ " a0 a1\n", "a0 0.260670 -0.081185\n", "a1 -0.081185 0.130076" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a0 = opt.x[0]\n", "a1 = opt.x[1]\n", "\n", "fitted = a0 + a1 * x\n", "pdf = stats.norm.pdf(fitted)\n", "cdf = stats.norm.cdf(fitted)\n", "\n", "error = y * pdf / cdf + (1 - y) * pdf * (1 - cdf)\n", "score = np.array([error, error * x]).T\n", "\n", "nobs = y.shape[0]\n", "\n", "asymptotic_cov = score.T @ score / nobs\n", "\n", "param_names = [\"a0\", \"a1\"]\n", "asymptotic_cov = pd.DataFrame(asymptotic_cov, columns=param_names, index=param_names)\n", "asymptotic_cov" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2021-09-22T10:07:09.117191Z", "iopub.status.busy": "2021-09-22T10:07:09.117191Z", "iopub.status.idle": "2021-09-22T10:07:09.129089Z", "shell.execute_reply": "2021-09-22T10:07:09.129089Z" }, "pycharm": { "is_executing": false, "name": "#%% \n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "a0 -0.175771\n", "a1 7.613041\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t_stats = pd.Series(opt.x / np.diag(asymptotic_cov), index=param_names)\n", "t_stats" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 4 }