{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try generalizing our model now. For survival analysis, it is very useful to think about cumulative hazards - that is, modelling the cumulative hazard is typically easiest. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Note: we are shifting the data 4 units to the right.\n", "T = (np.random.exponential(size=1000)/1.5) ** 2.3 + 4\n", "E = np.random.binomial(1, 0.95, size=1000)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from autograd import numpy as np\n", "\n", "def cumulative_hazard(params, t):\n", " lambda_, rho_, mu = params\n", " return ((t - mu) / lambda_) ** rho_\n", "\n", "def log_hazard(params, t):\n", " lambda_, rho_, mu = params\n", " return ... # this could get arbitrarly complicated." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from autograd import elementwise_grad\n", "hazard = elementwise_grad(cumulative_hazard, argnum=1)\n", "\n", "def log_hazard(params, t):\n", " return np.log(hazard(params, t))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-0.69314718, 0. , 0.40546511])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "log_hazard(np.array([2., 2. , 0.]), np.array([1,2,3]))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " fun: 341.1505998865307\n", " hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>\n", " jac: array([ 8.74288685e-06, -7.51244260e-04, -4.29427489e+04])\n", " message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'\n", " nfev: 30\n", " nit: 20\n", " status: 0\n", " success: True\n", " x: array([0.49171791, 0.48311375, 3.99900001])\n" ] } ], "source": [ "# same as previous\n", "\n", "from autograd import value_and_grad\n", "from scipy.optimize import minimize\n", "\n", "def log_likelihood(params, t, e):\n", " return np.sum(e * log_hazard(params, t)) - np.sum(cumulative_hazard(params, t))\n", "\n", "def negative_log_likelihood(params, t, e):\n", " return -log_likelihood(params, t, e)\n", "\n", "results = minimize(\n", " value_and_grad(negative_log_likelihood), \n", " x0 = np.array([1.0, 1.0, 0.0]),\n", " method=None, \n", " args=(T, E),\n", " jac=True,\n", " bounds=((0.00001, None), (0.00001, None), (None, np.min(T)-0.001)))\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So long as you are good at \"parameter accounting\", then you could create arbitrarly complicated survival models simply by specifying the hazard. \n", "\n", "Extending to including covariates is straightforward too, with some modifications to the code. Here's a simple model of the cumulative hazard:\n", "\n", "$$H(t; x) = \\left(\\frac{t}{\\lambda(x)}\\right)^\\rho $$\n", "$$ \\lambda(x) = \\beta_1 x_1 + \\beta_2 x_2 $$" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from scipy.stats import weibull_min\n", "\n", "# create some regression data. \n", "N = 10000\n", "X = 0.1 * np.random.normal(size=(N, 2))\n", "T = np.exp(2 * X[:, 0] + -2 * X[:, 1]) * weibull_min.rvs(1, scale=1, loc=0, size=N)\n", "E = np.random.binomial(1, 0.99, size=N)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " fun: 10012.505418738723\n", " hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>\n", " jac: array([ 5.64816887e-06, -4.95439586e-05, 2.98904267e-03])\n", " message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'\n", " nfev: 10\n", " nit: 8\n", " status: 0\n", " success: True\n", " x: array([ 2.0380495 , -2.14856494, 0.99908182])\n" ] } ], "source": [ "def cumulative_hazard(params, t, X):\n", " log_lambda_coefs_, rho_ = params[:2], params[-1]\n", " lambda_ = np.exp(np.dot(X, log_lambda_coefs_))\n", " return (t / lambda_) ** rho_\n", "\n", "# these functions are almost identical to above, \n", "# expect they have an additional argument, X\n", "hazard = elementwise_grad(cumulative_hazard, argnum=1)\n", "\n", "def log_hazard(params, t, X):\n", " return np.log(hazard(params, t, X))\n", "\n", "def log_likelihood(params, t, e, X):\n", " return np.sum(e * log_hazard(params, t, X)) - np.sum(cumulative_hazard(params, t, X))\n", "\n", "def negative_log_likelihood(params, t, e, X):\n", " return -log_likelihood(params, t, e, X)\n", "\n", "results = minimize(\n", " value_and_grad(negative_log_likelihood), \n", " x0 = np.array([0.0, 0.0, 1.0]),\n", " method=None, \n", " args=(T, E, X),\n", " jac=True,\n", " bounds=((None, None), (None, None), (0.00001, None)))\n", "\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! Let's move onto part 5, how to estimate the _variances_ of the parameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }