{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Introducing censoring\n", "\n", "![img](https://lifelines.readthedocs.io/en/latest/_images/survival_analysis_intro_censoring.png)\n", "\n", "\n", "![img2](https://lifelines.readthedocs.io/en/latest/_images/survival_analysis_intro_censoring_revealed.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All we know is that actual lifetime is greater than some threshold. Mathematically, we know $P(T \\ge t) = 1 - F(t) := S(t)$. We can use this in our log likelihood:\n", "\n", "No censoring cases:\n", "\n", "$$l(\\theta, t) = \\sum_{\\text{observed}} \\log{\\text{pdf}(\\theta, t)} $$\n", "\n", "With censoring cases:\n", "$$ \n", "\\begin{align}\n", "l(\\theta, t) & = \\sum_{\\text{observed}} \\log{\\text{pdf}(t, \\theta)} + \\sum_{\\text{censored}} \\log{\\text{S}(t, \\theta)} \\\\\n", "& = \\sum_{\\text{observed}} \\log{\\text{pdf}(t, \\theta)} \\frac{S(t)}{S(t)} + \\sum_{\\text{censored}} \\log{\\text{S}(t, \\theta)} \\\\\n", "& = \\sum_{\\text{observed}} (\\log{\\frac{\\text{pdf}(t, \\theta)}{S(t)}} + \\log{S(t)}) + \\sum_{\\text{censored}} \\log{\\text{S}(t, \\theta)} \\\\\n", "& = \\sum_{\\text{observed}} \\log{\\frac{\\text{pdf}(t, \\theta)}{S(t)}} + \\sum_{\\text{observed}} \\log{S(t)} + \\sum_{\\text{censored}} \\log{\\text{S}(t, \\theta)} \\\\\n", "& = \\sum_{\\text{observed}} \\log{\\frac{\\text{pdf}(t, \\theta)}{S(t)}} + \\sum \\log{S(t)} \n", "\\end{align}\n", "$$\n", "\n", "\n", "\n", "The $-\\log{S(t)}$ is known as the _cumulative hazard_, denoted $H(t)$. \n", "\n", "$$l(\\theta, t) = \\sum_{\\text{observed}} \\log{\\frac{\\text{pdf}(t, \\theta)}{S(t)}} - \\sum H(t, \\theta) $$\n", "\n", "Also, $\\frac{dH}{dt} = \\frac{\\text{pdf}(t, \\theta)}{S(t)}$. Denote that $h(t)$. \n", "\n", "$$l(\\theta, t) = \\sum_{\\text{observed}} \\log{h(t, \\theta}) - \\sum H(t, \\theta) $$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Phew! Now, instead of working in probability space, we will work in hazard space! Here's a link to all the relatioships: https://lifelines.readthedocs.io/en/latest/Survival%20Analysis%20intro.html#hazard-function \n", "\n", "\n", "## Take away: the likelihood function can be used to \"add\" information about the system (think about how penalizers are used...)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the hazard and cumulative hazard for Weibull are much simplier to implement 😗👌\n", "\n", "def cumulative_hazard(params, t):\n", " lambda_, rho_ = params\n", " return (t / lambda_) ** rho_\n", "\n", "def hazard(params, t):\n", " # diff of cumulative hazard w.r.t. t\n", " lambda_, rho_ = params\n", " return rho_ / lambda_ * (t / lambda_) ** (rho_ - 1)\n", "\n", "def log_hazard(params, t):\n", " return np.log(hazard(params, t))\n", "\n", "def log_likelihood(params, t, e):\n", " return np.sum(e * log_hazard(params, t)) - np.sum(cumulative_hazard(params, t))\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "T = (np.random.exponential(size=1000)/1.5) ** 2.3\n", "E = np.random.binomial(1, 0.95, size=1000)\n", "\n", "from scipy.optimize import minimize\n", "\n", "results = minimize(log_likelihood, \n", " x0 = np.array([1.0, 1.0]),\n", " method=None, \n", " args=(T, E),\n", " bounds=((0.00001, None), (0.00001, None)))\n", "\n", "print(results)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# why did this fail? Note the -inf." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def negative_log_likelihood(params, t, e):\n", " return -log_likelihood(params, t, e)\n", "\n", "results = minimize( # what goes here?, \n", " x0 = np.array([1.0, 1.0]),\n", " method=None, \n", " args=(T, E),\n", " bounds=((0.00001, None), (0.00001, None)))\n", "\n", "print(results)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = np.linspace(0, 10, 100)\n", "plt.plot(t, np.exp(-cumulative_hazard(results.x, t)))\n", "plt.ylim(0, 1)\n", "plt.title(\"Estimated survival function\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's move to part 3 - automatic differentiation! " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }