{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Kaplan-Meier Estimation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Run this notebook on Colab](https://colab.research.google.com/github/AllenDowney/SurvivalAnalysisPython/blob/master/02_kaplan_meier.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook introduces Kaplan-Meier estimation, a way to estimate a hazard function when the dataset includes both complete and incomplete cases.\n",
"To demonstrate, I'll use a small set of hypothetical data. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dog adoption data\n",
"\n",
"Suppose you are investigating the time it takes for dogs to get adopted from a shelter. You visit a shelter every week for 10 weeks, and record the arrival time for each dog and the adoption time for each dog that was adopted.\n",
"\n",
"Here's what the data might look like. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
start
\n",
"
end
\n",
"
status
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
0
\n",
"
5
\n",
"
1
\n",
"
\n",
"
\n",
"
1
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
\n",
"
\n",
"
2
\n",
"
2
\n",
"
6
\n",
"
1
\n",
"
\n",
"
\n",
"
3
\n",
"
2
\n",
"
9
\n",
"
0
\n",
"
\n",
"
\n",
"
4
\n",
"
4
\n",
"
9
\n",
"
0
\n",
"
\n",
"
\n",
"
5
\n",
"
6
\n",
"
8
\n",
"
1
\n",
"
\n",
"
\n",
"
6
\n",
"
7
\n",
"
9
\n",
"
0
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" start end status\n",
"0 0 5 1\n",
"1 1 2 1\n",
"2 2 6 1\n",
"3 2 9 0\n",
"4 4 9 0\n",
"5 6 8 1\n",
"6 7 9 0"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"obs = pd.DataFrame()\n",
"\n",
"obs['start'] = 0,1,2,2,4,6,7\n",
"obs['end'] = 5,2,6,9,9,8,9\n",
"obs['status'] = 1,1,1,0,0,1,0\n",
"\n",
"obs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This `DataFrame` contains one row for each dog and three columns:\n",
"\n",
"* `start`: arrival time, in weeks since the beginning of the study\n",
"\n",
"* `end`: adoption date, for dogs that were adopted, or `9` for dogs that had not been adopted at the end of the study\n",
"\n",
"* `status`: `1` for dogs that were adopted; `0` for dogs that were not."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting lifelines\n",
"\n",
"The following function visualizes the data."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def plot_lifelines(obs):\n",
" \"\"\"Plot a line for each observation.\n",
" \n",
" obs: DataFrame\n",
" \"\"\"\n",
" for y, row in obs.iterrows():\n",
" start = row['start']\n",
" end = row['end']\n",
" status = row['status']\n",
" \n",
" if status == 0:\n",
" # ongoing\n",
" plt.hlines(y, start, end, color='C0')\n",
" else:\n",
" # complete\n",
" plt.hlines(y, start, end, color='C1')\n",
" plt.plot(end, y, marker='o', color='C1')\n",
" \n",
" plt.xlabel('Time (weeks)')\n",
" plt.ylabel('Dog index')\n",
" plt.gca().invert_yaxis()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are the results:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEGCAYAAABvtY4XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAATf0lEQVR4nO3df7BndX3f8efLZZWV8CMVdCyLrlrF2s0iejXSTRwBY6gEnURdtRgrmclqJwoYMxrtFBk66UwSEyKpOm4VNGrVLUojYhBqNGtay3AXcfmxkFCCcoOWS1sXTHdggXf/+J4b7m7u7n65e889937u8zFz5/s9n3Pu+bz3O7uv+ezne87npKqQJLXnCUMXIEnqhwEvSY0y4CWpUQa8JDXKgJekRh02dAGzHXvssbVu3bqhy5CkZWP79u33VdVxc+1bUgG/bt06Jicnhy5DkpaNJN/f3z6naCSpUQa8JDXKgJekRhnwktQoA16SGtVrwCc5I8ntSe5I8tt99iUtWTu2wsXr4cJjRq87tg5dkVaI3i6TTLIK+AjwC8AUcH2Sr1TVrX31KS05O7bClefCnt2j7V13j7YBNmwari6tCH1eB/9S4I6quhMgyReA1wL9BPxlZ/ZyWumQTF0Pjzy4d9ue3fCNiwx49a7PKZrjgbtnbU91bXtJsjnJZJLJ6enpHsuRBrBvuM/YNbW4dWhF6nMEnzna/sHTRapqC7AFYGJiYv5PHznnqnn/qtSbi9ePpmX2dfTaxa9FK06fI/gp4IRZ22uBe3rsT1p6Tr8AVq/Zu231mlG71LM+A/564LlJnpXkicCbgK/02J+09GzYBGddAkefAGT0etYlzr9rUfQ2RVNVDyd5J/B1YBVwaVXd0ld/0pK1YZOBrkH0uppkVX0N+FqffUiS5uadrJLUKANekhplwEtSowx4SWqUAS9JjTLgJalRBrwkNcqAl6RGGfCS1CgDXpIaZcBLUqMMeElqlAEvSY0y4CWpUQa8JDXKgJekRhnwktQoA16SGmXAS1KjDHhJapQBL0mNMuAlqVEGvCQ1yoCXpEb1FvBJLk1yb5Kb++pD+9ixFS5eDxceM3rdsXXoiiQNqM8R/KeAM3o8v2bbsRWuPBd23Q3U6PXKcw15aQU7rK8TV9W2JOv6Ov+SddmZw/Q7dT088uDebXt2wzcugg2bhqlJ0qAGn4NPsjnJZJLJ6enpoctZvvYN9xm7pha3DklLRm8j+HFV1RZgC8DExEQNXM6hO+eqYfq9eH03PbOPo9cufi2SloTBR/BaIKdfAKvX7N22es2oXdKKZMC3YsMmOOsSOPoEIKPXsy5x/l1awXqboknyeeAVwLFJpoAPVtUn++pPjMLcQJfU6fMqmjf3dW5J0sE5RSNJjTLgJalRBrwkNcqAl6RGGfCS1CgDXpIaZcBLUqMMeElqlAEvSY0y4CWpUQa8JDXKgJekRhnwktQoA16SGmXAS1KjDHhJapQBL0mNMuAlqVEGvCQ1yoCXpEYZ8JLUKANekhplwEtSowx4SWpUbwGf5IQk30yyM8ktSc7rqy9JY9ixFS5eDxceM3rdsXXoitSzw3o898PAe6rqhiRHAtuTXFtVt/bYp6S57NgKV54Le3aPtnfdPdoG2LBpuLrUq94Cvqp+CPywe/9Akp3A8YAB37fLzhy6Ai01U9fDIw/u3bZnN3zjIgO+YYsyB59kHXAycN0c+zYnmUwyOT09vRjlSCvPvuE+Y9fU4tahRdXnFA0ASX4K+BJwflXdv+/+qtoCbAGYmJiovutZEc65augKtNRcvH40LbOvo9cufi1aNL2O4JOsZhTun6uqL/fZl6QDOP0CWL1m77bVa0btalafV9EE+CSws6r+sK9+JI1hwyY46xI4+gQgo9ezLnH+vXF9TtFsBH4VuCnJjV3bB6rqaz32KWl/Nmwy0FeYPq+i+UsgfZ1fknRg3skqSY0y4CWpUQa8JDXqoAGf5KlztJ3YTzmSpIUyzgj+20n+/qv3JO8BruivJEnSQhjnKppXAFuSvAF4GrATeGmfRUmSDt1BR/DdomFXA6cA64A/qaqf9FyXJOkQHXQEn+RaRqtCrgfWApcm2VZVv9V3cZKk+RtnDv4jVfXWqvpxVd3MaCS/q+e6JEmHaJwpmv+S5OeSnNM1/TTw2X7LkiQdqnEuk/wg8D7g/V3TEzHgJWnJG2eK5peB1wB/B1BV9wBH9lmUJOnQjRPwD1VVAQWQ5Ih+S5IkLYRxAn5rko8DxyT5deC/Av+x37IkSYfqoJdJVtWHkvwCcD9wInBBVV3be2WSpEMy1nrwXaAb6pK0jOw34JM8QDfvPpeqOqqXiiRJC2K/AV9VRwIkuQj4EfAZRk9oOhuvopGkJW+cL1l/sao+WlUPVNX9VfUx4HV9FyZJOjTjBPwjSc5OsirJE5KcDTzSd2GSpEMzTsD/S2AT8L+6nzd0bZKkJWycyyTvAl7bfymSpIU0znLBxwG/zmgt+L8/vqp+rb+yJEmHapzr4P8U+DajO1ide5ekZWKcgH9yVb3v8Z44yeHANuBJXT+XV9UHH+95JEnzM86XrF9N8up5nPtB4LSqOgl4IXBGkpfN4zySpHkYZwR/HvCBJA8Cexjd7FQHu5O1W4Fy5tmtq7uf/d4Zq4Xzxo9/Z+gSJD0OX3z7Kb2cd5wnOh1ZVU+oqjVVdVS3PdYyBd218zcC9wLXVtV1cxyzOclkksnp6enH/QeQJM0to4H2HDuS51fVbUleNNf+qrph7E6SY4ArgHd1z3Wd08TERE1OTo57Wkla8ZJsr6qJufYdaIrmN4HNwB/Msa+A08YtoKp+nORbwBnAfgNekrRwDrTY2Obu9dT5nLi7fn5PF+5rgFcCvzuvKiVJj9tY68HP09OBTydZxWiuf2tVfbXH/iRJs/QW8FW1Azi5r/NLkg5snOvgJUnL0Dhr0cx1Fc0u4PtV9fDClyRJWgjjTNF8FHgRsIPRTU7ru/dPSfKOqrqmx/okSfM0zhTNXcDJVTVRVS9mNK9+M6OrYn6vx9okSYdgnIB/flXdMrNRVbcyCvw7+ytLknSoxpmiuT3Jx4AvdNtvBP4qyZMYrU0jSVqCxhnBvw24AzgfeDdwZ9e2B5jXTVCSpP6N88i+3Un+GLiG0RIFt1fVzMj9J/v/TUnSkMa5TPIVwKcZfdka4IQk/6qqtvVamSTpkIwzB/8HwKuq6naAJM8DPg+8uM/CJEmHZpw5+NUz4Q5QVX/F6OEdkqQlbJwR/GSSTwKf6bbPBrb3V5IkaSGME/D/GvgN4FxGc/DbGN3dKklawsa5iubBJJ8BPlNVPlNPkpaJ/c7BZ+TCJPcBtzG64Wk6yQWLV54kab4O9CXr+cBG4CVV9ZSq+kfAzwIbk7x7MYqTJM3fgQL+rcCbq+pvZhq69Wfe0u2TJC1hBwr41VV1376N3Ty8l0lK0hJ3oIB/aJ77JElLwIGuojkpyf1ztAc4vKd6JEkLZL8BX1WrFrMQSdLC8qHbktQoA16SGmXAS1Kjeg/4JKuSfDfJV/vuS5L0mMUYwZ8H7FyEfiRJs4yzmuS8JVkLnAn8DvCbffYl7c8bP/6doUuQDuiLbz+ll/P2PYL/I+C9wKP7OyDJ5iSTSSanp12sUpIWSm8j+CS/BNxbVdu757rOqaq2AFsAJiYmqq96tHL1NTqSlro+R/AbgdckuQv4AnBaks/22J8kaZbeAr6q3l9Va6tqHfAm4M+r6i199SdJ2pvXwUtSo3q9imZGVX0L+NZi9CVJGnEEL0mNMuAlqVEGvCQ1yoCXpEYZ8JLUKANekhplwEtSowx4SWqUAS9JjTLgJalRBrwkNcqAl6RGGfCS1CgDXpIaZcBLUqMMeElqlAEvSY0y4CWpUQa8JDXKgJekRhnwktQoA16SGmXAS1KjDHhJatRhfZ48yV3AA8AjwMNVNdFnf5I0lh1b4RsXwa4pOHotnH4BbNg0dFULrteA75xaVfctQj+SdHA7tsKV58Ke3aPtXXePtqG5kF+MgJe0lFx25tAVDGvqenjkwb3b9uwejegbC/i+5+ALuCbJ9iSb5zogyeYkk0kmp6eney5H0oq3b7jP2DW1uHUsgr5H8Bur6p4kTwWuTXJbVW2bfUBVbQG2AExMTFTP9Ug656qhKxjWxetH0zL7Onrt4tfSs15H8FV1T/d6L3AF8NI++5Okgzr9Ali9Zu+21WtG7Y3pLeCTHJHkyJn3wKuAm/vqT5LGsmETnHUJHH0CkNHrWZc0N/8O/U7RPA24IslMP/+pqq7usT9JGs+GTU0G+r56C/iquhM4qa/zS5IOzDtZJalRBrwkNcqAl6RGGfCS1CgDXpIaZcBLUqMMeElqlAEvSY0y4CWpUQa8JDXKgJekRhnwktQoA16SGmXAS1KjDHhJapQBL0mNMuAlqVEGvCQ1yoCXpEYZ8JLUKANekhplwEtSowx4SWqUAS9Jjeo14JMck+TyJLcl2ZnklD77kyQ95rCez/9h4Oqqen2SJwJP7rk/SVKnt4BPchTwcuBtAFX1EPBQX/1JWl7e+PHvDF3CkvHFt/czudHnFM2zgWngsiTfTfKJJEfse1CSzUkmk0xOT0/3WI4krSypqn5OnEwA/wPYWFXXJfkwcH9V/dv9/c7ExERNTk72Uo8ktSjJ9qqamGtfnyP4KWCqqq7rti8HXtRjf5KkWXoL+Kr6EXB3khO7ptOBW/vqT5K0t76vonkX8LnuCpo7gXN67k+S1Ok14KvqRmDOuSFJUr+8k1WSGmXAS1KjDHhJapQBL0mN6u1Gp/lIMg18f56/fixw3wKWs5z5WezNz2Nvfh6PaeGzeGZVHTfXjiUV8IciyeT+7uZaafws9ubnsTc/j8e0/lk4RSNJjTLgJalRLQX8lqELWEL8LPbm57E3P4/HNP1ZNDMHL0naW0sjeEnSLAa8JDVq2Qd8kjOS3J7kjiS/PXQ9Q0pyQpJvdg84vyXJeUPXNLQkq7onin116FqGluSYJJcnua37O9LPc+KWiSTv7v6d3Jzk80kOH7qmhbasAz7JKuAjwL8AXgC8OckLhq1qUA8D76mqfwq8DPiNFf55AJwH7By6iCXiw8DVVfV84CRW8OeS5HjgXGCiqtYDq4A3DVvVwlvWAQ+8FLijqu7sHur9BeC1A9c0mKr6YVXd0L1/gNE/4OOHrWo4SdYCZwKfGLqWoSU5Cng58EmAqnqoqn48aFHDOwxYk+Qw4MnAPQPXs+CWe8AfD9w9a3uKFRxosyVZB5wMXHeQQ1v2R8B7gUcHrmMpeDYwDVzWTVl9IskRQxc1lKr6W+BDwA+AHwK7quqaYataeMs94DNH24q/7jPJTwFfAs6vqvuHrmcISX4JuLeqtg9dyxJxGKNnIn+sqk4G/g5Ysd9ZJflpRv/bfxbwj4Ejkrxl2KoW3nIP+CnghFnba2nwv1mPR5LVjML9c1X15aHrGdBG4DVJ7mI0dXdaks8OW9KgpoCpqpr5H93ljAJ/pXol8DdVNV1Ve4AvA/984JoW3HIP+OuB5yZ5Vvfc1zcBXxm4psEkCaM51p1V9YdD1zOkqnp/Va2tqnWM/l78eVU1N0IbV1X9CLg7yYld0+nArQOWNLQfAC9L8uTu383pNPilc98P3e5VVT2c5J3A1xl9C35pVd0ycFlD2gj8KnBTkhu7tg9U1deGK0lLyLuAz3WDoTuBcwauZzBVdV2Sy4EbGF199l0aXLbApQokqVHLfYpGkrQfBrwkNcqAl6RGGfCS1CgDXpIaZcBryUvylCQ3dj8/SvK33fufJPloT32en+StPZ37bUn+w5jH/kyST/VRh9q3rK+D18pQVf8beCFAkguBn1TVh/rqr1t86tdYAnd6VtVNSdYmeUZV/WDoerS8OILXspXkFTPrvCe5MMmnk1yT5K4kv5Lk95LclOTqbgkHkrw4yV8k2Z7k60mePsepTwNu6G6ke2qS7d3vnpSkkjyj2/6f3Z2QxyX5UpLru5+N3f4jklzatX03yT9Y6TTJmUm+k+TYJG/o1ib/XpJtsw67kgaXslX/DHi15DmMlgd+LfBZ4JtV9TPAbuDMLuT/GHh9Vb0YuBT4nTnOsxHYDlBV9wKHd8vt/jwwCfx8kmcyWszs/zFaZ/3iqnoJ8DoeW5743zBaIuElwKnA789ewTHJLzNa8OvVVXUfcAHwi1V1EvCaWfVMdn1Lj4tTNGrJn1XVniQ3MVq64uqu/SZgHXAisB64drT8CKsYLRW7r6ez97ok/51R6L8c+PfAGYxWMv12t/+VwAu6cwIcleRI4FWMFjz7ra79cOAZ3ftTgQngVbNW/PxvwKeSbGW0+NWMexmteCg9Lga8WvIgQFU9mmRPPbYOx6OM/q4HuKWqDvaout2MwnjGtxmNoJ8J/CnwPkbLUs88BvAJwClVtXv2SbpFrF5XVbfv0/6zjNaCeTbwPEYjdKrqHd2+M4Ebk7yw+/7h8K4m6XFxikYrye3AcTPPIk2yOsk/m+O4ncA/mbW9DXgL8NdV9Sjwf4BXMxpxA1wDvHPm4CQv7N5+HXhXF/QkOXnWOb8P/ArwJzM1JHlOVV1XVRcA9/HYUtjPA26e159YK5oBrxWje6zj64HfTfI94EbmXgP8zxhNx8z83l3d25kvPv8S+HFV/d9u+1xgIsmOJLcC7+ja/x2wGtiR5OZue3Y9twNnA/85yXMYzdHf1B27Dfhed+ipwFXz+kNrRXM1SWkOSa4A3ltVfz1wHU8C/gL4uap6eMhatPwY8NIcugdjPK2qth304H7reC5wfFV9a8g6tDwZ8JLUKOfgJalRBrwkNcqAl6RGGfCS1CgDXpIa9f8BpGIEi1s0kOYAAAAASUVORK5CYII=\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot_lifelines(obs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each line represents the time a dog spends at the shelter. Each dot represents an adoption.\n",
"We can see, for example:\n",
"\n",
"* The dog with index 0 arrived during week 0, and was adopted during week 5.\n",
"\n",
"* The dog with index 3 arrived during week 2, and had not been adopted at the end of week 9.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Estimating survival\n",
"\n",
"Now suppose we want to know the distribution of \"survival time\" from arrival to adoption.\n",
"For the dogs that were adopted, we have all the data we need. \n",
"For the others, we have only partial information: if a dog hasn't been adopted yet, we don't know when it will be, but we can put a lower bound on it.\n",
"\n",
"When we have a mixture of complete and incomplete observations -- adopted and unadopted dogs -- we can't compute the Survival function directly.\n",
"Instead, we have to work backwards: we estimate the hazard function first, then use it to compute the survival function, CDF, and PMF.\n",
"\n",
"Specifically, we'll use Kaplan-Meier estimation, which is based on two key ideas.\n",
"\n",
"The first idea is that we can ignore the arrival time in the observed data, and consider only the durations. In effect, we can take the actual lifelines and shift them so they all start at 0, like this:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": [
"fill-in"
]
},
"outputs": [],
"source": [
"duration = obs['end'] - obs['start']"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": [
"fill-in"
]
},
"outputs": [],
"source": [
"shifted = obs.copy()\n",
"shifted['start'] = 0\n",
"shifted['end'] = duration"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"tags": [
"fill-in"
]
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEGCAYAAABvtY4XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVWElEQVR4nO3df7RdZX3n8feHECVSSGZEXYyhE239MZ1MBHvBoWktP6qlgxFn1UYZrCOdGqdLRWu7rDpr0GFW23E6SmWs1siPWsQfGSytKFWplQm6lOEGmfAj6GIoDhlQLmvGAF0RQvzOH3tfuYn33pzc3H3POZv3a62sc/dzznmeL+GuT57znL2fnapCktQ/hw27AElSNwx4SeopA16SesqAl6SeMuAlqacOH3YBMx1zzDG1Zs2aYZchSWNj27ZtD1TV02Z7bqQCfs2aNUxOTg67DEkaG0m+M9dzLtFIUk8Z8JLUUwa8JPWUAS9JPWXAS1JPdRrwSc5I8q0kdyZ5R5djSerI9i1w4Vp4z6rmcfuWYVekAXV2mmSSZcCfAC8BdgI3JvlsVd3e1ZiSFtn2LXD1ebBnd3O8657mGGDdxuHVpYF0eR78ScCdVXUXQJJPAWcB3QT8ZWd20q30hLbzRtj7yL5te3bDly8w4MdAl0s0zwTumXG8s23bR5JNSSaTTE5NTXVYjqSDtn+4T9u1c2nr0IJ0OYPPLG0/dneRqtoMbAaYmJhY+N1Hzv38gt8qaQ4Xrm2WZfa3cvXS16KD1uUMfidw3Izj1cC9HY4nabGdfj4sX7Fv2/IVTbtGXpcBfyPwnCTPSvIk4NXAZzscT9JiW7cRNlwEK48D0jxuuMj19zHR2RJNVT2W5E3AF4FlwKVVdVtX40nqyLqNBvqY6nQ3yaq6BrimyzEkSbPzSlZJ6ikDXpJ6yoCXpJ4y4CWppwx4SeopA16SesqAl6SeMuAlqacMeEnqKQNeknrKgJeknjLgJamnDHhJ6ikDXpJ6yoCXpJ4y4CWppwx4SeopA16SesqAl6SeMuAlqacMeEnqKQNeknrKgJeknjLgJamnOgv4JJcmuT/JrV2NMXa2b4EL18J7VjWP27cMuyJJPdblDP7PgDM67H+8bN8CV58Hu+4Bqnm8+jxDXlJnDu+q46rammRNV/3/mMvOXLKhFmTnjbD3kX3b9uyGL18A6zYOpyZJvTb0Nfgkm5JMJpmcmpoadjnd2T/cp+3aubR1SHrC6GwGP6iq2gxsBpiYmKgFd3Tu5xerpG5cuLZdntnPytVLX4ukJ4Shz+CfME4/H5av2Ldt+YqmXZI6YMAvlXUbYcNFsPI4IM3jhotcf5fUmc6WaJJ8EjgFOCbJTuDdVXVJV+ONhXUbDXRJS6bLs2jO7qpvSdKBuUQjST1lwEtSTxnwktRTBrwk9ZQBL0k9ZcBLUk8Z8JLUUwa8JPWUAS9JPWXAS1JPGfCS1FMGvCT1lAEvST1lwEtSTxnwktRTBrwk9ZQBL0k9ZcBLUk8Z8JLUUwa8JPWUAS9JPWXAS1JPGfCS1FMGvCT1VGcBn+S4JF9JsiPJbUne0tVY0tjZvgUuXAvvWdU8bt8y7IrUQ4d32PdjwO9U1U1JjgK2Jbm2qm7vcExp9G3fAlefB3t2N8e77mmOAdZtHF5d6p3OAr6q7gPua39+KMkO4JlANwF/2ZmddCstup03wt5H9m3bsxu+fIEBr0W1JGvwSdYAJwA3zPLcpiSTSSanpqaWohxpuPYP92m7di5tHeq9LpdoAEjyE8BngLdW1YP7P19Vm4HNABMTE7Xggc79/ILfKi2pC9c2yzL7W7l66WtRr3U6g0+ynCbcr6iqv+hyLGlsnH4+LF+xb9vyFU27tIi6PIsmwCXAjqp6f1fjSGNn3UbYcBGsPA5I87jhItfftei6XKJZD/w6cEuSm9u2d1XVNR2OKY2HdRsNdHWuy7Novgqkq/4lSfPzSlZJ6ikDXpJ6yoCXpJ46YMAnefosbc/rphxJ0mIZZAZ/fZIffd2f5HeAq7orSZK0GAY5i+YUYHOSXwOeAewATuqyKEnSoTvgDL7dNOwLwMnAGuDPq+rhjuuSJB2iA87gk1xLsyvkWmA1cGmSrVX1u10XJ0lauEHW4P+kql5bVd+vqltpZvK7Oq5LknSIBlmi+cskP5/k3LbpHwAf77YsSdKhGuQ0yXcDvwe8s216Ega8JI28QZZo/iXwcuDvAarqXuCoLouSJB26QQL+0aoqoACSHNltSZKkxTBIwG9J8hFgVZLXA38DfLTbsiRJh+qAp0lW1X9J8hLgQeB5wPlVdW3nlUmSDslA+8G3gW6oS9IYmTPgkzxEu+4+m6o6upOKJEmLYs6Ar6qjAJJcAHwXuJzmDk3n4Fk0kjTyBvmS9Zer6kNV9VBVPVhVHwZ+tevCJEmHZpCA35vknCTLkhyW5Bxgb9eFSZIOzSAB/6+AjcD32j+/1rZJkkbYIKdJ3g2c1X0pkqTFNMh2wU8DXk+zF/yPXl9Vv9FdWZKkQzXIefB/BVxPcwWra++SNCYGCfinVNXvHWzHSY4AtgJPbse5sqrefbD9SJIWZpAvWT+X5F8soO9HgNOq6gXA8cAZSf75AvqRJC3AIDP4twDvSvIIsIfmYqc60JWs7Q6U0/duXd7+mfPK2EP1qo98vauuJalTn37DyZ30O8gdnY6qqsOqakVVHd0eD7RNQXvu/M3A/cC1VXXDLK/ZlGQyyeTU1NRB/wdIkmaXZqI9yxPJ86vqjiQvnO35qrpp4EGSVcBVwJvb+7rOamJioiYnJwftVpKe8JJsq6qJ2Z6bb4nmbcAm4H2zPFfAaYMWUFXfT3IdcAYwZ8BLkhbPfJuNbWofT11Ix+3583vacF8B/BLw3gVVKUk6aAPtB79AxwIfS7KMZq1/S1V9rsPxJEkzdBbwVbUdOKGr/iVJ8xvkPHhJ0hgaZC+a2c6i2QV8p6oeW/ySJEmLYZAlmg8BLwS201zktLb9+alJ/m1VfanD+iRJCzTIEs3dwAlVNVFVP0uzrn4rzVkx/7nD2iRJh2CQgH9+Vd02fVBVt9ME/l3dlSVJOlSDLNF8K8mHgU+1x68Cvp3kyTR700iSRtAgM/jXAXcCbwV+G7irbdsDLOgiKElS9wa5Zd/uJP8V+BLNFgXfqqrpmfvDc79TkjRMg5wmeQrwMZovWwMcl+RfV9XWTiuTJB2SQdbg3we8tKq+BZDkucAngZ/tsjBJ0qEZZA1++XS4A1TVt2lu3iFJGmGDzOAnk1wCXN4enwNs664kSdJiGCTgfwt4I3AezRr8VpqrWyVJI2yQs2geSXI5cHlVeU89SRoTc67Bp/GeJA8Ad9Bc8DSV5PylK0+StFDzfcn6VmA9cGJVPbWq/iHwImB9kt9eiuIkSQs3X8C/Fji7qv5uuqHdf+Y17XOSpBE2X8Avr6oH9m9s1+E9TVKSRtx8Af/oAp+TJI2A+c6ieUGSB2dpD3BER/VIkhbJnAFfVcuWshBJ0uLyptuS1FMGvCT1lAEvST3VecAnWZbkm0k+1/VYkqTHLcUM/i3AjiUYR5I0wyC7SS5YktXAmcDvA2/rcqxXfeTrXXYvPeF9+g0nD7sEHaSuZ/B/DLwd+OFcL0iyKclkksmpKTerlKTF0tkMPsnLgPuralt7X9dZVdVmYDPAxMRELXQ8ZxeStK8uZ/DrgZcnuRv4FHBako93OJ4kaYbOAr6q3llVq6tqDfBq4G+r6jVdjSdJ2pfnwUtST3V6Fs20qroOuG4pxpIkNZzBS1JPGfCS1FMGvCT1lAEvST1lwEtSTxnwktRTBrwk9ZQBL0k9ZcBLUk8Z8JLUUwa8JPWUAS9JPWXAS1JPGfCS1FMGvCT1lAEvST1lwEtSTxnwktRTBrwk9ZQBL0k9ZcBLUk8Z8JLUUwa8JPWUAS9JPXV4l50nuRt4CNgLPFZVE12Op0W2fQt8+QLYtRNWrobTz4d1G4ddlaQBdRrwrVOr6oElGEeLafsWuPo82LO7Od51T3MMhrw0JpYi4JfGZWcOu4J+2Xkj7H1k37Y9u5sZvQEvjYWu1+AL+FKSbUk2zfaCJJuSTCaZnJqa6rgcDWz/cJ+2a+fS1iFpwbqewa+vqnuTPB24NskdVbV15guqajOwGWBiYqIWPNK5nz+kQrWfC9c2yzL7W7l66WuRtCCdzuCr6t728X7gKuCkLsfTIjr9fFi+Yt+25SuadkljobOAT3JkkqOmfwZeCtza1XhaZOs2woaLYOVxQJrHDRe5/i6NkS6XaJ4BXJVkepxPVNUXOhxPi23dRgNdGmOdBXxV3QW8oKv+JUnz80pWSeopA16SesqAl6SeMuAlqacMeEnqKQNeknrKgJeknjLgJamnDHhJ6ikDXpJ6yoCXpJ4y4CWppwx4SeopA16SesqAl6SeMuAlqacMeEnqKQNeknrKgJeknjLgJamnDHhJ6ikDXpJ6yoCXpJ4y4CWppzoN+CSrklyZ5I4kO5Kc3OV4kqTHHd5x/x8AvlBVr0zyJOApHY8nSWp1FvBJjgZeDLwOoKoeBR7tarxXfeTrXXX9hPfpN/jBSxpHXS7RPBuYAi5L8s0kFyc5cv8XJdmUZDLJ5NTUVIflSNITS6qqm46TCeAbwPqquiHJB4AHq+rfz/WeiYmJmpyc7KQeSeqjJNuqamK257qcwe8EdlbVDe3xlcALOxxPkjRDZwFfVd8F7knyvLbpdOD2rsaTJO2r67No3gxc0Z5BcxdwbsfjSZJanQZ8Vd0MzLo2JEnqlleySlJPGfCS1FMGvCT1lAEvST3V2YVOC5FkCvjOAt9+DPDAIpbTpXGqFcar3nGqFcar3nGqFcar3kOp9R9X1dNme2KkAv5QJJmc62quUTNOtcJ41TtOtcJ41TtOtcJ41dtVrS7RSFJPGfCS1FN9CvjNwy7gIIxTrTBe9Y5TrTBe9Y5TrTBe9XZSa2/W4CVJ++rTDF6SNIMBL0k9NfYBn+SMJN9KcmeSdwy7nvkkuTTJ/UluHXYtB5LkuCRfaW+WfluStwy7pvkkOSLJ/0jyP9t6/8OwazqQJMvau519bti1HEiSu5PckuTmJCN9V54kq5JcmeSO9vd3ZO85meR57d/p9J8Hk7x10fof5zX4JMuAbwMvobnByI3A2VU1kvvOJ3kx8DDw51W1dtj1zCfJscCxVXVTkqOAbcArRvjvNsCRVfVwkuXAV4G3VNU3hlzanJK8jWa31aOr6mXDrmc+Se4GJqpq5C8cSvIx4PqqurjdqvwpVfX9IZd1QG2e/R/gRVW10As+9zHuM/iTgDur6q72pt6fAs4ack1zqqqtwP8ddh2DqKr7quqm9ueHgB3AM4db1dyq8XB7uLz9M7KzlySrgTOBi4ddS58kORp4MXAJQFU9Og7h3jod+F+LFe4w/gH/TOCeGcc7GeEQGldJ1gAnADcc4KVD1S553AzcD1w743aRo+iPgbcDPxxyHYMq4EtJtiXZNOxi5vFsYAq4rF3+ujjJkcMuakCvBj65mB2Oe8BnlraRnbWNoyQ/AXwGeGtVPTjseuZTVXur6nhgNXBSkpFcBkvyMuD+qto27FoOwvqqeiHwK8Ab2+XGUXQ4zb2fP1xVJwB/D4z0d3MA7VLSy4H/tpj9jnvA7wSOm3G8Grh3SLX0TruW/Rngiqr6i2HXM6j2I/l1wBnDrWRO64GXt+vanwJOS/Lx4ZY0v6q6t328H7iKZnl0FO0Eds749HYlTeCPul8Bbqqq7y1mp+Me8DcCz0nyrPZfwFcDnx1yTb3Qfml5CbCjqt4/7HoOJMnTkqxqf14B/BJwx1CLmkNVvbOqVlfVGprf2b+tqtcMuaw5JTmy/aKddrnjpcBInglWVd8F7knyvLbpdGAkTwzYz9ks8vIMdH/T7U5V1WNJ3gR8EVgGXFpVtw25rDkl+SRwCnBMkp3Au6vqkuFWNaf1wK8Dt7Tr2gDvqqprhlfSvI4FPtaeiXAYsKWqRv70wzHxDOCq5t98Dgc+UVVfGG5J83ozcEU76bsLOHfI9cwryVNozgR8w6L3Pc6nSUqS5jbuSzSSpDkY8JLUUwa8JPWUAS9JPWXAS1JPGfAaqiR72130bmt3gnxbkkX7vUzyuiT/aMbxxUl+ZpH6fkWS8xejr1n6PmXQXSbbawBG+bRFDclYnwevXtjdbi9AkqcDnwBWAu8etIMky6pq7xxPv47mopzpKzF/81CK3c/baS4vH6qqmkpyX5L1VfW1Ydej0eEMXiOjvQx+E/CmNF6X5IPTzyf5XJJT2p8fTnJBkhuAk5Ocn+TGJLcm2dy+/5U02/Fe0X5KWJHkuiQTbR9nt3uc35rkvTPGeTjJ77efKL6R5Bn715rkucAjVfVAu8nZXe2Yq5L8cHqvliTXJ/np9mrQS9sav5nkrPb5ZUn+qG3fnuTHLnZJcmL7nmcn+cU8vnf4N6evMAX+EjhnEf43qEcMeI2UqrqL5vfy6Qd46ZHArVX1oqr6KvDBqjqx3Wd/BfCyqroSmATOqarjq2r39JvbZZv3AqcBxwMnJnnFjL6/UVUvALYCr59l/PXA9HbKe2nuS/AzwM/T7J3/C0meDKyuqjuBf0ezJcGJwKnAH7WX/f8bYFfbfiLw+iTPmlHnzwF/CpzV/t38LvDG9lPPLwDT/02T7bH0Iwa8RtFsu4Tuby/NRmjTTk1yQ5JbaEL7nx7g/ScC11XVVFU9BlxBs484wKPA9Pr3NmDNLO8/lmZb2mnXt+9/MfCHNEF/Is1+SdDs3/KOdtuH64AjgJ9s21/btt8APBV4TvuefwJsBjZU1f9u274GvD/JecCqtnZotkj+0XcNEhjwGjFJnk0T3vcDj7Hv7+gRM37+wfS6e5IjgA8Br6yqfwZ8dL/XzjrUPM/tqcf38NjL7N9V7d5vjOtpZtAnAdcAq2j2Hdo6Y7xfbT9JHF9VP1lVO9r2N89of1ZVfal9z33AD2j24gegqv4T8Js0n1K+keT57VNH8PhsXgIMeI2QJE+jWY74YBuwdwPHJzksyXHMvUXtdNA+kGb/+lfOeO4h4Kgffws3AL+Y5Jh2g7Kzgf9+EOXuAH56v/5+DvhhVf0AuJlm86jr2+e/CLy53aWTJCfMaP+tNFszk+S5efwGFd+nuevTH8z47uGnquqWqnovzbLMdMA/lxHd4VHD41k0GrYV7fLEcpoZ++XA9PbEXwP+DriFJrxumq2Dqvp+ko+2r7ubx5dFAP4M+NMku4GTZ7znviTvBL5CM4u+pqr+6iDq3gq8L0na2wU+kuQeYPoesNfT/KNxS3v8H2nu4rS9Dfm7gZfR3LJvDXBT2z4FvGJGnd9LsgH46yS/Abwmyak0nyxuB/66fempwOcPon49AbibpLRAST4AXF1VfzMCtWyl+SL2/w27Fo0Ol2ikhfsD4CnDLqJd2nq/4a79OYOXpJ5yBi9JPWXAS1JPGfCS1FMGvCT1lAEvST31/wGK9tuQ8I2UkgAAAABJRU5ErkJggg==\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot_lifelines(shifted)\n",
"plt.xlabel('Duration (weeks)');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that the x-axis in this figure is duration, not time.\n",
"\n",
"The second key idea is that we can estimate the hazard function by considering:\n",
"\n",
"* The number of dogs adopted at each duration, divided by\n",
"\n",
"* The number of dogs \"at risk\" at each duration, where \"at risk\" means that they *could* be adopted.\n",
"\n",
"For example:\n",
"\n",
"* At duration 1, there is 1 adoption out of 7 dogs at risk, so the hazard rate is `1/7`.\n",
"\n",
"* At duration 2, there is 1 adoption out of 6 dogs at risk, so the hazard rate is `1/6`.\n",
"\n",
"* At duration 4, there is 1 adoption out of 4 dogs at risk, so the hazard rate is `1/4`.\n",
"\n",
"And so on. Now let's see how that works computationally."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computing \"at risk\"\n",
"\n",
"For each observed duration, we would like to compute the number of dogs that were at risk.\n",
"Here are the unique durations, in order:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"tags": [
"fill-in"
]
},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 4, 5, 7])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ts = duration.unique()\n",
"ts.sort()\n",
"ts"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To compute the number of dogs at risk, we can loop through `ts` and count the number of dogs where `t` is less than or equal to `end`."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"scrolled": true,
"tags": [
"fill-in"
]
},
"outputs": [
{
"data": {
"text/plain": [
"1 7\n",
"2 6\n",
"4 4\n",
"5 3\n",
"7 1\n",
"dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"at_risk = pd.Series(0, index=ts)\n",
"\n",
"for t in ts:\n",
" k = (t <= shifted['end'])\n",
" at_risk[t] = k.sum()\n",
" \n",
"at_risk"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you don't like mixing for loops with array operations, we can do the same computation using mesh grids."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(5, 7)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"E, T = np.meshgrid(shifted['end'], ts)\n",
"T.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The results are arrays with one row for each value of `t` and one column for each dog.\n",
"Now we can use comparison operators to compare all values of `t` to all values of `end` at the same time."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([7, 6, 4, 3, 1])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"at_risk = (T <= E).sum(axis=1)\n",
"at_risk"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is an array with the number of dogs at risk for each value of `t`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Estimating the hazard function\n",
"\n",
"Now, to compute the hazard function, we need to know the number of dogs adopted at each value of `t`."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"tags": [
"fill-in"
]
},
"outputs": [
{
"data": {
"text/plain": [
"1 1\n",
"2 1\n",
"4 1\n",
"5 1\n",
"7 0\n",
"dtype: int64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adopted = pd.Series(0, index=ts)\n",
"\n",
"for t in ts:\n",
" k = (shifted['status'] == 1) & (t == shifted['end'])\n",
" adopted[t] = k.sum()\n",
" \n",
"adopted"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or here's the same computation with array operations:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 5., 1., 4., nan, nan, 2., nan])"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adopt_times = np.where(shifted['status'], shifted['end'], np.nan)\n",
"adopt_times"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(5, 7)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A, T = np.meshgrid(adopt_times, ts)\n",
"T.shape"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 1, 1, 1, 0])"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adopted = (T == A).sum(axis=1)\n",
"adopted"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the next step, it will be easier to see what we're doing if we put the results in a table."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"tags": [
"fill-in"
]
},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" adopted at_risk hazard\n",
"1 1 7 0.142857\n",
"2 1 6 0.166667\n",
"4 1 4 0.250000\n",
"5 1 3 0.333333\n",
"7 0 1 0.000000"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['hazard'] = df['adopted'] / df['at_risk']\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Working backwards\n",
"\n",
"Given the hazard function, we can work backwards to compute the survival curve.\n",
"\n",
"The hazard function is the probability of being adopted at each duration, so its complement is the probability of *not* being adopted. \n",
"\n",
"In order to survive past `t`, a dog has to *not* be adopted at all durations up to and including `t`.\n",
"\n",
"So the survival function is the cumulative product of the complement of the hazard function."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"tags": [
"fill-in"
]
},
"outputs": [
{
"data": {
"text/html": [
"