{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Survival Analysis in Python\n", "\n", "\n", "Allen B. Downey\n", "\n", "[MIT License](https://en.wikipedia.org/wiki/MIT_License)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "This notebook introduces Kaplan-Meier estimation, a way to estimate a hazard function when the dataset includes both complete and incomplete cases." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# If we're running in Colab, set up the environment\n", "\n", "import sys\n", "IN_COLAB = 'google.colab' in sys.modules\n", "\n", "if IN_COLAB:\n", " !pip install empiricaldist\n", " !pip install lifelines" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "from empiricaldist import Pmf, Cdf, Surv, Hazard" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def decorate(**options):\n", " \"\"\"Decorate the current axes.\n", " Call decorate with keyword arguments like\n", " decorate(title='Title',\n", " xlabel='x',\n", " ylabel='y')\n", " The keyword arguments can be any of the axis properties\n", " https://matplotlib.org/api/axes_api.html\n", " \"\"\"\n", " plt.gca().set(**options)\n", " plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hypothetical data\n", "\n", "To demonstrate the basics of survival analysis, I'll use a small set of hypothetical data. Suppose you are investigating the time it takes for dogs to get adopted from a shelter. You visit a shelter every week for 10 weeks, and record the arrival time for each dog and the adoption time for each dog that was adopted.\n", "\n", "Here's what the data might look like. It contains one row for each dog and three columns:\n", "\n", "* `start`: arrival time, in weeks since the beginning of the study\n", "\n", "* `end`: adoption date, for dogs that were adopted, or `9` for dogs that had not been adopted at the end of the study\n", "\n", "* `status`: `1` for dogs that were adopted; `0` for dogs that were not." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
startendstatus
0051
1121
2261
3290
4490
5681
6790
\n", "
" ], "text/plain": [ " start end status\n", "0 0 5 1\n", "1 1 2 1\n", "2 2 6 1\n", "3 2 9 0\n", "4 4 9 0\n", "5 6 8 1\n", "6 7 9 0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obs = pd.DataFrame()\n", "\n", "obs['start'] = 0,1,2,2,4,6,7\n", "obs['end'] = 5,2,6,9,9,8,9\n", "obs['status'] = 1,1,1,0,0,1,0\n", "\n", "obs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting lifelines\n", "\n", "The following function visualizes the data so we can get a better sense of what it looks like." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def plot_lifelines(obs):\n", " \"\"\"Plot a line for each observation.\n", " \n", " obs: DataFrame\n", " \"\"\"\n", " for y, row in obs.iterrows():\n", " start = row['start']\n", " end = row['end']\n", " status = row['status']\n", " \n", " if status == 0:\n", " # ongoing\n", " plt.hlines(y, start, end, color='C0')\n", " else:\n", " # complete\n", " plt.hlines(y, start, end, color='C1')\n", " plt.plot(end, y, marker='o', color='C1')\n", " \n", " decorate(xlabel='Time (weeks)',\n", " ylabel='Dog index')\n", "\n", " plt.gca().invert_yaxis()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the results:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAUIElEQVR4nO3df7BndX3f8edLWHRFWFIhnZTd6WqDGkIX0IuRkjgIxpISNBntogFtSMY1mQho0sZop0iZdmwTEwIZw7gRaQaJuoOYiDoCJbHEhtC9C3T5bSmi3ADlkk4WcLawwLt/fM+Vu5u7y3fv7rnfz/1+n4+ZO997Pt9zz+e9Z3b3dc/nnO/nk6pCkqTWvGTUBUiStBADSpLUJANKktQkA0qS1CQDSpLUpANHXcB8hx9+eK1du3bUZUiSltCWLVser6ojdm1vKqDWrl3L9PT0qMuQJC2hJN9dqN0hPklSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpN6DagkpyW5L8n9SX6rz76kibd1E1x8DFx42OB166ZRVyTtk94+B5XkAOBTwE8DM8DmJF+pqrv76lOaWFs3wbXnwY7tg+1tDw22AdatH11d0j7o84O6bwTur6oHAJJ8AXgH0G9AXXF6r4eXmjSzGZ57eue2HdvhxosMKC1bfQ7xHQk8NG97pmvbSZINSaaTTM/OzvZYjjTGdg2nOdtmlrYOaT/q8woqC7T9veV7q2ojsBFgampq35f3Pedr+3wIadm5+JjBsN6uVq1e+lqk/aTPK6gZYM287dXAwz32J02uUy+AFSt3bluxctAuLVN9BtRm4Kgkr0pyEPBu4Cs99idNrnXr4YxLYdUaIIPXMy71/pOWtd6G+Krq2SQfBK4DDgA+W1V39dWfNPHWrTeQNFZ6XW6jqr4OfL3PPiRJ48mZJCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU3qLaCSfDbJY0nu7KsP7aOtm+DiY+DCwwavWzeNuiJJ+oE+r6D+C3Baj8fXvti6Ca49D7Y9BNTg9drzDClJzTiwrwNX1U1J1vZ1/GXpitNHXcELZjbDc0/v3LZjO9x4EaxbP5qaJGmekd+DSrIhyXSS6dnZ2VGXMzl2Dac522aWtg5J2o3erqCGVVUbgY0AU1NTNeJy+nXO10ZdwQsuPqYb3tvFqtVLX4skLWDkV1AakVMvgBUrd25bsXLQLkkNMKAm1br1cMalsGoNkMHrGZd6/0lSM3ob4kvyeeBk4PAkM8DHq+ryvvrTIqxbbyBJalafT/G9p69jS5LGn0N8kqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCb1FlBJ1iT5iyT3JLkryfl99SWpcVs3wcXHwIWHDV63bhp1RVoGDuzx2M8Cv1FVtyY5BNiS5IaqurvHPiW1ZusmuPY82LF9sL3tocE2wLr1o6tLzestoKrqEeCR7vsnk9wDHAkYUK254vRRV6BxNrMZnnt657Yd2+HGiwwo7dGS3INKshY4Hrhlgfc2JJlOMj07O7sU5UhaSruG05xtM0tbh5adPof4AEjyCuBLwIeq6old36+qjcBGgKmpqeq7Hi3gnK+NugKNs4uPGQzr7WrV6qWvRctKr1dQSVYwCKerquqaPvuS1KhTL4AVK3duW7Fy0C7tQZ9P8QW4HLinqn6vr34kNW7dejjjUli1Bsjg9YxLvf+kF9XnEN9JwHuBO5Lc3rV9rKq+3mOfklq0br2BpL3W51N83wLS1/ElSePNmSQkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNetGASnL0Am0n91KNJEmdYa6gNiX5SAZWJvkD4BN9FyZJmmzDBNRPAGuAvwI2Aw8zmCVCkqTeDBNQO4DtwErgZcB3qur5XquSJE28YQJqM4OAOgH4SeA9Sa7utSpJ0sQbZi6+X66q6e77R4F3JHlvjzVJkvTiV1BVNZ3kJ5OcA5DkcOBbvVcmSZpowzxm/nHgI8BHu6aDgM/1WZQkScPcg/p54O3A9wGq6mHgkD6LkiRpmIB6pqoKKIAkB/dbkiRJw39Q99PAYUneD/xX4I/6LUuSNOle9Cm+qvpkkp8GngBeC1xQVTf0XpkkaaINteR7F0iGkiRpyew2oJI8SXffaSFVdWgvFUmSxB4CqqoOAUhyEYMP6F4JBDgLn+KTJPVsmIck/nlV/WFVPVlVT1TVZcA7+y5MkjTZhgmo55KcleSAJC9JchbwXN+FSZIm2zAB9QvAeuD/dF//smuTJKk3wzxm/iDwjv5LkSTpBS8aUEmOAN4PrJ2/f1X9Un9lSZIm3TCfg/oz4C8ZzCDhvSdJ0pIYJqBeXlUf2dsDJ3kZcBPw0q6fq6vq43t7HEnSZBrmIYmvJvkXizj208ApVXUscBxwWpI3LeI4kqQJNMwV1PnAx5I8Dexg8GHderGZJLoZ0J/qNld0X7udmUKjc+anbx51CZKWoS9+4MRejz/MirqHVNVLqmplVR3abQ81zVH32anbgceAG6rqlgX22ZBkOsn07Ozs3v8JJEljKYMLnQXeSF5XVfcmef1C71fVrUN3khwGfBk4t6ru3N1+U1NTNT09PexhJUljIMmWqpratX1PQ3y/DmwAfneB9wo4ZdjOq+rvknwTOA3YbUBJkjRnT5PFbuhe37KYA3efn9rRhdNK4K3Af15UlZKkiTPUelCL9CPAHyc5gMG9rk1V9dUe+5MkjZHeAqqqtgLH93V8SdJ4G+ZzUJIkLblh5uJb6Cm+bcB3q+rZ/V+SJEnDDfH9IfB6YCuDD+ke033/yiS/UlXX91ifJGlCDTPE9yBwfFVNVdUbGNxXupPBU3m/3WNtkqQJNkxAva6q7prbqKq7GQTWA/2VJUmadMMM8d2X5DLgC932mcC3k7yUwdx8kiTtd8NcQf0icD/wIeDDwANd2w5gUR/ilSTpxQyz5Pv2JH8AXM9giqP7qmruyump3f+kJEmLN8xj5icDf8zgYYkAa5L8q6q6qd/SJEmTbJh7UL8LvK2q7gNI8hrg88Ab+ixMkjTZhrkHtWIunACq6tsMFh+UJKk3w1xBTSe5HLiy2z4L2NJfSZIkDRdQvwr8GnAeg3tQNzGYXUKSpN4M8xTf00muBK6sKtdklyQtid3eg8rAhUkeB+5l8IHd2SQXLF15kqRJtaeHJD4EnAScUFWvrKp/APwEcFKSDy9JdZKkibWngHof8J6q+s5cQzf/3tnde5Ik9WZPAbWiqh7ftbG7D+Vj5pKkXu0poJ5Z5HuSJO2zPT3Fd2ySJxZoD/CynuqRJAnYQ0BV1QFLWYgkSfMNM9WRJElLzoCSJDXJgJIkNcmAkiQ1yYCSJDXJgJIkNan3gEpyQJLbkny1774kSeNjKa6gzgfuWYJ+JEljZJgFCxctyWrgdOA/Ar/eZ1/SUjvz0zePugRppL74gRN7PX7fV1C/D/wm8PzudkiyIcl0kunZWddDlCQN9HYFleRngceqakuSk3e3X1VtBDYCTE1NVV/1SPtb3789SpOuzyuok4C3J3kQ+AJwSpLP9difJGmM9BZQVfXRqlpdVWuBdwN/XlVn99WfJGm8+DkoSVKTen2Kb05VfRP45lL0JUkaD15BSZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkpp0YJ8HT/Ig8CTwHPBsVU312Z8kjZWtm+DGi2DbDKxaDadeAOvWj7qqJdNrQHXeUlWPL0E/kjQ+tm6Ca8+DHdsH29seGmzDxITUUgSUJA1ccfqoK1g+ZjbDc0/v3LZj++CKakICqu97UAVcn2RLkg0L7ZBkQ5LpJNOzs7M9lyNJy8Su4TRn28zS1jFCfV9BnVRVDyf5YeCGJPdW1U3zd6iqjcBGgKmpqeq5HkmjdM7XRl3B8nHxMYNhvV2tWr30tYxIr1dQVfVw9/oY8GXgjX32J0lj49QLYMXKndtWrBy0T4jeAirJwUkOmfseeBtwZ1/9SdJYWbcezrgUVq0BMng949KJuf8E/Q7x/UPgy0nm+vmTqvpGj/1J0nhZt36iAmlXvQVUVT0AHNvX8SVJ482ZJCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU3qNaCSHJbk6iT3JrknyYl99idJGh8H9nz8S4BvVNW7khwEvLzn/iRJY6K3gEpyKPBm4BcBquoZ4Jm++pOkfXHmp28edQnLzhc/0O+gWJ9DfK8GZoErktyW5DNJDt51pyQbkkwnmZ6dne2xHEnScpKq6ufAyRTw18BJVXVLkkuAJ6rq3+3uZ6ampmp6erqXeiRJbUqypaqmdm3v8wpqBpipqlu67auB1/fYnyRpjPQWUFX1KPBQktd2TacCd/fVnyRpvPT9FN+5wFXdE3wPAOf03J8kaUz0GlBVdTvw98YVJUl6Mc4kIUlqkgElSWqSASVJapIBJUlqkgElSWpSbzNJLEaSWeC7+3iYw4HH90M5k8Rztjiet73nOVuccT9v/7iqjti1samA2h+STC80ZYZ2z3O2OJ63vec5W5xJPW8O8UmSmmRASZKaNI4BtXHUBSxDnrPF8bztPc/Z4kzkeRu7e1CSpPEwjldQkqQxYEBJkpo0NgGV5LQk9yW5P8lvjbqe5SDJmiR/keSeJHclOX/UNS0XSQ5IcluSr466luUiyWFJrk5yb/d37sRR19S6JB/u/m3emeTzSV426pqW0lgEVJIDgE8BPwMcDbwnydGjrWpZeBb4jar6MeBNwK953oZ2PnDPqItYZi4BvlFVrwOOxfO3R0mOBM4DpqrqGOAA4N2jrWppjUVAAW8E7q+qB6rqGeALwDtGXFPzquqRqrq1+/5JBv9hHDnaqtqXZDVwOvCZUdeyXCQ5FHgzcDlAVT1TVX832qqWhQOBlUkOBF4OPDziepbUuATUkcBD87Zn8D/avZJkLXA8cMtoK1kWfh/4TeD5UReyjLwamAWu6IZGP5Pk4FEX1bKq+hvgk8D3gEeAbVV1/WirWlrjElBZoM3n54eU5BXAl4APVdUTo66nZUl+FnisqraMupZl5kDg9cBlVXU88H3Ae8V7kOSHGIwEvQr4R8DBSc4ebVVLa1wCagZYM297NRN2KbxYSVYwCKerquqaUdezDJwEvD3JgwyGkk9J8rnRlrQszAAzVTV3hX41g8DS7r0V+E5VzVbVDuAa4J+NuKYlNS4BtRk4KsmrkhzE4EbiV0ZcU/OShME9gXuq6vdGXc9yUFUfrarVVbWWwd+zP6+qifqtdjGq6lHgoSSv7ZpOBe4eYUnLwfeANyV5efdv9VQm7MGSA0ddwP5QVc8m+SBwHYMnXT5bVXeNuKzl4CTgvcAdSW7v2j5WVV8fYU0aX+cCV3W/RD4AnDPieppWVbckuRq4lcETt7cxYVMeOdWRJKlJ4zLEJ0kaMwaUJKlJBpQkqUkGlCSpSQaUJKlJBpQmXpJXJrm9+3o0yd/M2/6rnvo8Pkkvc/klWZvkziH3PSjJTd1cb1JT/EupiVdVfwscB5DkQuCpqvpkz91+DPgPPffxoqrqmSQ3AmcCV426Hmk+r6CkPUjyVPd6cpL/lmRTkm8n+U9JzkryP5LckeSfdPsdkeRLSTZ3XyctcMxDgHVV9T+77Tu6tZKS5G+TvK9rvzLJW7u1p36nO97WJB+Yd6x/M6/93y/Q16u7yVlPSPLjXb23d/sf1e32p8BZ+/3kSfvIgJKGdyyDdaD+KYMZOF5TVW9ksOzGud0+lwAXV9UJwDtZeEmOKWD+ENx/ZzCrx48zmGHhp7r2NwF/Dfwyg5msTwBOAN7fTev1NuAoBsvNHAe8Icmb5w7aTSv0JeCcqtoM/ApwSVUd19Uw0+16Z3dcqSkO8UnD21xVjwAk+d/A3NIHdwBv6b5/K3D0YOo0AA5Ncki33tacH2Gw9MScv2SwVtJ3gcuADd1idf+3qp7qgmhdknd1+69iEExv675u69pf0bV/DzgC+DPgnfOm/boZ+LfdelbXVNX/Aqiq55I8s0Cd0kh5BSUN7+l53z8/b/t5Xvhl7yXAiVV1XPd15AL/6W8H5i/dfRODq6afAr7JILzexSC4YLCczLnzjvmqbl2gAJ+Y1/6jVXV59zPbGKyR9oMhxqr6E+DtXf/XJTllXg0vBf7f3pwMqW8GlLR/XQ98cG4jyXEL7HMP8KNzG1X1EHA4cFRVPQB8C/jXvBBQ1wG/2i2NQpLXdIv9XQf8UreeF0mOTPLD3c88A/wc8L4kv9C9/2rggaq6lMFs/+u69lcCc0s6SM1wiE/av84DPpVkK4N/XzcxuPfzA1V1b5JVuwyp3cJgJn4YBNMnGAQVDO5jrQVu7ZZdmAV+rqquT/JjwM3dkOJTwNnAc10/3+8WWLwhyfeBo4Gzk+wAHgUu6o7/FsAZ7NUcZzOXRiDJh4Enq6qXz0LtZS3XAB+tqvtGXYs0n0N80mhcxs73tEaiW5vpTw0ntcgrKElSk7yCkiQ1yYCSJDXJgJIkNcmAkiQ1yYCSJDXp/wOqDxCEPg1DKQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_lifelines(obs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see, for example:\n", "\n", "* The dog with index 0 arrived during week 0, and was adopted during week 5.\n", "\n", "* The dog with index 3 arrived during week 2, and had not been adopted at the end of week 9.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimating survival\n", "\n", "Now suppose we want to know the distribution of \"survival time\" from arrival to adoption.\n", "\n", "For the dogs that were adopted, we have all the data we need. \n", "\n", "For the others, we have only partial information: if a dog hasn't been adopted yet, we don't know when it will be, but we can put a lower bound on it.\n", "\n", "It might be tempting to ignore the dogs that haven't been adopted and use only the observations that are \"complete\". \n", "\n", "That will turn out to be a bad idea, but I'll do it anyway because it will lead us to a better idea.\n", "\n", "First, I'll compute durations for each dog:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 5\n", "1 1\n", "2 4\n", "3 7\n", "4 5\n", "5 2\n", "6 2\n", "dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "duration = obs['end'] - obs['start']\n", "duration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I'll extract the durations for the dogs that were adopted, with `status==1`:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 5\n", "1 1\n", "2 4\n", "5 2\n", "dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "complete = duration[obs['status']==1]\n", "complete" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this data, we can use `value_counts` to compute an unnormalized PMF:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 1\n", "2 1\n", "4 1\n", "5 1\n", "dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pmf = pd.value_counts(complete).sort_index()\n", "pmf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we can compute the cumulative sum of the PMF, which is the CDF." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 1\n", "2 2\n", "4 3\n", "5 4\n", "dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdf = pmf.cumsum()\n", "cdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This means that 1 dog was adopted in 1 week or less; two dogs were adopted in 2 weeks or less, and so on.\n", "\n", "The survival function is the complement of the CDF, so we subtract from the total number of observations:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 3\n", "2 2\n", "4 1\n", "5 0\n", "dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "surv = 4 - cdf\n", "surv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This means that 3 dogs took more than 1 week to be adopted, 2 dogs took more than 2 weeks, and so on.\n", "\n", "Now, to compute the hazard rate, we compute the ratio of\n", "\n", "1) The number of dogs adopted at each time value, which is the PMF.\n", "\n", "2) The number of dogs that were available at each time value, which is the sum of the PMF and the survival function.\n", "\n", "So we compute the hazard function like this:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 0.250000\n", "2 0.333333\n", "4 0.500000\n", "5 1.000000\n", "dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "haz = pmf / (pmf + surv)\n", "haz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hazard function indicates that \n", "\n", "1) Of the dogs available at 1 week, `1/4` were adopted.\n", "\n", "2) Of the dogs available at 2 weeks, `1/3` were adopted.\n", "\n", "3) Of the dogs available at 4 weeks, `1/2` were adopted.\n", "\n", "3) Of the dogs available at 5 weeks, all were adopted.\n", "\n", "In summary, when we have an unbiased sample of complete durations, we can compute the hazard function by computing the PMF, CDF, and survival function.\n", "\n", "However, in this example, we don't have an unbiased sample of adoption times, because we ignored the dogs that were not adopted during the observation period. In the next section, we'll see how to solve this problem." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kaplan-Meier estimation\n", "\n", "When we have a mixture of complete and incomplete observations -- adopted and unadopted dogs -- we can't compute the PMF directly, as in the previous example.\n", "\n", "Instead, we have to work backwards: we estimate the hazard function first, then use it to compute the survival function, CDF, and PMF.\n", "\n", "Specifically, we'll use Kaplan-Meier estimation, which is based on two key ideas.\n", "\n", "The first idea is that we can ignore the arrival time in the observed data, and consider only the durations. In effect, we can take the actual lifelines, which look like this:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAUIElEQVR4nO3df7BndX3f8edLWHRFWFIhnZTd6WqDGkIX0IuRkjgIxpISNBntogFtSMY1mQho0sZop0iZdmwTEwIZw7gRaQaJuoOYiDoCJbHEhtC9C3T5bSmi3ADlkk4WcLawwLt/fM+Vu5u7y3fv7rnfz/1+n4+ZO997Pt9zz+e9Z3b3dc/nnO/nk6pCkqTWvGTUBUiStBADSpLUJANKktQkA0qS1CQDSpLUpANHXcB8hx9+eK1du3bUZUiSltCWLVser6ojdm1vKqDWrl3L9PT0qMuQJC2hJN9dqN0hPklSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpN6DagkpyW5L8n9SX6rz76kibd1E1x8DFx42OB166ZRVyTtk94+B5XkAOBTwE8DM8DmJF+pqrv76lOaWFs3wbXnwY7tg+1tDw22AdatH11d0j7o84O6bwTur6oHAJJ8AXgH0G9AXXF6r4eXmjSzGZ57eue2HdvhxosMKC1bfQ7xHQk8NG97pmvbSZINSaaTTM/OzvZYjjTGdg2nOdtmlrYOaT/q8woqC7T9veV7q2ojsBFgampq35f3Pedr+3wIadm5+JjBsN6uVq1e+lqk/aTPK6gZYM287dXAwz32J02uUy+AFSt3bluxctAuLVN9BtRm4Kgkr0pyEPBu4Cs99idNrnXr4YxLYdUaIIPXMy71/pOWtd6G+Krq2SQfBK4DDgA+W1V39dWfNPHWrTeQNFZ6XW6jqr4OfL3PPiRJ48mZJCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU3qLaCSfDbJY0nu7KsP7aOtm+DiY+DCwwavWzeNuiJJ+oE+r6D+C3Baj8fXvti6Ca49D7Y9BNTg9drzDClJzTiwrwNX1U1J1vZ1/GXpitNHXcELZjbDc0/v3LZjO9x4EaxbP5qaJGmekd+DSrIhyXSS6dnZ2VGXMzl2Dac522aWtg5J2o3erqCGVVUbgY0AU1NTNeJy+nXO10ZdwQsuPqYb3tvFqtVLX4skLWDkV1AakVMvgBUrd25bsXLQLkkNMKAm1br1cMalsGoNkMHrGZd6/0lSM3ob4kvyeeBk4PAkM8DHq+ryvvrTIqxbbyBJalafT/G9p69jS5LGn0N8kqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCb1FlBJ1iT5iyT3JLkryfl99SWpcVs3wcXHwIWHDV63bhp1RVoGDuzx2M8Cv1FVtyY5BNiS5IaqurvHPiW1ZusmuPY82LF9sL3tocE2wLr1o6tLzestoKrqEeCR7vsnk9wDHAkYUK254vRRV6BxNrMZnnt657Yd2+HGiwwo7dGS3INKshY4Hrhlgfc2JJlOMj07O7sU5UhaSruG05xtM0tbh5adPof4AEjyCuBLwIeq6old36+qjcBGgKmpqeq7Hi3gnK+NugKNs4uPGQzr7WrV6qWvRctKr1dQSVYwCKerquqaPvuS1KhTL4AVK3duW7Fy0C7tQZ9P8QW4HLinqn6vr34kNW7dejjjUli1Bsjg9YxLvf+kF9XnEN9JwHuBO5Lc3rV9rKq+3mOfklq0br2BpL3W51N83wLS1/ElSePNmSQkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNetGASnL0Am0n91KNJEmdYa6gNiX5SAZWJvkD4BN9FyZJmmzDBNRPAGuAvwI2Aw8zmCVCkqTeDBNQO4DtwErgZcB3qur5XquSJE28YQJqM4OAOgH4SeA9Sa7utSpJ0sQbZi6+X66q6e77R4F3JHlvjzVJkvTiV1BVNZ3kJ5OcA5DkcOBbvVcmSZpowzxm/nHgI8BHu6aDgM/1WZQkScPcg/p54O3A9wGq6mHgkD6LkiRpmIB6pqoKKIAkB/dbkiRJw39Q99PAYUneD/xX4I/6LUuSNOle9Cm+qvpkkp8GngBeC1xQVTf0XpkkaaINteR7F0iGkiRpyew2oJI8SXffaSFVdWgvFUmSxB4CqqoOAUhyEYMP6F4JBDgLn+KTJPVsmIck/nlV/WFVPVlVT1TVZcA7+y5MkjTZhgmo55KcleSAJC9JchbwXN+FSZIm2zAB9QvAeuD/dF//smuTJKk3wzxm/iDwjv5LkSTpBS8aUEmOAN4PrJ2/f1X9Un9lSZIm3TCfg/oz4C8ZzCDhvSdJ0pIYJqBeXlUf2dsDJ3kZcBPw0q6fq6vq43t7HEnSZBrmIYmvJvkXizj208ApVXUscBxwWpI3LeI4kqQJNMwV1PnAx5I8Dexg8GHderGZJLoZ0J/qNld0X7udmUKjc+anbx51CZKWoS9+4MRejz/MirqHVNVLqmplVR3abQ81zVH32anbgceAG6rqlgX22ZBkOsn07Ozs3v8JJEljKYMLnQXeSF5XVfcmef1C71fVrUN3khwGfBk4t6ru3N1+U1NTNT09PexhJUljIMmWqpratX1PQ3y/DmwAfneB9wo4ZdjOq+rvknwTOA3YbUBJkjRnT5PFbuhe37KYA3efn9rRhdNK4K3Af15UlZKkiTPUelCL9CPAHyc5gMG9rk1V9dUe+5MkjZHeAqqqtgLH93V8SdJ4G+ZzUJIkLblh5uJb6Cm+bcB3q+rZ/V+SJEnDDfH9IfB6YCuDD+ke033/yiS/UlXX91ifJGlCDTPE9yBwfFVNVdUbGNxXupPBU3m/3WNtkqQJNkxAva6q7prbqKq7GQTWA/2VJUmadMMM8d2X5DLgC932mcC3k7yUwdx8kiTtd8NcQf0icD/wIeDDwANd2w5gUR/ilSTpxQyz5Pv2JH8AXM9giqP7qmruyump3f+kJEmLN8xj5icDf8zgYYkAa5L8q6q6qd/SJEmTbJh7UL8LvK2q7gNI8hrg88Ab+ixMkjTZhrkHtWIunACq6tsMFh+UJKk3w1xBTSe5HLiy2z4L2NJfSZIkDRdQvwr8GnAeg3tQNzGYXUKSpN4M8xTf00muBK6sKtdklyQtid3eg8rAhUkeB+5l8IHd2SQXLF15kqRJtaeHJD4EnAScUFWvrKp/APwEcFKSDy9JdZKkibWngHof8J6q+s5cQzf/3tnde5Ik9WZPAbWiqh7ftbG7D+Vj5pKkXu0poJ5Z5HuSJO2zPT3Fd2ySJxZoD/CynuqRJAnYQ0BV1QFLWYgkSfMNM9WRJElLzoCSJDXJgJIkNcmAkiQ1yYCSJDXJgJIkNan3gEpyQJLbkny1774kSeNjKa6gzgfuWYJ+JEljZJgFCxctyWrgdOA/Ar/eZ1/SUjvz0zePugRppL74gRN7PX7fV1C/D/wm8PzudkiyIcl0kunZWddDlCQN9HYFleRngceqakuSk3e3X1VtBDYCTE1NVV/1SPtb3789SpOuzyuok4C3J3kQ+AJwSpLP9difJGmM9BZQVfXRqlpdVWuBdwN/XlVn99WfJGm8+DkoSVKTen2Kb05VfRP45lL0JUkaD15BSZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkpp0YJ8HT/Ig8CTwHPBsVU312Z8kjZWtm+DGi2DbDKxaDadeAOvWj7qqJdNrQHXeUlWPL0E/kjQ+tm6Ca8+DHdsH29seGmzDxITUUgSUJA1ccfqoK1g+ZjbDc0/v3LZj++CKakICqu97UAVcn2RLkg0L7ZBkQ5LpJNOzs7M9lyNJy8Su4TRn28zS1jFCfV9BnVRVDyf5YeCGJPdW1U3zd6iqjcBGgKmpqeq5HkmjdM7XRl3B8nHxMYNhvV2tWr30tYxIr1dQVfVw9/oY8GXgjX32J0lj49QLYMXKndtWrBy0T4jeAirJwUkOmfseeBtwZ1/9SdJYWbcezrgUVq0BMng949KJuf8E/Q7x/UPgy0nm+vmTqvpGj/1J0nhZt36iAmlXvQVUVT0AHNvX8SVJ482ZJCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU3qNaCSHJbk6iT3JrknyYl99idJGh8H9nz8S4BvVNW7khwEvLzn/iRJY6K3gEpyKPBm4BcBquoZ4Jm++pOkfXHmp28edQnLzhc/0O+gWJ9DfK8GZoErktyW5DNJDt51pyQbkkwnmZ6dne2xHEnScpKq6ufAyRTw18BJVXVLkkuAJ6rq3+3uZ6ampmp6erqXeiRJbUqypaqmdm3v8wpqBpipqlu67auB1/fYnyRpjPQWUFX1KPBQktd2TacCd/fVnyRpvPT9FN+5wFXdE3wPAOf03J8kaUz0GlBVdTvw98YVJUl6Mc4kIUlqkgElSWqSASVJapIBJUlqkgElSWpSbzNJLEaSWeC7+3iYw4HH90M5k8Rztjiet73nOVuccT9v/7iqjti1samA2h+STC80ZYZ2z3O2OJ63vec5W5xJPW8O8UmSmmRASZKaNI4BtXHUBSxDnrPF8bztPc/Z4kzkeRu7e1CSpPEwjldQkqQxYEBJkpo0NgGV5LQk9yW5P8lvjbqe5SDJmiR/keSeJHclOX/UNS0XSQ5IcluSr466luUiyWFJrk5yb/d37sRR19S6JB/u/m3emeTzSV426pqW0lgEVJIDgE8BPwMcDbwnydGjrWpZeBb4jar6MeBNwK953oZ2PnDPqItYZi4BvlFVrwOOxfO3R0mOBM4DpqrqGOAA4N2jrWppjUVAAW8E7q+qB6rqGeALwDtGXFPzquqRqrq1+/5JBv9hHDnaqtqXZDVwOvCZUdeyXCQ5FHgzcDlAVT1TVX832qqWhQOBlUkOBF4OPDziepbUuATUkcBD87Zn8D/avZJkLXA8cMtoK1kWfh/4TeD5UReyjLwamAWu6IZGP5Pk4FEX1bKq+hvgk8D3gEeAbVV1/WirWlrjElBZoM3n54eU5BXAl4APVdUTo66nZUl+FnisqraMupZl5kDg9cBlVXU88H3Ae8V7kOSHGIwEvQr4R8DBSc4ebVVLa1wCagZYM297NRN2KbxYSVYwCKerquqaUdezDJwEvD3JgwyGkk9J8rnRlrQszAAzVTV3hX41g8DS7r0V+E5VzVbVDuAa4J+NuKYlNS4BtRk4KsmrkhzE4EbiV0ZcU/OShME9gXuq6vdGXc9yUFUfrarVVbWWwd+zP6+qifqtdjGq6lHgoSSv7ZpOBe4eYUnLwfeANyV5efdv9VQm7MGSA0ddwP5QVc8m+SBwHYMnXT5bVXeNuKzl4CTgvcAdSW7v2j5WVV8fYU0aX+cCV3W/RD4AnDPieppWVbckuRq4lcETt7cxYVMeOdWRJKlJ4zLEJ0kaMwaUJKlJBpQkqUkGlCSpSQaUJKlJBpQmXpJXJrm9+3o0yd/M2/6rnvo8Pkkvc/klWZvkziH3PSjJTd1cb1JT/EupiVdVfwscB5DkQuCpqvpkz91+DPgPPffxoqrqmSQ3AmcCV426Hmk+r6CkPUjyVPd6cpL/lmRTkm8n+U9JzkryP5LckeSfdPsdkeRLSTZ3XyctcMxDgHVV9T+77Tu6tZKS5G+TvK9rvzLJW7u1p36nO97WJB+Yd6x/M6/93y/Q16u7yVlPSPLjXb23d/sf1e32p8BZ+/3kSfvIgJKGdyyDdaD+KYMZOF5TVW9ksOzGud0+lwAXV9UJwDtZeEmOKWD+ENx/ZzCrx48zmGHhp7r2NwF/Dfwyg5msTwBOAN7fTev1NuAoBsvNHAe8Icmb5w7aTSv0JeCcqtoM/ApwSVUd19Uw0+16Z3dcqSkO8UnD21xVjwAk+d/A3NIHdwBv6b5/K3D0YOo0AA5Ncki33tacH2Gw9MScv2SwVtJ3gcuADd1idf+3qp7qgmhdknd1+69iEExv675u69pf0bV/DzgC+DPgnfOm/boZ+LfdelbXVNX/Aqiq55I8s0Cd0kh5BSUN7+l53z8/b/t5Xvhl7yXAiVV1XPd15AL/6W8H5i/dfRODq6afAr7JILzexSC4YLCczLnzjvmqbl2gAJ+Y1/6jVXV59zPbGKyR9oMhxqr6E+DtXf/XJTllXg0vBf7f3pwMqW8GlLR/XQ98cG4jyXEL7HMP8KNzG1X1EHA4cFRVPQB8C/jXvBBQ1wG/2i2NQpLXdIv9XQf8UreeF0mOTPLD3c88A/wc8L4kv9C9/2rggaq6lMFs/+u69lcCc0s6SM1wiE/av84DPpVkK4N/XzcxuPfzA1V1b5JVuwyp3cJgJn4YBNMnGAQVDO5jrQVu7ZZdmAV+rqquT/JjwM3dkOJTwNnAc10/3+8WWLwhyfeBo4Gzk+wAHgUu6o7/FsAZ7NUcZzOXRiDJh4Enq6qXz0LtZS3XAB+tqvtGXYs0n0N80mhcxs73tEaiW5vpTw0ntcgrKElSk7yCkiQ1yYCSJDXJgJIkNcmAkiQ1yYCSJDXp/wOqDxCEPg1DKQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_lifelines(obs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And shift them so they all start at 0, like this:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAV9UlEQVR4nO3df7RdZZ3f8ffHECUySFZFXdZkGu34o5ZGcC5aJo4DMlqmiNg1NkrRqcyMsbNUdGyXv7qKlq75YafKyFgdo8hYxR8pjh1Rxh/jSEErNDdIAwi6KGJJQbms1gCzIoT47R9733ITb8LJTfY9zz33/Vor69z9nHOe57sJySfP/vHsVBWSJLXmEeMuQJKk+RhQkqQmGVCSpCYZUJKkJhlQkqQmHTHuAuY69thja926deMuQ5K0iLZt23Z3VT1u3/amAmrdunVMT0+PuwxJ0iJK8oP52j3EJ0lqkgElSWqSASVJapIBJUlqkgElSWqSASVJatKgAZXktCTfTXJLkrcNOZakCbd9C1xwHLxrdfe6fcu4K9LABrsPKskK4D8CLwR2AFuTfL6qvjPUmJIm1PYtcNm5sHtXt73z9m4bYP3G8dWlQQ15o+5zgFuq6laAJJ8GzgSGDaiLTx+0e0ljsGMr7Ll/77bdu+Br5xtQE2zIQ3xPAm6fs72jb9tLkk1JppNMz8zMDFiOpCVr33CatXPH4tahRTXkDCrztP3M43urajOwGWBqaurQH+97zhcPuQtJjbnguO6w3r6OWbP4tWjRDDmD2gGsnbO9BrhjwPEkTapTz4OVq/ZuW7mqa9fEGjKgtgJPTfLkJI8EXgF8fsDxJE2q9RvhjAvhmLVAutczLvT804Qb7BBfVT2Y5PXAl4EVwEer6sahxpM04dZvNJCWmUEft1FVlwOXDzmGJGkyuZKEJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSQaUJKlJBpQkqUkGlCSpSYMFVJKPJrkryQ1DjTGRtm+BC46Dd63uXrdvGXdFkjQWQ86g/gw4bcD+J8/2LXDZubDzdqC618vONaQkLUtHDNVxVV2ZZN1Q/e/Xxacv+pCHzY6tsOf+vdt274KvnQ/rN46nJkkak7Gfg0qyKcl0kumZmZlxlzNe+4bTrJ07FrcOSWrAYDOoUVXVZmAzwNTUVB1yh+d88ZC7GJsLjusP7+3jmDWLX4skjdnYZ1Ca49TzYOWqvdtWruraJWmZMaBasn4jnHEhHLMWSPd6xoWef5K0LA12iC/Jp4CTgWOT7ADeWVUXDTXexFi/0UCSJIa9iu+sofqWJE0+D/FJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmjRYQCVZm+TrSW5KcmOSNw41lqQD2L4FLjgO3rW6e92+ZdwVSSM5YsC+HwT+ZVVdm+RoYFuSr1bVdwYcU9Jc27fAZefC7l3d9s7bu22A9RvHV5c0gsECqqruBO7sf743yU3Ak4BhA+ri0wftXlpSdmyFPffv3bZ7F3ztfANKzVuUc1BJ1gEnANfM896mJNNJpmdmZhajHGn52DecZu3csbh1SAsw5CE+AJL8HPBZ4E1Vdc++71fVZmAzwNTUVB3ygOd88ZC7kCbGBcd1h/X2dcyaxa9FOkiDzqCSrKQLp0uq6s+HHEvSPE49D1au2rtt5aquXWrckFfxBbgIuKmq3jvUOJIOYP1GOONCOGYtkO71jAs9/6QlYchDfBuAVwHXJ7mub3tHVV0+4JiS9rV+o4GkJWnIq/i+AWSo/iVJk82VJCRJTTKgJElNMqAkSU0yoCRJTTKgJElNMqAkSU162IBK8sx52k4epBpJknqjzKC2JHlrOquS/AnwB0MXJkla3kYJqOcCa4H/BmwF7qBbJUKSpMGMElC7gV3AKuBI4PtV9dNBq5IkLXujBNRWuoA6EXgecFaSSwetSpK07I2yFt9vVdV0//MPgTOTvGrAmiRJevgZVFVNJ3leknMAkhwLfGPwyiRJy9ool5m/E3gr8Pa+6ZHAJ4YsSpKkUc5B/RPgJcDfAFTVHcDRQxYlSdIoAfVAVRVQAEmOGrYkSZJGv1H3Q8DqJK8B/gr48LBlSZKWu4e9iq+q/kOSFwL3AE8Hzquqrw5emSRpWRvpke99IBlKkqRFs9+ASnIv/Xmn+VTVYwapSJIkDhBQVXU0QJLz6W7Q/TgQ4Gy8ik+SNLBRLpL4R1X1gaq6t6ruqaoPAr8+dGGSpOVtlIDak+TsJCuSPCLJ2cCeoQuTJC1vowTUPwM2Aj/qf/3Tvk2SpMGMcpn5bcCZw5ciSdJDHjagkjwOeA2wbu7nq+o3hytLkrTcjXIf1F8AV9GtIOG5J0nSohgloB5dVW892I6THAlcCTyqH+fSqnrnwfYjSVqeRrlI4gtJ/vEC+r4feEFVPQs4HjgtyT9cQD+SpGVolBnUG4F3JLkf2E13s2493EoS/Qro9/WbK/tf+12Z4nB5+Ye+NfQQkiTgM689adD+R3mi7tFV9YiqWlVVj+m3R1rmqL936jrgLuCrVXXNPJ/ZlGQ6yfTMzMzB74EkaSKlm+jM80byjKq6Ocmz53u/qq4deZBkNfA54A1VdcP+Pjc1NVXT09OjditJmgBJtlXV1L7tBzrE92ZgE/Ceed4r4AWjDl5VP05yBXAasN+AkiRp1oEWi93Uv56ykI77+6d29+G0CvhV4N0LqlKStOyM9DyoBXoi8LEkK+jOdW2pqi8MOJ4kaYIMFlBVtR04Yaj+JUmTbZT7oCRJWnSjrMU331V8O4EfVNWDh78kSZJGO8T3AeDZwHa6m3SP639+bJJ/UVVfGbA+SdIyNcohvtuAE6pqqqp+ke680g10V+X9+wFrkyQtY6ME1DOq6sbZjar6Dl1g3TpcWZKk5W6UQ3zfTfJB4NP99suB7yV5FN3afJIkHXajzKBeDdwCvAn4XeDWvm03sKCbeCVJejijPPJ9V5I/Ab5Ct8TRd6tqduZ03/6/KUnSwo1ymfnJwMfoLpYIsDbJP6+qK4ctTZK0nI1yDuo9wIuq6rsASZ4GfAr4xSELkyQtb6Ocg1o5G04AVfU9uocPSpI0mFFmUNNJLgI+3m+fDWwbriRJkkYLqN8BXgecS3cO6kq61SUkSRrMKFfx3Z/k48DHq8pnskuSFsV+z0Gl864kdwM3092wO5PkvMUrT5K0XB3oIok3ARuAE6vqsVX1t4DnAhuS/O6iVCdJWrYOFFC/AZxVVd+fbejX33tl/54kSYM5UECtrKq7923sz0N5mbkkaVAHCqgHFvieJEmH7EBX8T0ryT3ztAc4cqB6JEkCDhBQVbViMQuRJGmuUZY6kiRp0RlQkqQmGVCSpCYZUJKkJhlQkqQmGVCSpCYNHlBJViT5dpIvDD2WJGlyLMYM6o3ATYswjiRpgozywMIFS7IGOB34PeDNQ4416+Uf+tZiDCNpTD7z2pPGXYIWydAzqD8G3gL8dH8fSLIpyXSS6ZkZn4coSeoMNoNK8mLgrqraluTk/X2uqjYDmwGmpqbqUMf1X1eSNBmGnEFtAF6S5Dbg08ALknxiwPEkSRNksICqqrdX1ZqqWge8AvjrqnrlUONJkiaL90FJkpo06FV8s6rqCuCKxRhLkjQZnEFJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmmRASZKaZEBJkppkQEmSmnTEkJ0nuQ24F9gDPFhVU0OOp0Zs3wJfOx927oBj1sCp58H6jeOuStISM2hA9U6pqrsXYRy1YPsWuOxc2L2r2955e7cNhpSkg7IYAbW4Lj593BUsbzu2wp77927bvaubURlQkg7C0OegCvhKkm1JNs33gSSbkkwnmZ6ZmRm4HA1u33CatXPH4tYhackbega1oaruSPJ44KtJbq6qK+d+oKo2A5sBpqam6pBHPOeLh9yFDsEFx3WH9fZ1zJrFr0XSkjboDKqq7uhf7wI+BzxnyPHUgFPPg5Wr9m5buaprl6SDMFhAJTkqydGzPwMvAm4Yajw1Yv1GOONCOGYtkO71jAs9/yTpoA15iO8JwOeSzI7zyar60oDjqRXrNxpIkg7ZYAFVVbcCzxqqf0nSZHMlCUlSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMMKElSkwwoSVKTDChJUpMGDagkq5NcmuTmJDclOWnI8SRJk+OIgft/H/ClqnpZkkcCjx54PEnShBgsoJI8Bng+8GqAqnoAeGCo8Wa9/EPfGnoIjegzr3XCLGnhhjzE9xRgBrg4ybeTfCTJUft+KMmmJNNJpmdmZgYsR5K0lKSqhuk4mQKuBjZU1TVJ3gfcU1X/Zn/fmZqaqunp6UHqkSS1Kcm2qprat33IGdQOYEdVXdNvXwo8e8DxJEkTZLCAqqofArcneXrfdCrwnaHGkyRNlqGv4nsDcEl/Bd+twDkDjydJmhCDBlRVXQf8zHFFSZIejitJSJKaZEBJkppkQEmSmmRASZKaZEBJkpo02EoSC5FkBvjBIXZzLHD3YShn3CZhPyZhH8D9aM0k7Mck7AMcvv34O1X1uH0bmwqowyHJ9HxLZiw1k7Afk7AP4H60ZhL2YxL2AYbfDw/xSZKaZEBJkpo0iQG1edwFHCaTsB+TsA/gfrRmEvZjEvYBBt6PiTsHJUmaDJM4g5IkTQADSpLUpIkJqCSnJflukluSvG3c9SxUko8muSvJDeOuZaGSrE3y9SQ3JbkxyRvHXdNCJDkyyX9P8j/6/fi3465poZKsSPLtJF8Ydy0LleS2JNcnuS7Jkn30dpLVSS5NcnP/Z+Skcdd0sJI8vf99mP11T5I3HfZxJuEcVJIVwPeAF9I9yXcrcFZVLbkHJCZ5PnAf8J+q6rhx17MQSZ4IPLGqrk1yNLANeOlS+/1IEuCoqrovyUrgG8Abq+rqMZd20JK8me7RN4+pqhePu56FSHIbMFVVS/oG1yQfA66qqo/0z8p7dFX9eNx1LVT/9+//Bp5bVYe60MJeJmUG9Rzglqq6taoeAD4NnDnmmhakqq4E/s+46zgUVXVnVV3b/3wvcBPwpPFWdfCqc1+/ubL/teT+RZdkDXA68JFx17LcJXkM8HzgIoCqemAph1PvVOB/Hu5wgskJqCcBt8/Z3sES/AtxEiVZB5wAXDPeShamPzR2HXAX8NWqWor78cfAW4CfjruQQ1TAV5JsS7Jp3MUs0FOAGeDi/pDrR5IcNe6iDtErgE8N0fGkBFTmaVty/9KdNEl+Dvgs8Kaqumfc9SxEVe2pquOBNcBzkiypw65JXgzcVVXbxl3LYbChqp4N/Brwuv5w+FJzBPBs4INVdQLwN8BSPmf+SOAlwH8eov9JCagdwNo522uAO8ZUi4D+nM1ngUuq6s/HXc+h6g/DXAGcNuZSDtYG4CX9+ZtPAy9I8onxlrQwVXVH/3oX8Dm6Q/tLzQ5gx5yZ+KV0gbVU/RpwbVX9aIjOJyWgtgJPTfLkPtFfAXx+zDUtW/3FBRcBN1XVe8ddz0IleVyS1f3Pq4BfBW4eb1UHp6reXlVrqmod3Z+Lv66qV465rIOW5Kj+ghv6Q2IvApbcla5V9UPg9iRP75tOBZbUxUP7OIuBDu9BN91c8qrqwSSvB74MrAA+WlU3jrmsBUnyKeBk4NgkO4B3VtVF463qoG0AXgVc35+/AXhHVV0+xpoW4onAx/qrlB4BbKmqJXuZ9hL3BOBz3b99OAL4ZFV9abwlLdgbgEv6f0zfCpwz5noWJMmj6a6cfu1gY0zCZeaSpMkzKYf4JEkTxoCSJDXJgJIkNcmAkiQ1yYCSJDXJgNJES7KnX235xn5V8jcnOWz/3yd5dZK/PWf7I0meeZj6fmmS8w5HX/P0ffKoK5v394Mt1Uu6tYRNxH1Q0gHs6pcqIsnjgU8CxwDvHLWDJCuqas9+3n413Q2js6sc/PYhVbu3t9AtIzNWVTWT5M4kG6rqm+OuR8uHMygtG/0SOZuA16fz6iTvn30/yReSnNz/fF+S85NcA5yU5LwkW5PckGRz//2X0T3C4pJ+lrYqyRVJpvo+zuqfX3RDknfPGee+JL/Xz+iuTvKEfWtN8jTg/qq6u1+w9tZ+zNVJfjq7Dl2Sq5L8Qr/Swkf7Gr+d5Mz+/RVJ/qhv357kZ26qTHJi/52nJPmVPPSMn2/Prt4A/Bfg7MPyGyGNyIDSslJVt9L9f//4h/noUcANVfXcqvoG8P6qOrF/Rtcq4MVVdSkwDZxdVcdX1a7ZL/eH/d4NvAA4HjgxyUvn9H11VT0LuBJ4zTzjbwBmH1myh+55Z88Enkf3fK1fTvIoYE1V3QL8a7pljE4ETgH+qF8S6LeAnX37icBrkjx5Tp2/BPwpcGb/3+ZfAa/rZ52/DMzu03S/LS0aA0rL0Xyr3+9rD91it7NOSXJNkuvpQufvP8z3TwSuqKqZqnoQuITuOUAADwCz53+2Aevm+f4T6R7LMOuq/vvPB/6ALqhOpFuHErq16d7WLy11BXAk8PN9+2/07dcAjwWe2n/n7wGbgTOq6n/1bd8E3pvkXGB1Xzt0jxv5/+fapMVgQGlZSfIUuvC5C3iQvf8MHDnn55/MnndKciTwAeBlVfUPgA/v89l5hzrAe7vroTXG9jD/ueBd+4xxFd0M5jnA5cBqujUbr5wz3q/3M7njq+rnq+qmvv0Nc9qfXFVf6b9zJ/ATuud1AVBVfwj8Nt0s8eokz+jfOpKHZlPSojCgtGwkeRzd4az39wFxG3B8kkckWcv+H98wGxR3p3vG1cvmvHcvcPTPfoVrgF9Jcmy/2OxZwH89iHJvAn5hn/5+CfhpVf0EuI5ukc6r+ve/DLyhX0meJCfMaf+ddI8/IcnT8tAD8n5M96Td359z7u3vVtX1VfVuusN6swH1NJbg6uFa2ryKT5NuVX94ayXdjOnjwOwjQL4JfB+4nu4v32vn66Cqfpzkw/3nbuOhw2oAfwb8aZJdwElzvnNnkrcDX6ebxVxeVX9xEHVfCbwnSfpHz9+f5Hbg6v79q+hC7/p++9/RPTl3ex9StwEvpnvM+zrg2r59Bpg9F0ZV/SjJGcBfJvlN4JVJTqGb2X0H+Mv+o6cAXzyI+qVD5mrmUqOSvA+4rKr+qoFarqS7kOL/jrsWLR8e4pPa9fvAo8ddRH9o9L2GkxabMyhJUpOcQUmSmmRASZKaZEBJkppkQEmSmmRASZKa9P8AGCPPmyuP+lMAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "shifted = pd.DataFrame(index=obs.index)\n", "shifted['start'] = 0\n", "shifted['end'] = duration\n", "shifted['status'] = obs['status']\n", "\n", "plot_lifelines(shifted)\n", "\n", "plt.xlabel('Duration (weeks)');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the x-axis in this figure is duration, not time.\n", "\n", "From this figure, you can pretty much read off the hazard function:\n", "\n", "* At duration 1, there is 1 adoption out of 7 dogs at risk, so the hazard rate is `1/7`.\n", "\n", "* At duration 2, there is 1 adoption out of 6 dogs at risk, so the hazard rate is `1/6`.\n", "\n", "* At duration 4, there is 1 adoption out of 4 dogs at risk, so the hazard rate is `1/4`.\n", "\n", "And so on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which brings us to the second key idea, which is that we can estimate the hazard rate by computing the ratio of\n", "\n", "1) The number of cases that ended at each duration, and\n", "\n", "2) The number of cases that *could have* ended at each duration, that is, the number that were \"at risk\".\n", "\n", "The numerator at each duration, `d`, is the PMF of complete cases.\n", "\n", "The denominator at `d` is the sum of\n", "\n", "1) The number of adopted dogs whose duration was greater than or equal to `d`, and\n", "\n", "2) The number of unadopted dogs whose duration was greater than or equal to `d`.\n", "\n", "Recall that the survival function is the number of cases that *exceed* `d`, and the PMF if the number of cases with duration *equal to* `d`.\n", "\n", "So, to get the number that are *greater than or equal*, we can add the PMF and the survival function.\n", "\n", "That's the big picture; now let's see how it's done." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimating the hazard function\n", "\n", "First I'll select the durations of the complete and ongoing cases." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 5\n", "1 1\n", "2 4\n", "5 2\n", "dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "complete = duration[obs['status']==1]\n", "complete" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3 7\n", "4 5\n", "6 2\n", "dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ongoing = duration[obs['status']==0]\n", "ongoing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I'll compute the PMF for complete and ongoing cases, using unnormalized `Pmf` objects from `empiricaldist`: " ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
11
21
41
51
\n", "
" ], "text/plain": [ "1 1\n", "2 1\n", "4 1\n", "5 1\n", "dtype: int64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pmf_complete = Pmf.from_seq(complete, normalize=False)\n", "pmf_complete" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
21
51
71
\n", "
" ], "text/plain": [ "2 1\n", "5 1\n", "7 1\n", "dtype: int64" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pmf_ongoing = Pmf.from_seq(ongoing, normalize=False)\n", "pmf_ongoing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we add two `Pmf` objects, the index of the result is the union of the two indices." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 4, 5, 7])" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "observed = pmf_complete + pmf_ongoing\n", "ts = observed.qs\n", "ts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's what the PMFs look like in a table." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pmf_completepmf_ongoing
11.0NaN
21.01.0
41.0NaN
51.01.0
7NaN1.0
\n", "
" ], "text/plain": [ " pmf_complete pmf_ongoing\n", "1 1.0 NaN\n", "2 1.0 1.0\n", "4 1.0 NaN\n", "5 1.0 1.0\n", "7 NaN 1.0" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(index=ts)\n", "df['pmf_complete'] = pmf_complete\n", "df['pmf_ongoing'] = pmf_ongoing\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This table indicates that one dog was adopted at 1 week, another dog was adopted at 2 weeks, and a third dog was observed, unadopted, having been in the shelter for 2 weeks.\n", "\n", "We can use the PMFs to compute survival functions for the two groups:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "surv_complete = pmf_complete.make_surv()\n", "surv_ongoing = pmf_ongoing.make_surv()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we evaluate the survival functions at the times when we observed the cases." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "s_complete = surv_complete(ts)\n", "s_ongoing = surv_ongoing(ts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's what that looks like in the table." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pmf_completepmf_ongoings_completes_ongoing
11.0NaN3.03.0
21.01.02.02.0
41.0NaN1.02.0
51.01.00.01.0
7NaN1.00.00.0
\n", "
" ], "text/plain": [ " pmf_complete pmf_ongoing s_complete s_ongoing\n", "1 1.0 NaN 3.0 3.0\n", "2 1.0 1.0 2.0 2.0\n", "4 1.0 NaN 1.0 2.0\n", "5 1.0 1.0 0.0 1.0\n", "7 NaN 1.0 0.0 0.0" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['s_complete'] = s_complete\n", "df['s_ongoing'] = s_ongoing\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first row of this table indicates that 3 of the adopted dogs and 3 of the unadopted dogs were in the shelter for more than 1 week.\n", "\n", "We can use this table to compute the number of dogs available at each duration, `d`, which is the sum of:\n", "\n", "1) The number cases that end at, `d`, which is `pmf_complete`.\n", "\n", "2) The number of ongoing cases observed at `d`, which is `pmf_ongoing`,\n", "\n", "3) The number of complete cases that survive past `d`, which is `s_complete`.\n", "\n", "4) The number of ongoing cases that survive past `d`, which is `s_complete`.\n", "\n", "Here's what that looks like." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pmf_completepmf_ongoings_completes_ongoingat_risk
11.0NaN3.03.07.0
21.01.02.02.06.0
41.0NaN1.02.04.0
51.01.00.01.03.0
7NaN1.00.00.01.0
\n", "
" ], "text/plain": [ " pmf_complete pmf_ongoing s_complete s_ongoing at_risk\n", "1 1.0 NaN 3.0 3.0 7.0\n", "2 1.0 1.0 2.0 2.0 6.0\n", "4 1.0 NaN 1.0 2.0 4.0\n", "5 1.0 1.0 0.0 1.0 3.0\n", "7 NaN 1.0 0.0 0.0 1.0" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "at_risk = pmf_complete + pmf_ongoing + s_complete + s_ongoing\n", "df['at_risk'] = at_risk\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Finally, the hazard function is the ratio of `pmf_complete` and `at_risk`:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pmf_completepmf_ongoings_completes_ongoingat_riskhazard
11.0NaN3.03.07.00.142857
21.01.02.02.06.00.166667
41.0NaN1.02.04.00.250000
51.01.00.01.03.00.333333
7NaN1.00.00.01.00.000000
\n", "
" ], "text/plain": [ " pmf_complete pmf_ongoing s_complete s_ongoing at_risk hazard\n", "1 1.0 NaN 3.0 3.0 7.0 0.142857\n", "2 1.0 1.0 2.0 2.0 6.0 0.166667\n", "4 1.0 NaN 1.0 2.0 4.0 0.250000\n", "5 1.0 1.0 0.0 1.0 3.0 0.333333\n", "7 NaN 1.0 0.0 0.0 1.0 0.000000" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hazard = pmf_complete / at_risk\n", "\n", "df['hazard'] = hazard\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working backwards\n", "\n", "With the hazard function, we can work backwards to compute the survival curve:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pmf_completepmf_ongoings_completes_ongoingat_riskhazardsurv
11.0NaN3.03.07.00.1428570.857143
21.01.02.02.06.00.1666670.714286
41.0NaN1.02.04.00.2500000.535714
51.01.00.01.03.00.3333330.357143
7NaN1.00.00.01.00.0000000.357143
\n", "
" ], "text/plain": [ " pmf_complete pmf_ongoing s_complete s_ongoing at_risk hazard \\\n", "1 1.0 NaN 3.0 3.0 7.0 0.142857 \n", "2 1.0 1.0 2.0 2.0 6.0 0.166667 \n", "4 1.0 NaN 1.0 2.0 4.0 0.250000 \n", "5 1.0 1.0 0.0 1.0 3.0 0.333333 \n", "7 NaN 1.0 0.0 0.0 1.0 0.000000 \n", "\n", " surv \n", "1 0.857143 \n", "2 0.714286 \n", "4 0.535714 \n", "5 0.357143 \n", "7 0.357143 " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "surv = (1 - hazard).cumprod()\n", "df['surv'] = surv\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the CDF:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pmf_completepmf_ongoings_completes_ongoingat_riskhazardsurvcdf
11.0NaN3.03.07.00.1428570.8571430.142857
21.01.02.02.06.00.1666670.7142860.285714
41.0NaN1.02.04.00.2500000.5357140.464286
51.01.00.01.03.00.3333330.3571430.642857
7NaN1.00.00.01.00.0000000.3571430.642857
\n", "
" ], "text/plain": [ " pmf_complete pmf_ongoing s_complete s_ongoing at_risk hazard \\\n", "1 1.0 NaN 3.0 3.0 7.0 0.142857 \n", "2 1.0 1.0 2.0 2.0 6.0 0.166667 \n", "4 1.0 NaN 1.0 2.0 4.0 0.250000 \n", "5 1.0 1.0 0.0 1.0 3.0 0.333333 \n", "7 NaN 1.0 0.0 0.0 1.0 0.000000 \n", "\n", " surv cdf \n", "1 0.857143 0.142857 \n", "2 0.714286 0.285714 \n", "4 0.535714 0.464286 \n", "5 0.357143 0.642857 \n", "7 0.357143 0.642857 " ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdf = 1 - surv\n", "df['cdf'] = cdf\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the PMF:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pmf_completepmf_ongoings_completes_ongoingat_riskhazardsurvcdfpmf
11.0NaN3.03.07.00.1428570.8571430.1428570.142857
21.01.02.02.06.00.1666670.7142860.2857140.142857
41.0NaN1.02.04.00.2500000.5357140.4642860.178571
51.01.00.01.03.00.3333330.3571430.6428570.178571
7NaN1.00.00.01.00.0000000.3571430.6428570.000000
\n", "
" ], "text/plain": [ " pmf_complete pmf_ongoing s_complete s_ongoing at_risk hazard \\\n", "1 1.0 NaN 3.0 3.0 7.0 0.142857 \n", "2 1.0 1.0 2.0 2.0 6.0 0.166667 \n", "4 1.0 NaN 1.0 2.0 4.0 0.250000 \n", "5 1.0 1.0 0.0 1.0 3.0 0.333333 \n", "7 NaN 1.0 0.0 0.0 1.0 0.000000 \n", "\n", " surv cdf pmf \n", "1 0.857143 0.142857 0.142857 \n", "2 0.714286 0.285714 0.142857 \n", "4 0.535714 0.464286 0.178571 \n", "5 0.357143 0.642857 0.178571 \n", "7 0.357143 0.642857 0.000000 " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pmf = cdf.diff()\n", "pmf[1] = cdf[1]\n", "\n", "df['pmf'] = pmf\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## lifelines\n", "\n", "I implemented Kaplan-Meier estimation to show how it works, but I didn't have to; it's available in a library called `lifelines`.\n", "\n", "First I'll import it and create a `KaplanMeierFitter`." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "from lifelines import KaplanMeierFitter\n", "kmf = KaplanMeierFitter()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 5\n", "1 1\n", "2 4\n", "3 7\n", "4 5\n", "5 2\n", "6 2\n", "dtype: int64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T = duration\n", "T" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "E = obs['status']" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kmf.fit(T, E)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
KM_estimate
timeline
0.01.000000
1.00.857143
2.00.714286
4.00.535714
5.00.357143
7.00.357143
\n", "
" ], "text/plain": [ " KM_estimate\n", "timeline \n", "0.0 1.000000\n", "1.0 0.857143\n", "2.0 0.714286\n", "4.0 0.535714\n", "5.0 0.357143\n", "7.0 0.357143" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kmf.survival_function_" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.6653345369377348e-16" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "max(abs(kmf.survival_function_['KM_estimate'] - df['surv']).dropna())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 2 }