{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AI4M Course 2 Week 3 lecture notebook"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Outline\n",
"\n",
"[Count patients](#count-patients)\n",
"\n",
"[Kaplan-Meier](#kaplan-meier)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Count patients"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll work with data where:\n",
"- Time: days after a disease is diagnosed and the patient either dies or left the hospital's supervision.\n",
"- Event: \n",
" - 1 if the patient died\n",
" - 0 if the patient was not observed to die beyond the given 'Time' (their data is censored)\n",
" \n",
"Notice that these are the same numbers that you see in the lecture video about estimating survival."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Time | \n",
" Event | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 10 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" 8 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" 60 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" 20 | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" 12 | \n",
" 0 | \n",
"
\n",
" \n",
" 5 | \n",
" 30 | \n",
" 1 | \n",
"
\n",
" \n",
" 6 | \n",
" 15 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Time Event\n",
"0 10 1\n",
"1 8 0\n",
"2 60 1\n",
"3 20 1\n",
"4 12 0\n",
"5 30 1\n",
"6 15 0"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame({'Time': [10,8,60,20,12,30,15],\n",
" 'Event': [1,0,1,1,0,1,0]\n",
" })\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Count patients "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Count number of censored patients"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 True\n",
"2 False\n",
"3 False\n",
"4 True\n",
"5 False\n",
"6 True\n",
"Name: Event, dtype: bool"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Event'] == 0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Patient 1, 4 and 6 were censored."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Count how many patient records were censored\n",
"\n",
"When we sum a series of booleans, `True` is treated as 1 and `False` is treated as 0."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(df['Event'] == 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Count number of patients who definitely survived past time t\n",
"\n",
"This assumes that any patient who was censored died at the time of being censored ( **died immediately**).\n",
"\n",
"If a patient survived past time `t`:\n",
"- Their `Time` of event should be greater than `t`. \n",
"- Notice that they can have an `Event` of either 1 or 0. What matters is their `Time` value."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 True\n",
"3 False\n",
"4 False\n",
"5 True\n",
"6 False\n",
"Name: Time, dtype: bool"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t = 25\n",
"df['Time'] > t"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(df['Time'] > t)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Count the number of patients who may have survived past t\n",
"\n",
"This assumes that censored patients **never die**.\n",
"- The patient is censored at any time and we assume that they live forever.\n",
"- The patient died (`Event` is 1) but after time `t`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 True\n",
"2 True\n",
"3 False\n",
"4 True\n",
"5 True\n",
"6 True\n",
"dtype: bool"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t = 25\n",
"(df['Time'] > t) | (df['Event'] == 0)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum( (df['Time'] > t) | (df['Event'] == 0) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Count number of patients who were not censored before time t"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If patient was not censored before time `t`:\n",
"- They either had an event (death) before `t`, at `t`, or after `t` (any time)\n",
"- Or, their `Time` occurs after time `t` (they may have either died or been censored at a later time after `t`)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 False\n",
"2 True\n",
"3 True\n",
"4 False\n",
"5 True\n",
"6 False\n",
"dtype: bool"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t = 25\n",
"(df['Event'] == 1) | (df['Time'] > t)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum( (df['Event'] == 1) | (df['Time'] > t) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Kaplan-Meier\n",
"\n",
"The Kaplan Meier estimate of survival probability is:\n",
"\n",
"$$\n",
"S(t) = \\prod_{t_i \\leq t} (1 - \\frac{d_i}{n_i})\n",
"$$\n",
"\n",
"- $t_i$ are the events observed in the dataset \n",
"- $d_i$ is the number of deaths at time $t_i$\n",
"- $n_i$ is the number of people who we know have survived up to time $t_i$.\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Time | \n",
" Event | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 3 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" 2 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Time Event\n",
"0 3 0\n",
"1 3 1\n",
"2 2 0\n",
"3 2 1"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame({'Time': [3,3,2,2],\n",
" 'Event': [0,1,0,1]\n",
" })\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Find those who survived up to time $t_i$\n",
"\n",
"If they survived up to time $t_i$, \n",
"- Their `Time` is either greater than $t_i$\n",
"- Or, their `Time` can be equal to $t_i$"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 True\n",
"3 True\n",
"Name: Time, dtype: bool"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t_i = 2\n",
"df['Time'] >= t_i"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use this to help you calculate $n_i$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Find those who died at time $t_i$\n",
"\n",
"- If they died at $t_i$:\n",
"- Their `Event` value is 1. \n",
"- Also, their `Time` should be equal to $t_i$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"t_i = 2\n",
"(df['Event'] == 1) & (df['Time'] == t_i)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use this to help you calculate $d_i$\n",
"\n",
"You'll implement Kaplan Meier in this week's assignment!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}