{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "3Q2KuaRamHfr" }, "source": [ "# Survival Estimates that Vary with Time\n", "\n", "Welcome to the third assignment of Course 2. In this assignment, we'll use Python to build some of the statistical models we learned this past week to analyze surivival estimates for a dataset of lymphoma patients. We'll also evaluate these models and interpret their outputs. Along the way, you will be learning about the following: \n", "\n", "- Censored Data\n", "- Kaplan-Meier Estimates\n", "- Subgroup Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Outline\n", "\n", "- [1. Import Packages](#1)\n", "- [2. Load the Dataset](#2)\n", "- [3. Censored Data](#)\n", " - [Exercise 1](#Ex-1)\n", "- [4. Survival Estimates](#4)\n", " - [Exercise 2](#Ex-2)\n", " - [Exercise 3](#Ex-3)\n", "- [5. Subgroup Analysis](#5)\n", " - [5.1 Bonus: Log Rank Test](#5-1)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "UopnLTeLkViX" }, "source": [ "\n", "## 1. Import Packages\n", "\n", "We'll first import all the packages that we need for this assignment. \n", "\n", "- `lifelines` is an open-source library for data analysis.\n", "- `numpy` is the fundamental package for scientific computing in python.\n", "- `pandas` is what we'll use to manipulate our data.\n", "- `matplotlib` is a plotting library." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": {}, "colab_type": "code", "id": "TZyXoADQmYlt" }, "outputs": [], "source": [ "import lifelines\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "from util import load_data\n", "\n", "from lifelines import KaplanMeierFitter as KM\n", "from lifelines.statistics import logrank_test" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "5rp2TD1qnGmp" }, "source": [ "\n", "## 2. Load the Dataset\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "WEbu3MtrVsnU" }, "source": [ "Run the next cell to load the lymphoma data set. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": {}, "colab_type": "code", "id": "e3wHdLrEnSNa" }, "outputs": [], "source": [ "data = load_data()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "3hrHa0dPqU08" }, "source": [ "As always, you first look over your data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 221 }, "colab_type": "code", "id": "QEd504pKqWuc", "outputId": "7297830a-d316-4623-bb6a-77f8f96b8805" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data shape: (80, 3)\n" ] }, { "data": { "text/html": [ "
| \n", " | Stage_group | \n", "Time | \n", "Event | \n", "
|---|---|---|---|
| 0 | \n", "1 | \n", "6 | \n", "1 | \n", "
| 1 | \n", "1 | \n", "19 | \n", "1 | \n", "
| 2 | \n", "1 | \n", "32 | \n", "1 | \n", "
| 3 | \n", "1 | \n", "42 | \n", "1 | \n", "
| 4 | \n", "1 | \n", "42 | \n", "1 | \n", "
\n", "
'Event' column will give you the number of observations where censorship has NOT occurred.\n", "
| \n", " | time | \n", "Group 1 | \n", "Group 2 | \n", "
|---|---|---|---|
| 0 | \n", "90 | \n", "0.736842 | \n", "0.424529 | \n", "
| 1 | \n", "180 | \n", "0.680162 | \n", "0.254066 | \n", "
| 2 | \n", "270 | \n", "0.524696 | \n", "0.195436 | \n", "
| 3 | \n", "360 | \n", "0.524696 | \n", "0.195436 | \n", "