{ "cells": [ { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Survival analysis with lifelines\n", "\n", "\n", "The [lifelines package](http://lifelines.readthedocs.org/) is a well documented, easy-to-use Python package for survival analysis.\n", "\n", "I had never done any survival analysis and the fact that this has great documentation, made me adventure in the field. From the documentation I was able to understand the key concepts of survival analysis and run a few simple analysis on clinical data gathered by our collaborators from a cohort of cancer patients. This obviously does not mean this is a replacement of proper study of the field, but nonetheless I highly recommend reading the whole documentation for begginers on the topic and the usage of the package to anyone working in the field.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting our hands dirty\n", "\n", "Although all one needs for survival analysis are two arrays with the time the patient observation took and if death was observed during that time or not, in reality you're more likely to get from clinicians an Excel file with dates of birth, diagnosis, and death along with other relevant information on the clinical cohort." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's read some data in and transform those fields into the time we have been observing the patient (from diagnosis to the last checkup):\n", "\n", "> Hint: make sure you tell pandas which columns hold dates and the format they are in for correct date parsing.\n", "\n", "In these data, although already anonymized, I have added some jitter for the actual values differ from the real ones." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | patient_last_checkup_date | \n", "diagnosis_date | \n", "patient_death_date | \n", "t1 | \n", "t2 | \n", "t3 | \n", "t4 | \n", "t5 | \n", "t6 | \n", "t7 | \n", "t8 | \n", "t9 | \n", "t10 | \n", "t11 | \n", "t12 | \n", "duration | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2011-12-05 | \n", "1977-08-23 | \n", "2011-12-19 | \n", "F | \n", "A0 | \n", "1 | \n", "1 | \n", "1 | \n", "False | \n", "False | \n", "False | \n", "True | \n", "False | \n", "False | \n", "False | \n", "12522 days | \n", "
1 | \n", "2015-01-15 | \n", "1997-08-06 | \n", "NaT | \n", "M | \n", "A0 | \n", "1 | \n", "NaN | \n", "NaN | \n", "True | \n", "False | \n", "False | \n", "False | \n", "True | \n", "False | \n", "False | \n", "6371 days | \n", "
2 | \n", "2011-11-14 | \n", "1987-03-11 | \n", "NaT | \n", "F | \n", "A0 | \n", "1 | \n", "1 | \n", "1 | \n", "True | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "9014 days | \n", "
3 | \n", "2008-11-15 | \n", "1992-04-27 | \n", "2008-12-7 | \n", "F | \n", "A0 | \n", "1 | \n", "2 | \n", "1 | \n", "True | \n", "False | \n", "True | \n", "False | \n", "False | \n", "False | \n", "False | \n", "6046 days | \n", "
4 | \n", "2008-10-09 | \n", "1994-07-19 | \n", "2009-12-22 | \n", "M | \n", "A0 | \n", "2 | \n", "2 | \n", "2 | \n", "True | \n", "True | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "5196 days | \n", "