{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "abd678aa-fb2a-47b7-bd31-2123bbea17e4",
   "metadata": {},
   "source": [
    "# Replication: Agosta *et al*, 2013\n",
    "\n",
    "## Introduction\n",
    "\n",
    "This notebook attempts to replicate the following paper with the [PPMI](http://ppmi-info.org) dataset:\n",
    "\n",
    "<div class=\"alert alert-block alert-success\">\n",
    "Agosta, Federica, et al. <a href=\"https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.22101?casa_token=vStnHL7zZ9MAAAAA:Ua2XOtP-ZFkkN-ebLPWi-84jCctqFpC8I89zyE3Ia9Lvk8bJWl7wj7-15xciyX50gzG05TlCvv6DFk0\">\n",
    "The topography of brain damage at different stages of Parkinson's disease.</a>, Human brain mapping 34.11 (2013): 2798-2807.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab86edb8-d5ab-402a-b133-2a77530600a4",
   "metadata": {},
   "source": [
    "<img src='images/table.png' width=800/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9189c63b-94ad-469b-b9d0-328eb52cc8b8",
   "metadata": {},
   "source": [
    "## Initial setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "52b4e9c2-4f75-43bc-b956-dbe9f1ba608d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Installing notebook dependencies (see log in install.log)... \n",
      "This notebook was run on 2022-07-21 15:38:37 UTC +0000\n"
     ]
    }
   ],
   "source": [
    "import livingpark_utils\n",
    "\n",
    "utils = livingpark_utils.LivingParkUtils(\"agosta-etal\")\n",
    "utils.notebook_init()\n",
    "random_seed = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff5646ca-49a8-4072-b28c-cf8a82debb42",
   "metadata": {
    "tags": []
   },
   "source": [
    "## PPMI cohort preparation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "003cc8c9-96aa-4772-af38-fb769c4fdff3",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Study data download"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "be13ea39-957e-440f-b27b-32113c81f6a0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Download skipped: No missing files!\n"
     ]
    }
   ],
   "source": [
    "required_files = [\n",
    "    \"iu_genetic_consensus_20220310.csv\",\n",
    "    \"Cognitive_Categorization.csv\",\n",
    "    \"Primary_Clinical_Diagnosis.csv\",\n",
    "    \"Demographics.csv\",  # SEX\n",
    "    \"Socio-Economics.csv\",  # EDUCYRS\n",
    "    \"PD_Diagnosis_History.csv\",  # Disease duration\n",
    "    \"Montreal_Cognitive_Assessment__MoCA_.csv\",\n",
    "    \"REM_Sleep_Behavior_Disorder_Questionnaire.csv\",  # STROKE\n",
    "]\n",
    "\n",
    "utils.download_ppmi_metadata(required_files)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a7719ba4-28c1-4b6e-a8a7-08e163c2dd94",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "File /home/glatard/code/livingpark/agosta-etal/inputs/study_files/MDS_UPDRS_Part_III_clean.csv is now available\n",
      "File /home/glatard/code/livingpark/agosta-etal/inputs/study_files/MRI_info.csv is now available\n"
     ]
    }
   ],
   "source": [
    "# H&Y scores\n",
    "\n",
    "# TODO: move this to livingpark_utils\n",
    "import os.path as op\n",
    "\n",
    "file_path = op.join(utils.study_files_dir, \"MDS_UPDRS_Part_III_clean.csv\")\n",
    "if not op.exists(file_path):\n",
    "    !(cd {utils.study_files_dir} && python -m wget \"https://raw.githubusercontent.com/LivingPark-MRI/ppmi-treatment-and-on-off-status/main/PPMI medication and ON-OFF status.ipynb\")  # use requests to improve portability\n",
    "    npath = op.join(utils.study_files_dir, \"PPMI medication and ON-OFF status.ipynb\")\n",
    "    %run \"{npath}\"\n",
    "print(f\"File {file_path} is now available\")\n",
    "\n",
    "# TODO: move this to livingpark_utils\n",
    "import os.path as op\n",
    "\n",
    "file_path = op.join(utils.study_files_dir, \"MRI_info.csv\")\n",
    "if not op.exists(file_path):\n",
    "    !(cd {utils.study_files_dir} && python -m wget \"https://raw.githubusercontent.com/LivingPark-MRI/ppmi-MRI-metadata/main/MRI metadata.ipynb\")  # use requests to improve portability\n",
    "    npath = op.join(utils.study_files_dir, \"MRI metadata.ipynb\")\n",
    "    %run \"{npath}\"\n",
    "print(f\"File {file_path} is now available\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01671975-07d0-48a0-8774-4349abc3f090",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Inclusion criteria\n",
    "\n",
    "We obtain the following group sizes:\n",
    "\n",
    "<!-- and a cutoff score of 6 to identify RBD subjects among PD subjects, consistently with the results presented in [[2]](https://www.sciencedirect.com# /science/article/pii/S138994571100164X). -->"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7984652c-62b1-465f-bfdb-dccbb9191dcb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# H&Y during OFF time\n",
    "# Exclusion criteria: Patients were excluded if they had:\n",
    "# (a) parkin, leucine-rich repeat kinase 2 (LRRK2), and glu-\n",
    "# cocerebrosidase (GBA) gene mutations;  OK\n",
    "# (b) cerebrovascular\n",
    "# disorders, traumatic brain injury history, or intracranial\n",
    "# mass; only stroke found in\n",
    "#\n",
    "# (c) other major neurological and medical diseases;\n",
    "# (d) dementia: OK\n",
    "\n",
    "# DARTEL:  8-mm full width\n",
    "# at half maximum (FWHM) Gaussian filter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "b7589aaf-1a2f-411a-afbe-6aa8840fdddd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Download skipped: No missing files!\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# UPDRS3\n",
    "updrs3 = pd.read_csv(op.join(utils.study_files_dir, \"MDS_UPDRS_Part_III_clean.csv\"))[\n",
    "    [\"PATNO\", \"EVENT_ID\", \"PDSTATE\", \"NHY\", \"NP3TOT\"]\n",
    "]\n",
    "# Genetics\n",
    "genetics = pd.read_csv(\n",
    "    op.join(utils.study_files_dir, \"iu_genetic_consensus_20220310.csv\")\n",
    ")[[\"PATNO\", \"GBA_PATHVAR\", \"LRRK2_PATHVAR\", \"NOTES\"]]\n",
    "# Cognitive Categorization\n",
    "cog_cat = pd.read_csv(op.join(utils.study_files_dir, \"Cognitive_Categorization.csv\"))[\n",
    "    [\"PATNO\", \"EVENT_ID\", \"COGSTATE\"]\n",
    "]\n",
    "# Diagnosis\n",
    "diag = pd.read_csv(op.join(utils.study_files_dir, \"Primary_Clinical_Diagnosis.csv\"))[\n",
    "    [\"PATNO\", \"EVENT_ID\", \"PRIMDIAG\", \"OTHNEURO\"]\n",
    "]\n",
    "# MRI\n",
    "mri = pd.read_csv(op.join(utils.study_files_dir, \"MRI_info.csv\"))[\n",
    "    [\"Subject ID\", \"Visit code\", \"Description\", \"Age\"]\n",
    "]\n",
    "mri.rename(columns={\"Subject ID\": \"PATNO\", \"Visit code\": \"EVENT_ID\"}, inplace=True)\n",
    "# Demographics\n",
    "demographics = pd.read_csv(op.join(utils.study_files_dir, \"Demographics.csv\"))[\n",
    "    [\"PATNO\", \"SEX\"]\n",
    "]\n",
    "# Soci-economics\n",
    "socio_eco = pd.read_csv(op.join(utils.study_files_dir, \"Socio-Economics.csv\"))[\n",
    "    [\"PATNO\", \"EDUCYRS\"]\n",
    "]\n",
    "# Disease duration\n",
    "disease_dur = utils.disease_duration()\n",
    "# MoCA\n",
    "moca = pd.read_csv(\n",
    "    op.join(utils.study_files_dir, \"Montreal_Cognitive_Assessment__MoCA_.csv\")\n",
    ")[[\"PATNO\", \"EVENT_ID\", \"MCATOT\"]]\n",
    "moca[\"MMSETOT\"] = moca[\"MCATOT\"].apply(utils.moca2mmse)\n",
    "# Stroke\n",
    "rem = pd.read_csv(\n",
    "    op.join(utils.study_files_dir, \"REM_Sleep_Behavior_Disorder_Questionnaire.csv\")\n",
    ")[[\"PATNO\", \"EVENT_ID\", \"STROKE\"]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "7bdfd288-bf39-44c5-a194-a3baa873be64",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of controls: 46\n",
      "Number of PD subjects: 125\n",
      "Number of PD subject visits by H&Y score:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PATNO</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NHY</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>157</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     PATNO\n",
       "NHY       \n",
       "1      157\n",
       "2      537\n",
       "3       34\n",
       "4        6"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Remove genetics data that should be excluded according to PPMI doc\n",
    "genetics_clean = genetics[\n",
    "    ~(\n",
    "        (genetics[\"NOTES\"].notna())\n",
    "        & (\n",
    "            genetics[\"NOTES\"].str.contains(\"exclude GBA\")\n",
    "            | genetics[\"NOTES\"].str.contains(\"exclude LRRK2\")\n",
    "        )\n",
    "    )\n",
    "]\n",
    "\n",
    "# Inclusion / exclusion criteria\n",
    "subjects = (\n",
    "    updrs3[\n",
    "        (updrs3[\"PDSTATE\"] == \"OFF\") & (updrs3[\"NHY\"].notna()) & (updrs3[\"NHY\"] != \"UR\")\n",
    "    ]  # H&Y score is available and was obtained during OFF time\n",
    "    .merge(\n",
    "        genetics_clean[  # No pathogenic GBA or LRRK2 variant present (Note: on-going email thread with Madeleine)\n",
    "            (genetics_clean[\"GBA_PATHVAR\"] == 0)\n",
    "            & (genetics_clean[\"LRRK2_PATHVAR\"] == 0)\n",
    "        ],\n",
    "        how=\"inner\",\n",
    "        on=\"PATNO\",\n",
    "    )\n",
    "    .merge(\n",
    "        cog_cat[cog_cat[\"COGSTATE\"] != 3], how=\"inner\", on=[\"PATNO\", \"EVENT_ID\"]\n",
    "    )  # No dementia\n",
    "    .merge(\n",
    "        diag[\n",
    "            (diag[\"PRIMDIAG\"].isin([1, 17])) & (diag[\"OTHNEURO\"].isna())\n",
    "        ],  # Primary diagnosis is Idiopathic PD or healthy control\n",
    "        how=\"inner\",\n",
    "        on=[\"PATNO\", \"EVENT_ID\"],\n",
    "    )\n",
    "    .merge(mri, how=\"inner\", on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(demographics, how=\"left\", on=\"PATNO\")\n",
    "    .merge(socio_eco, how=\"left\", on=\"PATNO\")\n",
    "    .merge(disease_dur, how=\"left\", on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(moca, how=\"left\", on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(rem, how=\"inner\", on=[\"PATNO\", \"EVENT_ID\"])\n",
    ")\n",
    "\n",
    "pds = subjects[\n",
    "    (subjects[\"PRIMDIAG\"] == 1) & (subjects[\"STROKE\"] == 0)\n",
    "]  # Include only idiopathic PD with no history of stroke\n",
    "controls = subjects[\n",
    "    (subjects[\"PRIMDIAG\"] == 17) & (subjects[\"NHY\"] == \"0\") & (subjects[\"STROKE\"] == 0)\n",
    "]  # Exclude controls with H&Y > 0 and history of stroke\n",
    "subjects = pd.concat([pds, controls])\n",
    "\n",
    "\n",
    "print(f\"Number of controls: {len(pd.unique(controls['PATNO']))}\")\n",
    "print(f\"Number of PD subjects: {len(pd.unique(pds['PATNO']))}\")\n",
    "\n",
    "print(f\"Number of PD subject visits by H&Y score:\")\n",
    "pds.groupby([\"NHY\"], dropna=False).count()[[\"PATNO\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7bb0228-4b3a-443f-89da-c94bca364f2b",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Cohort matching\n",
    "\n",
    "Matching sex balance as much as possible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "6f8b6fee-c179-4330-a264-999eafd91300",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selecting {'total': 12, 'women': 5} visits of unique patients with H&Y=4\n",
      "Warning: found only 2 visits of unique patients with H&Y=4 while 12 were required. Including them all regardless of sex balance.\n",
      "Selecting {'total': 14, 'women': 7} visits of unique patients with H&Y=3\n",
      "Warning: found only 9 visits of unique patients with H&Y=3 while 14 were required. Including them all regardless of sex balance.\n",
      "Selecting {'total': 46, 'women': 15} visits of unique patients with H&Y=2\n",
      "Selecting {'total': 17, 'women': 7} visits of unique patients with H&Y=1\n",
      "Selecting {'total': 42, 'women': 17} visits of unique patients with H&Y=0\n",
      "Warning: found only 14 women with H&Y=0 while 17 were required: including them all\n"
     ]
    }
   ],
   "source": [
    "sample_size = {\n",
    "    0: {\"total\": 42, \"women\": 17},\n",
    "    1: {\"total\": 17, \"women\": 7},\n",
    "    2: {\"total\": 46, \"women\": 15},\n",
    "    3: {\"total\": 14, \"women\": 7},\n",
    "    4: {\"total\": 12, \"women\": 5},\n",
    "}\n",
    "\n",
    "samples = {}\n",
    "\n",
    "subjects_ = subjects.copy()  # copy subjects DF to leave it intact\n",
    "for x in sorted(\n",
    "    sample_size, reverse=True\n",
    "):  # sampling patients in descending H&Y score order as there are less patients with high H&Y scores\n",
    "    print(f\"Selecting {sample_size[x]} visits of unique patients with H&Y={x}\")\n",
    "\n",
    "    # Sample visits with H&Y=x and make sure they belong to different patients\n",
    "    hy = (\n",
    "        subjects_[subjects_[\"NHY\"] == str(x)]\n",
    "        .groupby([\"PATNO\"])\n",
    "        .sample(1, random_state=random_seed)\n",
    "    )\n",
    "\n",
    "    # Sample subjects in hy\n",
    "    if len(hy) < sample_size[x][\"total\"]:\n",
    "        print(\n",
    "            f'Warning: found only {len(hy)} visits of unique patients with H&Y={x} while {sample_size[x][\"total\"]} were required. '\n",
    "            + \"Including them all regardless of sex balance.\"\n",
    "        )\n",
    "        samples[x] = hy\n",
    "    else:\n",
    "        # We have enough subjects. Trying to match sex balance\n",
    "        hy_women = hy[hy[\"SEX\"] == 0]\n",
    "        required_women = sample_size[x][\"women\"]\n",
    "\n",
    "        hy_men = hy[hy[\"SEX\"] == 1]\n",
    "        required_men = sample_size[x][\"total\"] - required_women\n",
    "\n",
    "        if len(hy_women) < required_women:  # We don't have enough women\n",
    "            print(\n",
    "                f\"Warning: found only {len(hy_women)} women with H&Y={x} while {required_women} were required: including them all\"\n",
    "            )\n",
    "            # We have enough men since we have enough subjects\n",
    "            required_men += required_women - len(hy_women)\n",
    "            required_women = len(hy_women)\n",
    "\n",
    "        if len(hy_men) < required_men:\n",
    "            # We don't have enough men\n",
    "            print(\n",
    "                f\"Warning: found only {len(hy_men)} men with H&Y={x} while {required_men} were required: including them all\"\n",
    "            )\n",
    "            # We have enough women since we have enough subjects\n",
    "            required_women += required_men - len(hy_men)\n",
    "            required_men = len(hy_men)\n",
    "\n",
    "        samples[x] = pd.concat(\n",
    "            [\n",
    "                hy_women.sample(required_women, random_state=random_seed),\n",
    "                hy_men.sample(required_men, random_state=random_seed),\n",
    "            ]\n",
    "        )\n",
    "\n",
    "    # Remove sampled patients from subjects_\n",
    "    subjects_ = subjects_[~(subjects_[\"PATNO\"].isin(samples[x][\"PATNO\"]))]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c86d3e08-d089-4dcd-a404-f41fc3416aab",
   "metadata": {},
   "outputs": [],
   "source": [
    "cohort = pd.concat([samples[x] for x in samples])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "c1c616ca-5e4f-4ed9-82f7-d1d3f86dead7",
   "metadata": {},
   "outputs": [],
   "source": [
    "def num_stats(column_name, skip_healthy=False):\n",
    "    def round_(x):\n",
    "        if str(x) == \"nan\":\n",
    "            return float(\"nan\")\n",
    "        else:\n",
    "            return round(x)\n",
    "\n",
    "    a = [\n",
    "        f'{round_(cohort[cohort[\"NHY\"]==str(x)][column_name].mean())} '\n",
    "        + f'+/- {round_(cohort[cohort[\"NHY\"]==str(x)][column_name].std())} '\n",
    "        + f'({round_(cohort[cohort[\"NHY\"]==str(x)][column_name].min())}-'\n",
    "        + f'{round_(cohort[cohort[\"NHY\"]==str(x)][column_name].max())})'\n",
    "        for x in range(5)\n",
    "    ]\n",
    "    if skip_healthy:\n",
    "        a[0] = \"—\"\n",
    "    return a\n",
    "\n",
    "\n",
    "cohort_stats = pd.DataFrame(\n",
    "    columns=[\"Healthy controls\"] + [f\"HY {x}\" for x in range(1, 5)]\n",
    ")\n",
    "cohort_stats.loc[\"Number\"] = [len(cohort[cohort[\"NHY\"] == str(x)]) for x in range(5)]\n",
    "cohort_stats.loc[\"Women / men \"] = [\n",
    "    f'{len(cohort[(cohort[\"NHY\"]==str(x)) & (cohort[\"SEX\"]==0)])} / {len(cohort[(cohort[\"NHY\"]==str(x)) & (cohort[\"SEX\"]==1)])}'\n",
    "    for x in range(5)\n",
    "]\n",
    "cohort_stats.loc[\"Age (years)\"] = num_stats(\"Age\")\n",
    "cohort_stats.loc[\"Education (years)\"] = num_stats(\"EDUCYRS\")\n",
    "cohort_stats.loc[\"Disease duration (years)\"] = num_stats(\"PDXDUR\", skip_healthy=True)\n",
    "cohort_stats.loc[\"UPDRS III\"] = num_stats(\"NP3TOT\", skip_healthy=True)\n",
    "cohort_stats.loc[\"MMSE\"] = num_stats(\"MMSETOT\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "23529c9e-e8a6-4dbf-9a3e-7e50226a9c93",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Healthy controls</th>\n",
       "      <th>HY 1</th>\n",
       "      <th>HY 2</th>\n",
       "      <th>HY 3</th>\n",
       "      <th>HY 4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Number</th>\n",
       "      <td>42</td>\n",
       "      <td>17</td>\n",
       "      <td>46</td>\n",
       "      <td>9</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Women / men</th>\n",
       "      <td>14 / 28</td>\n",
       "      <td>7 / 10</td>\n",
       "      <td>15 / 31</td>\n",
       "      <td>4 / 5</td>\n",
       "      <td>1 / 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Age (years)</th>\n",
       "      <td>61 +/- 12 (33-83)</td>\n",
       "      <td>62 +/- 10 (40-79)</td>\n",
       "      <td>63 +/- 10 (42-78)</td>\n",
       "      <td>69 +/- 8 (55-80)</td>\n",
       "      <td>69 +/- 6 (64-73)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Education (years)</th>\n",
       "      <td>16 +/- 3 (12-24)</td>\n",
       "      <td>14 +/- 3 (8-20)</td>\n",
       "      <td>15 +/- 3 (9-22)</td>\n",
       "      <td>14 +/- 3 (8-18)</td>\n",
       "      <td>20 +/- nan (20-20)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Disease duration (years)</th>\n",
       "      <td>—</td>\n",
       "      <td>4 +/- 3 (0-10)</td>\n",
       "      <td>4 +/- 3 (0-10)</td>\n",
       "      <td>4 +/- 2 (0-7)</td>\n",
       "      <td>2 +/- 2 (1-4)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>UPDRS III</th>\n",
       "      <td>—</td>\n",
       "      <td>14 +/- 5 (6-22)</td>\n",
       "      <td>26 +/- 10 (9-51)</td>\n",
       "      <td>40 +/- 9 (29-56)</td>\n",
       "      <td>nan +/- nan (nan-nan)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MMSE</th>\n",
       "      <td>30 +/- 1 (28-30)</td>\n",
       "      <td>29 +/- 1 (26-30)</td>\n",
       "      <td>29 +/- 2 (22-30)</td>\n",
       "      <td>28 +/- 2 (26-30)</td>\n",
       "      <td>29 +/- 0 (29-29)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           Healthy controls               HY 1  \\\n",
       "Number                                   42                 17   \n",
       "Women / men                         14 / 28             7 / 10   \n",
       "Age (years)               61 +/- 12 (33-83)  62 +/- 10 (40-79)   \n",
       "Education (years)          16 +/- 3 (12-24)    14 +/- 3 (8-20)   \n",
       "Disease duration (years)                  —     4 +/- 3 (0-10)   \n",
       "UPDRS III                                 —    14 +/- 5 (6-22)   \n",
       "MMSE                       30 +/- 1 (28-30)   29 +/- 1 (26-30)   \n",
       "\n",
       "                                       HY 2              HY 3  \\\n",
       "Number                                   46                 9   \n",
       "Women / men                         15 / 31             4 / 5   \n",
       "Age (years)               63 +/- 10 (42-78)  69 +/- 8 (55-80)   \n",
       "Education (years)           15 +/- 3 (9-22)   14 +/- 3 (8-18)   \n",
       "Disease duration (years)     4 +/- 3 (0-10)     4 +/- 2 (0-7)   \n",
       "UPDRS III                  26 +/- 10 (9-51)  40 +/- 9 (29-56)   \n",
       "MMSE                       29 +/- 2 (22-30)  28 +/- 2 (26-30)   \n",
       "\n",
       "                                           HY 4  \n",
       "Number                                        2  \n",
       "Women / men                               1 / 1  \n",
       "Age (years)                    69 +/- 6 (64-73)  \n",
       "Education (years)            20 +/- nan (20-20)  \n",
       "Disease duration (years)          2 +/- 2 (1-4)  \n",
       "UPDRS III                 nan +/- nan (nan-nan)  \n",
       "MMSE                           29 +/- 0 (29-29)  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cohort_stats"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "713fd44e-d137-42da-97d5-9c30369d730b",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Image analysis\n",
    "\n",
    "Structural MRI analysis in the original paper is a straightforward VBM analysis implemented with SPM using the DARTEL toolbox. To replicate it, we mostly followed the excellent tutorial on VBM in SPM available at https://www.fil.ion.ucl.ac.uk/~john/misc/VBMclass15.pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb80d0ea-0767-40d3-a4c6-0af0024f65d0",
   "metadata": {},
   "source": [
    "### Imaging data download\n",
    "\n",
    "Let's first check if the Nifti image files associated with the subjects and visits selected in our cohort are available in cache:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b44e1a4-64c1-4444-875b-eef795c2b41d",
   "metadata": {},
   "source": [
    "We will now download the missing image files and move them to the data cache:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "e6e48c07-487d-48fd-b87b-20751618cae6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# utils.download_missing_nifti_files(cohort, link_in_outputs=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}