{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ede17860",
   "metadata": {},
   "source": [
    "# Replication: Hanganu *et al.*, 2014\n",
    "\n",
    "## Introduction\n",
    "\n",
    "This notebook attemps to replicate the following paper with the PPMI dataset:\n",
    "\n",
    "A. Hanganu et al. <a href=https://academic.oup.com/brain/article/137/4/1120/372146> “Mild cognitive impairment is linked with faster rate of cortical thinning in patients with Parkinson’s disease longitudinally”</a> Brain, vol. 137, no. 4, pp. 1120–1129, 2014.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7b736be",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "Abbreviations:\n",
    "\n",
    "GMV - gray matter volume <br>\n",
    "HC- healthy controls <br>\n",
    "MCI - mild cognitive impairement <br>\n",
    "MoCA - Montreal Cognitive Assessement <br>\n",
    "PD-MCI - Parkinson's disease with MCI <br>\n",
    "PD-non-MCI - Parkinson's disease without MCI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c592434a",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "This study involved 17 PD-MCI patients, 15 PD-non-MCI patients, and 18 HC. The demografic parameters were as follows (table extracted from the paper):\n",
    "\n",
    "<img src=\"images/demographics.png\"/>\n",
    "Table extracted from the paper A. Hanganu et al. <a href=https://academic.oup.com/brain/article/137/4/1120/372146> “Mild cognitive impairment is linked with faster rate of cortical thinning in patients with Parkinson’s disease longitudinally”</a> Brain, vol. 137, no. 4, pp. 1120–1129, 2014.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91cac3e7",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "The main results of the paper can be divided into three sections:\n",
    "    \n",
    "   1. **Difference in changes of cortical thickness between groups** \n",
    "    \n",
    "        - **PD-MCI** showed increased rate of overall cortical thinning compared with both PD-non-MCI and HC (Table 2).\n",
    "        \n",
    "            - **PD-MCI vs PD-non-MCI**: increased thinning was detected in the right temporal lobe (middle temporal gyrus, transverse temporal gyrus, temporal pole), the right insula, right inferior frontal gyrus and the right supplementary motor area (Figure 1). \n",
    "            \n",
    "            - **PD-MCI vs HC**: increased thinning was again detected in the right temporal lobe and right supplementary motor area. Additional significant clusters included the bilateral precuneus, bi- lateral cuneus, bilateral lingual, as well as right inferior parietal, right lateral occipital and left orbitofrontal region (Figure 1). \n",
    "            \n",
    "            - **PD-non-MCI vs HC**: an increased rate of thinning only in the left lateral occipital and left fusiform regions (Figure 1).\n",
    "            \n",
    "\n",
    "        \n",
    "<img src=\"images/org_results_figure_1.png\"/> \n",
    "<img src=\"images/org_results_table_2.png\"/>\n",
    "Figures extracted from the paper A. Hanganu et al. <a href=https://academic.oup.com/brain/article/137/4/1120/372146> “Mild cognitive impairment is linked with faster rate of cortical thinning in patients with Parkinson’s disease longitudinally”</a> Brain, vol. 137, no. 4, pp. 1120–1129, 2014.\n",
    "\n",
    "    \n",
    "   2. **Correlation between change of cortical thickness and MoCA** \n",
    "    \n",
    "        - a positive correlation between rate of change of cortical thickness and MoCA scores was observed when considering all of the patients with Parkinson’s disease (All-PD group), which was driven by the PD-MCI group. Significant clusters were revealed in the temporal lobe bilaterally, the right occipital medial lobe and the left postcentral gyrus. Clusters of negative correlation were revealed in the anterior cingulate region in the All Parkinson’s disease group and in the transverse temporal gyrus in PD-MCI (Figure 2).\n",
    "        \n",
    "<img src=\"images/org_results_figure_2.png\"/>\n",
    "Figure extracted from the paper A. Hanganu et al. <a href=https://academic.oup.com/brain/article/137/4/1120/372146> “Mild cognitive impairment is linked with faster rate of cortical thinning in patients with Parkinson’s disease longitudinally”</a> Brain, vol. 137, no. 4, pp. 1120–1129, 2014.\n",
    "\n",
    "    \n",
    "    \n",
    "   3. **Differences in change of subcortical volume between groups**\n",
    "    \n",
    "         - decreased volumes in both Parkinson’s disease groups of the thalamus, caudate nucleus, putamen and hippocampus. \n",
    "         \n",
    "         - a significant decrease in volume of the amygdala and nucleus accumbens was observed in the PD-MCI group vs both PD-non-MCI and HC (Table 2). ## note that this information is not consistent with Table 2. MD-MCI vs PD-non-MCI is significant only for Amyg\n",
    "    \n",
    "    \n",
    "   4. **Correlation between subcortical volume differences and MoCA** \n",
    "    \n",
    "        - a significant correlation between change of cognition over time and change of amygdala volume was identified in the All Parkinson’s disease group, and this was driven by the near-significant result observed in the Parkinson’s disease with MCI group (P = 0.059). Additionally both the Parkinson’s disease with MCI and All Parkinson’s disease groups revealed a correlation between cognition and the volume of the thalamus (Table 3).\n",
    "\n",
    "<img src=\"images/org_results_table_3.png\"/>\n",
    "Figure extracted from the paper A. Hanganu et al. <a href=https://academic.oup.com/brain/article/137/4/1120/372146> “Mild cognitive impairment is linked with faster rate of cortical thinning in patients with Parkinson’s disease longitudinally”</a> Brain, vol. 137, no. 4, pp. 1120–1129, 2014.\n",
    "\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "438b3bc1",
   "metadata": {},
   "source": [
    "The remainder of this notebook is an attempt to replicate this result using the PPMI dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52363b72",
   "metadata": {},
   "source": [
    "## Initial setup\n",
    "\n",
    "<!-- LivingPark notebooks use a *cache* directory to store analysis inputs and outputs. Inputs typically include PPMI Study Data and imaging data whereas outputs include processed images and other derivatives. Cache directories allow LivingPark notebooks to run in a few minutes as they reuse previously computed results. However, cache directories cannot be made public due to the PPMI Data Usage Agreement (DUA). Instead, they are stored on `login.bic.mni.mcgill.ca`, which requires a specific user name and password. In case you don't have access to the cache directory of this notebook, the next sections will download all the required imaging data from PPMI and recompute the results, which will take a few hours depending on your computer configuration. In the future, we will aim at storing this cache dataset on PPMI servers so that they can be accessed with a PPMI account. -->\n",
    "\n",
    "Let's initialize the notebook directory and software dependencies:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "bc75df34",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This notebook was run on 2023-02-27 17:48:03 UTC +0000\n"
     ]
    }
   ],
   "source": [
    "import livingpark_utils\n",
    "\n",
    "utils = livingpark_utils.LivingParkUtils()\n",
    "random_seed = 2\n",
    "utils.notebook_init()\n",
    "\n",
    "import warnings\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e5e5826",
   "metadata": {},
   "source": [
    "## PPMI cohort preparation\n",
    "\n",
    "We will build a PPMI cohort that matches the one used in the original study (Table 1) as much as possible. As in other LivingPark replications, we will use the same sample size as the original study. Our cohort will be built directly from PPMI Study Data files so that it can be replicated and updated whenever necessary."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e28da8f",
   "metadata": {},
   "source": [
    "### Study data download\n",
    "\n",
    "We will start by downloading the PPMI Study Data files required to build our cohort: \n",
    "* demographics\n",
    "* disease duration\n",
    "* Montreal Cognitive Assessment\n",
    "* UPDRS\n",
    "* Hoehn and Yahr score\n",
    "* primary clinical diagnosis\n",
    "* cognitive categorization\n",
    "* medical condition\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80f75e88",
   "metadata": {},
   "source": [
    "We will use the LivingPark utils library to download these files from the notebook. If files are already present in the notebook cache, they won't be downloaded again. Otherwise, you will need to enter your PPMI username and password. **In case you don't have a PPMI account, you can request one [here](http://ppmi-info.org).**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0e9d06ad",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This notebook was run on 2023-02-27 17:48:13 UTC +0000\n",
      "Download skipped: No missing files!\n"
     ]
    }
   ],
   "source": [
    "import livingpark_utils\n",
    "from livingpark_utils import download\n",
    "\n",
    "utils = livingpark_utils.LivingParkUtils()\n",
    "downloader = download.ppmi.Downloader(utils.study_files_dir)\n",
    "\n",
    "required_files = [\n",
    "    \"Demographics.csv\",\n",
    "    \"Age_at_visit.csv\",\n",
    "    \"Primary_Clinical_Diagnosis.csv\",\n",
    "    \"Cognitive_Categorization.csv\",\n",
    "    \"Medical_Conditions_Log.csv\",\n",
    "    \"Concomitant_Medication_Log.csv\",\n",
    "    \"MDS-UPDRS_Part_III.csv\",\n",
    "    \"Participant_Status.csv\",\n",
    "    \"Socio-Economics.csv\",\n",
    "    \"Montreal_Cognitive_Assessment__MoCA_.csv\",\n",
    "    \"PD_Diagnosis_History.csv\",\n",
    "    \"LEDD_Concomitant_Medication_Log.csv\",\n",
    "]\n",
    "\n",
    "utils.notebook_init()\n",
    "utils.get_study_files(required_files, default=downloader)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1febe243",
   "metadata": {},
   "source": [
    "We will also need file `MRI_info.csv` produced by another LivingPark notebook available at https://github.com/LivingPark-MRI/livingpark-utils/blob/main/livingpark_utils/notebooks/mri_metadata.ipynb. This file contains a list of T1-weighted MRIs usable for VBM. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "9dfe55b0",
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This notebook was run on 2023-02-27 17:48:18 UTC +0000\n",
      "['COR', 'Coronal', 'Cal Head 24', 'Transverse', 'tra_T1_MPRAGE', 'TRA']\n",
      "['AX', 'Ax', 'axial', 'Phantom', 'T2']\n",
      "{'Screening': 'SC', 'Baseline': 'BL', 'Month 6': 'V02', 'Month 12': 'V04', 'Month 24': 'V06', 'Month 36': 'V08', 'Month 48': 'V10', 'Symptomatic Therapy': 'ST', 'Unscheduled Visit 01': 'U01', 'Unscheduled Visit 02': 'U02', 'Premature Withdrawal': 'PW'}\n",
      "Saved in MRI_info.csv\n"
     ]
    }
   ],
   "source": [
    "from livingpark_utils.scripts import run\n",
    "from livingpark_utils.scripts import mri_metadata\n",
    "\n",
    "run.mri_metadata()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bd761d6",
   "metadata": {},
   "source": [
    "### Inclusion criteria\n",
    "\n",
    "To replicate the cohort in the original study, we used the following inclusion and exclusion criteria among PPMI subjects.\n",
    "\n",
    "1. Early stage of the illness (Hoehn and Yahr I and II stage).\n",
    "\n",
    "2. T1-weighted MRI available and usable for VBM (see [MRI metadata notebook](https://github.com/LivingPark-MRI/livingpark-utils/blob/main/livingpark_utils/notebooks/mri_metadata.ipynb)).\n",
    "\n",
    "3. Testing at two timepoints.\n",
    "\n",
    "4. MRI and MoCA data available.\n",
    "\n",
    "5. During this evaluation, all patients were OFF medication (at both time points), and did not receive any drugs related to Parkinson’s disease for at least 12h before the sessions.\n",
    "\n",
    "6. MCI inclusion criteria:\n",
    "   - (i) objective: performance > 1.5 SD below standardized mean on two or more subtests within a cognitive domain. \n",
    "   - (ii)\tsubjective complaint of cognitive decline.\n",
    "   - (iii)\tabsence of significant decline in daily living activities.\n",
    "   - (iv)\tabsence of dementia as diagnosed by the evaluating neuropsychologist.\n",
    "   - (v)\tevidence of cognitive abnormalities that cannot be attributed to age.\n",
    "\n",
    "    Our sample: met PPMI criteria for MCI\n",
    "\n",
    "\n",
    "7. Control group:  met PPMI criteria for healthy controls\n",
    "\n",
    "Healthy controls also underwent a neuropsychological assessment and those with MCI were excluded. \n",
    "\n",
    "\n",
    "### Exclusion criteria\n",
    "\n",
    "1. Patients excluded if they have other comorbidities.\n",
    "2. Cognitively stable patients who converted to MCI at the neuropsychological assessment at Time 2 were excluded.\n",
    "\n",
    "### Group matching\n",
    "\n",
    "•\tNo significant differences were observed between the three groups with respect to sex, age and education. <br>\n",
    "•\tNo significant differences existed between the two patients groups with respect to time since diagnosis or disease advancement as measured by the motor part of the Unified Parkinson’s Disease Rating Scale at Time 1.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e5acc2b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "\n",
    "# Read data files\n",
    "\n",
    "\n",
    "# Demographics\n",
    "dem = pd.read_csv(os.path.join(utils.study_files_dir, \"Demographics.csv\"))[\n",
    "    [\"PATNO\", \"SEX\", \"BIRTHDT\"]\n",
    "]\n",
    "\n",
    "# Age at visit\n",
    "age = pd.read_csv(os.path.join(utils.study_files_dir, \"Age_at_visit.csv\"))[\n",
    "    [\"PATNO\", \"EVENT_ID\", \"AGE_AT_VISIT\"]\n",
    "]\n",
    "\n",
    "# Education\n",
    "edu = pd.read_csv(os.path.join(utils.study_files_dir, \"Socio-Economics.csv\"))[\n",
    "    [\"PATNO\", \"EDUCYRS\"]\n",
    "]\n",
    "\n",
    "# Diagnosis\n",
    "diag = pd.read_csv(\n",
    "    os.path.join(utils.study_files_dir, \"Primary_Clinical_Diagnosis.csv\")\n",
    ")[[\"PATNO\", \"EVENT_ID\", \"PRIMDIAG\", \"OTHNEURO\"]]\n",
    "\n",
    "\n",
    "# Dx status\n",
    "dx_status = pd.read_csv(os.path.join(utils.study_files_dir, \"Participant_Status.csv\"))[\n",
    "    [\"PATNO\", \"COHORT\"]\n",
    "]\n",
    "\n",
    "# Disease duration / year of diagnosis\n",
    "# med_cond = pd.read_csv(os.path.join(utils.study_files_dir, \"Medical_Conditions_Log.csv\"))[\n",
    "#    [\"PATNO\", \"EVENT_ID\", \"MHDIAGYR\", \"MHCAT\"]\n",
    "# ]\n",
    "\n",
    "\n",
    "# PD dx history / disease duration calc\n",
    "pd_hist = pd.read_csv(os.path.join(utils.study_files_dir, \"PD_Diagnosis_History.csv\"))[\n",
    "    [\"PATNO\", \"EVENT_ID\", \"PDDXDT\"]\n",
    "]\n",
    "\n",
    "\n",
    "# Cognitive Categorization\n",
    "cog_cat = pd.read_csv(\n",
    "    os.path.join(utils.study_files_dir, \"Cognitive_Categorization.csv\")\n",
    ")[[\"PATNO\", \"EVENT_ID\", \"COGSTATE\"]]\n",
    "\n",
    "\n",
    "# Medication\n",
    "meds = (\n",
    "    pd.read_csv(os.path.join(utils.study_files_dir, \"Concomitant_Medication_Log.csv\"))[\n",
    "        [\"PATNO\", \"EVENT_ID\", \"SEQNO\", \"CMTRT\"]\n",
    "    ]\n",
    "    .groupby([\"PATNO\", \"EVENT_ID\"])[[\"CMTRT\"]]\n",
    "    .aggregate(lambda x: tuple(x))\n",
    ")  # aggregate all meds in a tuple\n",
    "\n",
    "# L-Dopa\n",
    "ldopa = pd.read_csv(\n",
    "    os.path.join(utils.study_files_dir, \"LEDD_Concomitant_Medication_Log.csv\")\n",
    ")[[\"PATNO\", \"EVENT_ID\", \"LEDD\"]]\n",
    "\n",
    "# meds = pd.read_csv(os.path.join(utils.study_files_dir, \"Concomitant_Medication_Log.csv\"))[\n",
    "#    [\"PATNO\", \"EVENT_ID\", \"SEQNO\", \"CMTRT\"]\n",
    "# ]\n",
    "\"\"\"\n",
    "# UPDRS and Hoehh Yahr\n",
    "updrs = pd.read_csv(os.path.join(utils.study_files_dir, \"MDS_UPDRS_Part_III.csv\"))[\n",
    "    [\"PATNO\", \"EVENT_ID\", \"PDSTATE\", \"NP3TOT\", \"NHY\"]\n",
    "]\n",
    "\"\"\"\n",
    "# Clean UPDRS file. Impute missing ON/OFF values.\n",
    "# It produces MDS_UPDRS_Part_III_clean.csv file\n",
    "# from livingpark_utils.scripts import pd_status\n",
    "\n",
    "updrs = pd.read_csv(\n",
    "    os.path.join(utils.study_files_dir, \"MDS_UPDRS_Part_III_clean.csv\")\n",
    ")[[\"PATNO\", \"EVENT_ID\", \"PDSTATE\", \"NP3TOT\", \"NHY\", \"PDTRTMNT\"]]\n",
    "\n",
    "# MoCA\n",
    "moca = pd.read_csv(\n",
    "    os.path.join(utils.study_files_dir, \"Montreal_Cognitive_Assessment__MoCA_.csv\")\n",
    ")[[\"PATNO\", \"EVENT_ID\", \"MCATOT\", \"INFODT\"]]\n",
    "# MoCA - Use screening instead of baseline.\n",
    "moca = moca[moca[\"EVENT_ID\"] != \"BL\"]\n",
    "moca[\"EVENT_ID\"].mask(moca[\"EVENT_ID\"] == \"SC\", \"BL\", inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "1e2c4910",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Calculate disease duration\n",
    "\n",
    "from dateutil.parser import parse\n",
    "from dateutil.relativedelta import relativedelta\n",
    "\n",
    "pdxhist = pd_hist[(pd_hist[\"EVENT_ID\"] == \"SC\") & pd_hist[\"PDDXDT\"].notna()]\n",
    "\n",
    "pd_dur = pd.read_csv(\n",
    "    os.path.join(utils.study_files_dir, \"MDS_UPDRS_Part_III_clean.csv\"),\n",
    "    low_memory=False,\n",
    ")[[\"PATNO\", \"EVENT_ID\", \"INFODT\"]]\n",
    "\n",
    "PDDXDT_map = dict(zip(pdxhist[\"PATNO\"].values, pdxhist[\"PDDXDT\"].values))\n",
    "pd_dur[\"PDDXDT\"] = pd_dur[\"PATNO\"].map(PDDXDT_map)\n",
    "\n",
    "pd_dur[\"PDXDUR\"] = pd_dur.apply(\n",
    "    lambda row: relativedelta(parse(row[\"INFODT\"]), parse(row[\"PDDXDT\"])).months\n",
    "    if row[\"PDDXDT\"] is not np.nan\n",
    "    else np.nan,\n",
    "    axis=1,\n",
    ")\n",
    "pd_dur.drop(labels=[\"INFODT\", \"PDDXDT\"], inplace=True, axis=1);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a0c03385",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "EVENT_ID\n",
       "BL     1961\n",
       "V10     442\n",
       "V04     344\n",
       "V06     332\n",
       "ST       10\n",
       "dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# MRI availability\n",
    "\n",
    "mri = pd.read_csv(os.path.join(utils.study_files_dir, \"MRI_info.csv\"))\n",
    "mri[\"EVENT_ID\"] = mri[\"Visit code\"]\n",
    "mri[\"PATNO\"] = mri[\"Subject ID\"]\n",
    "mri[\"Sex\"] = mri[\"Sex\"].map({\"F\": 0, \"M\": 1})\n",
    "mri = mri.drop([\"Subject ID\", \"Visit code\", \"Visit\", \"Age\", \"Sex\"], axis=1)\n",
    "mri.groupby(\"EVENT_ID\").size().sort_values(ascending=False).head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5cbf4f3",
   "metadata": {},
   "source": [
    "## Data aggregation\n",
    "\n",
    "Merge the data into a single dataframe\n",
    "- MRI scan,\n",
    "- Demographics (sex, age at visit),\n",
    "- Education (years of education),\n",
    "- Diagnosis (primary diagnosis),\n",
    "- Dx status (cohort),\n",
    "- Medical condition (year of dx, dx category),\n",
    "- Cognitive category (cognitive state),\n",
    "- Medication\n",
    "- UPDRS (ON/OFF state, UPDRS III total score, Hoehn and Yahr stage)\n",
    "- MoCA (total score)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "81b98e79",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Merge into a single df\n",
    "\n",
    "df = (\n",
    "    mri.merge(diag, on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(age, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(dem, on=[\"PATNO\"])\n",
    "    .merge(edu, on=[\"PATNO\"], how=\"left\")\n",
    "    .merge(dx_status, on=[\"PATNO\"])\n",
    "    .merge(pd_hist, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(cog_cat, on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(meds, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(updrs, on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(moca, on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(pd_dur, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .drop_duplicates()\n",
    "    .groupby(\"PATNO\")\n",
    "    .filter(lambda g: g[\"EVENT_ID\"].nunique() > 1)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0b0de92",
   "metadata": {},
   "source": [
    "# Pair visits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a941a1c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Find visit pairs\n",
    "\n",
    "from collections import defaultdict\n",
    "\n",
    "visit2month = {\n",
    "    \"BL\": 0,\n",
    "    \"V01\": 3,\n",
    "    \"V02\": 6,\n",
    "    \"V03\": 9,\n",
    "    \"V04\": 12,\n",
    "    \"V05\": 18,\n",
    "    \"V06\": 24,\n",
    "    \"V07\": 30,\n",
    "    \"V08\": 36,\n",
    "    \"V09\": 42,\n",
    "    \"V10\": 48,\n",
    "    \"V11\": 54,\n",
    "    \"V12\": 60,\n",
    "    \"V13\": 72,\n",
    "    \"V14\": 84,\n",
    "    \"V15\": 96,\n",
    "    \"V16\": 108,\n",
    "    \"V17\": 120,\n",
    "    \"V18\": 132,\n",
    "    \"V19\": 144,\n",
    "    \"V20\": 156,\n",
    "}\n",
    "\n",
    "\n",
    "def find_visit_pairs(months: int) -> int:\n",
    "    \"\"\"Return the pairs of visits closest to each other, given a target time difference in months.\"\"\"\n",
    "\n",
    "    diff = float(\"inf\")\n",
    "    diff_hist = defaultdict(dict)\n",
    "\n",
    "    for (k, v), (k_, v_) in combinations(visit2month.items(), 2):\n",
    "        if (diff_ := abs(abs(v - v_) - months)) <= diff:\n",
    "            diff = diff_\n",
    "            diff_hist[diff][k] = k_\n",
    "\n",
    "    return diff_hist[diff]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "ad1c45be",
   "metadata": {},
   "outputs": [],
   "source": [
    "def sample_cohort(df, /, *, n):\n",
    "    _df = df.drop_duplicates(subset=[\"PATNO\"])\n",
    "    n = min(_df.index.size, n)\n",
    "    return _df.sample(n=n, replace=False, random_state=1)\n",
    "    return _df[_df.index.isin(sample)]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5f863f7",
   "metadata": {},
   "source": [
    "# Select healthy controls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "57d3f98d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# diagnosis - use screening instead of baseline when PRIMDIAG is missing at baseline\n",
    "\n",
    "diag_bl = diag[diag[\"EVENT_ID\"] == \"BL\"]\n",
    "diag_other = diag[diag[\"EVENT_ID\"] != \"BL\"]\n",
    "diag_other[\"EVENT_ID\"].mask(diag_other[\"EVENT_ID\"] == \"SC\", \"BL\", inplace=True)\n",
    "\n",
    "diag_hc = pd.concat([diag_bl, diag_other])\n",
    "diag_hc = diag_hc.drop_duplicates()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "764ae541",
   "metadata": {},
   "outputs": [],
   "source": [
    "# merge into a single df\n",
    "\n",
    "df_hc = (\n",
    "    mri.merge(diag_hc, on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(age, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(dem, on=[\"PATNO\"], how=\"left\")\n",
    "    .merge(dx_status, on=[\"PATNO\"])\n",
    "    .merge(edu, on=[\"PATNO\"], how=\"left\")\n",
    "    .merge(cog_cat, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(moca, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(pd_hist, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(meds, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(updrs, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(moca, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(pd_dur, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .drop_duplicates()\n",
    "    .groupby(\"PATNO\")\n",
    "    .filter(lambda g: g[\"EVENT_ID\"].nunique() > 1)\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "12083a57",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unique HC subjects per visit pairs:\n",
      "BL  & V04 =  62 | Month difference: 12\n",
      "BL  & V06 =   9 | Month difference: 24\n",
      "BL  & V08 =   2 | Month difference: 36\n",
      "BL  & V10 =  11 | Month difference: 48\n",
      "V04 & V06 =   8 | Month difference: 12\n",
      "V04 & V08 =   1 | Month difference: 24\n",
      "V04 & V10 =  10 | Month difference: 36\n",
      "V06 & V08 =   2 | Month difference: 12\n",
      "V06 & V10 =   2 | Month difference: 24\n"
     ]
    }
   ],
   "source": [
    "# find how many visit pairs are available for HC group\n",
    "\n",
    "from itertools import combinations\n",
    "\n",
    "events = [\"BL\", \"V04\", \"V06\", \"V08\", \"V10\"]\n",
    "\n",
    "print(\"Unique HC subjects per visit pairs:\")\n",
    "for c in combinations(events, 2):\n",
    "    v0 = set(\n",
    "        df_hc[\n",
    "            (df_hc[\"EVENT_ID\"] == c[0])\n",
    "            & (df_hc[\"PRIMDIAG\"] == 17)\n",
    "            & (df_hc[\"COHORT\"] == 2)\n",
    "        ][\"PATNO\"].values\n",
    "    )\n",
    "    v1 = set(\n",
    "        df_hc[\n",
    "            (df_hc[\"EVENT_ID\"] == c[1])\n",
    "            & (df_hc[\"PRIMDIAG\"] == 17)\n",
    "            & (df_hc[\"COHORT\"] == 2)\n",
    "        ][\"PATNO\"].values\n",
    "    )\n",
    "    if len(v0 & v1):\n",
    "        print(\n",
    "            f\"{c[0]:3} & {c[1]:3} = {len(v0 & v1):>3}\"\n",
    "            f\" | Month difference: {visit2month[c[1]] - visit2month[c[0]]}\"\n",
    "        )\n",
    "#       print(v0 & v1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "35a45f41",
   "metadata": {},
   "outputs": [],
   "source": [
    "def pairs_hc(arg):\n",
    "\n",
    "    visit_pairs = find_visit_pairs(arg)\n",
    "    visit_df = df_hc.copy()\n",
    "    visit_df[\"NEXT_VISIT\"] = visit_df[\"EVENT_ID\"].map(visit_pairs)\n",
    "\n",
    "    visit_df = visit_df.merge(\n",
    "        visit_df.drop(\n",
    "            [\"AGE_AT_VISIT\", \"SEX\", \"NEXT_VISIT\", \"EDUCYRS\"],\n",
    "            axis=1,\n",
    "        ),\n",
    "        left_on=[\n",
    "            \"PATNO\",\n",
    "            \"NEXT_VISIT\",\n",
    "        ],\n",
    "        right_on=[\n",
    "            \"PATNO\",\n",
    "            \"EVENT_ID\",\n",
    "        ],\n",
    "        suffixes=(None, \"_NX\"),\n",
    "    ).drop_duplicates()\n",
    "\n",
    "    return visit_df.loc[\n",
    "        (visit_df[\"PRIMDIAG\"] == 17)\n",
    "        & (visit_df[\"COHORT\"] == 2)\n",
    "        & (visit_df[\"PRIMDIAG_NX\"] == 17)\n",
    "        & (visit_df[\"COHORT_NX\"] == 2)\n",
    "    ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "760a4823",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unique HC number before selection:  67\n",
      "18 HC subjects were randomly selected\n"
     ]
    }
   ],
   "source": [
    "# build database of all available HC\n",
    "hc_12 = pairs_hc(12)\n",
    "hc_24 = pairs_hc(24)\n",
    "hc = pd.concat([hc_12, hc_24], ignore_index=True)\n",
    "hc = hc.drop_duplicates(subset=[\"PATNO\"])\n",
    "print(\"Unique HC number before selection: \", hc[\"PATNO\"].unique().size)\n",
    "\n",
    "# select 18 HC patients\n",
    "\n",
    "hc = hc.sample(n=18, random_state=3)\n",
    "hc[\"dx_group\"] = \"HC\"\n",
    "print(len(hc), \"HC subjects were randomly selected\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1265eec2",
   "metadata": {},
   "source": [
    "## Data aggregation for PD\n",
    "\n",
    "Merge the data into a single dataframe\n",
    "- MRI scan,\n",
    "- Demographics (sex, age at visit),\n",
    "- Education (years of education),\n",
    "- Diagnosis (primary diagnosis),\n",
    "- Dx status (cohort),\n",
    "- Medical condition (year of dx, dx category),\n",
    "- Cognitive category (cognitive state),\n",
    "- Medication\n",
    "- UPDRS (ON/OFF state, UPDRS III total score, Hoehn and Yahr stage)\n",
    "- MoCA (total score)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "921e614e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Merge into a single df for PD\n",
    "\n",
    "df = (\n",
    "    mri.merge(diag, on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(age, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(dem, on=[\"PATNO\"])\n",
    "    .merge(edu, on=[\"PATNO\"], how=\"left\")\n",
    "    .merge(dx_status, on=[\"PATNO\"])\n",
    "    .merge(pd_hist, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(cog_cat, on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(meds, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .merge(updrs, on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(moca, on=[\"PATNO\", \"EVENT_ID\"])\n",
    "    .merge(pd_dur, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\")\n",
    "    .drop_duplicates()\n",
    "    .groupby(\"PATNO\")\n",
    "    .filter(lambda g: g[\"EVENT_ID\"].nunique() > 1)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e94d8a4",
   "metadata": {},
   "source": [
    "## Number of available visit pairs - PD meeting all the criteria"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "88d316e5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unique PD-non-MCI subjects per visit pairs:\n",
      "BL  & V04 =  27 | Month difference: 12\n",
      "V04 & V06 =  22 | Month difference: 12\n",
      "V04 & V10 =  24 | Month difference: 36\n",
      "V06 & V10 =  32 | Month difference: 24\n"
     ]
    }
   ],
   "source": [
    "# Pair PD-non-MCI\n",
    "# Include Primdiag 1 & Cogstate 1 & Cohort 1 & [NHY 1 | 2] & MoCA & no comorb & UPDRS OFF\n",
    "\n",
    "from itertools import combinations\n",
    "\n",
    "events = [\"BL\", \"V04\", \"V06\", \"V08\", \"V10\"]\n",
    "\n",
    "print(\"Unique PD-non-MCI subjects per visit pairs:\")\n",
    "for c in combinations(events, 2):\n",
    "    v0 = set(\n",
    "        df[\n",
    "            (df[\"EVENT_ID\"] == c[0])\n",
    "            & (df[\"PRIMDIAG\"] == 1)\n",
    "            & (df[\"COGSTATE\"] == 1)\n",
    "            & (df[\"COHORT\"] == 1)\n",
    "            & df[\"NHY\"].isin([\"1\", \"2\"])\n",
    "            & (df[\"MCATOT\"].notnull())\n",
    "            & (df[\"OTHNEURO\"].isnull())\n",
    "            & (df[\"PDSTATE\"] == \"OFF\")\n",
    "        ][\"PATNO\"].values\n",
    "    )\n",
    "    v1 = set(\n",
    "        df[\n",
    "            (df[\"EVENT_ID\"] == c[1])\n",
    "            & (df[\"PRIMDIAG\"] == 1)\n",
    "            & (df[\"COGSTATE\"] == 1)\n",
    "            & (df[\"COHORT\"] == 1)\n",
    "            & df[\"NHY\"].isin([\"1\", \"2\"])\n",
    "            & (df[\"MCATOT\"].notnull())\n",
    "            & (df[\"OTHNEURO\"].isnull())\n",
    "            & (df[\"PDSTATE\"] == \"OFF\")\n",
    "        ][\"PATNO\"].values\n",
    "    )\n",
    "    if len(v0 & v1):\n",
    "        print(\n",
    "            f\"{c[0]:3} & {c[1]:3} = {len(v0 & v1):>3}\"\n",
    "            f\" | Month difference: {visit2month[c[1]] - visit2month[c[0]]}\"\n",
    "        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "3c2bdd32",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unique PD-MCI subjects per visit pairs:\n",
      "BL  & V04 =   2 | Month difference: 12\n",
      "V04 & V06 =   1 | Month difference: 12\n",
      "V04 & V10 =   2 | Month difference: 36\n",
      "V06 & V10 =   5 | Month difference: 24\n"
     ]
    }
   ],
   "source": [
    "# Pair PD-MCI\n",
    "# Include Primdiag 1 & Cogstate 1 & Cohort 1 & [NHY 1 | 2] & MoCA & no comorb & UPDRS OFF\n",
    "\n",
    "from itertools import combinations\n",
    "\n",
    "events = [\"BL\", \"V04\", \"V06\", \"V08\", \"V10\"]\n",
    "\n",
    "print(\"Unique PD-MCI subjects per visit pairs:\")\n",
    "for c in combinations(events, 2):\n",
    "    v0 = set(\n",
    "        df[\n",
    "            (df[\"EVENT_ID\"] == c[0])\n",
    "            & (df[\"PRIMDIAG\"] == 1)\n",
    "            & (df[\"COGSTATE\"] == 2)\n",
    "            & (df[\"COHORT\"] == 1)\n",
    "            & df[\"NHY\"].isin([\"1\", \"2\"])\n",
    "            & (df[\"MCATOT\"].notnull())\n",
    "            & (df[\"OTHNEURO\"].isnull())\n",
    "            & (df[\"PDSTATE\"] == \"OFF\")\n",
    "        ][\"PATNO\"].values\n",
    "    )\n",
    "    v1 = set(\n",
    "        df[\n",
    "            (df[\"EVENT_ID\"] == c[1])\n",
    "            & (df[\"PRIMDIAG\"] == 1)\n",
    "            & (df[\"COGSTATE\"] == 2)\n",
    "            & (df[\"COHORT\"] == 1)\n",
    "            & df[\"NHY\"].isin([\"1\", \"2\"])\n",
    "            & (df[\"MCATOT\"].notnull())\n",
    "            & (df[\"OTHNEURO\"].isnull())\n",
    "            & (df[\"PDSTATE\"] == \"OFF\")\n",
    "        ][\"PATNO\"].values\n",
    "    )\n",
    "    if len(v0 & v1):\n",
    "        print(\n",
    "            f\"{c[0]:3} & {c[1]:3} = {len(v0 & v1):>3}\"\n",
    "            f\" | Month difference: {visit2month[c[1]] - visit2month[c[0]]}\"\n",
    "        )\n",
    "        # print(v0 & v1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a91e536",
   "metadata": {},
   "source": [
    "# Select PD-MCI patients\n",
    "\n",
    "This script has been written with a prior knowledge that there are less PD-MCI patients who meet the inclusion criteria than PD-non-MCI patients in PPMI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "ff16f546",
   "metadata": {},
   "outputs": [],
   "source": [
    "def pairs_mci(arg):\n",
    "\n",
    "    visit_pairs = find_visit_pairs(arg)\n",
    "    visit_df = df.copy()\n",
    "    visit_df[\"NEXT_VISIT\"] = visit_df[\"EVENT_ID\"].map(visit_pairs)\n",
    "\n",
    "    visit_df = visit_df.merge(\n",
    "        visit_df.drop(\n",
    "            [\"AGE_AT_VISIT\", \"SEX\", \"NEXT_VISIT\", \"EDUCYRS\"],\n",
    "            axis=1,\n",
    "        ),\n",
    "        left_on=[\n",
    "            \"PATNO\",\n",
    "            \"NEXT_VISIT\",\n",
    "        ],\n",
    "        right_on=[\n",
    "            \"PATNO\",\n",
    "            \"EVENT_ID\",\n",
    "        ],\n",
    "        suffixes=(None, \"_NX\"),\n",
    "    ).drop_duplicates()\n",
    "\n",
    "    return visit_df.loc[\n",
    "        (visit_df[\"COGSTATE\"] == 2)\n",
    "        & (visit_df[\"PRIMDIAG\"] == 1)\n",
    "        & (visit_df[\"COHORT\"] == 1)\n",
    "        & visit_df[\"NHY\"].isin([\"1\", \"2\"])\n",
    "        & (visit_df[\"MCATOT\"].notnull())\n",
    "        & (visit_df[\"OTHNEURO\"].isnull())\n",
    "        & (visit_df[\"PDSTATE\"] == \"OFF\")\n",
    "        & (visit_df[\"COGSTATE_NX\"] == 2)\n",
    "        & (visit_df[\"PRIMDIAG_NX\"] == 1)\n",
    "        & (visit_df[\"COHORT_NX\"] == 1)\n",
    "        & visit_df[\"NHY_NX\"].isin([\"1\", \"2\"])\n",
    "        & (visit_df[\"MCATOT_NX\"].notnull())\n",
    "        & (visit_df[\"OTHNEURO_NX\"].isnull())\n",
    "        & (visit_df[\"PDSTATE_NX\"] == \"OFF\")\n",
    "    ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "ce3823ef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The final PD-MCI sample size is  10  patients.\n"
     ]
    }
   ],
   "source": [
    "mci = pairs_mci(12)\n",
    "mci = mci.drop_duplicates(subset=[\"PATNO\"])\n",
    "\n",
    "if len(mci) > 16:\n",
    "    print(\"There is enough PD-MCI patients. \", len(mci), \" PD-MCI patients found\")\n",
    "else:\n",
    "    mci_24 = pairs_mci(24)\n",
    "    mci = pd.concat([mci, mci_24], ignore_index=True)\n",
    "    mci = mci.drop_duplicates(subset=[\"PATNO\"])\n",
    "\n",
    "    if len(mci) > 16:\n",
    "        print(\"There is enough PD-MCI patients. \", len(mci), \" PD-MCI patients found\")\n",
    "    else:\n",
    "        mci_36 = pairs_mci(36)\n",
    "        mci = pd.concat([mci, mci_36], ignore_index=True)\n",
    "        mci = mci.drop_duplicates(subset=[\"PATNO\"])\n",
    "        print(\"The final PD-MCI sample size is \", len(mci), \" patients.\")\n",
    "\n",
    "mci[\"dx_group\"] = \"PD-MCI\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7760bdf",
   "metadata": {},
   "source": [
    "# Select PD-non-MCI patients\n",
    "\n",
    "All available pairs for PD-non-MCI patients are selected in this step. They will be matched to a smaller PD-MCI cohort"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "e7d88d88",
   "metadata": {},
   "outputs": [],
   "source": [
    "def pairs_nonmci(arg):\n",
    "\n",
    "    visit_pairs = find_visit_pairs(arg)\n",
    "    visit_df = df.copy()\n",
    "    visit_df[\"NEXT_VISIT\"] = visit_df[\"EVENT_ID\"].map(visit_pairs)\n",
    "\n",
    "    visit_df = visit_df.merge(\n",
    "        visit_df.drop(\n",
    "            [\"AGE_AT_VISIT\", \"SEX\", \"NEXT_VISIT\", \"EDUCYRS\"],\n",
    "            axis=1,\n",
    "        ),\n",
    "        left_on=[\n",
    "            \"PATNO\",\n",
    "            \"NEXT_VISIT\",\n",
    "        ],\n",
    "        right_on=[\n",
    "            \"PATNO\",\n",
    "            \"EVENT_ID\",\n",
    "        ],\n",
    "        suffixes=(None, \"_NX\"),\n",
    "    ).drop_duplicates()\n",
    "\n",
    "    return visit_df.loc[\n",
    "        (visit_df[\"COGSTATE\"] == 1)\n",
    "        & (visit_df[\"PRIMDIAG\"] == 1)\n",
    "        & (visit_df[\"COHORT\"] == 1)\n",
    "        & visit_df[\"NHY\"].isin([\"1\", \"2\"])\n",
    "        & (visit_df[\"MCATOT\"].notnull())\n",
    "        & (visit_df[\"OTHNEURO\"].isnull())\n",
    "        & (visit_df[\"PDSTATE\"] == \"OFF\")\n",
    "        & (visit_df[\"COGSTATE_NX\"] == 1)\n",
    "        & (visit_df[\"PRIMDIAG_NX\"] == 1)\n",
    "        & (visit_df[\"COHORT_NX\"] == 1)\n",
    "        & visit_df[\"NHY_NX\"].isin([\"1\", \"2\"])\n",
    "        & (visit_df[\"MCATOT_NX\"].notnull())\n",
    "        & (visit_df[\"OTHNEURO_NX\"].isnull())\n",
    "        & (visit_df[\"PDSTATE_NX\"] == \"OFF\")\n",
    "    ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "e7f27671",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are  69  PD-non-MCI patients to match PD-MCI group.\n"
     ]
    }
   ],
   "source": [
    "wo_mci_12 = pairs_nonmci(12)\n",
    "wo_mci_24 = pairs_nonmci(24)\n",
    "wo_mci_all = pd.concat([wo_mci_12, wo_mci_24], ignore_index=True)\n",
    "wo_mci_all = wo_mci_all.drop_duplicates(subset=[\"PATNO\"])\n",
    "\n",
    "print(\"There are \", len(wo_mci_all), \" PD-non-MCI patients to match PD-MCI group.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1487ffb9",
   "metadata": {},
   "source": [
    "# Group matching\n",
    "\n",
    "Match PD-non-MCI sample to the smaller PD-MCI cohort\n",
    "\n",
    "We implemented a nearest-neighbor matching loop based on the Euclidean distance. \n",
    "We will match PD-MCI and PD-non-MCI groups for age, sex, education, UPDRS, and MoCA."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "be0acf37",
   "metadata": {},
   "outputs": [],
   "source": [
    "def nn(x, df, matched_vars):\n",
    "    \"\"\"\n",
    "    Find index of nearest neighbor of x in df\n",
    "    \"\"\"\n",
    "\n",
    "    # Select subjects\n",
    "    df_match = df\n",
    "\n",
    "    # Compute squared distance between x and all elements in df, using normalized variables\n",
    "    df_match[\"dist\"] = sum(\n",
    "        (df_match[f\"{var}\"] - x[f\"{var}\"].values[0]) ** 2 for var in matched_vars\n",
    "    )\n",
    "\n",
    "    # Return the element in df with the smallest distance\n",
    "    df_match.sort_values(\"dist\", inplace=True)\n",
    "    return df_match.head(\n",
    "        1\n",
    "    )  ## there's probably a better way to do it but it should work\n",
    "\n",
    "\n",
    "def match(n, group1_df, group2_df, matched_vars):\n",
    "    \"\"\"\n",
    "    Randomly pick n elements in group1_df, then find n matching elements in group2_df.\n",
    "    Ensure that each group only contains 1 or less element from each patient and that\n",
    "    no patient has elements in both groups.\n",
    "    \"\"\"\n",
    "\n",
    "    from numpy.random import choice, seed\n",
    "\n",
    "    # Select n random patients in group1\n",
    "    group1_patnos = sorted(pd.unique(group1_df[\"PATNO\"]))\n",
    "    seed(0)  # change this to bootstrap population\n",
    "    group1_patnos_sample = choice(\n",
    "        group1_patnos, n, replace=True\n",
    "    )  # !! have to replace due to a small cohort\n",
    "\n",
    "    # Remove the selected patients from group2\n",
    "    for p in group1_patnos_sample:\n",
    "        group2_df = group2_df[group2_df[\"PATNO\"] != p]\n",
    "\n",
    "    group1_matched = pd.DataFrame(columns=group1_df.columns)\n",
    "    group2_matched = pd.DataFrame(columns=group1_df.columns)\n",
    "\n",
    "    for p in group1_patnos_sample:  # for each patient in sampled list\n",
    "        # Pick a random element from this patient in group1\n",
    "        s = group1_df[group1_df[\"PATNO\"] == p].sample(1)\n",
    "        # Find the best match in group2\n",
    "        t = nn(s, group2_df, matched_vars)\n",
    "        # Add s and t to matched groups\n",
    "        group1_matched = group1_matched.append(s)\n",
    "        group2_matched = group2_matched.append(t)\n",
    "        # Remove t's patient from group 2 so that it doesn't get selected again\n",
    "        group2_df = group2_df[group2_df[\"PATNO\"] != t[\"PATNO\"].values[0]]\n",
    "\n",
    "    return group1_matched, group2_matched"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "a5d7cbfe",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There is enough PD-non-MCI patients.  15  patients included in the analysis.\n"
     ]
    }
   ],
   "source": [
    "# age is not used for matching. if used, the groups differ in MoCA and T2-T1 time\n",
    "\n",
    "matched_vars = [\"SEX\", \"MCATOT\", \"EDUCYRS\", \"NP3TOT\", \"PDXDUR\"]\n",
    "\n",
    "\n",
    "# Apply matching to get 15 PD-non-MCI patients\n",
    "mci_matched, wo_mci_matched = match(15, mci, wo_mci_all, matched_vars)\n",
    "patids = pd.unique(pd.concat([mci_matched, wo_mci_matched], axis=0)[\"PATNO\"])\n",
    "mci_matched = mci_matched.drop_duplicates(subset=[\"PATNO\"])\n",
    "\n",
    "if len(wo_mci_matched) > 14:\n",
    "    print(\n",
    "        \"There is enough PD-non-MCI patients. \",\n",
    "        len(wo_mci_matched),\n",
    "        \" patients included in the analysis.\",\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"There is not enough PD-non-MCI patients. \",\n",
    "        len(wo_mci_matched),\n",
    "        \" patients included in the analysis.\",\n",
    "    )\n",
    "\n",
    "wo_mci_matched[\"dx_group\"] = \"PD-non-MCI\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d99c0bc7",
   "metadata": {},
   "source": [
    "# Descriptive statistics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "b8ff232f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# get UPDRS for patients during ON state\n",
    "\n",
    "wo_mci_patno = wo_mci_matched[\"PATNO\"]\n",
    "wo_mci_patno = wo_mci_patno.tolist()\n",
    "wo_mci_updrs_on = updrs[[\"PATNO\", \"EVENT_ID\", \"PDSTATE\", \"NP3TOT\"]]\n",
    "wo_mci_updrs_on = wo_mci_updrs_on[\n",
    "    (wo_mci_updrs_on[\"PDSTATE\"] == \"ON\") & (wo_mci_updrs_on[\"PATNO\"].isin(wo_mci_patno))\n",
    "]\n",
    "\n",
    "mci_patno = mci_matched[\"PATNO\"]\n",
    "mci_patno = mci_patno.tolist()\n",
    "mci_updrs_on = updrs[[\"PATNO\", \"EVENT_ID\", \"PDSTATE\", \"NP3TOT\"]]\n",
    "mci_updrs_on = mci_updrs_on[\n",
    "    (mci_updrs_on[\"PDSTATE\"] == \"ON\") & (mci_updrs_on[\"PATNO\"].isin(mci_patno))\n",
    "]\n",
    "\n",
    "wo_mci_updrs = wo_mci_matched.merge(\n",
    "    wo_mci_updrs_on, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\", suffixes=(None, \"_on\")\n",
    ")\n",
    "mci_updrs = mci_matched.merge(\n",
    "    mci_updrs_on, on=[\"PATNO\", \"EVENT_ID\"], how=\"left\", suffixes=(None, \"_on\")\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "8b228aa5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# convert NHY and PATNO string to int\n",
    "mci_updrs[\"NHY\"] = pd.to_numeric(mci_updrs[\"NHY\"])\n",
    "mci_updrs[\"NHY_NX\"] = pd.to_numeric(mci_updrs[\"NHY_NX\"])\n",
    "wo_mci_updrs[\"NHY\"] = pd.to_numeric(wo_mci_updrs[\"NHY\"])\n",
    "wo_mci_updrs[\"NHY_NX\"] = pd.to_numeric(wo_mci_updrs[\"NHY_NX\"])\n",
    "hc[\"NHY\"] = pd.to_numeric(hc[\"NHY\"])\n",
    "hc[\"NHY_NX\"] = pd.to_numeric(hc[\"NHY_NX\"])\n",
    "\n",
    "mci_updrs[\"PATNO\"] = pd.to_numeric(mci_updrs[\"PATNO\"])\n",
    "wo_mci_updrs[\"PATNO\"] = pd.to_numeric(wo_mci_updrs[\"PATNO\"])\n",
    "\n",
    "mci_final = mci_updrs\n",
    "wo_mci_final = wo_mci_updrs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "2cc8e663",
   "metadata": {},
   "outputs": [],
   "source": [
    "# calculate Duration T2-T1 (months)\n",
    "# based on MoCA assessment\n",
    "\n",
    "mci_final[\"INFODT\"] = pd.to_datetime(mci_final[\"INFODT\"])\n",
    "mci_final[\"INFODT_NX\"] = pd.to_datetime(mci_final[\"INFODT_NX\"])\n",
    "wo_mci_final[\"INFODT\"] = pd.to_datetime(wo_mci_final[\"INFODT\"])\n",
    "wo_mci_final[\"INFODT_NX\"] = pd.to_datetime(wo_mci_final[\"INFODT_NX\"])\n",
    "hc[\"INFODT_x\"] = pd.to_datetime(hc[\"INFODT_x\"])\n",
    "hc[\"INFODT_x_NX\"] = pd.to_datetime(hc[\"INFODT_x_NX\"])\n",
    "\n",
    "mci_final[\"durationT2_T1\"] = (\n",
    "    mci_final[\"INFODT_NX\"] - mci_final[\"INFODT\"]\n",
    ") / np.timedelta64(1, \"M\")\n",
    "wo_mci_final[\"durationT2_T1\"] = (\n",
    "    wo_mci_final[\"INFODT_NX\"] - wo_mci_final[\"INFODT\"]\n",
    ") / np.timedelta64(1, \"M\")\n",
    "hc[\"durationT2_T1\"] = (hc[\"INFODT_x_NX\"] - hc[\"INFODT_x\"]) / np.timedelta64(1, \"M\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "010aa2d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections.abc import Iterable\n",
    "\n",
    "import rich\n",
    "from rich.console import Console\n",
    "from rich.table import Table\n",
    "\n",
    "\n",
    "def cohort_summary(*, hc, nc, mci, title):\n",
    "    def gen_row(D, *, agg, col, f=\"4.1f\", sep=\" ± \"):\n",
    "        if not isinstance(agg, str) and isinstance(agg, Iterable):\n",
    "            return [f\"{sep}\".join([f\"{d.loc[a][col]:{f}}\" for a in agg]) for d in D]\n",
    "        else:\n",
    "            return [f\"{d.loc[agg][col]:{f}}\" for d in D]\n",
    "\n",
    "    def gender_ratio(df):\n",
    "        male_count = df[df[\"SEX\"] == 1][\"PATNO\"].nunique()\n",
    "        return f\"{male_count:.0f}, {male_count / df['PATNO'].nunique() * 100:.1f}%\"\n",
    "\n",
    "    D = [hc.describe(), nc.describe(), mci.describe()]\n",
    "\n",
    "    table = Table(title=title, box=rich.box.SIMPLE_HEAVY, show_footer=True)\n",
    "\n",
    "    table.add_column(\"Subject groups\", footer=\"Values expressed as mean ± SD.\")\n",
    "    table.add_column(\"HC\")\n",
    "    table.add_column(\"PD-non-MCI\")\n",
    "    table.add_column(\"PD-MCI\")\n",
    "    # table.add_column(\"[italic]p\")  # TODO\n",
    "\n",
    "    table.add_row(\"n\", *gen_row(D, agg=\"count\", col=\"PATNO\", f=\".0f\"))\n",
    "    table.add_row(\"Age (y)\", *gen_row(D, agg=[\"mean\", \"std\"], col=\"AGE_AT_VISIT\"))\n",
    "    table.add_row(\n",
    "        \"Age range\", *gen_row(D, agg=[\"min\", \"max\"], col=\"AGE_AT_VISIT\", sep=\" - \")\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"Gender (male, %)\", gender_ratio(hc), gender_ratio(nc), gender_ratio(mci)\n",
    "    )\n",
    "    table.add_row(\"Education (y)\", *gen_row(D, agg=[\"mean\", \"std\"], col=\"EDUCYRS\"))\n",
    "    table.add_row(\n",
    "        \"Disease duration (y)\", \"\", *gen_row(D[1:], agg=[\"mean\", \"std\"], col=\"PDXDUR\")\n",
    "    )\n",
    "    # table.add_row(\"LEDD (mg/d) baseline\", )  # TODO\n",
    "\n",
    "    table.add_row(\"H&Y baseline\", *gen_row(D, agg=[\"mean\", \"std\"], col=\"NHY\"))\n",
    "    table.add_row(\"H&Y follow-up\", *gen_row(D, agg=[\"mean\", \"std\"], col=\"NHY_NX\"))\n",
    "    table.add_row(\n",
    "        \"UPDRS III OFF baseline\", \"\", *gen_row(D[1:], agg=[\"mean\", \"std\"], col=\"NP3TOT\")\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"UPDRS III ON baseline\",\n",
    "        \"\",\n",
    "        *gen_row(D[1:], agg=[\"mean\", \"std\"], col=\"NP3TOT_on\"),\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"UPDRS III OFF follow-up\",\n",
    "        \"\",\n",
    "        *gen_row(D[1:], agg=[\"mean\", \"std\"], col=\"NP3TOT_NX\"),\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"MoCA baseline\", \"\", *gen_row(D[1:], agg=[\"mean\", \"std\"], col=\"MCATOT\")\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"MoCA follow-up\", \"\", *gen_row(D[1:], agg=[\"mean\", \"std\"], col=\"MCATOT_NX\")\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"Duration T2 - T1 (m)\", *gen_row(D, agg=[\"mean\", \"std\"], col=\"durationT2_T1\")\n",
    "    )\n",
    "\n",
    "    console = Console()\n",
    "    console.print(table)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "db2f8af6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 5 out of 10 missing UPDRS T1 ON values in PD-MCI.\n",
      "There are 8 out of 15 missing UPDRS T1 ON values in PD-MCI.\n",
      "\n",
      " PD-MCI vs PD-non-MCI p values: sex p 1.0 edu p nan dx dur p 0.8983646622916739 updrs off t1 p 0.5450424659830888 updsr on t1 p 0.7236375666275758 moca p 0.05359221099902946 T2-T1 p 0.1105534586592145 \n",
      "\n",
      "There is a difference between PD-MCI and PD-non-MCI in sex frequency\n",
      "There is a difference between PD-MCI and PD-non-MCI in years of education\n",
      "There is no difference between PD-MCI and PD-non-MCI in disease duration\n",
      "There is no difference between PD-MCI and PD-non-MCI in UPDRS T1 OFF\n",
      "There is no difference between PD-MCI and PD-non-MCI in UPDRS T1 ON\n",
      "There is no difference between PD-MCI and PD-non-MCI in MoCA\n",
      "There is no difference between PD-MCI and PD-non-MCI in the time between T1 and T2\n",
      "\n",
      " HC vs PD p values: T2-T1 mci-hc 0.0003446193620317317 T2-T1 nonmci-hc 0.003927490083541382 age mci-hc 0.7625053437811584 age nonmci-hc 0.19923697998278472 \n",
      "\n",
      "There is a difference between HC and PD-MCI in the time between T1 and T2\n",
      "There is a difference between HC and PD-non-MCI in the time between T1 and T2\n",
      "There is no difference between HC and PD-MCI in age\n",
      "There is no difference between HC and PD-non-MCI in age\n"
     ]
    }
   ],
   "source": [
    "# test for group differences\n",
    "\n",
    "from scipy.stats import ttest_ind, chi2_contingency\n",
    "\n",
    "[chi2_sex, p_sex, dof_sex, expected_sex] = chi2_contingency(\n",
    "    mci_final[\"AGE_AT_VISIT\"], wo_mci_final[\"AGE_AT_VISIT\"]\n",
    ")\n",
    "\n",
    "[t_age, p_age] = ttest_ind(mci_final[\"AGE_AT_VISIT\"], wo_mci_final[\"AGE_AT_VISIT\"])\n",
    "[t_edu, p_edu] = ttest_ind(mci_final[\"EDUCYRS\"], wo_mci_final[\"EDUCYRS\"])\n",
    "[t_dur, p_dur] = ttest_ind(mci_final[\"PDXDUR\"], wo_mci_final[\"PDXDUR\"])\n",
    "[t_updrs_off, p_updrs_off] = ttest_ind(mci_final[\"NP3TOT\"], wo_mci_final[\"NP3TOT\"])\n",
    "print(\n",
    "    \"There are\",\n",
    "    mci_final[\"NP3TOT_on\"].isna().sum(),\n",
    "    \"out of\",\n",
    "    len(mci_final),\n",
    "    \"missing UPDRS T1 ON values in PD-MCI.\",\n",
    ")\n",
    "print(\n",
    "    \"There are\",\n",
    "    wo_mci_final[\"NP3TOT_on\"].isna().sum(),\n",
    "    \"out of\",\n",
    "    len(wo_mci_final),\n",
    "    \"missing UPDRS T1 ON values in PD-MCI.\",\n",
    ")\n",
    "[t_updrs_on, p_updrs_on] = ttest_ind(\n",
    "    mci_final[\"NP3TOT_on\"].dropna(), wo_mci_final[\"NP3TOT_on\"].dropna()\n",
    ")\n",
    "[t_moca, p_moca] = ttest_ind(mci_final[\"MCATOT\"], wo_mci_final[\"MCATOT\"])\n",
    "[t_tdiff, p_tdiff] = ttest_ind(\n",
    "    mci_final[\"durationT2_T1\"], wo_mci_final[\"durationT2_T1\"]\n",
    ")\n",
    "\n",
    "print(\n",
    "    \"\\n\",\n",
    "    \"PD-MCI vs PD-non-MCI p values:\",\n",
    "    \"sex p\",\n",
    "    p_sex,\n",
    "    \"edu p\",\n",
    "    p_edu,\n",
    "    \"dx dur p\",\n",
    "    p_dur,\n",
    "    \"updrs off t1 p\",\n",
    "    p_updrs_off,\n",
    "    \"updsr on t1 p\",\n",
    "    p_updrs_on,\n",
    "    \"moca p\",\n",
    "    p_moca,\n",
    "    \"T2-T1 p\",\n",
    "    p_tdiff,\n",
    "    \"\\n\",\n",
    ")\n",
    "\n",
    "if chi2_sex > 0.05:\n",
    "    print(\"There is no difference between PD-MCI and PD-non-MCI in sex frequency\")\n",
    "else:\n",
    "    print(\"There is a difference between PD-MCI and PD-non-MCI in sex frequency\")\n",
    "if p_edu > 0.05:\n",
    "    print(\"There is no difference between PD-MCI and PD-non-MCI in years of education\")\n",
    "else:\n",
    "    print(\"There is a difference between PD-MCI and PD-non-MCI in years of education\")\n",
    "if p_dur > 0.05:\n",
    "    print(\"There is no difference between PD-MCI and PD-non-MCI in disease duration\")\n",
    "else:\n",
    "    print(\"There is a difference between PD-MCI and PD-non-MCI in disease duration\")\n",
    "if p_updrs_off > 0.05:\n",
    "    print(\"There is no difference between PD-MCI and PD-non-MCI in UPDRS T1 OFF\")\n",
    "else:\n",
    "    print(\"There is a difference between PD-MCI and PD-non-MCI in UPDRS T1 OFF\")\n",
    "if p_updrs_on > 0.05:\n",
    "    print(\"There is no difference between PD-MCI and PD-non-MCI in UPDRS T1 ON\")\n",
    "else:\n",
    "    print(\"There is a difference between PD-MCI and PD-non-MCI in UPDRS T1 ON\")\n",
    "if p_moca > 0.05:\n",
    "    print(\"There is no difference between PD-MCI and PD-non-MCI in MoCA\")\n",
    "else:\n",
    "    print(\"There is a difference between PD-MCI and PD-non-MCI in MoCA\")\n",
    "if p_tdiff > 0.05:\n",
    "    print(\n",
    "        \"There is no difference between PD-MCI and PD-non-MCI in the time between T1 and T2\"\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"There is a difference between PD-MCI and PD-non-MCI in the time between T1 and T2\"\n",
    "    )\n",
    "\n",
    "\n",
    "[t_tdiff_mci_hc, p_tdiff_mci_hc] = ttest_ind(\n",
    "    mci_final[\"durationT2_T1\"], hc[\"durationT2_T1\"]\n",
    ")\n",
    "[t_tdiff_womci_hc, p_tdiff_womci_hc] = ttest_ind(\n",
    "    wo_mci_final[\"durationT2_T1\"], hc[\"durationT2_T1\"]\n",
    ")\n",
    "[t_age_mci_hc, p_age_mci_hc] = ttest_ind(mci_final[\"AGE_AT_VISIT\"], hc[\"AGE_AT_VISIT\"])\n",
    "[t_age_womci_hc, p_age_womci_hc] = ttest_ind(\n",
    "    wo_mci_final[\"AGE_AT_VISIT\"], hc[\"AGE_AT_VISIT\"]\n",
    ")\n",
    "\n",
    "\n",
    "print(\n",
    "    \"\\n\",\n",
    "    \"HC vs PD p values:\",\n",
    "    \"T2-T1 mci-hc\",\n",
    "    p_tdiff_mci_hc,\n",
    "    \"T2-T1 nonmci-hc\",\n",
    "    p_tdiff_womci_hc,\n",
    "    \"age mci-hc\",\n",
    "    p_age_mci_hc,\n",
    "    \"age nonmci-hc\",\n",
    "    p_age_womci_hc,\n",
    "    \"\\n\",\n",
    ")\n",
    "\n",
    "if p_tdiff_mci_hc > 0.05:\n",
    "    print(\"There is no difference between HC and PD-MCI in the time between T1 and T2\")\n",
    "else:\n",
    "    print(\"There is a difference between HC and PD-MCI in the time between T1 and T2\")\n",
    "if p_tdiff_womci_hc > 0.05:\n",
    "    print(\n",
    "        \"There is no difference between HC and PD-non-MCI in the time between T1 and T2\"\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"There is a difference between HC and PD-non-MCI in the time between T1 and T2\"\n",
    "    )\n",
    "if p_age_mci_hc > 0.05:\n",
    "    print(\"There is no difference between HC and PD-MCI in age\")\n",
    "else:\n",
    "    print(\"There is a difference between HC and PD-MCI in age\")\n",
    "if p_age_womci_hc > 0.05:\n",
    "    print(\"There is no difference between HC and PD-non-MCI in age\")\n",
    "else:\n",
    "    print(\"There is a difference between HC and PD-non-MCI in age\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "e3d2479a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-style: italic\">                  Demographic and clinical characteristics                  </span>\n",
       "                                                                            \n",
       " <span style=\"font-weight: bold\"> Subject groups                 </span> <span style=\"font-weight: bold\"> HC          </span> <span style=\"font-weight: bold\"> PD-non-MCI  </span> <span style=\"font-weight: bold\"> PD-MCI      </span> \n",
       " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \n",
       "  n                                18            15            10           \n",
       "  Age (y)                          66.9 ±  6.1   63.4 ±  9.4   67.6 ±  5.8  \n",
       "  Age range                        53.2 - 76.8   42.8 - 75.3   61.0 - 79.2  \n",
       "  Gender (male, %)                 15, 83.3%     10, 66.7%     10, 100.0%   \n",
       "  Education (y)                    16.1 ±  2.9   14.5 ±  1.9   14.6 ±  2.4  \n",
       "  Disease duration (y)                            5.1 ±  3.1    4.9 ±  3.3  \n",
       "  H&amp;Y baseline                      0.0 ±  0.0    1.7 ±  0.5    1.8 ±  0.4  \n",
       "  H&amp;Y follow-up                     0.1 ±  0.5    1.7 ±  0.5    1.8 ±  0.4  \n",
       "  UPDRS III OFF baseline                         22.3 ± 10.8   25.2 ± 13.0  \n",
       "  UPDRS III ON baseline                          17.7 ±  9.6   20.2 ± 14.3  \n",
       "  UPDRS III OFF follow-up                        25.7 ± 11.3   29.3 ± 13.6  \n",
       "  MoCA baseline                                  25.9 ±  1.8   24.3 ±  1.9  \n",
       "  MoCA follow-up                                 27.0 ±  1.7   24.5 ±  2.4  \n",
       "  Duration T2 - T1 (m)             14.5 ±  2.9   19.1 ±  5.4   23.9 ±  9.0  \n",
       " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \n",
       " <span style=\"font-weight: bold\"> Values expressed as mean ± SD. </span> <span style=\"font-weight: bold\">             </span> <span style=\"font-weight: bold\">             </span> <span style=\"font-weight: bold\">             </span> \n",
       "                                                                            \n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3m                  Demographic and clinical characteristics                  \u001b[0m\n",
       "                                                                            \n",
       " \u001b[1m \u001b[0m\u001b[1mSubject groups                \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mHC         \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mPD-non-MCI \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mPD-MCI     \u001b[0m\u001b[1m \u001b[0m \n",
       " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \n",
       "  n                                18            15            10           \n",
       "  Age (y)                          66.9 ±  6.1   63.4 ±  9.4   67.6 ±  5.8  \n",
       "  Age range                        53.2 - 76.8   42.8 - 75.3   61.0 - 79.2  \n",
       "  Gender (male, %)                 15, 83.3%     10, 66.7%     10, 100.0%   \n",
       "  Education (y)                    16.1 ±  2.9   14.5 ±  1.9   14.6 ±  2.4  \n",
       "  Disease duration (y)                            5.1 ±  3.1    4.9 ±  3.3  \n",
       "  H&Y baseline                      0.0 ±  0.0    1.7 ±  0.5    1.8 ±  0.4  \n",
       "  H&Y follow-up                     0.1 ±  0.5    1.7 ±  0.5    1.8 ±  0.4  \n",
       "  UPDRS III OFF baseline                         22.3 ± 10.8   25.2 ± 13.0  \n",
       "  UPDRS III ON baseline                          17.7 ±  9.6   20.2 ± 14.3  \n",
       "  UPDRS III OFF follow-up                        25.7 ± 11.3   29.3 ± 13.6  \n",
       "  MoCA baseline                                  25.9 ±  1.8   24.3 ±  1.9  \n",
       "  MoCA follow-up                                 27.0 ±  1.7   24.5 ±  2.4  \n",
       "  Duration T2 - T1 (m)             14.5 ±  2.9   19.1 ±  5.4   23.9 ±  9.0  \n",
       " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \n",
       " \u001b[1m \u001b[0m\u001b[1mValues expressed as mean ± SD.\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m           \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m           \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m           \u001b[0m\u001b[1m \u001b[0m \n",
       "                                                                            \n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "cohort_summary(\n",
    "    hc=hc,\n",
    "    nc=wo_mci_final,\n",
    "    mci=mci_final,\n",
    "    title=\"Demographic and clinical characteristics\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ec8000a",
   "metadata": {},
   "source": [
    "# Get MRI data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "9f0c177f",
   "metadata": {},
   "outputs": [],
   "source": [
    "merged_data = pd.concat([mci_final, wo_mci_final, hc], ignore_index=True)\n",
    "merged_data[\"visit_no\"] = \"first\"\n",
    "\n",
    "second_visit = merged_data\n",
    "second_visit = second_visit.drop([\"Description\", \"EVENT_ID\"], axis=1)\n",
    "second_visit.rename(\n",
    "    columns={\"Description_NX\": \"Description\", \"EVENT_ID_NX\": \"EVENT_ID\"}, inplace=True\n",
    ")\n",
    "second_visit[\"visit_no\"] = \"second\"\n",
    "download_data = merged_data.append(second_visit)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "901f5dad",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Download skipped: No missing files!\n"
     ]
    }
   ],
   "source": [
    "# download MRI data\n",
    "\n",
    "utils.get_T1_nifti_files(download_data, default=downloader, symlink=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "05be8413",
   "metadata": {},
   "outputs": [],
   "source": [
    "# rename PATNO to distinguish between the first and the second visit after preprocessing\n",
    "# PATNO is used as dir name during preprocessing\n",
    "\n",
    "download_data[\"PATNO_id\"] = (\n",
    "    download_data[\"PATNO\"].astype(str) + \"_\" + download_data[\"visit_no\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6a03d27",
   "metadata": {},
   "source": [
    "# Image preprocessing \n",
    "\n",
    "\n",
    "\n",
    "Data is preprocessed using Freesurfer's recon-all."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "179c10e8",
   "metadata": {},
   "source": [
    "**Segmentation**\n",
    "\n",
    "We only segment images for which an existing segmentation is not found in directory outputs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0db52ab",
   "metadata": {},
   "source": [
    "## Preprocessing pipeline by Hanganu et al.\n",
    "\n",
    "* Cortical reconstruction and volumetric segmentation was performed with FreeSurfer 5.3\n",
    "\n",
    "* motion correction and averaging of multiple volumetric T1-weighted images, removal of non-brain tissue using a hybrid surface deformation procedure, automated Talairach transformation, segmentation of the subcortical white matter and deep grey matter volumetric structures, intensity normalization, tessellation of the grey/white matter boundary, automated topology correction, and surface deformation following intensity gradients to optimally place the grey/ white and grey/cerebrospinal fluid borders at the location where the greatest shift in intensity defines the transition to the other tissue class. \n",
    "\n",
    "* images where automatically processed with the longitudinal stream. Specifically an unbiased within-subject template space and image was created using robust, inverse consistent registration. \n",
    "\n",
    "* Several processing steps, such as skull stripping, Talairach transforms, atlas registration as well as spherical surface maps and parcellations were then initialized with common information from the within-subject template.\n",
    "\n",
    "* Misclassification of tissue types was corrected by minimal manual adjustment. Cortical thickness was calculated as the closest distance from the grey/white matter boundary to the grey/cerebrospinal fluid boundary at each vertex on the tessellated surface.\n",
    "\n",
    "* Cortical thickness was smoothed with a 10-mm full-width half-height Gaussian kernel to reduce local variations in the measurements (# whole-brain correlation with MoCA).\n",
    "\n",
    "* For the analysis of subcortical longitudinal changes, the subcortical structures were segmented in order to obtain their volumes. The volumes were then corrected by performing a regression over the estimated total intracranial volume.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd792582",
   "metadata": {},
   "source": [
    "Analyses were performed on Compute Canada servers (Advanced Research Computing facilities provided by the Compute Canada Federation). If you don't have Compute Canada account you may be able to request one [here](https://ccdb.computecanada.ca/security/login).\n",
    "\n",
    "Otherwise, please use any other available server or your local machine to run the analyses. You may need to adjust the following code depending on the method you use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cfe6471a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# copy working directory to Compute Canada server\n",
    "\n",
    "# please provide the following:\n",
    "hostname = \"\"  # please change to the hostname you use\n",
    "username = \"\"  # please provide your Compute Canada username\n",
    "password = \"\"  # please provide your Compute Canada password\n",
    "\n",
    "from paramiko import SSHClient\n",
    "from scp import SCPClient\n",
    "\n",
    "ssh = SSHClient()\n",
    "ssh.load_system_host_keys()\n",
    "ssh.connect(\n",
    "    hostname=hostname,\n",
    "    username=username,\n",
    "    password=password,\n",
    ")\n",
    "\n",
    "\n",
    "# SCPCLient takes a paramiko transport as its only argument\n",
    "scp = SCPClient(ssh.get_transport())\n",
    "\n",
    "cdir = \"./\"\n",
    "scp.put(cdir, \"hanganu\", recursive=True)\n",
    "\n",
    "# scp.get('file_path_on_remote_machine', 'file_path_on_local_machine')\n",
    "\n",
    "scp.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90cf90ca",
   "metadata": {},
   "source": [
    "# Preprocessing \n",
    "\n",
    "Data are preprocessed using longitudinal stream implemented in Freesurfer. <a href=https://surfer.nmr.mgh.harvard.edu/fswiki/LongitudinalProcessing> Click here for details.</a> \n",
    "\n",
    "Preprocessing is performed on an external server. <a href=https://github.com/boutiques> Boutiques</a> is used to manage Freesurfer's functions within a container. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36af9583",
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext slurm_magic"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0dc858f",
   "metadata": {},
   "source": [
    "## preprocessing step 1 - cross-sectional"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a24c517",
   "metadata": {},
   "outputs": [],
   "source": [
    "# prepare node for computation. it includes downloading the container\n",
    "\n",
    "! module load singularity\n",
    "\n",
    "from boutiques import bosh\n",
    "\n",
    "zid = \"zenodo.4043546\"\n",
    "bosh([\"exec\", \"prepare\", zid])\n",
    "freesurfer = boutiques.descriptor2func.function(zid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1157471b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# exclude subjects from the dataset if the image was already preprocessed\n",
    "\n",
    "import os.path\n",
    "\n",
    "data_to_process = download_data\n",
    "\n",
    "for patno in data_to_process[\"PATNO\"]:\n",
    "\n",
    "    if os.path.exists(\"preproc_FS_codes/exitcode_{0}.txt\".format(patno)) == True:\n",
    "        exitcode = open(\"preproc_FS_codes/exitcode_{0}.txt\".format(patno), \"r\")\n",
    "\n",
    "        if exitcode.read() == \"0\":\n",
    "            data = data_to_process.drop[\"[0]\".format(patno)]\n",
    "            print(\"The image ID\", patno, \"has been preprocessed.\")\n",
    "        elif exitcode.read() != \"0\":\n",
    "            print(\"The image ID\", patno, \"has not been preprocessed successfully.\")\n",
    "\n",
    "    else:\n",
    "        print(\"The image ID\", patno, \"has not been preprocessed at all.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14f0ce28",
   "metadata": {},
   "outputs": [],
   "source": [
    "# save df with all timepoints as json\n",
    "\n",
    "import json\n",
    "\n",
    "download_data[\"PATNO_id\"] = download_data[\"PATNO_id\"].astype(str)\n",
    "\n",
    "data_to_process = download_data.reset_index()\n",
    "small_df = data_to_process[[\"PATNO_id\", \"File name\"]]\n",
    "json_data = small_df.to_json()\n",
    "meta = json.loads(json_data)\n",
    "with open(\"json_data.json\", \"w\") as fout:\n",
    "    json.dump(meta, fout, indent=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d06ed452",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile preprocess_FS.py\n",
    "\n",
    "# save proprocessing script to submit jobs to the server later \n",
    "# copy your FreeSurfer license to FS_license/license.txt or update the license path below\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.4043546\"\n",
    "from boutiques.descriptor2func import function\n",
    "#bosh([\"exec\", \"prepare\", zid])\n",
    "freesurfer = function(zid)\n",
    "\n",
    "task_id = str(os.environ[\"SLURM_ARRAY_TASK_ID\"])\n",
    "\n",
    "with open('json_data.json') as fin:\n",
    "    subject_map = json.load(fin)\n",
    "\n",
    "    \n",
    "out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        input=subject_map[\"File name\"][task_id], qcache_flag=True,\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                        subjid=str(subject_map[\"PATNO_id\"][task_id]),\n",
    "                                       )\n",
    "\n",
    "#exitcode = out.stdout\n",
    "exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "with open('preproc_FS_codes/exitcode_{0}.txt'.format(str(subject_map[\"PATNO_id\"][task_id])), 'w') as f:\n",
    "        f.write(exitcode)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10924389",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch --array=0-85\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=FS_preproc\n",
    "#SBATCH --mem=4G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_preproc.out\n",
    "#SBATCH --error=FS_preproc.err\n",
    "#SBATCH --time=10:0:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python preprocess_FS.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3b9f3c1",
   "metadata": {},
   "source": [
    "## preprocessing step 2 - base template"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1f7f1fcb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# save df with a single input for each subject as json\n",
    "\n",
    "import json\n",
    "\n",
    "download_data[\"PATNO\"] = download_data[\"PATNO\"].astype(str)\n",
    "\n",
    "data_to_process = download_data.reset_index()\n",
    "small_df = data_to_process[[\"PATNO\"]]\n",
    "small_df = small_df.drop_duplicates()\n",
    "json_data = small_df.to_json()\n",
    "meta = json.loads(json_data)\n",
    "with open(\"json_data_base.json\", \"w\") as fout:\n",
    "    json.dump(meta, fout, indent=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2acf2b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile preprocess_FS_base.py\n",
    "\n",
    "# Step 2. create an unbiased template from all time points for each subject and process it with recon-all:\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.7327935\"\n",
    "from boutiques.descriptor2func import function\n",
    "#bosh([\"exec\", \"prepare\", zid])\n",
    "freesurfer = function(zid)\n",
    "\n",
    "task_id = str(os.environ[\"SLURM_ARRAY_TASK_ID\"])\n",
    "\n",
    "with open('json_data_base.json') as fin:\n",
    "    subject_map = json.load(fin)\n",
    "\n",
    "out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        tp1=subject_map[\"PATNO\"+\"_first\"][task_id],\n",
    "                                        tp2=subject_map[\"PATNO\"+\"_second\"][task_id],\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                        outputdir=str(subject_map[\"PATNO\"+\"_base\"][task_id]),\n",
    "                                       )\n",
    "\n",
    "\n",
    "exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "with open('preproc_FS_codes/exitcode_base_{0}.txt'.format(str(subject_map[\"PATNO\"][task_id])), 'w') as f:\n",
    "        f.write(exitcode)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96da1698",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch --array=0-42\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=FS_preproc\n",
    "#SBATCH --mem=4G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_preproc_base.out\n",
    "#SBATCH --error=FS_preproc_base.err\n",
    "#SBATCH --time=10:0:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python preprocess_FS_base.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51b7cc21",
   "metadata": {},
   "source": [
    "## preprocessing step 3 - longitudinally processed timepoints"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d3469fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "# prepare node for computation\n",
    "# this should be run on a server connected to the internet\n",
    "# load recon-all base descriptor\n",
    "\n",
    "! module load singularity\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "\n",
    "zid = \"zenodo.7339689\"\n",
    "# bosh([\"exec\", \"prepare\", zid])\n",
    "freesurfer = boutiques.descriptor2func.function(zid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4dd20b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# save df with all timepoints and base as json\n",
    "\n",
    "import json\n",
    "\n",
    "download_data[\"PATNO\"] = download_data[\"PATNO\"].astype(str)\n",
    "download_data[\"PATNO_base\"] = download_data[\"PATNO\"] + \"_base\"\n",
    "download_data[\"PATNO_base\"] = download_data[\"PATNO_base\"].astype(str)\n",
    "\n",
    "data_to_process = download_data.reset_index()\n",
    "small_df = data_to_process[[\"PATNO_id\", \"PATNO_base\"]]\n",
    "json_data = small_df.to_json()\n",
    "meta = json.loads(json_data)\n",
    "with open(\"json_data_long.json\", \"w\") as fout:\n",
    "    json.dump(meta, fout, indent=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b1f76e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile preprocess_FS_long.py\n",
    "\n",
    "# Step 3. \"-long\" longitudinally process all timepoints (recon-all -long):\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.7339689\"\n",
    "from boutiques.descriptor2func import function\n",
    "#bosh([\"exec\", \"prepare\", zid])\n",
    "freesurfer = function(zid)\n",
    "\n",
    "task_id = str(os.environ[\"SLURM_ARRAY_TASK_ID\"])\n",
    "\n",
    "with open('json_data_long.json') as fin:\n",
    "    subject_map = json.load(fin)\n",
    "\n",
    "out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        tp=subject_map[\"PATNO_id\"][task_id],\n",
    "                                        base=subject_map[\"PATNO_base\"][task_id],\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                       )\n",
    "\n",
    "\n",
    "exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "with open('preproc_FS_codes/exitcode_long_{0}.txt'.format(str(subject_map[\"PATNO_id\"][task_id])), 'w') as f:\n",
    "        f.write(exitcode)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4af849c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch --array=0-85\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=FS_preproc\n",
    "#SBATCH --mem=4G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_preproc_long.out\n",
    "#SBATCH --error=FS_preproc_long.err\n",
    "#SBATCH --time=10:0:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python preprocess_FS_long.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f6ab06a",
   "metadata": {},
   "source": [
    "## preprocessing step 4 - Qcache"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9dbcdd18",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile preprocess_FS_long_qcache.py\n",
    "\n",
    "# save proprocessing script to submit jobs to the server later \n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.7343403\"\n",
    "from boutiques.descriptor2func import function\n",
    "#bosh([\"exec\", \"prepare\", zid])\n",
    "freesurfer = function(zid)\n",
    "\n",
    "task_id = str(os.environ[\"SLURM_ARRAY_TASK_ID\"])\n",
    "\n",
    "with open('json_data_long.json') as fin:\n",
    "    subject_map = json.load(fin)\n",
    "\n",
    "    \n",
    "out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        tp=subject_map[\"PATNO_id\"][task_id],\n",
    "                                        base=subject_map[\"PATNO_base\"][task_id],\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                       )\n",
    "\n",
    "#exitcode = out.stdout\n",
    "exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "with open('preproc_FS_codes/exitcode_longQcache_{0}.txt'.format(str(subject_map[\"PATNO_id\"][task_id])), 'w') as f:\n",
    "        f.write(exitcode)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f18c1649",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch --array=0-85\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=FS_preproc_longQcache\n",
    "#SBATCH --mem=4G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_preproc_longQcache.out\n",
    "#SBATCH --error=FS_preproc_longQcache.err\n",
    "#SBATCH --time=2:0:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python preprocess_FS_long_qcache.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b9c811d",
   "metadata": {},
   "source": [
    "# Image quality control"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2131d330",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import imageio as iio\n",
    "from pathlib import Path\n",
    "\n",
    "for view in [\"axial\", \"sagittal\", \"coronal\"]:\n",
    "    images = list()\n",
    "    for file in Path(\"segm/{view}\".format(view=view)).iterdir():\n",
    "        if not file.is_file():\n",
    "            continue\n",
    "\n",
    "        images.append(iio.imread(file))\n",
    "        iio.mimsave(\"segm_{view}.gif\".format(view=view), images, duration=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a071a48e",
   "metadata": {},
   "source": [
    "## Axial\n",
    "\n",
    "![SegmentLocal](images/segm_axial.gif \"segment\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99151f05",
   "metadata": {},
   "source": [
    "## Coronal\n",
    "\n",
    "![SegmentLocal](images/segm_coronal.gif \"segment\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acd5136d",
   "metadata": {},
   "source": [
    "## Sagittal\n",
    "\n",
    "![SegmentLocal](images/segm_sagittal.gif \"segment\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf4bdfea",
   "metadata": {},
   "source": [
    "# Postprocessing \n",
    "\n",
    "# ROI analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "d99773a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# calculate MoCA T1 - T2 difference\n",
    "merged_data[\"MCATOT_diff\"] = merged_data[\"MCATOT\"] - merged_data[\"MCATOT_NX\"]\n",
    "\n",
    "\n",
    "# extract cortical thickness and volumes\n",
    "\n",
    "aseg_table = pd.DataFrame()\n",
    "aseg_table[\"PATNO\"] = merged_data[\"PATNO\"]\n",
    "\n",
    "\n",
    "ROIs = [\n",
    "    \"Left-Thalamus\",\n",
    "    \"Left-Caudate\",\n",
    "    \"Left-Putamen\",\n",
    "    \"Left-Pallidum\",\n",
    "    \"Left-Hippocampus\",\n",
    "    \"Left-Amygdala\",\n",
    "    \"Left-Accumbens-area\",\n",
    "    \"Right-Thalamus\",\n",
    "    \"Right-Caudate\",\n",
    "    \"Right-Putamen\",\n",
    "    \"Right-Pallidum\",\n",
    "    \"Right-Hippocampus\",\n",
    "    \"Right-Amygdala\",\n",
    "    \"Right-Accumbens-area\",\n",
    "]\n",
    "\n",
    "\n",
    "for subj in aseg_table[\"PATNO\"]:\n",
    "    for session in [\"_first\", \"_second\"]:\n",
    "\n",
    "        # extract TIV\n",
    "        file = \"{subidd}{ses}.long.{subidd}_base/stats/aseg.stats\".format(\n",
    "            subidd=subj, ses=session\n",
    "        )\n",
    "        with open(file, \"r\") as fp:\n",
    "            # read all lines in a list\n",
    "            lines = fp.readlines()\n",
    "            for line in lines:\n",
    "                # check if string present on a current line\n",
    "                if line.find(\"Estimated Total Intracranial Volume\") != -1:\n",
    "                    aseg_table.loc[\n",
    "                        aseg_table[\"PATNO\"] == subj, \"TIV{ses}\".format(ses=session)\n",
    "                    ] = float(line.split(\",\")[3])\n",
    "\n",
    "                    # aseg_table[\"TIV{ses}\".format(ses = session)] = float(out)\n",
    "\n",
    "        # extract cortical thickness\n",
    "        for hemi in [\"lh\", \"rh\"]:\n",
    "            file = \"{subidd}{ses}.long.{subidd}_base/stats/{hemisphere}.aparc.a2009s.stats\".format(\n",
    "                subidd=subj, ses=session, hemisphere=hemi\n",
    "            )\n",
    "            with open(file, \"r\") as fp:\n",
    "                # read all lines in a list\n",
    "                lines = fp.readlines()\n",
    "                for line in lines:\n",
    "                    if line.find(\"Mean Thickness\") != -1:\n",
    "                        aseg_table.loc[\n",
    "                            aseg_table[\"PATNO\"] == subj,\n",
    "                            \"thickness_{hemi}{ses}\".format(hemi=hemi, ses=session),\n",
    "                        ] = float(line.split(\",\")[3])\n",
    "\n",
    "        aseg_table[\"thickness_bil{ses}\".format(ses=session)] = (\n",
    "            aseg_table[\"thickness_lh{ses}\".format(ses=session)]\n",
    "            + aseg_table[\"thickness_rh{ses}\".format(ses=session)]\n",
    "        ) / 2\n",
    "\n",
    "        # extract ROIs volume\n",
    "        for roi in ROIs:\n",
    "\n",
    "            file = \"{subidd}{ses}.long.{subidd}_base/stats/aseg.stats\".format(\n",
    "                subidd=subj, ses=session\n",
    "            )\n",
    "            with open(file, \"r\") as fp:\n",
    "                lines = fp.readlines()\n",
    "                for line in lines:\n",
    "                    if line.find(roi) != -1:\n",
    "                        aseg_table.loc[\n",
    "                            aseg_table[\"PATNO\"] == subj,\n",
    "                            roi + \"{ses}\".format(ses=session),\n",
    "                        ] = float(line.split()[3])\n",
    "\n",
    "        # calculate bilateral volume\n",
    "\n",
    "        ROIs_bil = [\n",
    "            \"Thalamus{ses}\".format(ses=session),\n",
    "            \"Caudate{ses}\".format(ses=session),\n",
    "            \"Putamen{ses}\".format(ses=session),\n",
    "            \"Pallidum{ses}\".format(ses=session),\n",
    "            \"Hippocampus{ses}\".format(ses=session),\n",
    "            \"Amygdala{ses}\".format(ses=session),\n",
    "            \"Accumbens-area{ses}\".format(ses=session),\n",
    "        ]\n",
    "\n",
    "        for roi_bil in ROIs_bil:\n",
    "            aseg_table[roi_bil] = (\n",
    "                aseg_table[\"Right-{0}\".format(roi_bil)]\n",
    "                + aseg_table[\"Left-{0}\".format(roi_bil)]\n",
    "            )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "74e9f70a",
   "metadata": {},
   "outputs": [],
   "source": [
    "ROI_data = merged_data.merge(aseg_table, on=[\"PATNO\"]).drop_duplicates()\n",
    "ROI_data[\"TIV_mean\"] = (ROI_data[\"TIV_first\"] + ROI_data[\"TIV_second\"]) / 2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f25bb5ec",
   "metadata": {},
   "source": [
    "For the analysis of subcortical longitudinal changes, the subcortical structures were segmented in order to obtain their volumes. **The volumes were then corrected by performing a regression over the estimated total intracranial volume.** Previous studies have shown that estimated total intracranial volume provides a robust method for head size correction that is equivalent to manual intracranial volume correction (Buckner et al., 2004) and the regression-based correctional method may provide advantages over the proportion- based method (Sanfilipo et al., 2004). \n",
    "\n",
    "\n",
    "After this, the differences between the two time points of corrected subcortical volumes were computed, and **additional corrections were performed using the averages of the two time points for each subject as a regression.** This was done to reduce the variability linked to the differences in the initial volumes for each participant. The percentage of change was computed with respect to the first time point for each subject. \n",
    "\n",
    "Significant differences between groups were computed using the Student’s t-test analysis and were set to P < 0.05. We also correlated the evolution of subcortical volumes between Time 1 and Time 2 and the evolution of Montreal Cognitive Assessment scores between Times 1 and 2. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "bcb84035",
   "metadata": {},
   "outputs": [],
   "source": [
    "# this calculates the change in subcortical volume and cortical thickness change\n",
    "\n",
    "from sklearn.linear_model import LinearRegression\n",
    "\n",
    "reg = LinearRegression()\n",
    "\n",
    "# correct the bilateral ROIs volumes by performing a regression over the estimated total intracranial volume\n",
    "\n",
    "ROIs_bil = [\n",
    "    \"Thalamus\",\n",
    "    \"Caudate\",\n",
    "    \"Putamen\",\n",
    "    \"Pallidum\",\n",
    "    \"Hippocampus\",\n",
    "    \"Amygdala\",\n",
    "    \"Accumbens-area\",\n",
    "]\n",
    "\n",
    "for roi_bil in ROIs_bil:\n",
    "    for session in [\"_first\", \"_second\"]:\n",
    "        x = ROI_data[\"TIV_mean\"].array.reshape(-1, 1)\n",
    "        y = ROI_data[roi_bil + session].array\n",
    "        reg.fit(x, y)\n",
    "        y_pred = reg.predict(x)\n",
    "        residual = y - y_pred\n",
    "        ROI_data[roi_bil + session + \"_pred\"] = residual\n",
    "\n",
    "    # calculate change in subcortical volume\n",
    "    ROI_data[roi_bil + \"_change\"] = (\n",
    "        ROI_data[roi_bil + \"_second_pred\"] - ROI_data[roi_bil + \"_first_pred\"]\n",
    "    )\n",
    "\n",
    "    # perform additional correction using the averages of the two time points\n",
    "    ROI_data[roi_bil + \"_meanT1T2\"] = (\n",
    "        ROI_data[roi_bil + \"_first_pred\"] + ROI_data[roi_bil + \"_second_pred\"]\n",
    "    ) / 2\n",
    "    x = ROI_data[\"{roi}_meanT1T2\".format(roi=roi_bil)].array.reshape(-1, 1)\n",
    "    y = ROI_data[roi_bil + \"_change\"].array\n",
    "    reg.fit(x, y)\n",
    "    y_pred = reg.predict(x)\n",
    "    residual = y - y_pred\n",
    "    ROI_data[roi_bil + \"_change_pred\"] = residual\n",
    "\n",
    "    # calculate percentage of change\n",
    "    #   ROI_data[roi_bil + \"_change_pred_pct\"] = (\n",
    "    #         ((ROI_data[roi_bil + \"_second_pred\"] - ROI_data[roi_bil + \"_first_pred\"]) /\n",
    "    #             ROI_data[roi_bil + \"_first_pred\"]) * 100\n",
    "    #     )\n",
    "    ROI_data[roi_bil + \"_change_pred_pct\"] = (\n",
    "        (ROI_data[roi_bil + \"_second\"] - ROI_data[roi_bil + \"_first\"])\n",
    "        / ROI_data[roi_bil + \"_first\"]\n",
    "    ) * 100\n",
    "\n",
    "# correct the unilateral ROIs volumes by performing a regression over the estimated total intracranial volume\n",
    "\n",
    "for roi_bil in ROIs_bil:\n",
    "    for side in [\"Right-\", \"Left-\"]:\n",
    "        for session in [\"_first\", \"_second\"]:\n",
    "            x = ROI_data[\"TIV_mean\"].array.reshape(-1, 1)\n",
    "            y = ROI_data[side + roi_bil + session].array\n",
    "            reg.fit(x, y)\n",
    "            y_pred = reg.predict(x)\n",
    "            residual = y - y_pred\n",
    "            ROI_data[side + roi_bil + session + \"_pred\"] = residual\n",
    "\n",
    "        # calculate change in subcortical volume\n",
    "        ROI_data[side + roi_bil + \"_change\"] = (\n",
    "            ROI_data[side + roi_bil + \"_second_pred\"]\n",
    "            - ROI_data[side + roi_bil + \"_first_pred\"]\n",
    "        )\n",
    "\n",
    "        # perform additional correction using the averages of the two time points\n",
    "        ROI_data[side + roi_bil + \"_meanT1T2\"] = (\n",
    "            ROI_data[side + roi_bil + \"_first_pred\"]\n",
    "            + ROI_data[side + roi_bil + \"_second_pred\"]\n",
    "        ) / 2\n",
    "        x = ROI_data[side + roi_bil + \"_meanT1T2\"].array.reshape(-1, 1)\n",
    "        y = ROI_data[side + roi_bil + \"_change\"].array\n",
    "        reg.fit(x, y)\n",
    "        y_pred = reg.predict(x)\n",
    "        residual = y - y_pred\n",
    "        ROI_data[side + roi_bil + \"_change_pred\"] = residual\n",
    "\n",
    "\n",
    "# calculate cortical thickness change index\n",
    "# formula: (Thickness at Time 1 - Thickness at Time 2)/(Time 2 - Time 1)\n",
    "ROI_data[\"durationT2_T1_y\"] = (\n",
    "    ROI_data[\"durationT2_T1\"] / 12\n",
    ")  # calculate difference in years\n",
    "ROI_data[\"thickness_change\"] = (\n",
    "    ROI_data[\"thickness_bil_first\"] - ROI_data[\"thickness_bil_second\"]\n",
    ") / ROI_data[\"durationT2_T1_y\"]\n",
    "\n",
    "ROI_data[\"thickness_change_pct\"] = (\n",
    "    (ROI_data[\"thickness_bil_second\"] - ROI_data[\"thickness_bil_first\"])\n",
    "    / ROI_data[\"thickness_bil_first\"]\n",
    ") * 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "a0a2d5d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Correlation analysis\n",
    "\n",
    "from scipy.stats import pearsonr\n",
    "\n",
    "ROIs_bil = [\n",
    "    \"Thalamus\",\n",
    "    \"Caudate\",\n",
    "    \"Putamen\",\n",
    "    \"Pallidum\",\n",
    "    \"Hippocampus\",\n",
    "    \"Amygdala\",\n",
    "    \"Accumbens-area\",\n",
    "]\n",
    "\n",
    "\n",
    "def correlation(data, yvar, xvars):\n",
    "    Y = data[yvar]\n",
    "    X = data[xvars]\n",
    "    [corr_r, corr_p] = pearsonr(Y, X)\n",
    "    return [corr_r, corr_p]\n",
    "\n",
    "\n",
    "ROI_PDMCI = ROI_data.loc[ROI_data[\"dx_group\"] == \"PD-MCI\"]\n",
    "ROI_PDnonMCI = ROI_data.loc[ROI_data[\"dx_group\"] == \"PD-non-MCI\"]\n",
    "ROI_PDall = pd.concat([ROI_PDMCI, ROI_PDnonMCI])\n",
    "corr = {}\n",
    "\n",
    "# MoCA x volume correlation\n",
    "for roi_bil in ROIs_bil:\n",
    "    for side in [\"Right-\", \"Left-\"]:\n",
    "        [\n",
    "            corr[\"corr_r_\" + side + roi_bil + \"_MoCA_PDMCI\"],\n",
    "            corr[\"corr_p_\" + side + roi_bil + \"_MoCA_PDMCI\"],\n",
    "        ] = correlation(\n",
    "            ROI_PDMCI, \"{s}{roi}_change_pred\".format(s=side, roi=roi_bil), \"MCATOT_diff\"\n",
    "        )\n",
    "        [\n",
    "            corr[\"corr_r_\" + side + roi_bil + \"_MoCA_PDnonMCI\"],\n",
    "            corr[\"corr_p_\" + side + roi_bil + \"_MoCA_PDnonMCI\"],\n",
    "        ] = correlation(\n",
    "            ROI_PDnonMCI,\n",
    "            \"{s}{roi}_change_pred\".format(s=side, roi=roi_bil),\n",
    "            \"MCATOT_diff\",\n",
    "        )\n",
    "        [\n",
    "            corr[\"corr_r_\" + side + roi_bil + \"_MoCA_PDall\"],\n",
    "            corr[\"corr_p_\" + side + roi_bil + \"_MoCA_PDall\"],\n",
    "        ] = correlation(\n",
    "            ROI_PDall, \"{s}{roi}_change_pred\".format(s=side, roi=roi_bil), \"MCATOT_diff\"\n",
    "        )\n",
    "\n",
    "# MoCA x thickness correlation\n",
    "[corr[\"corr_r_thick_MoCA_PDMCI\"], corr[\"corr_p_thick_MoCA_PDMCI\"]] = correlation(\n",
    "    ROI_PDMCI, \"thickness_change\", \"MCATOT_diff\"\n",
    ")\n",
    "[corr[\"corr_r_thick_MoCA_PDnonMCI\"], corr[\"corr_p_thick_MoCA_PDnonMCI\"]] = correlation(\n",
    "    ROI_PDnonMCI, \"thickness_change\", \"MCATOT_diff\"\n",
    ")\n",
    "[corr[\"corr_r_thick_MoCA_PDall\"], corr[\"corr_p_thick_MoCA_PDall\"]] = correlation(\n",
    "    ROI_PDall, \"thickness_change\", \"MCATOT_diff\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "71267253",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The p value of the correlation between MoCA score and right thalamus in PD-all group is 0.8861297955138382. The original significant correlation between the change of the right thalamus volume and MoCA scores in PD-all has not been replicated. The result is non-significant.\n",
      "\n",
      "The p value of the correlation between MoCA score and right thalamus in PD-MCI group is 0.8059076167947397. The original significant correlation between the change of the right thalamus volume and MoCA scores in PD-MCI has not been replicated. The result is non-significant.\n",
      "\n",
      "The p value of the correlation between MoCA score and right amygdala in PD-all group is 0.2549423489414147. The original significant correlation between the change of the right amygdala volume and MoCA scores in PD-all has not been replicated. The result is non-significant.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# check if the original correlation results have been replicated\n",
    "\n",
    "\n",
    "if corr[\"corr_p_Right-Thalamus_MoCA_PDall\"] < 0.05:\n",
    "    print(\n",
    "        \"The p value of the correlation between MoCA score and right thalamus in PD-all group is \"\n",
    "        + str(corr[\"corr_p_Right-Thalamus_MoCA_PDall\"])\n",
    "        + \". The original significant correlation between the change of the right thalamus volume and MoCA scores in PD-all has been replicated. The result is significant.\\n\"\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"The p value of the correlation between MoCA score and right thalamus in PD-all group is \"\n",
    "        + str(corr[\"corr_p_Right-Thalamus_MoCA_PDall\"])\n",
    "        + \". The original significant correlation between the change of the right thalamus volume and MoCA scores in PD-all has not been replicated. The result is non-significant.\\n\"\n",
    "    )\n",
    "\n",
    "if corr[\"corr_p_Right-Thalamus_MoCA_PDMCI\"] < 0.05:\n",
    "    print(\n",
    "        \"The p value of the correlation between MoCA score and right thalamus in PD-MCI group is \"\n",
    "        + str(corr[\"corr_p_Right-Thalamus_MoCA_PDMCI\"])\n",
    "        + \". The original significant correlation between the change of the right thalamus volume and MoCA scores in PD-MCI has been replicated. The result is significant.\\n\"\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"The p value of the correlation between MoCA score and right thalamus in PD-MCI group is \"\n",
    "        + str(corr[\"corr_p_Right-Thalamus_MoCA_PDMCI\"])\n",
    "        + \". The original significant correlation between the change of the right thalamus volume and MoCA scores in PD-MCI has not been replicated. The result is non-significant.\\n\"\n",
    "    )\n",
    "\n",
    "if corr[\"corr_p_Right-Amygdala_MoCA_PDall\"] < 0.05:\n",
    "    print(\n",
    "        \"The p value of the correlation between MoCA score and right amygdala in PD-all group is \"\n",
    "        + str(corr[\"corr_p_Right-Amygdala_MoCA_PDall\"])\n",
    "        + \". The original significant correlation between the change of the right amygdala volume and MoCA scores in PD-all has been replicated. The result is significant.\\n\"\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"The p value of the correlation between MoCA score and right amygdala in PD-all group is \"\n",
    "        + str(corr[\"corr_p_Right-Amygdala_MoCA_PDall\"])\n",
    "        + \". The original significant correlation between the change of the right amygdala volume and MoCA scores in PD-all has not been replicated. The result is non-significant.\\n\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "7ee68315",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Group differences in cortical thickness change and subcortical volume change\n",
    "\n",
    "variables = [\n",
    "    \"Thalamus_change_pred\",\n",
    "    \"Caudate_change_pred\",\n",
    "    \"Putamen_change_pred\",\n",
    "    \"Pallidum_change_pred\",\n",
    "    \"Hippocampus_change_pred\",\n",
    "    \"Amygdala_change_pred\",\n",
    "    \"Accumbens-area_change_pred\",\n",
    "    \"thickness_change\",\n",
    "]\n",
    "\n",
    "from scipy.stats import ttest_ind\n",
    "\n",
    "ROI_PDMCI = ROI_data.loc[ROI_data[\"dx_group\"] == \"PD-MCI\"]\n",
    "ROI_PDnonMCI = ROI_data.loc[ROI_data[\"dx_group\"] == \"PD-non-MCI\"]\n",
    "ROI_HC = ROI_data.loc[ROI_data[\"dx_group\"] == \"HC\"]\n",
    "ttest = {}\n",
    "\n",
    "for var in variables:\n",
    "    [\n",
    "        ttest[\"ttest_t_\" + var + \"_PDMCI_HC\"],\n",
    "        ttest[\"ttest_p_\" + var + \"_PDMCI_HC\"],\n",
    "    ] = ttest_ind(ROI_PDMCI[var], ROI_HC[var])\n",
    "    [\n",
    "        ttest[\"ttest_t_\" + var + \"_PDMCI_PDnonMCI\"],\n",
    "        ttest[\"ttest_p_\" + var + \"_PDMCI_PDnonMCI\"],\n",
    "    ] = ttest_ind(ROI_PDMCI[var], ROI_PDnonMCI[var])\n",
    "    [\n",
    "        ttest[\"ttest_t_\" + var + \"_PDnonMCI_HC\"],\n",
    "        ttest[\"ttest_p_\" + var + \"_PDnonMCI_HC\"],\n",
    "    ] = ttest_ind(ROI_PDnonMCI[var], ROI_HC[var])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "4e40d028",
   "metadata": {},
   "outputs": [],
   "source": [
    "def cohort_summary(title):\n",
    "\n",
    "    hc = ROI_data[ROI_data[\"dx_group\"] == \"HC\"]\n",
    "    nc = ROI_data[ROI_data[\"dx_group\"] == \"PD-non-MCI\"]\n",
    "    mci = ROI_data[ROI_data[\"dx_group\"] == \"PD-MCI\"]\n",
    "\n",
    "    table = Table(title=title, box=rich.box.SIMPLE_HEAVY, show_footer=True)\n",
    "\n",
    "    table.add_column(\"\")\n",
    "    table.add_column(\"HC mean\")\n",
    "    table.add_column(\"HC %\")\n",
    "    table.add_column(\"PD-non-MCI mean\")\n",
    "    table.add_column(\"PD-non-MCI %\")\n",
    "    table.add_column(\"PD-MCI mean\")\n",
    "    table.add_column(\"PD-MCI %\")\n",
    "    table.add_column(\"PD-MCI vs PD-non-MCI\")\n",
    "    table.add_column(\"PD-MCI vs HC\")\n",
    "    table.add_column(\"PD-non-MCI vs HC\")\n",
    "    table.add_row(\"ROI\", \"mean\", \"%\", \"mean\", \"%\", \"mean\", \"%\", \"p\", \"p\", \"p\")\n",
    "    table.add_row(\n",
    "        \"CoTh\",\n",
    "        str(hc[\"thickness_change\"].mean().round(decimals=3)),\n",
    "        str(hc[\"thickness_change_pct\"].mean().round(decimals=3)),\n",
    "        str(nc[\"thickness_change\"].mean().round(decimals=3)),\n",
    "        str(nc[\"thickness_change_pct\"].mean().round(decimals=3)),\n",
    "        str(mci[\"thickness_change\"].mean().round(decimals=3)),\n",
    "        str(mci[\"thickness_change_pct\"].mean().round(decimals=3)),\n",
    "        str(ttest[\"ttest_p_thickness_change_PDMCI_PDnonMCI\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_thickness_change_PDMCI_HC\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_thickness_change_PDnonMCI_HC\"].round(decimals=2)),\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"Thal\",\n",
    "        str(hc[\"Thalamus_change_pred\"].mean().round(decimals=2)),\n",
    "        str(hc[\"Thalamus_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Thalamus_change_pred\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Thalamus_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Thalamus_change_pred\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Thalamus_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Thalamus_change_pred_PDMCI_PDnonMCI\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Thalamus_change_pred_PDMCI_HC\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Thalamus_change_pred_PDnonMCI_HC\"].round(decimals=2)),\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"Caud\",\n",
    "        str(hc[\"Caudate_change_pred\"].mean().round(decimals=2)),\n",
    "        str(hc[\"Caudate_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Caudate_change_pred\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Caudate_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Caudate_change_pred\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Caudate_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Caudate_change_pred_PDMCI_PDnonMCI\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Caudate_change_pred_PDMCI_HC\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Caudate_change_pred_PDnonMCI_HC\"].round(decimals=2)),\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"Put\",\n",
    "        str(hc[\"Putamen_change_pred\"].mean().round(decimals=2)),\n",
    "        str(hc[\"Putamen_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Putamen_change_pred\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Putamen_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Putamen_change_pred\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Putamen_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Putamen_change_pred_PDMCI_PDnonMCI\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Putamen_change_pred_PDMCI_HC\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Putamen_change_pred_PDnonMCI_HC\"].round(decimals=2)),\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"Hipp\",\n",
    "        str(hc[\"Hippocampus_change_pred\"].mean().round(decimals=2)),\n",
    "        str(hc[\"Hippocampus_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Hippocampus_change_pred\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Hippocampus_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Hippocampus_change_pred\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Hippocampus_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Hippocampus_change_pred_PDMCI_PDnonMCI\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Hippocampus_change_pred_PDMCI_HC\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Hippocampus_change_pred_PDnonMCI_HC\"].round(decimals=2)),\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"Amyg\",\n",
    "        str(hc[\"Amygdala_change_pred\"].mean().round(decimals=2)),\n",
    "        str(hc[\"Amygdala_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Amygdala_change_pred\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Amygdala_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Amygdala_change_pred\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Amygdala_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Amygdala_change_pred_PDMCI_PDnonMCI\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Amygdala_change_pred_PDMCI_HC\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Amygdala_change_pred_PDnonMCI_HC\"].round(decimals=2)),\n",
    "    )\n",
    "    table.add_row(\n",
    "        \"Nacc\",\n",
    "        str(hc[\"Accumbens-area_change_pred\"].mean().round(decimals=2)),\n",
    "        str(hc[\"Accumbens-area_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Accumbens-area_change_pred\"].mean().round(decimals=2)),\n",
    "        str(nc[\"Accumbens-area_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Accumbens-area_change_pred\"].mean().round(decimals=2)),\n",
    "        str(mci[\"Accumbens-area_change_pred_pct\"].mean().round(decimals=2)),\n",
    "        str(\n",
    "            ttest[\"ttest_p_Accumbens-area_change_pred_PDMCI_PDnonMCI\"].round(decimals=2)\n",
    "        ),\n",
    "        str(ttest[\"ttest_p_Accumbens-area_change_pred_PDMCI_HC\"].round(decimals=2)),\n",
    "        str(ttest[\"ttest_p_Accumbens-area_change_pred_PDnonMCI_HC\"].round(decimals=2)),\n",
    "    )\n",
    "\n",
    "    console = Console()\n",
    "    console.print(table)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "bccea90a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-style: italic\">                                      Longitudinal differences between groups                                      </span>\n",
       "<span style=\"font-style: italic\">                                           Mean and percentage of change                                           </span>\n",
       "                                                                                                                   \n",
       " <span style=\"font-weight: bold\">      </span> <span style=\"font-weight: bold\">         </span> <span style=\"font-weight: bold\">        </span> <span style=\"font-weight: bold\"> PD-non-MCI </span> <span style=\"font-weight: bold\"> PD-non-MCI </span> <span style=\"font-weight: bold\"> PD-MCI     </span> <span style=\"font-weight: bold\">          </span> <span style=\"font-weight: bold\"> PD-MCI vs  </span> <span style=\"font-weight: bold\"> PD-MCI vs </span> <span style=\"font-weight: bold\"> PD-non-MCI </span> \n",
       " <span style=\"font-weight: bold\">      </span> <span style=\"font-weight: bold\"> HC mean </span> <span style=\"font-weight: bold\"> HC %   </span> <span style=\"font-weight: bold\"> mean       </span> <span style=\"font-weight: bold\"> %          </span> <span style=\"font-weight: bold\"> mean       </span> <span style=\"font-weight: bold\"> PD-MCI % </span> <span style=\"font-weight: bold\"> PD-non-MCI </span> <span style=\"font-weight: bold\"> HC        </span> <span style=\"font-weight: bold\"> vs HC      </span> \n",
       " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \n",
       "  ROI    mean      %        mean         %            mean         %          p            p           p           \n",
       "  CoTh   0.006     -0.211   -0.0         -0.114       0.006        -0.57      0.44         1.0         0.5         \n",
       "  Thal   -51.26    -0.68    210.34       1.25         -223.24      -2.56      0.01         0.28        0.09        \n",
       "  Caud   -9.36     -0.2     -12.99       -0.25        36.33        0.51       0.63         0.56        0.95        \n",
       "  Put    -86.9     -0.92    115.21       1.03         -16.38       -0.66      0.21         0.26        0.02        \n",
       "  Hipp   24.22     -0.49    4.56         -1.14        -50.43       -1.76      0.43         0.22        0.74        \n",
       "  Amyg   25.98     -0.65    -5.15        -2.24        -39.03       -4.97      0.57         0.2         0.5         \n",
       "  Nacc   8.04      0.23     -3.06        -0.82        -9.89        -1.23      0.81         0.52        0.58        \n",
       " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \n",
       " <span style=\"font-weight: bold\">      </span> <span style=\"font-weight: bold\">         </span> <span style=\"font-weight: bold\">        </span> <span style=\"font-weight: bold\">            </span> <span style=\"font-weight: bold\">            </span> <span style=\"font-weight: bold\">            </span> <span style=\"font-weight: bold\">          </span> <span style=\"font-weight: bold\">            </span> <span style=\"font-weight: bold\">           </span> <span style=\"font-weight: bold\">            </span> \n",
       "                                                                                                                   \n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3m                                      Longitudinal differences between groups                                      \u001b[0m\n",
       "\u001b[3m                                           Mean and percentage of change                                           \u001b[0m\n",
       "                                                                                                                   \n",
       " \u001b[1m      \u001b[0m \u001b[1m         \u001b[0m \u001b[1m        \u001b[0m \u001b[1m \u001b[0m\u001b[1mPD-non-MCI\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mPD-non-MCI\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mPD-MCI    \u001b[0m\u001b[1m \u001b[0m \u001b[1m          \u001b[0m \u001b[1m \u001b[0m\u001b[1mPD-MCI vs \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mPD-MCI vs\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mPD-non-MCI\u001b[0m\u001b[1m \u001b[0m \n",
       " \u001b[1m \u001b[0m\u001b[1m    \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mHC mean\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mHC %  \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mmean      \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m%         \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mmean      \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mPD-MCI %\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mPD-non-MCI\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mHC       \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mvs HC     \u001b[0m\u001b[1m \u001b[0m \n",
       " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \n",
       "  ROI    mean      %        mean         %            mean         %          p            p           p           \n",
       "  CoTh   0.006     -0.211   -0.0         -0.114       0.006        -0.57      0.44         1.0         0.5         \n",
       "  Thal   -51.26    -0.68    210.34       1.25         -223.24      -2.56      0.01         0.28        0.09        \n",
       "  Caud   -9.36     -0.2     -12.99       -0.25        36.33        0.51       0.63         0.56        0.95        \n",
       "  Put    -86.9     -0.92    115.21       1.03         -16.38       -0.66      0.21         0.26        0.02        \n",
       "  Hipp   24.22     -0.49    4.56         -1.14        -50.43       -1.76      0.43         0.22        0.74        \n",
       "  Amyg   25.98     -0.65    -5.15        -2.24        -39.03       -4.97      0.57         0.2         0.5         \n",
       "  Nacc   8.04      0.23     -3.06        -0.82        -9.89        -1.23      0.81         0.52        0.58        \n",
       " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \n",
       " \u001b[1m \u001b[0m\u001b[1m    \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m       \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m      \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m          \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m          \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m          \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m        \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m          \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m         \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m          \u001b[0m\u001b[1m \u001b[0m \n",
       "                                                                                                                   \n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Amyg, amygdala; Caud, caudate; CoTh, cortical thickness; Hipp, hippocampus; Nacc, nucleus accumbens; Put, putamen; Thal, thalamus\n"
     ]
    }
   ],
   "source": [
    "cohort_summary(\n",
    "    title=\"Longitudinal differences between groups\\n Mean and percentage of change\",\n",
    ")\n",
    "print(\n",
    "    \"Amyg, amygdala; Caud, caudate; CoTh, cortical thickness; Hipp, hippocampus; Nacc, nucleus accumbens; Put, putamen; Thal, thalamus\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "3db0ce00",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The p value of the difference in cortical thinkness change between PD-MCI and HC is 0.9979758052157945. The original significant difference has not been replicated. The result is non-significant.\n",
      "\n",
      "The p value of the difference in cortical thinkness change between PD-MCI and PD-non-MCI is 0.44316857788867625. The original significant difference has not been replicated. The result is non-significant.\n",
      "\n",
      "The p value of the difference in change of the bilateral amygdala volume between PD-MCI and HC is 0.20002293252735706. The original significant difference has not been replicated. The result is non-significant.\n",
      "\n",
      "The p value of the difference in change of the bilateral amygdala volume between PD-MCI and PD-non-MCI is 0.5742146221493709. The original significant difference has not been replicated. The result is non-significant.\n",
      "\n",
      "The p value of the difference in change of the bilateral nucleus accumbens volume between PD-MCI and HC is 0.5201460743279115. The original significant difference has not been replicated. The result is non-significant.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# check if the original results have been replicated\n",
    "\n",
    "\n",
    "if ttest[\"ttest_p_thickness_change_PDMCI_HC\"] < 0.05:\n",
    "    print(\n",
    "        \"The p value of the difference in cortical thinkness change between PD-MCI and HC is \"\n",
    "        + str(ttest[\"ttest_p_thickness_change_PDMCI_HC\"])\n",
    "        + \". The original significant difference has been replicated. The result is significant.\\n\"\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"The p value of the difference in cortical thinkness change between PD-MCI and HC is \"\n",
    "        + str(ttest[\"ttest_p_thickness_change_PDMCI_HC\"])\n",
    "        + \". The original significant difference has not been replicated. The result is non-significant.\\n\"\n",
    "    )\n",
    "\n",
    "if ttest[\"ttest_p_thickness_change_PDMCI_PDnonMCI\"] < 0.05:\n",
    "    print(\n",
    "        \"The p value of the difference in cortical thinkness change between PD-MCI and PD-non-MCI is \"\n",
    "        + str(ttest[\"ttest_p_thickness_change_PDMCI_PDnonMCI\"])\n",
    "        + \". The original significant difference has been replicated. The result is significant.\\n\"\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"The p value of the difference in cortical thinkness change between PD-MCI and PD-non-MCI is \"\n",
    "        + str(ttest[\"ttest_p_thickness_change_PDMCI_PDnonMCI\"])\n",
    "        + \". The original significant difference has not been replicated. The result is non-significant.\\n\"\n",
    "    )\n",
    "\n",
    "if ttest[\"ttest_p_Amygdala_change_pred_PDMCI_HC\"] < 0.05:\n",
    "    print(\n",
    "        \"The p value of the difference in change of the bilateral amygdala volume between PD-MCI and HC is \"\n",
    "        + str(ttest[\"ttest_p_Amygdala_change_pred_PDMCI_HC\"])\n",
    "        + \". The original significant difference has been replicated. The result is significant.\\n\"\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"The p value of the difference in change of the bilateral amygdala volume between PD-MCI and HC is \"\n",
    "        + str(ttest[\"ttest_p_Amygdala_change_pred_PDMCI_HC\"])\n",
    "        + \". The original significant difference has not been replicated. The result is non-significant.\\n\"\n",
    "    )\n",
    "\n",
    "if ttest[\"ttest_p_Amygdala_change_pred_PDMCI_PDnonMCI\"] < 0.05:\n",
    "    print(\n",
    "        \"The p value of the difference in change of the bilateral amygdala volume between PD-MCI and PD-non-MCI is \"\n",
    "        + str(ttest[\"ttest_p_Amygdala_change_pred_PDMCI_PDnonMCI\"])\n",
    "        + \". The original significant difference has been replicated. The result is significant.\\n\"\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"The p value of the difference in change of the bilateral amygdala volume between PD-MCI and PD-non-MCI is \"\n",
    "        + str(ttest[\"ttest_p_Amygdala_change_pred_PDMCI_PDnonMCI\"])\n",
    "        + \". The original significant difference has not been replicated. The result is non-significant.\\n\"\n",
    "    )\n",
    "\n",
    "if ttest[\"ttest_p_Accumbens-area_change_pred_PDMCI_HC\"] < 0.05:\n",
    "    print(\n",
    "        \"The p value of the difference in change of the bilateral nucleus accumbens volume between PD-MCI and HC is \"\n",
    "        + str(ttest[\"ttest_p_Accumbens-area_change_pred_PDMCI_HC\"])\n",
    "        + \". The original significant difference has been replicated. The result is significant.\\n\"\n",
    "    )\n",
    "else:\n",
    "    print(\n",
    "        \"The p value of the difference in change of the bilateral nucleus accumbens volume between PD-MCI and HC is \"\n",
    "        + str(ttest[\"ttest_p_Accumbens-area_change_pred_PDMCI_HC\"])\n",
    "        + \". The original significant difference has not been replicated. The result is non-significant.\\n\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "1cc0032d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-style: italic\">Correlation between volumes differences of subcortical segmentations at Times 1 and </span>\n",
       "<span style=\"font-style: italic\">                  2 with MoCA scores differences at Times 1 and 2                   </span>\n",
       "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┓\n",
       "┃<span style=\"font-weight: bold\"> Subcortical segmentations </span>┃<span style=\"font-weight: bold\"> PD-All </span>┃<span style=\"font-weight: bold\">       </span>┃<span style=\"font-weight: bold\"> PC-MCA </span>┃<span style=\"font-weight: bold\">       </span>┃<span style=\"font-weight: bold\"> PC-non-MCI </span>┃<span style=\"font-weight: bold\">       </span>┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━┩\n",
       "│                           │ r      │ p     │ r      │ p     │ r          │ p     │\n",
       "│ R Thalamus                │ -0.03  │ 0.886 │ 0.089  │ 0.806 │ 0.07       │ 0.803 │\n",
       "│ R Caudate                 │ 0.088  │ 0.676 │ -0.184 │ 0.61  │ 0.284      │ 0.305 │\n",
       "│ R Putamen                 │ 0.022  │ 0.918 │ 0.141  │ 0.697 │ 0.098      │ 0.728 │\n",
       "│ R Pallidum                │ -0.0   │ 1.0   │ -0.074 │ 0.839 │ 0.334      │ 0.224 │\n",
       "│ R Hippocampus             │ -0.099 │ 0.639 │ -0.037 │ 0.92  │ -0.054     │ 0.848 │\n",
       "│ R Amygdala                │ -0.237 │ 0.255 │ -0.253 │ 0.481 │ -0.27      │ 0.331 │\n",
       "│ R Nucleus accumbens       │ 0.035  │ 0.868 │ 0.384  │ 0.273 │ -0.123     │ 0.661 │\n",
       "│ L Thalamus                │ 0.186  │ 0.374 │ 0.135  │ 0.711 │ 0.483      │ 0.068 │\n",
       "│ L Caudate                 │ -0.0   │ 1.0   │ -0.286 │ 0.424 │ 0.417      │ 0.123 │\n",
       "│ L Putamen                 │ -0.027 │ 0.898 │ -0.173 │ 0.633 │ 0.189      │ 0.501 │\n",
       "│ L Pallidum                │ 0.052  │ 0.806 │ 0.015  │ 0.968 │ 0.145      │ 0.607 │\n",
       "│ L Hippocampus             │ -0.012 │ 0.955 │ -0.082 │ 0.822 │ 0.067      │ 0.814 │\n",
       "│ L Amygdala                │ 0.235  │ 0.257 │ 0.323  │ 0.363 │ 0.336      │ 0.22  │\n",
       "│ L Nucleus accumbens       │ 0.022  │ 0.917 │ -0.07  │ 0.848 │ 0.122      │ 0.666 │\n",
       "└───────────────────────────┴────────┴───────┴────────┴───────┴────────────┴───────┘\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[3mCorrelation between volumes differences of subcortical segmentations at Times 1 and \u001b[0m\n",
       "\u001b[3m                  2 with MoCA scores differences at Times 1 and 2                   \u001b[0m\n",
       "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┓\n",
       "┃\u001b[1m \u001b[0m\u001b[1mSubcortical segmentations\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mPD-All\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m     \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mPC-MCA\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m     \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mPC-non-MCI\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m     \u001b[0m\u001b[1m \u001b[0m┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━┩\n",
       "│                           │ r      │ p     │ r      │ p     │ r          │ p     │\n",
       "│ R Thalamus                │ -0.03  │ 0.886 │ 0.089  │ 0.806 │ 0.07       │ 0.803 │\n",
       "│ R Caudate                 │ 0.088  │ 0.676 │ -0.184 │ 0.61  │ 0.284      │ 0.305 │\n",
       "│ R Putamen                 │ 0.022  │ 0.918 │ 0.141  │ 0.697 │ 0.098      │ 0.728 │\n",
       "│ R Pallidum                │ -0.0   │ 1.0   │ -0.074 │ 0.839 │ 0.334      │ 0.224 │\n",
       "│ R Hippocampus             │ -0.099 │ 0.639 │ -0.037 │ 0.92  │ -0.054     │ 0.848 │\n",
       "│ R Amygdala                │ -0.237 │ 0.255 │ -0.253 │ 0.481 │ -0.27      │ 0.331 │\n",
       "│ R Nucleus accumbens       │ 0.035  │ 0.868 │ 0.384  │ 0.273 │ -0.123     │ 0.661 │\n",
       "│ L Thalamus                │ 0.186  │ 0.374 │ 0.135  │ 0.711 │ 0.483      │ 0.068 │\n",
       "│ L Caudate                 │ -0.0   │ 1.0   │ -0.286 │ 0.424 │ 0.417      │ 0.123 │\n",
       "│ L Putamen                 │ -0.027 │ 0.898 │ -0.173 │ 0.633 │ 0.189      │ 0.501 │\n",
       "│ L Pallidum                │ 0.052  │ 0.806 │ 0.015  │ 0.968 │ 0.145      │ 0.607 │\n",
       "│ L Hippocampus             │ -0.012 │ 0.955 │ -0.082 │ 0.822 │ 0.067      │ 0.814 │\n",
       "│ L Amygdala                │ 0.235  │ 0.257 │ 0.323  │ 0.363 │ 0.336      │ 0.22  │\n",
       "│ L Nucleus accumbens       │ 0.022  │ 0.917 │ -0.07  │ 0.848 │ 0.122      │ 0.666 │\n",
       "└───────────────────────────┴────────┴───────┴────────┴───────┴────────────┴───────┘\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "table = Table(\n",
    "    title=\"Correlation between volumes differences of subcortical segmentations at Times 1 and 2 with MoCA scores differences at Times 1 and 2\"\n",
    ")\n",
    "\n",
    "table.add_column(\"Subcortical segmentations\", justify=\"Left\")\n",
    "table.add_column(\"PD-All\")\n",
    "table.add_column(\"\")\n",
    "table.add_column(\"PC-MCA\")\n",
    "table.add_column(\"\")\n",
    "table.add_column(\"PC-non-MCI\")\n",
    "table.add_column(\"\")\n",
    "\n",
    "table.add_row(\"\", \"r\", \"p\", \"r\", \"p\", \"r\", \"p\")\n",
    "\n",
    "table.add_row(\n",
    "    \"R Thalamus\",\n",
    "    str(corr[\"corr_r_Right-Thalamus_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Thalamus_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Thalamus_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Thalamus_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Thalamus_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Thalamus_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"R Caudate\",\n",
    "    str(corr[\"corr_r_Right-Caudate_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Caudate_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Caudate_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Caudate_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Caudate_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Caudate_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"R Putamen\",\n",
    "    str(corr[\"corr_r_Right-Putamen_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Putamen_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Putamen_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Putamen_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Putamen_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Putamen_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"R Pallidum\",\n",
    "    str(corr[\"corr_r_Right-Pallidum_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Pallidum_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Pallidum_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Pallidum_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Pallidum_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Pallidum_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"R Hippocampus\",\n",
    "    str(corr[\"corr_r_Right-Hippocampus_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Hippocampus_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Hippocampus_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Hippocampus_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Hippocampus_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Hippocampus_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"R Amygdala\",\n",
    "    str(corr[\"corr_r_Right-Amygdala_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Amygdala_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Amygdala_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Amygdala_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Amygdala_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Amygdala_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"R Nucleus accumbens\",\n",
    "    str(corr[\"corr_r_Right-Accumbens-area_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Accumbens-area_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Accumbens-area_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Accumbens-area_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Right-Accumbens-area_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Right-Accumbens-area_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "\n",
    "table.add_row(\n",
    "    \"L Thalamus\",\n",
    "    str(corr[\"corr_r_Left-Thalamus_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Thalamus_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Thalamus_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Thalamus_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Thalamus_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Thalamus_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"L Caudate\",\n",
    "    str(corr[\"corr_r_Left-Caudate_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Caudate_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Caudate_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Caudate_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Caudate_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Caudate_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"L Putamen\",\n",
    "    str(corr[\"corr_r_Left-Putamen_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Putamen_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Putamen_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Putamen_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Putamen_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Putamen_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"L Pallidum\",\n",
    "    str(corr[\"corr_r_Left-Pallidum_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Pallidum_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Pallidum_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Pallidum_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Pallidum_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Pallidum_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"L Hippocampus\",\n",
    "    str(corr[\"corr_r_Left-Hippocampus_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Hippocampus_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Hippocampus_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Hippocampus_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Hippocampus_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Hippocampus_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"L Amygdala\",\n",
    "    str(corr[\"corr_r_Left-Amygdala_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Amygdala_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Amygdala_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Amygdala_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Amygdala_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Amygdala_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "table.add_row(\n",
    "    \"L Nucleus accumbens\",\n",
    "    str(corr[\"corr_r_Left-Accumbens-area_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Accumbens-area_MoCA_PDall\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Accumbens-area_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Accumbens-area_MoCA_PDMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_r_Left-Accumbens-area_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    "    str(corr[\"corr_p_Left-Accumbens-area_MoCA_PDnonMCI\"].round(decimals=3)),\n",
    ")\n",
    "\n",
    "\n",
    "console = Console()\n",
    "console.print(table)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a49404db",
   "metadata": {},
   "source": [
    "# Vertex-wise analysis\n",
    "\n",
    "##  Prepare the vertex data with long_mris_slopes for longitudinal two stage model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "db7b2dfb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "zsh:1: command not found: module\r\n"
     ]
    }
   ],
   "source": [
    "# prepare node for computation. it includes downloading the container\n",
    "# this should be run on a server connected to the internet\n",
    "# load recon-all base descriptor\n",
    "\n",
    "! module load singularity\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "\n",
    "zid = \"zenodo.7378416\"\n",
    "# bosh([\"exec\", \"prepare\", zid])\n",
    "freesurfer = boutiques.descriptor2func.function(zid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "3123ca7f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create longitudinal QDEC table\n",
    "\n",
    "ROI_data_dup = pd.DataFrame(np.repeat(ROI_data.values, 2, axis=0))\n",
    "ROI_data_dup.columns = ROI_data.columns\n",
    "qdec_table = ROI_data_dup[[\"PATNO\", \"dx_group\"]]\n",
    "qdec_table[\"fsid\"] = qdec_table[\"PATNO\"].astype(str) + np.where(\n",
    "    qdec_table.index % 2 == 0, \"_first\", \"_second\"\n",
    ")\n",
    "qdec_table[\"fsid-base\"] = qdec_table[\"PATNO\"].astype(str) + \"_base\"\n",
    "qdec_table[\"years\"] = np.where(\n",
    "    qdec_table.index % 2 == 0, 0, ROI_data_dup[\"durationT2_T1_y\"]\n",
    ")\n",
    "qdec_table = qdec_table[[\"fsid\", \"fsid-base\", \"years\", \"dx_group\"]]\n",
    "qdec_table\n",
    "\n",
    "qdec_table.to_csv(\"qdec_long_groups.dat\", sep=\" \", index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "abc2a359",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile preprocess_FS_long_mris_slopes.py\n",
    "\n",
    "# Prepare the data with long_mris_slopes for longitudinal two stage model\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.7378416\"\n",
    "from boutiques.descriptor2func import function\n",
    "#bosh([\"exec\", \"prepare\", zid])\n",
    "freesurfer = function(zid)\n",
    "\n",
    "\n",
    "for hemi in [\"lh\", \"rh\"]:\n",
    "    out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        qdec='qdec_long_groups.dat',\n",
    "                                        meas='thickness',\n",
    "                                        hemi=hemi,\n",
    "                                        time='years',\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                        stack_avg='{hemi}.long.thickness-avg.stack.mgh'.format(hemi=hemi),\n",
    "                                        stack_rate='{hemi}.long.thickness-rate.stack.mgh'.format(hemi=hemi),\n",
    "                                        stack_pc1fit='{hemi}.long.thickness-pc1fit.stack.mgh'.format(hemi=hemi),\n",
    "                                        stack_pc1='{hemi}.long.thickness-pc1.stack.mgh'.format(hemi=hemi),\n",
    "                                        stack_spc='{hemi}.long.thickness-spc.stack.mgh'.format(hemi=hemi),\n",
    "                                       )\n",
    "\n",
    "\n",
    "    exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "    with open('preproc_FS_codes/exitcode_long_mris_slopes_{hemi}.txt'.format(hemi=hemi), 'w') as f:\n",
    "            f.write(exitcode)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e17ff299",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=FS_mris_slopes\n",
    "#SBATCH --mem=4G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_mris_slopes.out\n",
    "#SBATCH --error=FS_mris_slopes.err\n",
    "#SBATCH --time=10:0:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python preprocess_FS_long_mris_slopes.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6529144",
   "metadata": {},
   "source": [
    "## Create FSGD file for the correlational analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "5fbdc645",
   "metadata": {},
   "outputs": [],
   "source": [
    "fsgd_cortThick_mcat = ROI_data[[\"PATNO\", \"durationT2_T1_y\", \"dx_group\", \"MCATOT_diff\"]]\n",
    "fsgd_cortThick_mcat[\"Input\"] = \"Input\"\n",
    "fsgd_cortThick_mcat[\"PATNO\"] = fsgd_cortThick_mcat[\"PATNO\"].astype(str) + \"_base\"\n",
    "fsgd_cortThick_mcat = fsgd_cortThick_mcat[\n",
    "    [\"Input\", \"PATNO\", \"dx_group\", \"durationT2_T1_y\", \"MCATOT_diff\"]\n",
    "]\n",
    "fsgd_cortThick_mcat\n",
    "\n",
    "fsgd_cortThick_mcat_PDMCI = fsgd_cortThick_mcat.loc[\n",
    "    fsgd_cortThick_mcat[\"dx_group\"] == \"PD-MCI\"\n",
    "]\n",
    "fsgd_cortThick_mcat_PDMCI[\"dx_group\"] = \"PDMCI\"\n",
    "fsgd_cortThick_mcat_PDnonMCI = fsgd_cortThick_mcat.loc[\n",
    "    fsgd_cortThick_mcat[\"dx_group\"] == \"PD-non-MCI\"\n",
    "]\n",
    "fsgd_cortThick_mcat_PDnonMCI[\"dx_group\"] = \"PDnonMCI\"\n",
    "fsgd_cortThick_mcat_PDall = fsgd_cortThick_mcat.loc[\n",
    "    fsgd_cortThick_mcat[\"dx_group\"].isin([\"PD-MCI\", \"PD-non-MCI\"])\n",
    "]\n",
    "fsgd_cortThick_mcat_PDall[\"dx_group\"] = \"PDall\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "be93cb7c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# generate fsgd files for correlational analysis\n",
    "# make sure that the order of subjects in each FSGD file is the same as in long.qdec table\n",
    "\n",
    "for group in [\"PDMCI\", \"PDnonMCI\", \"PDall\"]:\n",
    "\n",
    "    # generate sample matrix\n",
    "    exec(\n",
    "        \"fsgd_cortThick_mcat_%s.to_csv('fsgd_corr_%s_cort.txt', sep='\\t', index=False, header=None)\"\n",
    "        % (group, group)\n",
    "    )\n",
    "\n",
    "    # generate file header\n",
    "    with open(\"fsgd_corr_{group}_desc.txt\".format(group=group), \"w\") as f:\n",
    "        f.write(\n",
    "            \"GroupDescriptorFile 1\\nTitle GroupDifferences\\nMeasurementName MoCA\\nClass {group}\\nVariables durationT2_T1_y MoCA\\n\".format(\n",
    "                group=group\n",
    "            )\n",
    "        )\n",
    "\n",
    "    # generate fsgd file\n",
    "    with open(\"fsgd_corr_{group}_group.fsgd\".format(group=group), \"wb\") as outfile:\n",
    "        for f in [\n",
    "            \"fsgd_corr_{group}_desc.txt\".format(group=group),\n",
    "            \"fsgd_corr_{group}_cort.txt\".format(group=group),\n",
    "        ]:\n",
    "            with open(f, \"rb\") as infile:\n",
    "                outfile.write(infile.read())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f71710c1",
   "metadata": {},
   "source": [
    "## Stack images for correlational analyses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72caad73",
   "metadata": {},
   "outputs": [],
   "source": [
    "# prepare node for computation. it includes downloading the container\n",
    "# this should be run on a server connected to the internet\n",
    "# load recon-all base descriptor\n",
    "\n",
    "! module load singularity\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "\n",
    "zid = \"zenodo.7387748\"\n",
    "freesurfer = boutiques.descriptor2func.function(zid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f199fb79",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile preprocess_FS_mris_preproc_corr.py\n",
    "\n",
    "# Concatenate images from correlation analysis\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.7387748\"\n",
    "from boutiques.descriptor2func import function\n",
    "freesurfer = function(zid)\n",
    "\n",
    "\n",
    "for hemi in [\"lh\", \"rh\"]:\n",
    "    for group in [\"PDMCI\", \"PDnonMCI\", \"PDall\"]:\n",
    "        out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        hemi=hemi,\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                        cachein=\"long.thickness-rate.fwhm10.fsaverage\",\n",
    "                                        target=\"fsaverage\",\n",
    "                                        fsgd='fsgd_corr_{group}_group.fsgd'.format(group=group),\n",
    "                                        out='stack.{hemi}.corr.{group}.thickness.rate.10.mgh'.format(hemi=hemi, group=group),\n",
    "                                       )\n",
    "\n",
    "\n",
    "    exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "    with open('preproc_FS_codes/exitcode_mris_preproc_{hemi}.txt'.format(hemi=hemi), 'w') as f:\n",
    "            f.write(exitcode)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad91f659",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=FS_mris_preproc_corr\n",
    "#SBATCH --mem=1G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_mris_preproc_corr.out\n",
    "#SBATCH --error=FS_mris_preproc_corr.err\n",
    "#SBATCH --time=0:30:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python preprocess_FS_mris_preproc_corr.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6a0e5fd",
   "metadata": {},
   "source": [
    "## Create FSGD file for the group analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "7a02666c",
   "metadata": {},
   "outputs": [],
   "source": [
    "fsgd_cortThick_diff = ROI_data[[\"PATNO\", \"durationT2_T1_y\", \"dx_group\"]]\n",
    "fsgd_cortThick_diff[\"Input\"] = \"Input\"\n",
    "fsgd_cortThick_diff[\"PATNO\"] = fsgd_cortThick_diff[\"PATNO\"].astype(str) + \"_base\"\n",
    "fsgd_cortThick_diff = fsgd_cortThick_diff[\n",
    "    [\"Input\", \"PATNO\", \"dx_group\", \"durationT2_T1_y\"]\n",
    "]\n",
    "\n",
    "\n",
    "# generate sample matrix\n",
    "fsgd_cortThick_diff.to_csv(\"fsgd_group_cort.txt\", sep=\"\\t\", index=False, header=None)\n",
    "\n",
    "# generate file header\n",
    "with open(\"fsgd_group_desc.txt\", \"w\") as f:\n",
    "    f.write(\n",
    "        \"GroupDescriptorFile 1\\nTitle GroupDifferences\\nClass PD-MCI\\nClass PD-non-MCI\\nClass HC\\nVariables durationT2_T1_y\\n\"\n",
    "    )\n",
    "\n",
    "\n",
    "with open(\"fsgd_cort_group.fsgd\", \"wb\") as outfile:\n",
    "    for f in [\"fsgd_group_desc.txt\", \"fsgd_group_cort.txt\"]:\n",
    "        with open(f, \"rb\") as infile:\n",
    "            outfile.write(infile.read())\n",
    "\n",
    "# make sure that the order of subjects in FSGD file is the same as in long.qdec table"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fabe1532",
   "metadata": {},
   "source": [
    "## Stack images for group analyses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09b55607",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile preprocess_FS_mris_preproc_group.py\n",
    "\n",
    "# Concatenate images from group analysis\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.7387748\"\n",
    "from boutiques.descriptor2func import function\n",
    "freesurfer = function(zid)\n",
    "\n",
    "\n",
    "for hemi in [\"lh\", \"rh\"]:\n",
    "    out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        hemi=hemi,\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                        cachein=\"long.thickness-rate.fwhm10.fsaverage\",\n",
    "                                        target=\"fsaverage\",\n",
    "                                        fsgd='fsgd_cort_group.fsgd',\n",
    "                                        out='stack.{hemi}.group.thickness.rate.10.mgh'.format(hemi=hemi),\n",
    "                                       )\n",
    "\n",
    "\n",
    "    exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "    with open('preproc_FS_codes/exitcode_mris_preproc_group_{hemi}.txt'.format(hemi=hemi), 'w') as f:\n",
    "            f.write(exitcode)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54c2b920",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=FS_mris_preproc_group\n",
    "#SBATCH --mem=1G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_mris_preproc_group.out\n",
    "#SBATCH --error=FS_mris_preproc_group.err\n",
    "#SBATCH --time=0:10:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python preprocess_FS_mris_preproc_group.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ae16361",
   "metadata": {},
   "source": [
    "## Run GLM model for the correlational analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8bb9b43e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create contrast for MoCA scores\n",
    "\n",
    "with open(\"con_corr_MoCA.mtx\", \"w\") as f:\n",
    "    f.write(\"0 0 1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d34b573",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile glm_corr.py\n",
    "\n",
    "# GLM model for the correlational analysis\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.7411854\"\n",
    "from boutiques.descriptor2func import function\n",
    "freesurfer = function(zid)\n",
    "\n",
    "\n",
    "for hemi in [\"lh\", \"rh\"]:\n",
    "    for group in [\"PDMCI\", \"PDnonMCI\", \"PDall\"]:   \n",
    "        out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        hemi=hemi,\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                        outdir='results_corr_{group}_{hemi}'.format(group=group, hemi=hemi),\n",
    "                                        inputdata='stack.{hemi}.corr.{group}.thickness.rate.10.mgh'.format(hemi=hemi,group=group),\n",
    "                                        fsgd='fsgd_corr_{group}_group.fsgd'.format(group=group),\n",
    "                                        con='con_corr_MoCA.mtx'\n",
    "                                       )\n",
    "\n",
    "\n",
    "    exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "    with open('preproc_FS_codes/exitcode_glm_corr_{group}_{hemi}.txt'.format(group=group, hemi=hemi), 'w') as f:\n",
    "            f.write(exitcode)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18d69a75",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=glm_corr\n",
    "#SBATCH --mem=1G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_glm_corr.out\n",
    "#SBATCH --error=FS_glm_corr.err\n",
    "#SBATCH --time=0:10:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python glm_corr.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20e25810",
   "metadata": {},
   "source": [
    "## Run GLM model for the group analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f51e0a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create group contrasts\n",
    "\n",
    "with open(\"con_group_PDMCI_PDnonMCI.mtx\", \"w\") as f:\n",
    "    f.write(\"1 -1 0 0 0 0\")\n",
    "with open(\"con_group_PDMCI_HC.mtx\", \"w\") as f:\n",
    "    f.write(\"1 0 -1 0 0 0\")\n",
    "with open(\"con_group_PDnonMCI_HC.mtx\", \"w\") as f:\n",
    "    f.write(\"0 1 -1 0 0 0\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82e7f3e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile glm_group.py\n",
    "\n",
    "# GLM model for the group analysis\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.7411856\"\n",
    "from boutiques.descriptor2func import function\n",
    "freesurfer = function(zid)\n",
    "\n",
    "\n",
    "for hemi in [\"lh\", \"rh\"]:  \n",
    "    out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        hemi=hemi,\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                        outdir='results_group_{hemi}'.format(hemi=hemi),\n",
    "                                        inputdata='stack.{hemi}.group.thickness.rate.10.mgh'.format(hemi=hemi),\n",
    "                                        fsgd='fsgd_cort_group.fsgd',\n",
    "                                        con1=\"con_group_PDMCI_PDnonMCI.mtx\", \n",
    "                                        con2=\"con_group_PDMCI_HC.mtx\", \n",
    "                                        con3=\"con_group_PDnonMCI_HC.mtx\"\n",
    "                                       )\n",
    "\n",
    "\n",
    "    exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "    with open('preproc_FS_codes/exitcode_glm_group_{hemi}.txt'.format(hemi=hemi), 'w') as f:\n",
    "            f.write(exitcode)\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2f20676",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=glm_group\n",
    "#SBATCH --mem=1G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_glm_group.out\n",
    "#SBATCH --error=FS_glm_group.err\n",
    "#SBATCH --time=0:10:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python glm_group.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27d85cf1",
   "metadata": {},
   "source": [
    "## Correction for multiple comparison (with mri_glmfit-sim)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c596f9c",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile glm_group_sim.py\n",
    "\n",
    "# GLM model for the group analysis\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.7411893\"\n",
    "from boutiques.descriptor2func import function\n",
    "freesurfer = function(zid)\n",
    "\n",
    "\n",
    "for hemi in [\"lh\", \"rh\"]:  \n",
    "    out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                        dir='results_group_{hemi}'.format(hemi=hemi),\n",
    "                                        CACHE_abs='1.3',\n",
    "                                        cwp=\"0.05\"\n",
    "                                       )\n",
    "\n",
    "\n",
    "    exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "    with open('preproc_FS_codes/exitcode_glm_sim_group_{hemi}.txt'.format(hemi=hemi), 'w') as f:\n",
    "            f.write(exitcode)\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "472aebab",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=glm_group_sim\n",
    "#SBATCH --mem=1G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_glm_sim_group.out\n",
    "#SBATCH --error=FS_glm_sim_group.err\n",
    "#SBATCH --time=0:10:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python glm_group_sim.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "622134a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile glm_corr_sim.py\n",
    "\n",
    "# GLM model for the correlational analysis\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "import boutiques\n",
    "from boutiques import bosh\n",
    "zid = \"zenodo.7411893\"\n",
    "from boutiques.descriptor2func import function\n",
    "freesurfer = function(zid)\n",
    "\n",
    "\n",
    "for hemi in [\"lh\", \"rh\"]:\n",
    "    for group in [\"PDMCI\", \"PDnonMCI\", \"PDall\"]: \n",
    "        out_fs = freesurfer('--imagepath', 'freesurfer-freesurfer-7.1.1.simg.tmp',\n",
    "                                        license=\"FS_license/license.txt\",\n",
    "                                        dir='results_corr_{group}_{hemi}'.format(group=group, hemi=hemi),\n",
    "                                        CACHE_abs='1.3',\n",
    "                                        cwp=\"0.05\"\n",
    "                                       )\n",
    "\n",
    "\n",
    "    exitcode = str(out_fs.exit_code)\n",
    "\n",
    "\n",
    "    with open('preproc_FS_codes/exitcode_glm_sim_corr_{group}_{hemi}.txt'.format(group=group, hemi=hemi), 'w') as f:\n",
    "            f.write(exitcode)\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5885d6dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sbatch\n",
    "#!/bin/bash\n",
    "#SBATCH --job-name=glm_corr_sim\n",
    "#SBATCH --mem=1G\n",
    "#SBATCH --cpus-per-task=2\n",
    "#SBATCH --nodes=1\n",
    "#SBATCH --output=FS_glm_sim_corr.out\n",
    "#SBATCH --error=FS_glm_sim_corr.err\n",
    "#SBATCH --time=0:10:0\n",
    "\n",
    ". venv/bin/activate # opens virtual environment. change depending where you proprocess the data  \n",
    "\n",
    "module load singularity\n",
    "\n",
    "python glm_corr_sim.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86d26239",
   "metadata": {},
   "source": [
    "# Vertex-wise results"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e09a270",
   "metadata": {},
   "source": [
    "## correlation with MoCA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "9c39a05d",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# correlation results\n",
    "\n",
    "result_corr = open(\"result_corr.txt\", \"w\")\n",
    "\n",
    "# print(\"Correlation analysis\")\n",
    "for group in [\"PDMCI\", \"PDnonMCI\", \"PDall\"]:\n",
    "    # print(\"Significant correlation with MoCA in {group} group\\n\".format(group=group))\n",
    "    for hemi in [\"lh\", \"rh\"]:\n",
    "        # print(\"Hemisphere {hemi}\\n\".format(hemi=hemi))\n",
    "        file = open(\n",
    "            \"results_corr_{group}_{hemi}/con_corr_MoCA/cache.th13.abs.sig.cluster.summary\".format(\n",
    "                hemi=hemi, group=group\n",
    "            ),\n",
    "            \"r\",\n",
    "        )\n",
    "        always_print = False\n",
    "        lines = file.readlines()\n",
    "        result_corr.write(group + \"_\" + hemi + \"\\n\")\n",
    "        for line in lines:\n",
    "            if always_print or \"ClusterNo\" in line:\n",
    "                # print(line)\n",
    "                result_corr.write(line)\n",
    "                always_print = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "72e518a0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ClusterNo</th>\n",
       "      <th>Max</th>\n",
       "      <th>VtxMax</th>\n",
       "      <th>Size(mm^2)</th>\n",
       "      <th>MNIX</th>\n",
       "      <th>MNIY</th>\n",
       "      <th>MNIZ</th>\n",
       "      <th>CWP</th>\n",
       "      <th>CWPLow</th>\n",
       "      <th>CWPHi</th>\n",
       "      <th>NVtxs</th>\n",
       "      <th>WghtVtx</th>\n",
       "      <th>Annot</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>PDMCI_lh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>PDMCI_rh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>-4.0378</td>\n",
       "      <td>138382</td>\n",
       "      <td>749.67</td>\n",
       "      <td>30.1</td>\n",
       "      <td>43.1</td>\n",
       "      <td>16.9</td>\n",
       "      <td>0.00040</td>\n",
       "      <td>0.00000</td>\n",
       "      <td>0.00080</td>\n",
       "      <td>1129</td>\n",
       "      <td>-2223.62</td>\n",
       "      <td>rostralmiddlefrontal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>PDnonMCI_lh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>PDnonMCI_rh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>PDall_lh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>3.4012</td>\n",
       "      <td>161438</td>\n",
       "      <td>592.67</td>\n",
       "      <td>-53.3</td>\n",
       "      <td>-20.1</td>\n",
       "      <td>-33.1</td>\n",
       "      <td>0.01772</td>\n",
       "      <td>0.01534</td>\n",
       "      <td>0.02010</td>\n",
       "      <td>876</td>\n",
       "      <td>1635.12</td>\n",
       "      <td>inferiortemporal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>2.5887</td>\n",
       "      <td>138009</td>\n",
       "      <td>523.77</td>\n",
       "      <td>-24.4</td>\n",
       "      <td>-16.8</td>\n",
       "      <td>64.5</td>\n",
       "      <td>0.04058</td>\n",
       "      <td>0.03705</td>\n",
       "      <td>0.04410</td>\n",
       "      <td>1206</td>\n",
       "      <td>2085.15</td>\n",
       "      <td>precentral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>PDall_rh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     ClusterNo      Max  VtxMax Size(mm^2)   MNIX   MNIY   MNIZ      CWP  \\\n",
       "0     PDMCI_lh                                                             \n",
       "1     PDMCI_rh                                                             \n",
       "2            1  -4.0378  138382     749.67   30.1   43.1   16.9  0.00040   \n",
       "3  PDnonMCI_lh                                                             \n",
       "4  PDnonMCI_rh                                                             \n",
       "5     PDall_lh                                                             \n",
       "6            1   3.4012  161438     592.67  -53.3  -20.1  -33.1  0.01772   \n",
       "7            2   2.5887  138009     523.77  -24.4  -16.8   64.5  0.04058   \n",
       "8     PDall_rh                                                             \n",
       "\n",
       "    CWPLow    CWPHi NVtxs   WghtVtx                 Annot  \n",
       "0                                                          \n",
       "1                                                          \n",
       "2  0.00000  0.00080  1129  -2223.62  rostralmiddlefrontal  \n",
       "3                                                          \n",
       "4                                                          \n",
       "5                                                          \n",
       "6  0.01534  0.02010   876   1635.12      inferiortemporal  \n",
       "7  0.03705  0.04410  1206   2085.15            precentral  \n",
       "8                                                          "
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results_corr = pd.read_csv(\n",
    "    \"result_corr.txt\",\n",
    "    sep=\"\\\\s+\",\n",
    "    keep_default_na=False,\n",
    "    na_values=\" \",\n",
    "    comment=\"#\",\n",
    "    names=[\n",
    "        \"ClusterNo\",\n",
    "        \"Max\",\n",
    "        \"VtxMax\",\n",
    "        \"Size(mm^2)\",\n",
    "        \"MNIX\",\n",
    "        \"MNIY\",\n",
    "        \"MNIZ\",\n",
    "        \"CWP\",\n",
    "        \"CWPLow\",\n",
    "        \"CWPHi\",\n",
    "        \"NVtxs\",\n",
    "        \"WghtVtx\",\n",
    "        \"Annot\",\n",
    "    ],\n",
    ")\n",
    "results_corr"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5454622",
   "metadata": {},
   "source": [
    "Correlation between rate of change of cortical thickness and the Montreal Cognitive Assessment scores in PD-MCI group\n",
    "\n",
    "<img src=\"images/img_corr_PDMCI_rh.jpg\"/>\n",
    "\n",
    "Correlation between rate of change of cortical thickness and the Montreal Cognitive Assessment scores in PD-all group\n",
    "\n",
    "<img src=\"images/img_corr_PDall_lh.jpg\"/>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00c92a60",
   "metadata": {},
   "source": [
    "## group differences"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "4b167c68",
   "metadata": {},
   "outputs": [],
   "source": [
    "# group differences\n",
    "\n",
    "result_group = open(\"result_group.txt\", \"w\")\n",
    "\n",
    "# print(\"Group analysis\\n\")\n",
    "for pair in [\"PDMCI_PDnonMCI\", \"PDMCI_HC\", \"PDnonMCI_HC\"]:\n",
    "    # print(\"** Significant group differences between {pair} **\\n\".format(pair=pair))\n",
    "    for hemi in [\"lh\", \"rh\"]:\n",
    "        # print(\"Hemisphere {hemi}\\n\".format(hemi=hemi))\n",
    "        file = open(\n",
    "            \"results_group_{hemi}/con_group_{pair}/cache.th13.abs.sig.cluster.summary\".format(\n",
    "                hemi=hemi, pair=pair\n",
    "            ),\n",
    "            \"r\",\n",
    "        )\n",
    "        always_print = False\n",
    "        lines = file.readlines()\n",
    "        result_group.write(pair + \"_\" + hemi + \"\\n\")\n",
    "        for line in lines:\n",
    "            if always_print or \"ClusterNo\" in line:\n",
    "                # print(line)\n",
    "                result_group.write(line)\n",
    "                always_print = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "23db829e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ClusterNo</th>\n",
       "      <th>Max</th>\n",
       "      <th>VtxMax</th>\n",
       "      <th>Size(mm^2)</th>\n",
       "      <th>MNIX</th>\n",
       "      <th>MNIY</th>\n",
       "      <th>MNIZ</th>\n",
       "      <th>CWP</th>\n",
       "      <th>CWPLow</th>\n",
       "      <th>CWPHi</th>\n",
       "      <th>NVtxs</th>\n",
       "      <th>WghtVtx</th>\n",
       "      <th>Annot</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>PDMCI_PDnonMCI_lh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>2.8711</td>\n",
       "      <td>67258</td>\n",
       "      <td>901.62</td>\n",
       "      <td>-15.4</td>\n",
       "      <td>-45.2</td>\n",
       "      <td>65.8</td>\n",
       "      <td>0.00040</td>\n",
       "      <td>0.00000</td>\n",
       "      <td>0.00080</td>\n",
       "      <td>2169</td>\n",
       "      <td>4071.31</td>\n",
       "      <td>precuneus</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>PDMCI_PDnonMCI_rh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>-4.3574</td>\n",
       "      <td>34834</td>\n",
       "      <td>808.31</td>\n",
       "      <td>45.0</td>\n",
       "      <td>-4.9</td>\n",
       "      <td>49.8</td>\n",
       "      <td>0.00260</td>\n",
       "      <td>0.00180</td>\n",
       "      <td>0.00360</td>\n",
       "      <td>1540</td>\n",
       "      <td>-3489.04</td>\n",
       "      <td>precentral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>-4.2850</td>\n",
       "      <td>679</td>\n",
       "      <td>689.83</td>\n",
       "      <td>36.1</td>\n",
       "      <td>-16.4</td>\n",
       "      <td>-3.4</td>\n",
       "      <td>0.00659</td>\n",
       "      <td>0.00519</td>\n",
       "      <td>0.00798</td>\n",
       "      <td>1752</td>\n",
       "      <td>-3804.45</td>\n",
       "      <td>insula</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3</td>\n",
       "      <td>-4.7243</td>\n",
       "      <td>120680</td>\n",
       "      <td>636.56</td>\n",
       "      <td>42.1</td>\n",
       "      <td>7.1</td>\n",
       "      <td>-37.1</td>\n",
       "      <td>0.01077</td>\n",
       "      <td>0.00898</td>\n",
       "      <td>0.01256</td>\n",
       "      <td>1030</td>\n",
       "      <td>-2140.64</td>\n",
       "      <td>middletemporal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>PDMCI_HC_lh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1</td>\n",
       "      <td>3.4112</td>\n",
       "      <td>109333</td>\n",
       "      <td>1596.42</td>\n",
       "      <td>-12.2</td>\n",
       "      <td>-55.4</td>\n",
       "      <td>59.9</td>\n",
       "      <td>0.00020</td>\n",
       "      <td>0.00000</td>\n",
       "      <td>0.00040</td>\n",
       "      <td>3613</td>\n",
       "      <td>7252.89</td>\n",
       "      <td>superiorparietal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2</td>\n",
       "      <td>4.2357</td>\n",
       "      <td>103723</td>\n",
       "      <td>816.25</td>\n",
       "      <td>-8.3</td>\n",
       "      <td>35.1</td>\n",
       "      <td>34.0</td>\n",
       "      <td>0.00060</td>\n",
       "      <td>0.00020</td>\n",
       "      <td>0.00100</td>\n",
       "      <td>1259</td>\n",
       "      <td>2412.37</td>\n",
       "      <td>superiorfrontal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>PDMCI_HC_rh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>2.4693</td>\n",
       "      <td>137873</td>\n",
       "      <td>619.04</td>\n",
       "      <td>15.0</td>\n",
       "      <td>-12.2</td>\n",
       "      <td>61.8</td>\n",
       "      <td>0.01435</td>\n",
       "      <td>0.01216</td>\n",
       "      <td>0.01653</td>\n",
       "      <td>1396</td>\n",
       "      <td>2247.74</td>\n",
       "      <td>precentral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>2</td>\n",
       "      <td>3.0369</td>\n",
       "      <td>151977</td>\n",
       "      <td>578.61</td>\n",
       "      <td>37.3</td>\n",
       "      <td>-29.7</td>\n",
       "      <td>39.5</td>\n",
       "      <td>0.02208</td>\n",
       "      <td>0.01950</td>\n",
       "      <td>0.02484</td>\n",
       "      <td>1561</td>\n",
       "      <td>2734.26</td>\n",
       "      <td>supramarginal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>3</td>\n",
       "      <td>3.9183</td>\n",
       "      <td>2591</td>\n",
       "      <td>563.81</td>\n",
       "      <td>29.5</td>\n",
       "      <td>-48.4</td>\n",
       "      <td>42.7</td>\n",
       "      <td>0.02563</td>\n",
       "      <td>0.02287</td>\n",
       "      <td>0.02859</td>\n",
       "      <td>1354</td>\n",
       "      <td>2888.27</td>\n",
       "      <td>superiorparietal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>PDnonMCI_HC_lh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>PDnonMCI_HC_rh</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>1</td>\n",
       "      <td>4.4171</td>\n",
       "      <td>109273</td>\n",
       "      <td>1010.34</td>\n",
       "      <td>35.7</td>\n",
       "      <td>-30.6</td>\n",
       "      <td>39.6</td>\n",
       "      <td>0.00020</td>\n",
       "      <td>0.00000</td>\n",
       "      <td>0.00040</td>\n",
       "      <td>2607</td>\n",
       "      <td>5628.62</td>\n",
       "      <td>supramarginal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>2</td>\n",
       "      <td>3.8495</td>\n",
       "      <td>54008</td>\n",
       "      <td>580.49</td>\n",
       "      <td>43.5</td>\n",
       "      <td>-9.4</td>\n",
       "      <td>46.2</td>\n",
       "      <td>0.02109</td>\n",
       "      <td>0.01851</td>\n",
       "      <td>0.02366</td>\n",
       "      <td>1391</td>\n",
       "      <td>2904.78</td>\n",
       "      <td>precentral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>3</td>\n",
       "      <td>3.6102</td>\n",
       "      <td>116416</td>\n",
       "      <td>537.11</td>\n",
       "      <td>8.6</td>\n",
       "      <td>-71.2</td>\n",
       "      <td>40.0</td>\n",
       "      <td>0.03842</td>\n",
       "      <td>0.03489</td>\n",
       "      <td>0.04195</td>\n",
       "      <td>996</td>\n",
       "      <td>1920.65</td>\n",
       "      <td>precuneus</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            ClusterNo      Max  VtxMax Size(mm^2)   MNIX   MNIY   MNIZ  \\\n",
       "0   PDMCI_PDnonMCI_lh                                                    \n",
       "1                   1   2.8711   67258     901.62  -15.4  -45.2   65.8   \n",
       "2   PDMCI_PDnonMCI_rh                                                    \n",
       "3                   1  -4.3574   34834     808.31   45.0   -4.9   49.8   \n",
       "4                   2  -4.2850     679     689.83   36.1  -16.4   -3.4   \n",
       "5                   3  -4.7243  120680     636.56   42.1    7.1  -37.1   \n",
       "6         PDMCI_HC_lh                                                    \n",
       "7                   1   3.4112  109333    1596.42  -12.2  -55.4   59.9   \n",
       "8                   2   4.2357  103723     816.25   -8.3   35.1   34.0   \n",
       "9         PDMCI_HC_rh                                                    \n",
       "10                  1   2.4693  137873     619.04   15.0  -12.2   61.8   \n",
       "11                  2   3.0369  151977     578.61   37.3  -29.7   39.5   \n",
       "12                  3   3.9183    2591     563.81   29.5  -48.4   42.7   \n",
       "13     PDnonMCI_HC_lh                                                    \n",
       "14     PDnonMCI_HC_rh                                                    \n",
       "15                  1   4.4171  109273    1010.34   35.7  -30.6   39.6   \n",
       "16                  2   3.8495   54008     580.49   43.5   -9.4   46.2   \n",
       "17                  3   3.6102  116416     537.11    8.6  -71.2   40.0   \n",
       "\n",
       "        CWP   CWPLow    CWPHi NVtxs   WghtVtx             Annot  \n",
       "0                                                                \n",
       "1   0.00040  0.00000  0.00080  2169   4071.31         precuneus  \n",
       "2                                                                \n",
       "3   0.00260  0.00180  0.00360  1540  -3489.04        precentral  \n",
       "4   0.00659  0.00519  0.00798  1752  -3804.45            insula  \n",
       "5   0.01077  0.00898  0.01256  1030  -2140.64    middletemporal  \n",
       "6                                                                \n",
       "7   0.00020  0.00000  0.00040  3613   7252.89  superiorparietal  \n",
       "8   0.00060  0.00020  0.00100  1259   2412.37   superiorfrontal  \n",
       "9                                                                \n",
       "10  0.01435  0.01216  0.01653  1396   2247.74        precentral  \n",
       "11  0.02208  0.01950  0.02484  1561   2734.26     supramarginal  \n",
       "12  0.02563  0.02287  0.02859  1354   2888.27  superiorparietal  \n",
       "13                                                               \n",
       "14                                                               \n",
       "15  0.00020  0.00000  0.00040  2607   5628.62     supramarginal  \n",
       "16  0.02109  0.01851  0.02366  1391   2904.78        precentral  \n",
       "17  0.03842  0.03489  0.04195   996   1920.65         precuneus  "
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results_group = pd.read_csv(\n",
    "    \"result_group.txt\",\n",
    "    sep=\"\\\\s+\",\n",
    "    keep_default_na=False,\n",
    "    na_values=\" \",\n",
    "    comment=\"#\",\n",
    "    names=[\n",
    "        \"ClusterNo\",\n",
    "        \"Max\",\n",
    "        \"VtxMax\",\n",
    "        \"Size(mm^2)\",\n",
    "        \"MNIX\",\n",
    "        \"MNIY\",\n",
    "        \"MNIZ\",\n",
    "        \"CWP\",\n",
    "        \"CWPLow\",\n",
    "        \"CWPHi\",\n",
    "        \"NVtxs\",\n",
    "        \"WghtVtx\",\n",
    "        \"Annot\",\n",
    "    ],\n",
    ")\n",
    "results_group"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8aaa5d5c",
   "metadata": {},
   "source": [
    "### Rate of change of cortical thickness in patients with Parkinson’s disease.\n",
    "\n",
    "PD-MCI versus PD-non-MCI, left hemisphere\n",
    "\n",
    "<img src=\"images/img_group_PDMCI_PDnonMCI_lh.jpg\"/>\n",
    "\n",
    "<img src=\"images/img_group_PDMCI_PDnonMCI_lh_mid.jpg\"/>\n",
    "\n",
    "PD-MCI versus PD-non-MCI, right hemisphere\n",
    "\n",
    "<img src=\"images/img_group_PDnonMCI_PDMCI_rh.jpg\"/>\n",
    "\n",
    "PD-MCI versus HC, left hemisphere\n",
    "\n",
    "<img src=\"images/img_group_PDMCI_HC_lh_mid.jpg\"/>\n",
    "\n",
    "PD-MCI versus HC, right hemisphere\n",
    "\n",
    "<img src=\"images/img_group_PDMCI_HC_rh.jpg\"/>\n",
    "\n",
    "PD-non-MCI versus HC, right hemisphere\n",
    "\n",
    "<img src=\"images/img_group_PDNonMCI_HC_rh.jpg\"/>\n",
    "\n",
    "<img src=\"images/img_group_PDNonMCI_HC_rh_mid.jpg\"/>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}