{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Predicting Schizophrenia Diagnosis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebooks contains an analysis of the COBRE dataset available on Nilearn. The dataset contains resting state fMRI data from 146 participants. Approximately half of the subjects are patients diagnosed with schizophrenia and the remainder are healthy controls. The anlaysis in this notebook attempt to predict schizophrenia diagnosis using resting state fMRI data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#import data\n", "from nilearn import datasets\n", "data = datasets.fetch_cobre(n_subjects=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Phenotypic info for the subjects is included with the data ut requires some cleaning first." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#import phenotypic data\n", "import pandas\n", "pheno = pandas.DataFrame(data.phenotypic)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll extract subject ID from the niifti file names using index slicing and then merge the fMRI file paths to the phenotypic data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#extract participant id from file paths\n", "file_names = []\n", "for path in data.func:\n", " \n", " file_names.append(path[40:45])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "#create dataframe of file paths and ids\n", "files = pandas.DataFrame(data.func, columns = ['path'])\n", "files['id'] = file_names\n", "files['id'] = files.id.astype(int)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#merge phenotypic data with file paths\n", "import pandas\n", "pheno = pandas.merge(pheno, files, on = 'id')\n", "\n", "#fix string decoding\n", "pheno['gender'] = pheno['gender'].map(lambda x: x.decode('utf-8'))\n", "pheno['handedness'] = pheno['handedness'].map(lambda x: x.decode('utf-8'))\n", "pheno['subject_type'] = pheno['subject_type'].map(lambda x: x.decode('utf-8'))\n", "pheno['diagnosis'] = pheno['diagnosis'].map(lambda x: x.decode('utf-8'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at what we have now. And also sve the cleaned phenotypic data to a csv." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idcurrent_agegenderhandednesssubject_typediagnosisframes_okfdfd_scrubbedpath
04006118MaleRightControlNone1330.255120.22657/home/aalbury/nilearn_data/cobre/fmri_0040061....
14009018FemaleRightControlNone1500.169630.16963/home/aalbury/nilearn_data/cobre/fmri_0040090....
24004618MaleLeftPatient295.70 depressed type760.375040.30042/home/aalbury/nilearn_data/cobre/fmri_0040046....
34000219MaleRightPatient295.3670.400060.21575/home/aalbury/nilearn_data/cobre/fmri_0040002....
44011719MaleRightPatient295.31330.209750.18410/home/aalbury/nilearn_data/cobre/fmri_0040117....
.................................
1414008962MaleRightPatient295.3400.703680.72439/home/aalbury/nilearn_data/cobre/fmri_0040089....
1424004063MaleRightPatient295.3420.583010.40646/home/aalbury/nilearn_data/cobre/fmri_0040040....
1434002864MaleRightPatient295.3550.423640.26393/home/aalbury/nilearn_data/cobre/fmri_0040028....
1444008665MaleRightControlNone480.395950.32296/home/aalbury/nilearn_data/cobre/fmri_0040086....
1454000765FemaleRightPatient295.3400.700440.72077/home/aalbury/nilearn_data/cobre/fmri_0040007....
\n", "

146 rows × 10 columns

\n", "
" ], "text/plain": [ " id current_age gender handedness subject_type \\\n", "0 40061 18 Male Right Control \n", "1 40090 18 Female Right Control \n", "2 40046 18 Male Left Patient \n", "3 40002 19 Male Right Patient \n", "4 40117 19 Male Right Patient \n", ".. ... ... ... ... ... \n", "141 40089 62 Male Right Patient \n", "142 40040 63 Male Right Patient \n", "143 40028 64 Male Right Patient \n", "144 40086 65 Male Right Control \n", "145 40007 65 Female Right Patient \n", "\n", " diagnosis frames_ok fd fd_scrubbed \\\n", "0 None 133 0.25512 0.22657 \n", "1 None 150 0.16963 0.16963 \n", "2 295.70 depressed type 76 0.37504 0.30042 \n", "3 295.3 67 0.40006 0.21575 \n", "4 295.3 133 0.20975 0.18410 \n", ".. ... ... ... ... \n", "141 295.3 40 0.70368 0.72439 \n", "142 295.3 42 0.58301 0.40646 \n", "143 295.3 55 0.42364 0.26393 \n", "144 None 48 0.39595 0.32296 \n", "145 295.3 40 0.70044 0.72077 \n", "\n", " path \n", "0 /home/aalbury/nilearn_data/cobre/fmri_0040061.... \n", "1 /home/aalbury/nilearn_data/cobre/fmri_0040090.... \n", "2 /home/aalbury/nilearn_data/cobre/fmri_0040046.... \n", "3 /home/aalbury/nilearn_data/cobre/fmri_0040002.... \n", "4 /home/aalbury/nilearn_data/cobre/fmri_0040117.... \n", ".. ... \n", "141 /home/aalbury/nilearn_data/cobre/fmri_0040089.... \n", "142 /home/aalbury/nilearn_data/cobre/fmri_0040040.... \n", "143 /home/aalbury/nilearn_data/cobre/fmri_0040028.... \n", "144 /home/aalbury/nilearn_data/cobre/fmri_0040086.... \n", "145 /home/aalbury/nilearn_data/cobre/fmri_0040007.... \n", "\n", "[146 rows x 10 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#pheno.to_csv('pheno.csv', index=False)\n", "pheno" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the file paths matched with the phenotypic data, we can easily make subsets for patients and controls." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "#create lists of filepaths for patients and controls\n", "patients = []\n", "controls = []\n", "\n", "for i in pheno.index:\n", " if pheno.loc[i, 'subject_type']=='Patient':\n", " \n", " patients.append(pheno.loc[i, 'path'])\n", " else:\n", " controls.append(pheno.loc[i, 'path'])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code below generates an interactive app using plotly express that will plot a histogram of subject age." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly.express as px\n", "from jupyter_dash import JupyterDash\n", "import dash_core_components as dcc\n", "import dash_html_components as html\n", "from dash.dependencies import Input, Output\n", "# Load Data\n", "df = pheno\n", "# Build App\n", "app = JupyterDash(__name__)\n", "app.layout = html.Div([\n", " html.H1(\"Age\"),\n", " dcc.Graph(id='graph'),\n", " html.Label([\n", " \"Participant type\",\n", " dcc.Dropdown(\n", " id='subject_type', clearable=False,\n", " value='Patient', options=[\n", " {'label': c, 'value': c}\n", " for c in df.subject_type.unique() #get all unique values from column\n", " ])\n", " ]),\n", "])\n", "# Define callback to update graph\n", "@app.callback(\n", " Output('graph', 'figure'),\n", " [Input(\"subject_type\", \"value\")]\n", ")\n", "def update_figure(subject_type):\n", " return px.histogram(\n", " df[df[\"subject_type\"]==subject_type], x=\"current_age\", color=\"gender\"\n", " \n", " )\n", "# Run app and display result inline in the notebook\n", "app.run_server(mode='inline')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Connectivity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This anlaysis uses the BASC atlas to defin ROIs. We'll focus on 64 ROIs for this analysis." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "#import atlas\n", "parcellations = datasets.fetch_atlas_basc_multiscale_2015(version='sym')\n", "atlas_filename = parcellations.scale064" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# visualize atlas\n", "from nilearn import plotting\n", "plotting.plot_roi(atlas_filename, draw_cross = False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's generate correlation matrices for each subject and then merge them to the phenotypic data." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "from nilearn.input_data import NiftiLabelsMasker\n", "from nilearn.connectome import ConnectivityMeasure\n", "\n", "# create mask\n", "mask = NiftiLabelsMasker(labels_img=atlas_filename, \n", " standardize=True, \n", " memory='nilearn_cache', \n", " verbose=1)\n", "\n", "# initialize correlation measure\n", "correlation_measure = ConnectivityMeasure(kind='correlation', vectorize=True,\n", " discard_diagonal=True)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "Resampling labels\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "________________________________________________________________________________\n", "[Memory] Calling nilearn.input_data.base_masker.filter_and_extract...\n", "filter_and_extract('/home/aalbury/nilearn_data/cobre/fmri_0040072.nii.gz', , \n", "{ 'background_label': 0,\n", " 'detrend': False,\n", " 'dtype': None,\n", " 'high_pass': None,\n", " 'labels_img': '/home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz',\n", " 'low_pass': None,\n", " 'mask_img': None,\n", " 'smoothing_fwhm': None,\n", " 'standardize': True,\n", " 'strategy': 'mean',\n", " 't_r': None,\n", " 'target_affine': None,\n", " 'target_shape': None}, confounds='/home/aalbury/nilearn_data/cobre/fmri_0040072.tsv', dtype=None, memory=Memory(location=nilearn_cache/joblib), memory_level=1, verbose=1)\n", "[NiftiLabelsMasker.transform_single_imgs] Loading data from /home/aalbury/nilearn_data/cobre/fmri_0040072.nii.gz\n", "[NiftiLabelsMasker.transform_single_imgs] Extracting region signals\n", "[NiftiLabelsMasker.transform_single_imgs] Cleaning extracted signals\n", "_______________________________________________filter_and_extract - 0.9s, 0.0min\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n", "[NiftiLabelsMasker.fit_transform] loading data from /home/aalbury/nilearn_data/basc_multiscale_2015/template_cambridge_basc_multiscale_nii_sym/template_cambridge_basc_multiscale_sym_scale064.nii.gz\n" ] } ], "source": [ "import pandas as pd\n", "\n", "#initialize empty dataframe\n", "all_features = pd.DataFrame(columns=['features', 'file'])\n", "\n", "for i,sub in enumerate(data.func):\n", " # extract the timeseries from the ROIs in the atlas\n", " time_series = mask.fit_transform(sub, confounds=data.confounds[i])\n", " # create a region x region correlation matrix\n", " correlation_matrix = correlation_measure.fit_transform([time_series])[0]\n", " # add features and file name to dataframe\n", " all_features = all_features.append({'features': correlation_matrix, 'file': data.func[i]}, ignore_index=True)\n", " # uncomment below to keep track of status\n", " #print('finished %s of %s'%(i+1,len(data.func)))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# create pandas dataframe of features and phenotypic data\n", "full = pandas.merge(pheno, all_features, left_on = 'path', right_on = 'file')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have a Pandas dataframe with all of our demographic data and a column that contains the correlation matrix for each subject as an array." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idcurrent_agegenderhandednesssubject_typediagnosisframes_okfdfd_scrubbedpathfeaturesfile
04006118MaleRightControlNone1330.255120.22657/home/aalbury/nilearn_data/cobre/fmri_0040061....[0.12785325357282862, 0.24422311479417322, 0.0.../home/aalbury/nilearn_data/cobre/fmri_0040061....
14009018FemaleRightControlNone1500.169630.16963/home/aalbury/nilearn_data/cobre/fmri_0040090....[0.05584897620355883, 0.11246991477285287, 0.0.../home/aalbury/nilearn_data/cobre/fmri_0040090....
24004618MaleLeftPatient295.70 depressed type760.375040.30042/home/aalbury/nilearn_data/cobre/fmri_0040046....[0.08678037911430761, 0.06380639929297223, 0.2.../home/aalbury/nilearn_data/cobre/fmri_0040046....
34000219MaleRightPatient295.3670.400060.21575/home/aalbury/nilearn_data/cobre/fmri_0040002....[0.1456258349041035, -0.06313977045762048, 0.0.../home/aalbury/nilearn_data/cobre/fmri_0040002....
44011719MaleRightPatient295.31330.209750.18410/home/aalbury/nilearn_data/cobre/fmri_0040117....[0.17462308889197792, -0.11825290188441862, -0.../home/aalbury/nilearn_data/cobre/fmri_0040117....
.......................................
1414008962MaleRightPatient295.3400.703680.72439/home/aalbury/nilearn_data/cobre/fmri_0040089....[0.010289989396440208, 0.05385253837186407, 0..../home/aalbury/nilearn_data/cobre/fmri_0040089....
1424004063MaleRightPatient295.3420.583010.40646/home/aalbury/nilearn_data/cobre/fmri_0040040....[-0.05942811380550243, 0.005379451667081145, 0.../home/aalbury/nilearn_data/cobre/fmri_0040040....
1434002864MaleRightPatient295.3550.423640.26393/home/aalbury/nilearn_data/cobre/fmri_0040028....[0.1526906170886252, 0.2197449376315366, 0.301.../home/aalbury/nilearn_data/cobre/fmri_0040028....
1444008665MaleRightControlNone480.395950.32296/home/aalbury/nilearn_data/cobre/fmri_0040086....[0.44868677474053265, -0.1957847795785304, -0..../home/aalbury/nilearn_data/cobre/fmri_0040086....
1454000765FemaleRightPatient295.3400.700440.72077/home/aalbury/nilearn_data/cobre/fmri_0040007....[0.08900283860731034, 0.11650348658751322, 0.3.../home/aalbury/nilearn_data/cobre/fmri_0040007....
\n", "

146 rows × 12 columns

\n", "
" ], "text/plain": [ " id current_age gender handedness subject_type \\\n", "0 40061 18 Male Right Control \n", "1 40090 18 Female Right Control \n", "2 40046 18 Male Left Patient \n", "3 40002 19 Male Right Patient \n", "4 40117 19 Male Right Patient \n", ".. ... ... ... ... ... \n", "141 40089 62 Male Right Patient \n", "142 40040 63 Male Right Patient \n", "143 40028 64 Male Right Patient \n", "144 40086 65 Male Right Control \n", "145 40007 65 Female Right Patient \n", "\n", " diagnosis frames_ok fd fd_scrubbed \\\n", "0 None 133 0.25512 0.22657 \n", "1 None 150 0.16963 0.16963 \n", "2 295.70 depressed type 76 0.37504 0.30042 \n", "3 295.3 67 0.40006 0.21575 \n", "4 295.3 133 0.20975 0.18410 \n", ".. ... ... ... ... \n", "141 295.3 40 0.70368 0.72439 \n", "142 295.3 42 0.58301 0.40646 \n", "143 295.3 55 0.42364 0.26393 \n", "144 None 48 0.39595 0.32296 \n", "145 295.3 40 0.70044 0.72077 \n", "\n", " path \\\n", "0 /home/aalbury/nilearn_data/cobre/fmri_0040061.... \n", "1 /home/aalbury/nilearn_data/cobre/fmri_0040090.... \n", "2 /home/aalbury/nilearn_data/cobre/fmri_0040046.... \n", "3 /home/aalbury/nilearn_data/cobre/fmri_0040002.... \n", "4 /home/aalbury/nilearn_data/cobre/fmri_0040117.... \n", ".. ... \n", "141 /home/aalbury/nilearn_data/cobre/fmri_0040089.... \n", "142 /home/aalbury/nilearn_data/cobre/fmri_0040040.... \n", "143 /home/aalbury/nilearn_data/cobre/fmri_0040028.... \n", "144 /home/aalbury/nilearn_data/cobre/fmri_0040086.... \n", "145 /home/aalbury/nilearn_data/cobre/fmri_0040007.... \n", "\n", " features \\\n", "0 [0.12785325357282862, 0.24422311479417322, 0.0... \n", "1 [0.05584897620355883, 0.11246991477285287, 0.0... \n", "2 [0.08678037911430761, 0.06380639929297223, 0.2... \n", "3 [0.1456258349041035, -0.06313977045762048, 0.0... \n", "4 [0.17462308889197792, -0.11825290188441862, -0... \n", ".. ... \n", "141 [0.010289989396440208, 0.05385253837186407, 0.... \n", "142 [-0.05942811380550243, 0.005379451667081145, 0... \n", "143 [0.1526906170886252, 0.2197449376315366, 0.301... \n", "144 [0.44868677474053265, -0.1957847795785304, -0.... \n", "145 [0.08900283860731034, 0.11650348658751322, 0.3... \n", "\n", " file \n", "0 /home/aalbury/nilearn_data/cobre/fmri_0040061.... \n", "1 /home/aalbury/nilearn_data/cobre/fmri_0040090.... \n", "2 /home/aalbury/nilearn_data/cobre/fmri_0040046.... \n", "3 /home/aalbury/nilearn_data/cobre/fmri_0040002.... \n", "4 /home/aalbury/nilearn_data/cobre/fmri_0040117.... \n", ".. ... \n", "141 /home/aalbury/nilearn_data/cobre/fmri_0040089.... \n", "142 /home/aalbury/nilearn_data/cobre/fmri_0040040.... \n", "143 /home/aalbury/nilearn_data/cobre/fmri_0040028.... \n", "144 /home/aalbury/nilearn_data/cobre/fmri_0040086.... \n", "145 /home/aalbury/nilearn_data/cobre/fmri_0040007.... \n", "\n", "[146 rows x 12 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "full" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing Connectivity" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "from matplotlib.pyplot import figure, savefig\n", "\n", "patient_features = list(full.loc[full['subject_type']=='Patient']['features'])\n", "control_features = list(full.loc[full['subject_type']=='Control']['features'])\n", "\n", "figure(figsize=(16,6))\n", "\n", "plt.subplot(1, 2, 1)\n", "plt.imshow(patient_features, aspect='auto')\n", "plt.colorbar()\n", "plt.title('Patients')\n", "plt.xlabel('features')\n", "plt.ylabel('subjects')\n", "\n", "\n", "plt.subplot(1, 2, 2)\n", "plt.imshow(control_features, aspect='auto')\n", "plt.colorbar()\n", "plt.title('Controls')\n", "plt.xlabel('features')\n", "plt.ylabel('subjects')\n", "\n", "savefig('features.png', transparent=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classification" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This section contains the main analysis of this notebook. Namely, predicting schizophrenia diagnosis. The features used are the correlation matrices generated previously, and diagnosis labels are contained in the `subject_type` column from our phenotypic data.\n", "\n", "We first split the data into training and validation sets, with a ratio of 80/20." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "# Split the sample to training/validation with a 80/20 ratio\n", "\n", "x_train, x_val, y_train, y_val = train_test_split(\n", " list(full['features']), # x\n", " full['subject_type'], # y\n", " test_size = 0.2, # 80%/20% split \n", " shuffle = True, # shuffle dataset\n", " stratify = full['subject_type'],\n", " random_state = 242 \n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our starting classifier with be a linear support vector machine, specified as `SVC()` in Nilearn. This is often the [first recommendation](https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html) for clssification problems with small sample sizes. \n", "\n", "We'll be using 10-fold corss validation to get a rough benchmark of performance for each classifier. We'll use F1 as our performance metric. After each run we'll look at the preformance of the classifier across the folds as well as the average performance." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# build SVC classifier\n", "from sklearn.svm import SVC\n", "svc = SVC(kernel='linear')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.7981132756132755\n", "[0.82857143 0.74825175 0.74825175 0.74825175 1. 0.74825175\n", " 0.90598291 0.81666667 0.80357143 0.63333333]\n" ] } ], "source": [ "# F1 score by averaging each fold\n", "from sklearn.model_selection import cross_val_score\n", "import numpy as np\n", "svc_score = cross_val_score(svc, x_train, y_train, cv=10, scoring = 'f1_macro')\n", "print(np.mean(svc_score))\n", "print(svc_score)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Linear SCV seems to perform very strongly, with an average F1 score of ~0.80\n", "\n", "We'll try gradient boosting next. The gradient boost model will use a greater number of estimators and a larger max depth than the defaults in order to try and improve performance." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# build gradient boost classifier\n", "from sklearn.ensemble import GradientBoostingClassifier\n", "boost = GradientBoostingClassifier(n_estimators=500,\n", " max_depth=4, \n", " random_state=242\n", " )" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.5752838827838829\n", "[0.48571429 0.5 0.74825175 0.625 0.24475524 0.82857143\n", " 0.60714286 0.71794872 0.54545455 0.45 ]\n" ] } ], "source": [ "#train model\n", "boost.fit(x_train, y_train)\n", "\n", "# F1 score by averaging each fold\n", "from sklearn.model_selection import cross_val_score\n", "import numpy as np\n", "boost_score = cross_val_score(boost, x_train, y_train, cv=10, scoring = 'f1_macro')\n", "print(np.mean(boost_score))\n", "print(boost_score)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The gradient boost model seems to be highly variable and doesn't come close to matching the performance of the SVC. \n", "We'll try K Nearest Neighbors next." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.6219476356976357\n", "[0.73333333 0.625 0.58041958 0.48571429 0.48571429 0.4375\n", " 0.71794872 0.71794872 0.71794872 0.71794872]\n" ] } ], "source": [ "# K Nearest Neighbours\n", "from sklearn.neighbors import KNeighborsClassifier\n", "\n", "knn = KNeighborsClassifier()\n", "\n", "knn_score = cross_val_score(knn, x_train, y_train, cv=10, scoring = 'f1_macro')\n", "print(np.mean(knn_score))\n", "print(knn_score)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "K Nearest Neighbors performs poorly with default paramaters. Given the large difference between KNN and the other classifiers I won't try to tweak this alogrithm.\n", "\n", "Lastly we'll try a Random Forest classifier. We'll increase the numebr of estimators like we did with the gradient boost model." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.7322227772227772\n", "[0.83333333 0.73333333 0.65714286 0.58041958 0.74825175 0.91608392\n", " 0.68571429 0.81666667 0.63333333 0.71794872]\n" ] } ], "source": [ "# Random Forest\n", "from sklearn.ensemble import RandomForestClassifier\n", "\n", "rfc = RandomForestClassifier(n_estimators = 500, random_state = 242)\n", "\n", "rfc_score = cross_val_score(rfc, x_train, y_train, cv=10, scoring = 'f1_macro')\n", "print(np.mean(rfc_score))\n", "print(rfc_score)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Random Forest model seems to perform well but not as well as the linear SVC. With some hyperparameter tweaking it might be possible to achieve the same performance but considering the random forest classifier is more complex, and takes longer to train, we'll use SVC as the final model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hyperparameter Tuning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we've committed to a model, let's see if we can get a little more out of it by tweaking the hyperparameters. Unfortunately, the only option for a linear SVC is the `C` parameter.\n", "\n", "We can create a range of values for `C` and then compare each using cross validation." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import validation_curve\n", "\n", "C_range = 10. ** np.arange(-3, 8) # A range of different values for C\n", "\n", "train_scores, valid_scores = validation_curve(svc, x_train, y_train, \n", " param_name= \"C\",\n", " param_range = C_range,\n", " cv=10,\n", " scoring='f1_macro')" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# Creating a Pandas dataframe of the results\n", "tScores = pandas.DataFrame(train_scores).stack().reset_index()\n", "tScores.columns = ['C','Fold','Score']\n", "tScores.loc[:,'Type'] = ['Train' for x in range(len(tScores))]\n", "\n", "vScores = pandas.DataFrame(valid_scores).stack().reset_index()\n", "vScores.columns = ['C','Fold','Score']\n", "vScores.loc[:,'Type'] = ['Validate' for x in range(len(vScores))]\n", "\n", "ValCurves = pandas.concat([tScores,vScores]).reset_index(drop=True)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plotting the performance of different values of C\n", "import seaborn as sns\n", "g = sns.catplot(x='C', y='Score', hue='Type', data=ValCurves, kind='point')\n", "\n", "g.set_xticklabels(C_range, rotation=90)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The best performance seems to be at a C value of 0.1 but it's a negligible difference. But there's one more thing to try.\n", "\n", "What if we changed the SVC kernel to the default 'rbf' which would let us adjust C and gamma? Let's use a grid search to see if optimizing an rbf kernel would perform better than a linear kernel." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "GridSearchCV(cv=10, error_score=nan,\n", " estimator=SVC(C=1.0, break_ties=False, cache_size=200,\n", " class_weight=None, coef0=0.0,\n", " decision_function_shape='ovr', degree=3,\n", " gamma='scale', kernel='rbf', max_iter=-1,\n", " probability=False, random_state=None, shrinking=True,\n", " tol=0.001, verbose=False),\n", " iid='deprecated', n_jobs=None,\n", " param_grid={'C': array([1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04,\n", " 1.e+05, 1.e+06, 1.e+07]),\n", " 'gamma': array([1.e-08, 1.e-07, 1.e-06, 1.e-05, 1.e-04, 1.e-03, 1.e-02, 1.e-01,\n", " 1.e+00, 1.e+01, 1.e+02])},\n", " pre_dispatch='2*n_jobs', refit=True, return_train_score=False,\n", " scoring=None, verbose=0)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# RBF SVC model\n", "from sklearn.model_selection import GridSearchCV\n", "\n", "svc_rbf = SVC(kernel='rbf')\n", "\n", "C_range = 10. ** np.arange(-3, 8)\n", "gamma_range = 10. ** np.arange(-8, 3)\n", "\n", "param_grid = dict(gamma=gamma_range, C=C_range)\n", "\n", "grid = GridSearchCV(svc_rbf, param_grid=param_grid, cv=10)\n", "\n", "grid.fit(x_train, y_train)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'C': 100.0, 'gamma': 0.001}\n" ] } ], "source": [ "print(grid.best_params_)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.8061452436452436\n", "[0.82857143 0.82857143 0.74825175 0.74825175 1. 0.74825175\n", " 0.90598291 0.81666667 0.80357143 0.63333333]\n" ] } ], "source": [ "svc_rbf = SVC(kernel='rbf', C=100.0, gamma=0.001)\n", "\n", "svc_rbf_score = cross_val_score(svc_rbf, x_train, y_train, cv=10, scoring = 'f1_macro')\n", "print(np.mean(svc_rbf_score))\n", "print(svc_rbf_score)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems like SVC with an RBF kernel and tuned hyperparameters performs slightly better than linear SVC, so we'll use this as the final model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing The Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now run the model on the left out data and see how it performs." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "F1: 0.6875\n", "Accuracy: 0.6666666666666666\n" ] } ], "source": [ "# Validation\n", "from sklearn.metrics import f1_score, accuracy_score\n", "svc_rbf.fit(x_train, y_train)\n", "final_pred = svc_rbf.predict(x_val)\n", "print('F1:', f1_score(y_val, final_pred, pos_label='Patient'))\n", "print('Accuracy:', accuracy_score(y_val, final_pred))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An F1 score of .69 isn't too bad for a binary classification problem. Let's see how the model is handling the labels by taking a look at the confusion matrix." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 9 6]\n", " [ 4 11]]\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "from sklearn.metrics import plot_confusion_matrix\n", "\n", "disp = plot_confusion_matrix(svc_rbf, x_val, y_val,\n", " cmap=plt.cm.Blues,\n", " normalize=None)\n", "disp.ax_.set_title('SVC Schizophrenia Labels')\n", "\n", "print(disp.confusion_matrix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The model seems to handle each class equally well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicting Schizophrenia Subtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The phenotypic data also includes the schizophrenia subtype that each patient was diagnosed with. Maybe we can predict subtype as well. Let's take a look at how they are distributed." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "None 72\n", "295.3 41\n", "295.6 12\n", "295.7 5\n", "295.9 5\n", "295.1 3\n", "296.26 1\n", "296.4 1\n", "290.3 1\n", "295.92 1\n", "311 1\n", "295.70 bipolar type 1\n", "295.70 depressed type 1\n", "295.2 1\n", "Name: diagnosis, dtype: int64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "full.diagnosis.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The distribution of schizohprenia subtypes seems highly unbalanced. Most of the patients were diagnosed with the label \"295.3\" which refers to paranoid schizophrenia. There are very few observations for the other subtypes and so it's unlikely that any model could predict these with so little data. Maybe we can predict paranoid schizophrenia from the other subtypes." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# creating a new variable for subtype\n", "diagnosis=[]\n", "\n", "for i in full.index:\n", " if full.loc[i, 'diagnosis']=='295.3':\n", " diagnosis.append('Paranoid')\n", " elif full.loc[i, 'diagnosis']=='None':\n", " diagnosis.append('None')\n", " else:\n", " diagnosis.append('Other')\n", " \n", "full['type'] = diagnosis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll split the data again. Stratified by our new subtype variable." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "# Split the sample to training/validation with a 80/20 ratio\n", "\n", "x_train2, x_val2, y_train2, y_val2 = train_test_split(\n", " list(full['features']), # x\n", " full['type'], # y\n", " test_size = 0.2, # 80%/20% split \n", " shuffle = True, # shuffle dataset\n", " stratify = full['type'],\n", " random_state = 242 \n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's avoid running all of the models separately again. It would be much easier to compare a lot of models at once. The cell below defines several models and then loops over them to generate cross validated performance metrics. A more detailed example of this can be found [here]()." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nearest Neighbors 0.39345543345543355\n", "Linear SVM 0.4076200651200651\n", "RBF SVM 0.21960784313725487\n", "Gaussian Process 0.1431135531135531\n", "Decision Tree 0.34980260480260483\n", "Random Forest 0.36240516564045977\n", "Neural Net 0.3840762723115664\n", "AdaBoost 0.3908056540409482\n", "Naive Bayes 0.41458892958892957\n" ] } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.neural_network import MLPClassifier\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.svm import SVC\n", "from sklearn.gaussian_process import GaussianProcessClassifier\n", "from sklearn.gaussian_process.kernels import RBF\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier\n", "from sklearn.naive_bayes import GaussianNB\n", "\n", "np.random.seed(242)\n", "\n", "names = [\"Nearest Neighbors\", \"Linear SVM\", \"RBF SVM\", \"Gaussian Process\",\n", " \"Decision Tree\", \"Random Forest\", \"Neural Net\", \"AdaBoost\",\n", " \"Naive Bayes\"]\n", "\n", "classifiers = [\n", " KNeighborsClassifier(3),\n", " SVC(kernel=\"linear\"),\n", " SVC(gamma=2, C=1),\n", " GaussianProcessClassifier(1.0 * RBF(1.0)),\n", " DecisionTreeClassifier(max_depth=5),\n", " RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),\n", " MLPClassifier(alpha=1, max_iter=1000),\n", " AdaBoostClassifier(),\n", " GaussianNB()]\n", "\n", "for name, clf in zip(names, classifiers):\n", " \n", " score = cross_val_score(clf, x_train2, y_train2, cv=10, scoring='f1_macro')\n", " \n", " print(name, np.mean(score))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A Gaussian Naive Bayes model performs slightly better than linear SVC, so we'll use it in this case. But I think this is another example of how powerful SVM is as an approach." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.4464646464646464" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Validation\n", "NB = GaussianNB()\n", "\n", "NB.fit(x_train2, y_train2)\n", "type_pred = NB.predict(x_val2)\n", "f1_score(y_val2, type_pred, average='macro')" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[10 3 2]\n", " [ 4 3 0]\n", " [ 4 2 2]]\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "from sklearn.metrics import plot_confusion_matrix\n", "\n", "\n", "disp = plot_confusion_matrix(NB, x_val2, y_val2,\n", " #display_labels=class_names,\n", " cmap=plt.cm.Blues,\n", " normalize=None)\n", "disp.ax_.set_title('Naive Bayes: Schizophrenia Type')\n", "\n", "\n", "print(disp.confusion_matrix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems like the model is performing very well predicting controls, but poorly differentiating schizophrenia patients." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }