{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Preparation Machine Learning\n",
    "\n",
    "For the machine learning tutorial in the [Nibabel & PyMVPA](machine_learning_nilearn_and_pymvpa.ipynb) and the [Keras](machine_learning_keras.ipynb) notebook, we will need some pre-processed and prepared data. This notebook covers how one could prepare their data. But it's always important to consider what the data looks like and what we want to do.\n",
    "\n",
    "## What do we want to do?\n",
    "\n",
    "For the machine learning tutorial, we will use the dataset from [Zang et al.](https://link.springer.com/article/10.1007%2Fs12021-013-9187-0). It contains 48 subjects, where each subject did two resting-state fMRI recordings. Once with eyes **open** and once with eyes **closed**. The data was already pre-processed and is already ready for the machine learning notebooks. ***Note***: The data diverges from the original data in the way, that we only consider the first 100 volumes for this tutorial. The original dataset had 240 volumes per run.\n",
    "\n",
    "What we would like to do is to see if we can predict if a person had their eyes open or not, solely based on the fMRI signal.\n",
    "\n",
    "<img src=\"./data/resting_state_eyes.gif\" style=\"width:500px;\">\n",
    "\n",
    "With such an obvious signal difference we should be able to classify the two categories `'closed'` and `'open'` if we leave the signal around the eyes in the data. But before we can do any machine learning we need to prepare the data accordingly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why do we need to prepare the data?\n",
    "\n",
    "Let's take a look at the data of one subject:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nilearn.plotting import plot_anat\n",
    "%matplotlib inline\n",
    "import nibabel as nb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_func = nb.load('data/sub-01_rest-EC.nii.gz')\n",
    "plot_anat(img_func.slicer[..., 0], dim='auto', draw_cross=False, annotate=False,\n",
    "          cmap='magma', vmax=1250, cut_coords=[33, -20, 20], colorbar=True,\n",
    "          title='Normalized and Cleaned Functional Image; Vox.Res.: 4mm^3')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This all looks fine. Why do we need to prepare something? Well, let's take a look at two things.\n",
    "\n",
    "### 1. What does the signal time-course of a voxel look like?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from matplotlib.pyplot import plot\n",
    "plot(img_func.get_data()[19, 16, 17, :])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see, the data still has a slight linear trend and is centered around a value of 800 (for this particular voxel). To be able to do some machine learning on this data we therefore need to remove the linear trend and ideally zscore the data.\n",
    "\n",
    "This can be done with the following commands:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy.signal import detrend\n",
    "from scipy.stats import zscore\n",
    "\n",
    "# Detrend and zscore the data\n",
    "data = img_func.get_data()\n",
    "data = detrend(data)\n",
    "data = np.nan_to_num(zscore(data, axis=0))\n",
    "\n",
    "# Plot the cleaned signal\n",
    "plot(data[19, 16, 17, :]);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Perfect, that looks much better!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. How many nonzero voxels do we have?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_func.get_data().shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.sum(img_func.get_data().mean(axis=-1)!=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Well, those are all voxels of the 40 x 52 x 38 matrix. It doesn't make sense that we run machine learning outside of the brain.\n",
    "\n",
    "So let's use a mask to only keep those voxels that we're interested in. For this purpose we will use the MNI-152 template brain mask and an eye mask that we've created for this workshop. Both can be found under `/home/neuro/workshop/notebooks/data/templates`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nilearn.image import math_img\n",
    "\n",
    "# Specify location of the brain and eye image\n",
    "brain = '/home/neuro/workshop/notebooks/data/templates/MNI152_T1_1mm_brain.nii.gz'\n",
    "eyes = '/home/neuro/workshop/notebooks/data/templates/MNI152_T1_1mm_eye.nii.gz'\n",
    "\n",
    "# Combine the two template images\n",
    "img_roi = math_img(\"img1 + img2\", img1=brain, img2=eyes)\n",
    "\n",
    "# Plot the region-of-interest (ROI) template\n",
    "plot_anat(img_roi, dim='auto', draw_cross=False, annotate=False,\n",
    "          cut_coords=[33, -20, 20], title='Brain & Eye MNI Template; Vox.Res.: 1mm^3')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great, now we just need to binarize this template to get a mask, dilate this mask a bit to be sure that we keep all relevant voxels and multiply it with the functional image. But before we can do any of this we also need to resample the ROI template to a voxel resolution for 4x4x4mm, as the functional images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Resample ROI template to functional image resolution\n",
    "from nilearn.image import resample_to_img\n",
    "\n",
    "img_resampled = resample_to_img(img_roi, img_func)\n",
    "\n",
    "plot_anat(img_resampled, dim='auto', draw_cross=False, annotate=False,\n",
    "          cut_coords=[33, -20, 20], title='Brain & Eye MNI Template; Vox.Res.: 4mm^3')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.ndimage import binary_dilation\n",
    "\n",
    "# Binarize ROI template\n",
    "data_binary = np.array(img_resampled.get_data()>=10, dtype=np.int8)\n",
    "\n",
    "# Dilate binary mask once\n",
    "data_dilated = binary_dilation(data_binary, iterations=1).astype(np.int8)\n",
    "\n",
    "# Save binary mask in NIfTI image\n",
    "img_mask = nb.Nifti1Image(data_dilated, img_resampled.affine, img_resampled.header)\n",
    "img_mask.set_data_dtype('i1')\n",
    "\n",
    "# Plot binary mask (overlayed over MNI-152_Template)\n",
    "from nilearn.plotting import plot_roi\n",
    "plot_roi(img_mask, draw_cross=False, annotate=False, black_bg=True,\n",
    "         bg_img='/home/neuro/workshop/notebooks/data/templates/MNI152_T1_1mm.nii.gz', cut_coords=[33, -20, 20],\n",
    "         title='Dilated Brain & Eye Mask; Vox.Res.: 4mm^3', cmap='magma_r', dim=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cool. How many voxels do we have now?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.sum(img_mask.get_data())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great, that's a 53% reduction of datapoints that we need to consider in our machine-learning approach!\n",
    "\n",
    "Now we only have to multiply this mask with our functional images and remove tailing zeros from the 3D matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Multiply the functional image with the mask\n",
    "img_cleaned = math_img('img1 * img2', img1=img_func, img2=img_mask.slicer[..., None])\n",
    "\n",
    "# Remove as many zero rows in the data matrix to reduce overall volume size\n",
    "from nilearn.image import crop_img\n",
    "img_crop = crop_img(img_cleaned)\n",
    "\n",
    "# Plot the \n",
    "from nilearn.plotting import plot_anat\n",
    "plot_anat(img_crop.slicer[..., 0], dim='auto', draw_cross=False, annotate=False,\n",
    "          cmap='magma', vmax=1250, cut_coords=[33, -20, 20], colorbar=True,\n",
    "          title='Masked functional image; Vox.Res.: 4mm^3')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparing the data\n",
    "\n",
    "If we do all the steps that we discussed above in one go, it looks like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "in_file = 'data/sub-01_rest-EC.nii.gz'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load functional image\n",
    "img_func = nb.load(in_file)\n",
    "\n",
    "# Detrend and zscore data and save it under a new NIfTI file\n",
    "data = img_func.get_data()\n",
    "data = detrend(data)\n",
    "data = np.nan_to_num(zscore(data, axis=0))\n",
    "img_standardized = nb.Nifti1Image(data, img_func.affine, img_func.header)\n",
    "\n",
    "# Create MNI-152 template brain and eye mask\n",
    "brain = '/home/neuro/workshop/notebooks/data/templates/MNI152_T1_1mm_brain.nii.gz'\n",
    "eyes = '/home/neuro/workshop/notebooks/data/templates/MNI152_T1_1mm_eye.nii.gz'\n",
    "img_roi = math_img(\"img1 + img2\", img1=brain, img2=eyes)\n",
    "img_resampled = resample_to_img(img_roi, img_func)\n",
    "data_binary = np.array(img_resampled.get_data()>=10, dtype=np.int8)\n",
    "data_dilated = binary_dilation(data_binary, iterations=1).astype(np.int8)\n",
    "img_mask = nb.Nifti1Image(data_dilated, img_resampled.affine, img_resampled.header)\n",
    "img_mask.set_data_dtype('i1')\n",
    "\n",
    "# Multiply functional image with mask and crop image\n",
    "img_cleaned = math_img('img1 * img2',\n",
    "                       img1=img_standardized, img2=img_mask.slicer[..., None])\n",
    "img_crop = crop_img(img_cleaned)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the result looks as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_anat(img_crop.slicer[..., 0], dim='auto', draw_cross=False, annotate=False,\n",
    "          cmap='magma', cut_coords=[33, -20, 20], colorbar=True,\n",
    "          title='Masked and standardized functional image; Vox.Res.: 4mm^3')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating the machine learning dataset\n",
    "\n",
    "Above we showed you how to prepare the data of an individual run for machine-learning. We now could use the 100 volumes per run and try to do machine learning on this. But this might not be the best approach.\n",
    "\n",
    "Let's consider again what we want to do. We want to predict if a person has their eyes closed or open during a resting state scan. Our assumption is that during the **eyes open** there is more eye movement, more visual stimulation, i.e. more variance in certain regions. Therefore, we want to look at the standard deviation over time (i.e. over the 100 volumes per run).\n",
    "\n",
    "**Keep in mind** that this approach is more or less \"randomly\" chosen by us to be appropriate for this particular classification and might differ a lot to other datasets, research questions etc.\n",
    "\n",
    "To nonetheless keep enough data points, let's take the 100 volumes, and compute the standard deviation for 4 equally long sections:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_std1 = nb.Nifti1Image(img_crop.get_data()[...,0:25].std(axis=-1), img_crop.affine, img_crop.header)\n",
    "img_std2 = nb.Nifti1Image(img_crop.get_data()[...,25:50].std(axis=-1), img_crop.affine, img_crop.header)\n",
    "img_std3 = nb.Nifti1Image(img_crop.get_data()[...,50:75].std(axis=-1), img_crop.affine, img_crop.header)\n",
    "img_std4 = nb.Nifti1Image(img_crop.get_data()[...,75:100].std(axis=-1), img_crop.affine, img_crop.header)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_anat(img_std1, draw_cross=False, annotate=False, cmap='magma',\n",
    "          cut_coords=[33, -20, 20], vmax=3, colorbar=True,\n",
    "          title='Standard Deviation for Segment 1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we do this now for each of the eyes **closed** and **open** run, for each of the total 48 subjects in the dataset, we will get 4 segments x 2 eye_state x 48 subjects = 384 datapoints per voxel. The pre-processing of all 48 subjects would explode the scope of this workshop, we therefore already pre-processed all subjects and prepared the data for the machine-learning approach.\n",
    "\n",
    "### The dataset ready for the machine-learning approach can be found under:\n",
    "\n",
    "`/home/neuro/workshop/notebooks/data/dataset_ML.nii.gz`"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}