{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\nA introduction tutorial to fMRI decoding\n==========================================\n\nHere is a simple tutorial on decoding with nilearn. It reproduces the\nHaxby 2001 study on a face vs cat discrimination task in a mask of the\nventral stream.\n\nThis tutorial is meant as an introduction to the various steps of a\ndecoding analysis.\n\nIt is not a minimalistic example, as it strives to be didactic. It is not\nmeant to be copied to analyze new data: many of the steps are unecessary.\n :depth: 1\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Retrieve and load the fMRI data from the Haxby study\n------------------------------------------------------\n\nFirst download the data\n........................\n\nThe :func:`nilearn.datasets.fetch_haxby` function will download the\nHaxby dataset if not present on the disk, in the nilearn data directory.\nIt can take a while to download about 310 Mo of data from the Internet.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from nilearn import datasets\n# By default 2nd subject will be fetched\nhaxby_dataset = datasets.fetch_haxby()\n# 'func' is a list of filenames: one for each subject\nfmri_filename = haxby_dataset.func[0]\n\n# print basic information on the dataset\nprint('First subject functional nifti images (4D) are at: %s' %\n fmri_filename) # 4D data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Visualizing the fmri volume\n............................\n\nOne way to visualize a fmri volume is using :func:`nilearn.plotting.plot_epi`.\nWe will visualize the previously fecthed fmri data from Haxby dataset.\n\nBecause fmri data is 4D (it consists of many 3D EPI images), we cannot \nplot it directly using :func:`nilearn.plotting.plot_epi` (which accepts \njust 3D input). Here we are using :func:`nilearn.image.mean_img` to \nextract a single 3D EPI image from the fmri data.\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from nilearn import plotting\nfrom nilearn.image import mean_img\nplotting.view_img(mean_img(fmri_filename), threshold=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Feature extraction: from fMRI volumes to a data matrix\n.......................................................\n\nThese are some really lovely images, but for machine learning we need \nmatrices to work with the actual data. To transform our Nifti images into\nmatrices, we will use the :class:`nilearn.input_data.NiftiMasker` to \nextract the fMRI data on a mask and convert it to data series.\n\nA mask of the Ventral Temporal (VT) cortex coming from the\nHaxby study is available:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "mask_filename = haxby_dataset.mask_vt[0]\n\n# Let's visualize it, using the subject's anatomical image as a\n# background\nplotting.plot_roi(mask_filename, bg_img=haxby_dataset.anat[0],\n cmap='Paired')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we use the NiftiMasker.\n\nWe first create a masker, and ask it to normalize the data to improve the\ndecoding. The masker will extract a 2D array ready for machine learning\nwith nilearn:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from nilearn.input_data import NiftiMasker\nmasker = NiftiMasker(mask_img=mask_filename, standardize=True)\nfmri_masked = masker.fit_transform(fmri_filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ".. seealso::\n\tYou can ask the NiftiMasker to derive a mask given the data. In\n\tthis case, it is interesting to have a look at a report to see the\n\tcomputed mask by using `masker.generate_report`.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "masker.generate_report()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variable \"fmri_masked\" is a numpy array:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(fmri_masked)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Its shape corresponds to the number of time-points times the number of\nvoxels in the mask\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(fmri_masked.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One way to think about what just happened is to look at it visually:\n\n![](/images/masking.jpg)\n\n\nEssentially, we can think about overlaying a 3D grid on an image. Then,\nour mask tells us which cubes or \"voxels\" (like 3D pixels) to sample from.\nSince our Nifti images are 4D files, we can't overlay a single grid --\ninstead, we use a series of 3D grids (one for each volume in the 4D file),\nso we can get a measurement for each voxel at each timepoint. These are\nreflected in the shape of the matrix ! You can check this by checking the\nnumber of non-negative voxels in our binary brain mask.\n\n.. seealso::\n\tThere are many other strategies in Nilearn `for masking data and for\n\tgenerating masks `\n\tI'd encourage you to spend some time exploring the documentation for these.\n\tWe can also `display this time series `sphx_glr_auto_examples_03_connectivity_plot_sphere_based_connectome.py` to get an intuition of how the\n\twhole brain signal is changing over time.\n\nWe'll display the first three voxels by sub-selecting values from the\nmatrix. You can also find more information on how to slice arrays `here\n`_.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\nplt.plot(fmri_masked[5:150, :3])\n\nplt.title('Voxel Time Series')\nplt.xlabel('Scan number')\nplt.ylabel('Normalized signal')\nplt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the behavioral labels\n...........................\n\nNow that the brain images are converted to a data matrix, we can apply \nmachine-learning to them, for instance to predict the task that the subject \nwas doing. The behavioral labels are stored in a CSV file, separated by\nspaces.\n\nWe use pandas to load them in an array.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n# Load behavioral information\nbehavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=' ')\nprint(behavioral)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The task was a visual-recognition task, and the labels denote the \nexperimental condition: the type of object that was presented to the \nsubject. This is what we are going to try to predict.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "conditions = behavioral['labels']\nconditions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Restrict the analysis to cats and faces\n........................................\n\nAs we can see from the targets above, the experiment contains many\nconditions. As a consequence the data is quite big:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(fmri_masked.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not all of this data has an interest to us for decoding, so we will keep\nonly fmri signals corresponding to faces or cats. We create a mask of\nthe samples belonging to the condition; this mask is then applied to the\nfmri data to restrict the classification to the face vs cat discrimination.\nAs a consequence, the input data is much small (i.e. fmri signal is shorter):\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "condition_mask = conditions.isin(['face', 'cat'])\nfmri_masked = fmri_masked[condition_mask]\nprint(fmri_masked.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We apply the same mask to the targets\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "conditions = conditions[condition_mask]\nprint(conditions.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Decoding with an SVM\n---------------------\n\nWe will now use the `scikit-learn `_\nmachine-learning toolbox on the fmri_masked data.\n\nAs a decoder, we use a Support Vector Classification, with a linear\nkernel.\n\nWe first create it:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.svm import SVC\nsvc = SVC(kernel='linear')\nprint(svc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The svc object is an object that can be fit (or trained) on data with\nlabels, and then predict labels on data without.\n\nWe first fit it on the data\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "svc.fit(fmri_masked, conditions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then predict the labels from the data\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "prediction = svc.predict(fmri_masked)\nprint(prediction)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's measure the error rate:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print((prediction == conditions).sum() / float(len(conditions)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This error rate is meaningless. Why?\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Measuring prediction scores using cross-validation\n---------------------------------------------------\n\nThe proper way to measure error rates or prediction accuracy is via\ncross-validation: leaving out some data and testing on it.\n\nManually leaving out data\n..........................\n\nLet's leave out the 30 last data points during training, and test the\nprediction on these 30 last points:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "svc.fit(fmri_masked[:-30], conditions[:-30])\n\nprediction = svc.predict(fmri_masked[-30:])\nprint((prediction == conditions[-30:]).sum() / float(len(conditions[-30:])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Implementing a KFold loop\n..........................\n\nWe can split the data in train and test set repetitively in a `KFold`\nstrategy:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.model_selection import KFold\n\ncv = KFold(n_splits=5)\n\n# The \"cv\" object's split method can now accept data and create a\n# generator which can yield the splits.\nfor train, test in cv.split(X=fmri_masked):\n conditions_masked = conditions.values[train]\n svc.fit(fmri_masked[train], conditions_masked)\n prediction = svc.predict(fmri_masked[test])\n print((prediction == conditions.values[test]).sum()\n / float(len(conditions.values[test])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cross-validation with scikit-learn\n...................................\n\nScikit-learn has tools to perform cross-validation easier:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.model_selection import cross_val_score\ncv_score = cross_val_score(svc, fmri_masked, conditions)\nprint(cv_score)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Note

\n\tWe can speed things up to use all the CPUs of our computer with the\n\tn_jobs parameter.\n\nThe best way to do cross-validation is to respect the structure of\nthe experiment, for instance by leaving out full sessions of\nacquisition.\n\nThe number of the session is stored in the CSV file giving the\nbehavioral data. We have to apply our session mask, to select only cats\nand faces.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "session_label = behavioral['chunks'][condition_mask]\n\n# By default, cross_val_score uses a 3-fold KFold. We can control this by\n# passing the \"cv\" object, here a 5-fold:\ncv_score = cross_val_score(svc, fmri_masked, conditions, cv=cv)\nprint(cv_score)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fMRI data is acquired by sessions, and the noise is autocorrelated\nin a given session. Hence, it is better to predict across sessions when\ndoing cross-validation. To leave a session out, pass it to the groups\nparameter of cross_val_score.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.model_selection import LeaveOneGroupOut\ncv = LeaveOneGroupOut()\ncv_score = cross_val_score(svc,\n fmri_masked,\n conditions,\n cv=cv,\n groups=session_label,\n )\nprint(cv_score)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspecting the model weights\n-----------------------------\n\nFinally, it may be useful to inspect and display the model weights.\n\nTurning the weights into a nifti image\n.......................................\n\nWe retrieve the SVC discriminating weights\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "coef_ = svc.coef_\nprint(coef_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's a numpy array with only one coefficient per voxel:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(coef_.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to turn it back into a Nifti image, in essence, \"inverting\"\nwhat the NiftiMasker has done.\n\nFor this, we can call inverse_transform on the NiftiMasker:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "coef_img = masker.inverse_transform(coef_)\nprint(coef_img)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "coef_img is now a NiftiImage.\n\nWe can save the coefficients as a nii.gz file:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "coef_img.to_filename('haxby_svc_weights.nii.gz')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plotting the SVM weights\n.........................\n\nWe can plot the weights, using the subject's anatomical as a background\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from nilearn.plotting import plot_stat_map, show\n\nplot_stat_map(coef_img, bg_img=haxby_dataset.anat[0],\n title=\"SVM weights\", display_mode=\"yx\")\n\nshow()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Further reading\n----------------\n\n* The `section of the documentation on decoding `\n\n* `sphx_glr_auto_examples_02_decoding_plot_haxby_anova_svm.py`\n For decoding without a precomputed mask\n\n* `space_net`\n\n______________\n\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }