Example 1 Overview

GLMsingle is new tool that provides efficient, scalable, and accurate single-trial fMRI response estimates.

The purpose of this Example 1 notebook is to guide the user through basic calls to GLMsingle, using a representative, small-scale test dataset (in this case, an example session from a rapid event-related visual fMRI dataset - the Natural Scenes Dataset core experiment).

The goal is to examine the effect of GLMsingle on the reliability of single-trial fMRI response estimates. By default, the tool implements a set of optimizations that improve upon generic GLM approaches by: (1) identifying an optimal hemodynamic response function (HRF) at each voxel, (2) deriving a set of useful GLM nuisance regressors via "GLMdenoise" and picking an optimal number to include in the final GLM, and (3) applying a custom amount of ridge regularization at each voxel using an efficient technique called "fracridge". The output of GLMsingle are GLM betas reflecting the estimated percent signal change in each voxel in response to each experimental stimulus or condition being modeled.

Beyond directly improving the reliability of neural responses to repeated stimuli, these optimized techniques for signal estimation can have a range of desirable downstream effects such as: improving cross-subject representational similarity within and between datasets; improving the single-image decodability of evoked neural patterns via MVPA; and, decreasing the correlation in spatial patterns observed at neighboring timepoints in analysis of fMRI GLM outputs. See our video presentation at V-VSS 2020 for a summary of these phenomena as observed in recent massive-scale fMRI datasets (the Natural Scenes Dataset and BOLD5000): https://www.youtube.com/watch?v=yb3Nn7Han8o

Example 1 contains a full walkthrough of the process of loading an example dataset and design matrix, estimating neural responses using GLMsingle, estimating the reliability of responses at each voxel, and comparing those achieved via GLMsingle to those achieved using a baseline GLM. After loading and visualizing formatted fMRI time-series and their corresponding design matrices, we will describe the default behavior of GLMsingle and show how to modify hyperparameters if the user desires. Throughout the notebook we will highlight important metrics and outputs using figures, print statements, and comments.

Users encountering bugs, unexpected outputs, or other issues regarding GLMsingle shouldn't hesitate to raise an issue on GitHub: https://github.com/kendrickkay/GLMsingle/issues

Contents

Add dependencies and download the example dataset

% We will assume that the current working directory is the directory that
% contains this script.

% Add path to GLMsingle
addpath(genpath('../../matlab'));

% You also need fracridge repository to run this code. For example, you
% could do:
%   git clone https://github.com/nrdg/fracridge.git
% and then do:
%   addpath('fracridge')

% Start fresh
clear
clc
close all

% Name of directory to which outputs will be saved
outputdir = 'example1outputs';

% Download files to data directory
if ~exist('./data','dir')
    mkdir('data')
end

if ~exist('./data/nsdcoreexampledataset.mat','file')
    % download data with curl
    system('curl -L --output ./data/nsdcoreexampledataset.mat https://osf.io/k89b2/download')
end
load('./data/nsdcoreexampledataset.mat')
% Data comes from the NSD dataset (subj01, nsd01 scan session).
% https://www.biorxiv.org/content/10.1101/2021.02.22.432340v1.full.pdf

Data overview

clc
whos

% data -> consists of several runs of 4D volume files (x,y,z,t)  where
% (t)ime is the 4th dimention. In this example, data consists of only a
% single slice and has been prepared with a TR = 1s

% ROI -> manually defined region in the occipital cortex. It is a binary
% matrix where (x,y,z) = 1 corresponds to the cortical area that responded
% to visual stimuli used in the NSD project.

fprintf('There are %d runs in total.\n',length(design));
fprintf('The dimensions of the data for the first run are %s.\n',mat2str(size(data{1})));
fprintf('The stimulus duration is %.3f seconds.\n',stimdur);
fprintf('The sampling rate (TR) is %.3f seconds.\n',tr);
  Name             Size                 Bytes  Class     Attributes

  ROI            145x186               215760  double              
  data             1x12             388369344  cell                
  design           1x12                 69408  cell                
  outputdir        1x15                    30  char                
  stimdur          1x1                      8  double              
  tr               1x1                      8  double              

There are 12 runs in total.
The dimensions of the data for the first run are [145 186 1 300].
The stimulus duration is 3.000 seconds.
The sampling rate (TR) is 1.000 seconds.
figure(1);clf

%Show example design matrix.
for d = 1
    imagesc(design{d}); colormap gray; drawnow
    xlabel('Conditions')
    ylabel('TRs')
    title(sprintf('Design matrix for run%i',d))
    axis image
end

xticks(0:53:length(design{d}))
set(gcf,'Position',[418   412   782   605])

design -> Each run has a corresponding design matrix where each column describes a single condition (conditions are repeated across runs). Each design matrix is binary with 1 specfing the time (TR) when the stimulus is presented on the screen.

In this NSD scan session, there are a total of 750 trials, in which a total of 583 distinct images are shown. (Thus, some images were presented more than once.) In the design matrix shown, there are 583 predictor columns/conditions, one per distinct image. Notice that white rectangles are pseudo randomized and they indicate when the presentation of each image occurs. Note that in some runs not all images are shown; if a column does not have a white rectangle it means that this image is shown in a different run within the session.

% Show an example slice of the first fMRI volume.
figure(2);clf
imagesc(data{1}(:,:,:,1));
colormap(gray);
axis equal tight;
c=colorbar;
title('fMRI data (first volume)');
set(gcf,'Position',[418   412   782   605])
axis off
c.Label.String = 'T2*w intensity';
set(gca,'FontSize',15)

Call GLMestimatesingletrial with default parameters

% Outputs and figures will be stored in a folder (you can specify its name
% as the 5th input to GLMestimatesingletrial). Model estimates can be also
% saved to the 'results' variable which is the only output of
% GLMestimatesingletrial.

% Optional parameters below can be assigned to a structure, i.e., opt =
% struct('wantlibrary',1,'wantglmdenoise',1); Options are the 6th input to
% GLMestimatesingletrial.

% There are many options that can be specified; here, we comment on the
% main options that one might want to modify/set. Defaults for the options
% are indicated below.

% wantlibrary = 1 -> Fit HRF to each voxel
% wantglmdenoise = 1 -> Use GLMdenoise
% wantfracridge = 1  -> Use ridge regression to improve beta estimates
% chunknum = 50000 -> is the number of voxels that we will
%     process at the same time. For setups with lower memory, you may need to
%     decrease this number.

% wantmemoryoutputs is a logical vector [A B C D] indicating which of the
%     four model types to return in the output <results>. The user must be
%     careful with this, as large datasets can require a lot of RAM. If you
%     do not request the various model types, they will be cleared from
%     memory (but still potentially saved to disk). Default: [0 0 0 1]
%     which means return only the final type-D model.

% wantfileoutputs is a logical vector [A B C D] indicating which of the
%     four model types to save to disk (assuming that they are computed). A
%     = 0/1 for saving the results of the ONOFF model, B = 0/1 for saving
%     the results of the FITHRF model, C = 0/1 for saving the results of the
%     FITHRF_GLMdenoise model, D = 0/1 for saving the results of the
%     FITHRF_GLMdenoise_RR model. Default: [1 1 1 1] which means save all
%     computed results to disk.

% numpcstotry (optional) is a non-negative integer indicating the maximum
%     number of GLMdenoise PCs to enter into the model. Default: 10.

% fracs (optional) is a vector of fractions that are greater than 0
%     and less than or equal to 1. We automatically sort in descending
%     order and ensure the fractions are unique. These fractions indicate
%     the regularization levels to evaluate using fractional ridge
%     regression (fracridge) and cross-validation. Default:
%     fliplr(.05:.05:1). A special case is when <fracs> is specified as a
%     single scalar value. In this case, cross-validation is NOT performed
%     for the type-D model, and we instead blindly use the supplied
%     fractional value for the type-D model.

% For the purpose of this example, we will keep all outputs in the memory.
opt = struct('wantmemoryoutputs',[1 1 1 1]);

% This example saves output .mat files to the folder
% "example1outputs/GLMsingle". If these outputs don't already exist, we
% will perform the time-consuming call to GLMestimatesingletrial.m;
% otherwise, we will just load from disk.
if ~exist([outputdir '/GLMsingle'],'dir')

    [results] = GLMestimatesingletrial(design,data,stimdur,tr,[outputdir '/GLMsingle'],opt);

    % We assign outputs of GLMestimatesingletrial to "models" structure.
    % Note that results{1} contains GLM estimates from an ONOFF model,
    % where all images are treated as the same condition. These estimates
    % could be potentially used to find cortical areas that respond to
    % visual stimuli. We want to compare beta weights between conditions
    % therefore we are not going to store the ONOFF GLM results.

    clear models;
    models.FIT_HRF = results{2};
    models.FIT_HRF_GLMdenoise = results{3};
    models.FIT_HRF_GLMdenoise_RR = results{4};

else
    % Load existing file outputs if they exist
    results = load([outputdir '/GLMsingle/TYPEB_FITHRF.mat']);
    models.FIT_HRF = results;
    results = load([outputdir '/GLMsingle/TYPEC_FITHRF_GLMDENOISE.mat']);
    models.FIT_HRF_GLMdenoise = results;
    results = load([outputdir '/GLMsingle/TYPED_FITHRF_GLMDENOISE_RR.mat']);
    models.FIT_HRF_GLMdenoise_RR = results;

end

Summary of important outputs

% The outputs of GLMestimatesingletrial.m are formally documented in its
% header. Here, we highlight a few of the more important outputs:
%
% R2 -> is model accuracy expressed in terms of R^2 (percentage).
%
% modelmd -> is the full set of single-trial beta weights (X x Y x Z x
% TRIALS). Beta weights are arranged in chronological order.
%
% HRFindex -> is the 1-index of the best fit HRF. HRFs can be recovered
% with getcanonicalHRFlibrary(stimdur,tr)
%
% FRACvalue -> is the fractional ridge regression regularization level
% chosen for each voxel. Values closer to 1 mean less regularization.

Plot a slice of brain showing GLMsingle outputs

% We are going to plot several outputs from the FIT_HRF_GLMdenoise_RR GLM,
% which contains the full set of GLMsingle optimizations.

slice = 1;

% we will plot betas, R2, optimal HRF indices, and the voxel frac values
val2plot = {'modelmd';'R2';'HRFindex';'FRACvalue'};
cmaps = {cmapsign2;hot;jet;copper};

figure(3);clf

for v = 1 : length(val2plot)

    f=subplot(2,2,v);

    if contains('modelmd',val2plot{v})
        % When plotting betas, for simplicity just average across all image
        % presentations This will yield a summary of whether voxels tend to
        % increase or decrease their activity in response to the
        % experimental stimuli (similar to outputs from an ONOFF GLM)
        imagesc(nanmean(models.FIT_HRF_GLMdenoise_RR.(val2plot{v})(:,:,slice),4),[-5 5]); axis off image;
        title('Average GLM betas (750 stimuli)')

    else
        % Plot all other voxel-wise metrics as outputted from GLMsingle
        imagesc(models.FIT_HRF_GLMdenoise_RR.(val2plot{v})(:,:,slice)); axis off image;
        title(val2plot{v})

    end

    colormap(f,cmaps{v})
    colorbar
    set(gca,'FontSize',15)
end

set(gcf,'Position',[418   412   782   605])

Run a baseline GLM to compare with GLMsingle

% Additionally, for comparison purposes we are going to run a standard GLM
% without HRF fitting, GLMdenoise, or ridge regression regularization. We
% will change the default settings by using the "opt" structure.
opt.wantlibrary = 0; % switch off HRF fitting
opt.wantglmdenoise = 0; % switch off GLMdenoise
opt.wantfracridge = 0; % switch off ridge regression
opt.wantfileoutputs = [0 1 0 0];
opt.wantmemoryoutputs = [0 1 0 0];

% If these outputs don't already exist, we will perform the call to
% GLMestimatesingletrial.m; otherwise, we will just load from disk.
if ~exist([outputdir '/GLMbaseline'],'dir')

    [ASSUME_HRF] = GLMestimatesingletrial(design,data,stimdur,tr,[outputdir '/GLMbaseline'],opt);
    models.ASSUME_HRF = ASSUME_HRF{2};

else

    % Note that even though we are loading TYPEB_FITHRF betas, HRF fitting
    % has been turned off and this struct field will thus contain the
    % outputs of a GLM fit using the canonical HRF.
    results = load([outputdir '/GLMbaseline/TYPEB_FITHRF.mat']);
    models.ASSUME_HRF = results;

end

% We assign outputs from GLMestimatesingletrial to "models" structure.
% Again, results{1} contains GLM estimates from an ONOFF model so we are
% not going to extract it.
% Now, "models" variable holds solutions for 4 GLM models
disp(fieldnames(models))
    'FIT_HRF'
    'FIT_HRF_GLMdenoise'
    'FIT_HRF_GLMdenoise_RR'
    'ASSUME_HRF'

Get indices of repeated conditions to use for reliability calculations

% To compare the results of different GLMs we are going to calculate the
% voxel-wise split-half reliablity for each model. Reliability values
% reflect a correlation between beta weights for repeated presentations of
% the same conditions. In short, we are going to check how
% reliable/reproducible are the single trial responses to repeated
% conditions estimated with each GLM type.

% This NSD scan session has a large number of images that are just shown
% once during the session, some images that are shown twice, and a few that
% are shown three times. In the code below, we are attempting to locate the
% indices in the beta weight GLMsingle outputs modelmd(x,y,z,trials) that
% correspond to repeated images. Here we only consider stimuli that have
% been presented at least twice. For the purpose of the example we ignore
% the 3rd repetition of the stimulus.

% Consolidate design matrices
designALL = cat(1,design{:});

% Construct a vector containing 1-indexed condition numbers in
% chronological order.

corder = [];
for p=1:size(designALL,1)
    if any(designALL(p,:))
        corder = [corder find(designALL(p,:))];
    end
end
% Let's take a look at the first few entries
corder(1:3)

% Note that [375 497 8] means that the first stimulus trial involved
% presentation of the 375th condition, the second stimulus trial involved
% presentation of the 497th condition, and so on.
ans =

   375   497     8

% In order to compute split-half reliability, we have to do some indexing.
% we want to find images with least two repetitions and then prepare a
% useful matrix of indices that refer to when these occur.
repindices = [];  % 2 x images containing stimulus trial indices.

% The first row refers to the first presentation; the second row refers to
% the second presentation.
for p=1:size(designALL,2)  % loop over every condition
    temp = find(corder==p);
    if length(temp) >= 2
        repindices = cat(2,repindices,[temp(1); temp(2)]);  % note that for images with 3 presentations, we are simply ignoring the third trial
    end
end

% Let's take a look at a few entries
repindices(:,1:3)

% Notice that the first condition is presented on the 217th stimulus trial
% and the 486th stimulus trial, the second condition is presented on the
% 218th and 621st stimulus trials, and so on.

fprintf('There are %i repeated images in the experiment \n',length(repindices))

% Now, for each voxel we are going to correlate beta weights describing the
% response to images presented for the first time with beta weights
% describing the response from the repetition of the same image. With 136
% repeated conditions, the correlation for each voxel will reflect the
% relationship between two vectors with 136 beta weights each.
ans =

   217   218    19
   486   621   124

There are 136 repeated images in the experiment 

Compute median split-half reliability for each GLM version

% Finally, let's compute split-half reliability. We are going to loop
% through our 4 models and calculate split-half reliability for each of
% them.

% We first arrange models from least to most sophisticated (for
% visualization purposes)
model_names = fieldnames(models);
model_names = model_names([4 1 2 3]);

% Create output variable for reliability values
vox_reliabilities = cell(1,length(models));

% For each GLM...
for m = 1 : length(model_names)

    % Get the GLM betas
    betas = models.(model_names{m}).modelmd(:,:,:,repindices);  % use indexing to pull out the trials we want
    betas_reshaped = reshape(betas,size(betas,1),size(betas,2),size(betas,3),2,[]);  % reshape to X x Y x Z x 2 x CONDITIONS

    % compute reliabilities using an efficient (vectorized) utility
    % function
    vox_reliabilities{m} = calccorrelation(betas_reshaped(:,:,:,1,:),betas_reshaped(:,:,:,2,:),5);

    % Note that calccorrelation.m is a utility function that computes
    % correlations in a vectorized fashion (for optimal speed).

end

Compare visual voxel reliabilities between beta versions

figure(4);clf
subplot(1,2,1);
cmap = [0.2314    0.6039    0.6980
    0.8615    0.7890    0.2457
    0.8824    0.6863         0
    0.9490    0.1020         0];

% For each GLM type we calculate median reliability for voxels within the
% visual ROI and plot it as a bar plot.
for m = 1 : 4
    bar(m,nanmedian(vox_reliabilities{m}(ROI==1)),'FaceColor','None','Linewidth',3,'EdgeColor',cmap(m,:)); hold on
end
ylabel('Median reliability')
legend(model_names,'Interpreter','None','Location','NorthWest')
set(gca,'Fontsize',16)
set(gca,'TickLabelInterpreter','none')
xtickangle(0)
xticks([])
ylim([0.1 0.2])
set(gcf,'Position',[418   412   782   605])
title('Median voxel split-half reliability of GLM models')

subplot(1,2,1);

% Comparison is the final output (FIT_HRF_GLMDENOISE_RR) vs. the baseline
% GLM (ASSUME_HRF)
vox_reliability = vox_reliabilities{4} - vox_reliabilities{1};
underlay = data{1}(:,:,:,1);
ROI(ROI~=1) = NaN;
overlay = vox_reliability;

underlay_im = cmaplookup(underlay,min(underlay(:)),max(underlay(:)),[],gray(256));
overlay_im = cmaplookup(overlay,-0.3,0.3,[],cmapsign2);

mask = ROI==1;

subplot(1,2,2);
hold on
imagesc(underlay_im);
imagesc(overlay_im, 'AlphaData', mask);
hold off
axis image
colormap(cmapsign2)
c = colorbar;
c.Ticks = [0 0.5 1];
c.TickLabels = {'-0.3';'0';'0.3'};
title('change in nsdgeneral voxel reliability** due to GLMsingle (\Delta{\itr})')
set(gca,'Fontsize',16)
xlabel('**plotting (FITHRF_GLMDENOISE_RR - ASSUMEHRF) reliabilities','Interpreter','none','FontSize',12);
xticks([])
yticks([])

set(gcf,'Position',[36 343 1116 674])

% Notice that there is systematic increase in reliability moving from the
% first to the second to the third to the final fourth version of the GLM
% results. These increases reflect, respectively, the addition of HRF
% fitting, the derivation and use of data-driven nuisance regressors, and
% the use of ridge regression as a way to regularize the instability of
% closely spaced experimental trials. Depending on one's experimental
% goals, it is possible with setting of option flags to activate a subset
% of these analysis features.
%
% Also, keep in mind that in the above figure, we are simply showing the
% median as a metric of the central tendency (you may want to peruse
% individual voxels in scatter plots, for example).