import marimo __generated_with = "0.23.2" app = marimo.App(width="medium", app_title="Preprocessing") @app.cell(hide_code=True) def _(IMG_DIR, mo): mo.vstack([ mo.md(r""" # Preprocessing *Written by Luke Chang* Being able to study brain activity associated with cognitive processes in humans is an amazing achievement. However, there is an extraordinary amount of noise and very low levels of signal, which makes it difficult to make inferences about brain function using BOLD imaging. A critical step before any analysis is to remove as much noise as possible. The series of steps to remove noise comprise our *neuroimaging data **preprocessing** pipeline*. See slides on our preprocessing lecture [here](../images/lectures/Preprocessing.pdf). """), mo.image(str(IMG_DIR / "preprocessing.png")), mo.md(r""" In this lab, we will go over the basics of preprocessing fMRI data using the [fmriprep](https://fmriprep.org/) preprocessing pipeline. We will cover: - **Image transformations** (rigid body and affine) - **Cost functions** for image registration - **Head motion correction** (realignment) - **Spatial normalization** - **Spatial smoothing** - **fMRIPrep** automated preprocessing pipeline There are other preprocessing steps that are also common, but not necessarily performed by all labs such as slice timing and distortion correction. We will not be discussing these in depth outside of the videos. Let’s start with watching a short video by Martin Lindquist to get a general overview of the main steps of preprocessing and the basics of how to transform images and register them to other images. """), ]) return @app.cell(hide_code=True) def _(): import marimo as mo import numpy as np import plotly.graph_objects as go from pathlib import Path from dartbrains_tools.mr_widgets import TransformCubeWidget, CostFunctionWidget, SmoothingWidget # Locate the repo root so we can find the sibling images/ directory. # Under MarimoIslandGenerator (WASM build), both __file__ and # mo.notebook_dir() resolve to marimo-internal paths (.venv/bin/), # so we walk up from cwd looking for book.yml. Falls back to cwd # in marimo-edit (where cwd is already the project root). def _find_root() -> Path: for candidate in (Path.cwd(), *Path.cwd().resolve().parents): if (candidate / "book.yml").exists(): return candidate return Path.cwd() _ROOT = _find_root() IMG_DIR = _ROOT / "images" / "preprocessing" return ( CostFunctionWidget, IMG_DIR, SmoothingWidget, TransformCubeWidget, mo, ) @app.cell(hide_code=True) def _(mo): mo.md(r""" """) return @app.cell(hide_code=True) def _(mo): mo.Html(""" """) return @app.cell(hide_code=True) def _(mo): mo.md(r""" --- ## Image Transformations Ok, now let’s dive deeper into how we can transform images into different spaces using linear transformations. Recall from our introduction to neuroimaging data lab, that neuroimaging data is typically stored in a nifti container, which contains a 3D or 4D matrix of the voxel intensities and also an **affine matrix** that maps voxel coordinates \((i, j, k)\) to world coordinates \((x, y, z)\). $$\begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \mathbf{A} \begin{bmatrix} i \\ j \\ k \\ 1 \end{bmatrix}$$ A **rigid body transformation** has 6 parameters: - 3 **translations** (shift in x, y, z) - 3 **rotations** (around x, y, z axes) A full **affine transformation** adds 6 more (12 total): - 3 **scale** factors - 3 **shear** parameters Let’s create an interactive plot to get an intuition for how these affine matrices can be used to transform a 3D image. The ghost (wireframe) shows the original position; the solid red cube shows the transformed position. We can move the sliders to play with applying rigid body transforms to a 3D cube. A rigid body transformation has 6 parameters: translation in x,y, & z, and rotation around each of these axes. The key thing to remember is that a rigid body transform doesn’t allow the image to be fundamentally changed. A full 12 parameter affine transformation adds an additional 3 parameters each for scaling and shearing, which can change the shape of the cube. Try moving some of the sliders around. Each time you move a slider it is applying an affine transformation to the matrix and re-plotting. Translation moves the cube in x, y, and z dimensions. We can also rotate the cube around the x, y, and z axes where the origin is the center point. Continuing to rotate around the point will definitely lead to the cube leaving the current field of view, but it will come back if you keep rotating it. """) return @app.cell(hide_code=True) def _(mo): tx_slider = mo.ui.slider(start=-15, stop=15, step=0.5, value=0, label="Translate X") ty_slider = mo.ui.slider(start=-15, stop=15, step=0.5, value=0, label="Translate Y") tz_slider = mo.ui.slider(start=-15, stop=15, step=0.5, value=0, label="Translate Z") rx_slider = mo.ui.slider(start=-180, stop=180, step=5, value=0, label="Rotate X (\u00b0)") ry_slider = mo.ui.slider(start=-180, stop=180, step=5, value=0, label="Rotate Y (\u00b0)") rz_slider = mo.ui.slider(start=-180, stop=180, step=5, value=0, label="Rotate Z (\u00b0)") sx_slider = mo.ui.slider(start=0.5, stop=2.0, step=0.1, value=1.0, label="Scale X") sy_slider = mo.ui.slider(start=0.5, stop=2.0, step=0.1, value=1.0, label="Scale Y") sz_slider = mo.ui.slider(start=0.5, stop=2.0, step=0.1, value=1.0, label="Scale Z") return ( rx_slider, ry_slider, rz_slider, sx_slider, sy_slider, sz_slider, tx_slider, ty_slider, tz_slider, ) @app.cell(hide_code=True) def _( TransformCubeWidget, mo, rx_slider, ry_slider, rz_slider, sx_slider, sy_slider, sz_slider, tx_slider, ty_slider, tz_slider, ): _widget = TransformCubeWidget( trans_x=float(tx_slider.value), trans_y=float(ty_slider.value), trans_z=float(tz_slider.value), rot_x=float(rx_slider.value), rot_y=float(ry_slider.value), rot_z=float(rz_slider.value), scale_x=float(sx_slider.value), scale_y=float(sy_slider.value), scale_z=float(sz_slider.value), ) _wrapped = mo.ui.anywidget(_widget) mo.vstack([ mo.md("**Translation:**"), mo.hstack([tx_slider, ty_slider, tz_slider], justify="start", gap=1), mo.md("**Rotation:**"), mo.hstack([rx_slider, ry_slider, rz_slider], justify="start", gap=1), mo.md("**Scale:**"), mo.hstack([sx_slider, sy_slider, sz_slider], justify="start", gap=1), _wrapped, ]) return @app.cell(hide_code=True) def _(mo): mo.callout( mo.md( "**Try this:** Move translation sliders to shift the cube. " "Rotate around each axis. Change scale to stretch/compress " "(goes beyond rigid body!). The affine matrix on the right " "updates live. **Drag to orbit** the 3D view." ), kind="info", ) return @app.cell(hide_code=True) def _(mo): mo.md(r""" The affine matrix encodes all of these transformations compactly. The upper-left 3\(\times\)3 block contains rotation and scaling, while the rightmost column contains translation. The bottom row is always \([0, 0, 0, 1]\). **Rotation matrices** for each axis: $$R_x(\theta) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & \cos\theta & -\sin\theta \\ 0 & \sin\theta & \cos\theta \end{bmatrix}$$ $$R_y(\phi) = \begin{bmatrix} \cos\phi & 0 & \sin\phi \\ 0 & 1 & 0 \\ -\sin\phi & 0 & \cos\phi \end{bmatrix}$$ $$R_z(\psi) = \begin{bmatrix} \cos\psi & -\sin\psi & 0 \\ \sin\psi & \cos\psi & 0 \\ 0 & 0 & 1 \end{bmatrix}$$ The combined rotation is \(R = R_x \cdot R_y \cdot R_z\), and the full affine matrix is built by combining rotation, scaling, and translation. Every time we apply a transformation to our images, the result is not a perfect representation of the original data -- we need **interpolation** to fill gaps, which introduces small errors. This is why minimizing the number of resampling steps matters in preprocessing. """) return @app.cell(hide_code=True) def _(mo): mo.md(r""" --- ## Cost Functions Now that we understand affine transformations, how do we use them to **register** one brain image to another? We need a way to quantify how well two images are aligned. The key is to identify a way to quantify how aligned the two images are to each other. Our visual systems are very good at identifying when two images are aligned, however, we need to create a quantitative alignment measure. These measures are called **cost functions**. A common cost function is the **Sum of Squared Errors (SSE)**. You may remember that this same cost function is used by linear regression to find the best fitting line to data. This measure works best if the images are of the same type and have roughly equivalent signal intensities. $$SSE = \sum_{i} (I_{target}(i) - I_{reference}(i))^2$$ The goal is to find the transformation parameters that **minimize** the cost function. This process is called **optimization**. Let’s create another interactive plot and find the optimal X & Y translation parameters that minimize the difference between a two-dimensional target image to a reference image. Try to align the target image (blue square) with the reference image by adjusting the translation sliders. Watch the SSE drop to zero when perfectly aligned! """) return @app.cell(hide_code=True) def _(mo): cost_tx = mo.ui.slider(start=0, stop=20, step=1, value=0, label="Translate X") cost_ty = mo.ui.slider(start=0, stop=20, step=1, value=0, label="Translate Y") return cost_tx, cost_ty @app.cell(hide_code=True) def _(CostFunctionWidget, cost_tx, cost_ty, mo): _widget = CostFunctionWidget( trans_x=float(cost_tx.value), trans_y=float(cost_ty.value), ) _wrapped = mo.ui.anywidget(_widget) mo.vstack([ mo.hstack([cost_tx, cost_ty], justify="start", gap=2), _wrapped, mo.callout( mo.md( "**Goal:** Find the translation that makes SSE = 0 (perfect overlap, shown in green). " "This is exactly what registration algorithms do automatically -- they search the " "parameter space to minimize the cost function." ), kind="info", ), ]) return @app.cell(hide_code=True) def _(mo): mo.md(r""" You probably had to move the sliders around back and forth until you were able to reduce the sum of squared error to zero. This cost function increases exponentially the further you are away from your target. The process of minimizing (or sometimes maximizing) cost functions to identify the best fitting parameters is called optimization and is a concept that is core to fitting models to data across many different disciplines. Different cost functions work best for different image types: | Cost Function | Use Case | Example | |:---:|:---:|:---:| | Sum of Squared Error | Same modality & scaling | Two T2* images | | Normalized correlation | Same modality | Two T1 images | | Correlation ratio | Any modality | T1 and FLAIR | | Mutual information | Any modality | T1 and CT | | Boundary Based Registration | Contrast across boundaries | EPI and T1 | """) return @app.cell(hide_code=True) def _(IMG_DIR, mo): mo.vstack([ mo.md(r""" --- ## Realignment Now let's put everything we learned together to understand how we can correct for head motion in functional images that occurred during a scanning session. It is extremely important to make sure that a specific voxel has the same 3D coordinate across all time points to be able to model neural processes. This of course is made difficult by the fact that participants move during a scanning session and also in between runs. Realignment is the preprocessing step in which a rigid body transformation is applied to each volume to align them to a common space. One typically needs to choose a reference volume, which might be the first, middle, or last volume, or the mean of all volumes. Let's look at an example of the translation and rotation parameters after running realignment on our first subject. """), mo.image(str(IMG_DIR / "realignment_parameters.png")), mo.md(r""" Don't forget that even though we can approximately put each volume into a similar position with realignment, head motion always distorts the magnetic field and can lead to nonlinear changes in signal intensity that will not be addressed by this procedure. In the resting-state literature, where many analyses are based on functional connectivity, head motion can lead to spurious correlations. Some researchers choose to exclude any subject that moved more than a certain amount. Others choose to remove the impact of these time points in their data through removing the volumes via *scrubbing* or modeling out the volume with a dummy code in the first level general linear models. """), ]) return @app.cell(hide_code=True) def _(mo): mo.md(r""" --- ## Spatial Normalization There are several other preprocessing steps that involve image registration. The main one is called *spatial normalization*, in which each subject's brain data is warped into a common stereotactic space. Talairach is an older space, that has been subsumed by various standards developed by the Montreal Neurological Institute. There are a variety of algorithms to warp subject data into stereotactic space. Linear 12 parameter affine transformations have increasingly been replaced by more complicated nonlinear normalizations that have hundreds to thousands of parameters. One nonlinear algorithm that has performed very well across comparison studies is *diffeomorphic registration*, which can also be inverted so that subject space can be transformed into stereotactic space and back to subject space. This is the core of the [ANTs](http://stnava.github.io/ANTs/) algorithm that is implemented in fmriprep. Let's watch another short video by Martin Lindquist and Tor Wager to learn more about the core preprocessing steps. """) return @app.cell(hide_code=True) def _(mo): mo.Html(""" """) return @app.cell(hide_code=True) def _(IMG_DIR, mo): mo.vstack([ mo.md(r""" There are many different steps involved in the spatial normalization process and these details vary widely across various imaging software packages. We will briefly discuss some of the steps involved in the anatomical preprocessing pipeline implemented by fMRIprep and will be showing example figures from the output generated by the pipeline. First, brains are extracted from the skull and surrounding dura mater. You can check and see how well the algorithm performed by examining the red outline. """), mo.image(str(IMG_DIR / "T1_normalization.png")), mo.md(r""" Next, the anatomical images are segmented into different tissue types. These tissue maps are used for various types of analyses, including providing a grey matter mask to reduce the computational time in estimating statistics. In addition, they provide masks to aid in extracting average activity in CSF, or white matter, which might be used as covariates in the statistical analyses to account for physiological noise. """), mo.image(str(IMG_DIR / "T1_segmentation.png")), ]) return @app.cell(hide_code=True) def _(mo): mo.md(r""" ### Spatial normalization of the anatomical T1w reference fmriprep uses [ANTs](http://stnava.github.io/ANTs/) to perform nonlinear spatial normalization. It is easy to check to see how well the algorithm performed by viewing the results of aligning the T1w reference to the stereotactic reference space. Hover on the panels with the mouse pointer to transition between both spaces. We are using the MNI152NLin2009cAsym template. """) return @app.cell(hide_code=True) def _(IMG_DIR, mo): mo.Html(f'