{ "metadata": { "name": "", "signature": "sha256:54bc7e57d511be3d7b7d7bf45bdbc1dab27d3cda6711ec340bd73e6e1eadbd93" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Nipype Concepts: Iteration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](http://nipy.sourceforge.net/nipype/_static/nipype-banner-bg.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The previous notebooks explained Interfaces and Workflows, which are the main building blocks of any processing stream implemented in Nipype. Here, we'll introduce two simple constructs that allow you to pipe multiple instances of similar data through the same workflow.\n", "\n", "In neuroimaging, it is common to collect multiple datasets a given experiment, perhaps one for each subject. It is often also the case that each subject will have multiple runs, or sessions, which should be processed identically. Nipype supports this kind of design with the constructs **iterables** and **MapNode**.\n", "\n", "There is a good page on the online documentation explaining in detail [how these work](http://nipy.sourceforge.net/nipype/users/mapnode_and_iterables.html). This notebook will hopefully make these concepts a bit more concrete." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Smooth and Mask Workflow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's start off with our simple brain extraction workflow from before. Now say that instead of processing only one brain, you had a set of subjects you needed to analyze. Of course, you'd probably turn to that old programming standby, the `for` loop:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from __future__ import print_function" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "%%bash\n", "for f in data/T1-a data/T1-b\n", " do\n", " bet $f ${f}_brain -m\n", " fslmaths $f -s 2 ${f}_smooth\n", " fslmaths ${f}_smooth -mas ${f}_brain_mask ${f}_smooth_mask\n", "done" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "ls data/*_smooth_mask.nii.gz" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "data/T1-a_smooth_mask.nii.gz data/T1-b_smooth_mask.nii.gz\r\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "How would we translate this to Nipype? First, write the workflow as before." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from nipype.interfaces import fsl\n", "from nipype import Workflow, Node" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "/Users/mwaskom/anaconda/lib/python2.7/site-packages/nipy/labs/glm/glm.py:7: FutureWarning: Module nipy.labs.utils.routines deprecated, will be removed\n", " from ..utils import mahalanobis\n", "/Users/mwaskom/anaconda/lib/python2.7/site-packages/nipype/interfaces/nipy/model.py:17: FutureWarning: Module nipy.labs.glm deprecated, will be removed. Please use nipy.modalities.fmri.glm instead.\n", " import nipy.labs.glm.glm as GLM\n" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "skullstrip = Node(fsl.BET(mask=True), name=\"skullstrip\")\n", "smooth = Node(fsl.IsotropicSmooth(fwhm=4), name=\"smooth\")\n", "mask = Node(fsl.ApplyMask(), name=\"mask\")\n", "\n", "wf = Workflow(name=\"smoothflow\")\n", "wf.base_dir = \".\"\n", "wf.connect(skullstrip, \"mask_file\", mask, \"mask_file\")\n", "wf.connect(smooth, \"out_file\", mask, \"in_file\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we're going to use a special kind of Interface called an **IdentityInterface**. Basically, whatever this node receives as an input will get exposed as an output. I like to think of this as using a \"variable\" in the workflow graph, as it minimizes the extent to which you have to repeat yourself. We'll use this to make a single place where we specify the input file, which goes to both the skullstripping and smoothing nodes of the workflow." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from nipype import IdentityInterface" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "inputs = Node(IdentityInterface(fields=[\"mri_file\"]), name=\"inputs\")\n", "wf.connect(inputs, \"mri_file\", skullstrip, \"in_file\")\n", "wf.connect(inputs, \"mri_file\", smooth, \"in_file\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also set the config so the workflow output isn't so loud in the notebook" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from nipype import config, logging\n", "config.set('logging', 'workflow_level', 'CRITICAL')\n", "config.set('logging', 'interface_level', 'CRITICAL')\n", "logging.update_logging(config)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Iterables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Say we had a list of files we wanted to skullstrip, smooth, and mask." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from os.path import abspath\n", "files = [\"data/T1-a.nii.gz\", \"data/T1-b.nii.gz\"]\n", "files = map(abspath, files)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "You could use a `for` loop to execute this workflow over each of these files:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for f in files:\n", " inputs.inputs.mri_file = f\n", " wf.run()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "But a much better way to execute a workflow over a sequence of data is using **iterables**. You configure iterables by assigning a tuple of `(field_name, [val_1, val_2, ...])` to a particular node." ] }, { "cell_type": "code", "collapsed": false, "input": [ "inputs.iterables = (\"mri_file\", files)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happens here is that everything in the graph that is dependent on a Node with iterables will be duplicated for each value in the iterables list. Now when we run the workflow, both images will be processed.\n", "\n", "Of course, if you run a workflow with a distributed plugin, the mutiple files will be processed in parallel. With a few lines of code, you can set up parallel processing over a group of subjects and ensure that each is processed in exactly the same way." ] }, { "cell_type": "code", "collapsed": false, "input": [ "wf.run(plugin=\"MultiProc\", plugin_args={\"n_proc\": 2})" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we look in the workflow directory, we see that the intermediate processing is separated for each iterable input. So, you could add more data and rerun without clobbering the old working files. You'll also see that the directory names just get set with the iterable value, and are very long when you use a filename. In the next notebook, we'll show you a better way to inject data into a workflow." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ls smoothflow/_mri_file*/smooth/*.nii.gz" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "smoothflow/_mri_file_..Users..mwaskom..Dropbox..Projects..Nipype_Concepts..data..T1-a.nii.gz/smooth/T1-a_smooth.nii.gz\r\n", "smoothflow/_mri_file_..Users..mwaskom..Dropbox..Projects..Nipype_Concepts..data..T1-b.nii.gz/smooth/T1-b_smooth.nii.gz\r\n" ] } ], "prompt_number": 13 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "MapNode" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's a second way to iterate over a list of data in the context of a workflow. The `Node` class has a close cousin, called `MapNode`. A `MapNode` is quite similar to a `Node`, but it can take a list of inputs and operate over each input separately, ultimately returning a list of outputs. Let's demonstrate this with a simple function interface." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from nipype import Function\n", "def square_func(x):\n", " return x ** 2\n", "square = Function([\"x\"], [\"f_x\"], square_func)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that this function just takes a numeric input and returns its squared value." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print(square.run(x=2).outputs.f_x)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "4\n" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if we wanted to square a list of numbers? We could set an iterable, as we see above. But say we were making a simple workflow that squared a list of numbers and then summed them. The sum node would expect a list, but using an iterable would make a bunch of sum nodes, and each would get one number from the list. The solution here (and in similar problems) is to use a `MapNode`.\n", "\n", "The `MapNode` constructor has a field called `iterfield`, which tells it what inputs should be expecting a list." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from nipype import MapNode\n", "square_node = MapNode(square, name=\"square\", iterfield=[\"x\"])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "square_node.inputs.x = range(4)\n", "print(square_node.run().outputs.f_x)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[0, 1, 4, 9]\n" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because `iterfield` can take a list of names, you can operate over multiple sets of data, as long as they're the same length. (The values in each list will be paired; it does not compute a combinatoric product of the lists)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def power_func(x, y):\n", " return x ** y" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "power = Function([\"x\", \"y\"], [\"f_xy\"], power_func)\n", "power_node = MapNode(power, name=\"power\", iterfield=[\"x\", \"y\"])\n", "power_node.inputs.x = range(4)\n", "power_node.inputs.y = range(4)\n", "print(power_node.run().outputs.f_xy)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[1, 1, 4, 27]\n" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "But not every input needs to be an iterfield." ] }, { "cell_type": "code", "collapsed": false, "input": [ "power_node = MapNode(power, name=\"power\", iterfield=[\"x\"])\n", "power_node.inputs.x = range(4)\n", "power_node.inputs.y = 3\n", "print(power_node.run().outputs.f_xy)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[0, 1, 8, 27]\n" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As in the case of `iterables`, each underlying `MapNode` execution can happen in parallel. Hopefully, you see how these tools allow you to write flexible, reusable workflows that will help you processes large amounts of data efficiently and reproducibly." ] }, { "cell_type": "code", "collapsed": false, "input": [ "!make clean" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }