MATLAB_PARALLEL is a directory of MATLAB programs which illustrate how to do parallel computing with MATLAB on a single multicore or multiprocessor machine.
This is called "local" parallel computing. Local parallel computing with MATLAB requires access to MATLAB's Parallel Computing Toolbox.
It is also possible to run a MATLAB program in parallel, taking advantage of a remote cluster of computers. This is called "remote" parallel processing, and it requires, in addition to the Parallel Computing Toolbox, the MATLAB Distributed Computing Server. Remote parallel computing will not be discussed in this article!
You must be running a copy of MATLAB that is recent enough to have the parallel computing features described here. In particular, you must be running MATLAB version 2008a or later. (Sadly, you may find that MATLAB has stopped issuing new releases for your slightly old computer!)
Your copy of MATLAB must include the Parallel Computing Toolbox. To get a list of all your toolboxes, type:
ver
Assuming you have the right version of MATLAB, and the Parallel Computing Toolbox, let's also assume that your computer has multiple processors or cores. If you don't, you can still run the Parallel Computing Toolbox - but you just won't ever see any speedup!
If you have access to the Parallel Computing Toolbox, then you can do "local" parallel computing directly on your machine.
Suppose for a moment that you have a MATLAB M-file which has been modified to compute in parallel (we will explain what this means momentarily).
The first thing you do, then, is to start up MATLAB in the regular way. This copy of MATLAB that you start with is called the "client" copy; the copies of MATLAB that will be created to assist in the computation are known as "workers".
The process of running your program in parallel now requires three steps:
Supposing that your machine has 8 cores, and that your M file is named "samson.m", the command you might actually issue could look like this:
matlabpool open local 4 samson matlabpool close
The number of workers you request can be any value from 0 up to 4. Of course, asking for 0 workers gets you no parallelism! You can't ask for more than 4 because that's a limitation imposed by the Parallel Computing Toolbox.
It is probably reasonable to request no more workers than you have cores on your machine. If you only have 2 cores, and you ask for 4 workers, MATLAB will create 4 workers, but several of them will have to share each core.
The simplest way of parallelizing a MATLAB program focuses on the for loops in the program. If a for loop is suitable for parallel execution, this can be indicated simply by replacing the word for by the word parfor. When the MATLAB program is run, and if workers have been made available by the matlabpool command, then the work in each parfor loop will be distributed among the workers.
What determines whether a for loop is "suitable" for parallelization? The crucial question that must be answered satisfactorily is this: Can the iterations of the loop be performed in any order without affecting the results? If the answer is "yes", then generally the loop can be parallelized.
If you have nested for loops, then generally it is not useful to replace these by nested parfor loops. If the outer loop can be parallelized, then that is the one that should be controlled by a parfor. If the outer loop cannot be parallelized, then you are free to try to parallelize some of the inner for loops.
In some parallelization systems, special care must be taken for what are called reduction operations. Typically, these occur when certain functions such as the maximum, minimum, or sum, are applied to an indexed loop value. MATLAB does not require any special treatment for such operations, and in general they are not an impediment to parallelization.
In this loop, for example, the variables total and big are reduction variables. MATLAB is able to handle this calculation in parallel without any special action from the user, who simply has to replace for by parfor:
total = 0.0; big = - Inf; for i = 1 : n total = total + x(i); big = max ( big, x(i) ); end
Another feature of some parallelization systems involves the treatment of temporary variables. In the simplest case, a temporary variable is a scalar that is redefined and then used within each iteration of a loop. It is often simply a convenience or shorthand for a cumbersome expression. Many parallelization systems require such variables to be declared as "private", so that each worker has its own copy of the variable. MATLAB does not require any such special treatment of temporary variables.
In this loop, for example, the variable angle is a temporary variable. MATLAB is able to handle this calculation in parallel without any special action from the user, who simply has to replace for by parfor:
for i = 1 : n angle = ( i - 1 ) * pi / ( n - 1 ); t(i) = cos ( angle ); end
The most common programming practice that will make parallelization impossible occurs when the data used in one iteration of the loop is not available until a previous iteration has been computed. Sometimes there is another way of programming the loop that makes this difficulty disappear; other times, the problem cannot be fixed because the quantity being computed is inherently recursive.
As a simple example, you may be computing the X locations of a set of nodes this way:
dx = 0.25; x = zeros (1,n); for i = 2 : n x(i) = x(i-1) + dx; endIt should be clear that MATLAB's approach to parallelization will fail if applied to this loop. Correct computation requires that the iterations be done in exactly the order that the standard for specifies. However, it's not hard to see alternative ways of doing this calculation, such as:
dx = 0.25; parfor i = 1 : n x(i) = (i-1) * dx; end(Of course, in this case, any MATLAB programmer can think of a one line solution, but that's not our point!)
On the other hand, suppose we are computing an approximation to the solution of a differential equation:
dt = 0.25; u = zeros (1,n); for i = 2 : n u(i) = u(i-1) + dt * f( t, u(i-1) ); endHere, there is generally no way to replace the sequential calculation by a parallel one. The value of u(i) cannot be computed until the value of u(i-1) is known, and that means the loop iterations cannot be executed in arbitrary order. Similar issues arise when a Newton iteration is being carried out.
Some loops include a break or return statement which is invoked if a certain condition is encountered. Such loops cannot be parallelized! Sometimes, the jump out of the loop was not actually necessary, but seemed more efficient, at least for sequential computation. It may be possible to execute the loop in parallel, at the cost of carrying out those "useless" final iterations.
Some temporary variables are used in ways that defeat parallelism. This usually occurs when the value of the temporary variable is assumed to be saved from one iteration to the next, containing information that is necessary for the computation. A simple example of this occurs when the variable is counting how many times some condition has occurred. (There are often ways of rewriting the loop so that parallelization can occur.)
In this loop, for example, we are looking for nonzero entries of the matrix A, and storing them in a compressed vector. The variable k, which counts how many such entries we have seen so far, is a "wraparound" temporary, whose value, set in one loop iteration, is needed in a later loop iteration. It is not possible to ask MATLAB to carry out this operation in parallel simply by replacing the for loop by a parfor. A better choice might be to explore the find command!
k = 0; for i = 1 : m for j = 1 : n if ( a(i,j) ~= 0.0 ) k = k + 1 a2(k) = a(i,j); i2(k) = i; j2(k) = j; end end end
MATLAB's cputime function is designed to report the elapsed CPU time; but if your program is being run in parallel, this function might not capture the information you want. That's because cputime will only measure the work done by the particular worker (or the client) that invokes it. In some cases, the client does very little work, and so a CPU time measurement will be very misleading.
Instead, you should use the tic and toc functions to begin and end timing. The call to toc returns the number of seconds elapsed since tic was called. Here is an example of the use of both tic and toc when measuring performance of a parallel computation. Note that in this example, the parfor loops are not visible. They occur inside the "compute" and "update" functions.
tic; for step = 1 : step_num [ force, potential, kinetic ] = compute ( np, nd, pos, vel, mass ); pe(step) = potential; ke(step) = kinetic; ee(step) = ( potential + kinetic - e0 ) / e0; [ pos, vel, acc ] = update ( np, nd, pos, vel, force, acc, mass, dt ); end wtime = toc; fprintf ( 1, ' Main computation:\n' ); fprintf ( 1, ' Wall clock time = %f seconds.\n', wtime );
BIRTHDAY_REMOTE, a MATLAB program which runs a Monte Carlo simulation of the birthday paradox, and includes instructions on how to run the job, via MATLAB's BATCH facility, on a remote system such as Virginia Tech's ITHACA cluster.
CG_DISTRIBUTED, a MATLAB program which implements a version of the NAS CG conjugate gradient benchmark, using distributed memory.
COLLATZ_PARFOR, a MATLAB program which seeks the maximum Collatz sequence between 1 and N, running in parallel using MATLAB's "PARFOR" feature.
COLOR_REMOTE, a MATLAB program which carries out the color segmentation of an image in parallel, via SPMD commands; this includes instructions on how to run the job, via MATLAB's BATCH facility, on a remote system such as Virginia Tech's ITHACA cluster.
CONTRAST_SPMD, a MATLAB program which demonstrates the SPMD parallel programming feature for image operations; the client reads an image, the workers increase contrast over separate portions, and the client assembles and displays the results.
CONTRAST2_SPMD, a MATLAB program which demonstrates the SPMD parallel programming feature for image operations; this improves the contrast_spmd program by allowing the workers to share some data; this makes it possible to eliminate artificial "seams" in the processed image.
FD2D_HEAT_EXPLICIT_SPMD, a MATLAB program which uses the finite difference method and explicit time stepping to solve the time dependent heat equation in 2D. A black and white image is used as the "initial condition". MATLAB's SPMD facility is used to carry out the computation in parallel.
FMINCON_PARALLEL, a MATLAB program which demonstrates the use of MATLAB's FMINCON constrained minimization function, taking advantage of MATLAB's Parallel Computing Toolbox for faster execution.
IMAGE_DENOISE_SPMD, a MATLAB program which demonstrates the SPMD parallel programming feature for image operations; the client reads an image, the workers process portions of it, and the client assembles and displays the results.
KNAPSACK_TASKS, a MATLAB program which solves a knapsack problem by subdividing it into tasks, each of which is carried out as a separate program.
LINEAR_SOLVE_DISTRIBUTED, a MATLAB program which solves a linear system A*x=b using MATLAB's spmd facility, so that the matrix A is "distributed" across multiple MATLAB workers.
LYRICS_REMOTE, a MATLAB program which runs in parallel, using three workers which cooperate "systolically", that is, as through they were on an assembly line. The output from worker 1 is passed to worker 2 for further processing, and so on. This includes instructions on how to run the job, via MATLAB's BATCH facility, on a remote system such as Virginia Tech's ITHACA cluster.
MATLAB_COMMANDLINE, MATLAB programs which illustrate how MATLAB can be run from the UNIX command line, that is, not with the usual MATLAB command window.
MATLAB_REMOTE, MATLAB programs which illustrate the use of remote job execution, in which a desktop copy of MATLAB sends programs and data to a remote machine for execution. Included is information needed to properly configure the local machine.
MD_PARFOR, a MATLAB program which carries out a molecular dynamics simulation, running in parallel using MATLAB's "PARFOR" feature.
ODE_SWEEP_PARFOR, a MATLAB program which demonstrates how the PARFOR command can be used to parallelize the computation of a grid of solutions to a parameterized system of ODE's.
PLOT_SPMD, a MATLAB library which demonstrates the SPMD parallel programming feature, by having a number of labs compute parts of a sine plot, which is then displayed by the client process.
PRIME_PARFOR, a MATLAB program which counts the number of primes between 1 and N; running in parallel using MATLAB's "PARFOR" feature.
PRIME_SPMD, a MATLAB program which counts the number of primes between 1 and N; running in parallel using MATLAB's "SPMD" feature.
QUAD_PARFOR, a MATLAB program which estimates an integral using quadrature; running in parallel using MATLAB's "PARFOR" feature.
QUAD_SPMD, a MATLAB program which estimates an integral using quadrature; running in parallel using MATLAB's "SPMD" feature.
QUAD_TASKS, a MATLAB program which estimates an integral using quadrature; running in parallel using MATLAB's "TASK" feature.
RANDOM_WALK_2D_AVOID_TASKS, a MATLAB program which computes many self avoiding random walks in 2D by creating a job which defines each walk as a task, and then computes these independently using MATLAB's Parallel Computing Toolbox task computing capability.
SATISFY_PARFOR, a MATLAB program which demonstrates, for a particular circuit, an exhaustive search for solutions of the circuit satisfiability problem, running in parallel using MATLAB's "PARFOR" feature.
The User's Guide for the Parallel Computing Toolbox is available at http://www.mathworks.com/access/helpdesk/help/pdf_doc/distcomp/distcomp.pdf
You can go up one level to the MATLAB source codes.