---
title: Parallel Evaluations in R 
linkTitle: Parallel Evaluations in R 
type: docs
weight: 12
aliases:
    - /manuals_linux-cluster_parallelR.html
    - /manuals_linux-cluster_parallelR
---

# Overview
R provides a variety of packages for parallel computations. One of the most
comprehensive parallel computing environments for R is [`batchtools`](https://mllg.github.io/batchtools/articles/batchtools.html) 
(formerly `BatchJobs`). It supports both multi-core and multi-node computations with and
without schedulers. By making use of cluster template files, most schedulers
and queueing systems are also supported (e.g. Torque, Sun Grid Engine, Slurm). 

## R code of this section

To simplify the evaluation of the R code of this page, the corresponding text version 
is available for download from [here](https://bit.ly/3m5QmMU).

## Parallelization with batchtools 

The following introduces the usage of `batchtools` for a computer cluster using SLURM as scheduler (workload manager).

## Set up working directory for SLURM
First login to your cluster account, open R and execute the following lines. This will
create a test directory (here `mytestdir`), redirect R into this directory and then download
the required files: 

+ [`slurm.tmpl`](https://bit.ly/3Oh9dRO)
+ [`.batchtools.conf.R`](https://bit.ly/3KPBwou)

```r
dir.create("mytestdir")
setwd("mytestdir")
download.file("https://bit.ly/3Oh9dRO", "slurm.tmpl")
download.file("https://bit.ly/3KPBwou", ".batchtools.conf.R")
```

## Load package and define some custom function

This is the test function (here toy example) that will be run on the cluster for demonstration
purposes. It subsets the `iris` data frame by rows, and appends the host name and R version of each
node where the function was executed. The R version to be used on each node can be
specified in the `slurm.tmpl` file (under `module load`).

```r
library('RenvModule')
module('load','slurm') # Loads slurm among other modules

library(batchtools)
myFct <- function(x) {
	result <- cbind(iris[x, 1:4,],
	Node=system("hostname", intern=TRUE),
	Rversion=paste(R.Version()[6:7], collapse="."))
}
```

## Submit jobs from R to cluster

The following creates a `batchtools` registry, defines the number of jobs and resource requests, and then submits the jobs to the cluster
via SLURM.

```r
reg <- makeRegistry(file.dir="myregdir", conf.file=".batchtools.conf.R")
Njobs <- 1:4 # Define number of jobs (here 4)
ids <- batchMap(fun=myFct, x=Njobs) 
done <- submitJobs(ids, reg=reg, resources=list(partition="short", walltime=60, ntasks=1, ncpus=1, memory=1024))
waitForJobs() # Wait until jobs are completed
```

## Summarize job status 
After the jobs are completed one instect their status as follows.

```r
getStatus() # Summarize job status
showLog(Njobs[1])
# killJobs(Njobs) # # Possible from within R or outside with scancel
```

## Access/assemble results

The results are stored as `.rds` files in the registry directory (here `myregdir`). One
can access them manually via `readRDS` or use various convenience utilities provided
by the `batchtools` package.

```r
readRDS("myregdir/results/1.rds") # reads from rds file first result chunk
loadResult(1) 
lapply(Njobs, loadResult)
reduceResults(rbind) # Assemble result chunks in single data.frame
do.call("rbind", lapply(Njobs, loadResult))
```

## Remove registry directory from file system

By default existing registries will not be overwritten. If required one can exlicitly
clean and delete them with the following functions. 

```r
clearRegistry() # Clear registry in R session
removeRegistry(wait=0, reg=reg) # Delete registry directory
# unlink("myregdir", recursive=TRUE) # Same as previous line
```

## Load registry into R 

Loading a registry can be useful when accessing the results at a later state or 
after moving them to a local system. 

```r
from_file <- loadRegistry("myregdir", conf.file=".batchtools.conf.R")
reduceResults(rbind)
```