Parallel Integration

mirai provides an alternative communications backend for R. This functionality was developed to fulfil a request by R Core at R Project Sprint 2023.

make_cluster() creates a cluster object of class ‘miraiCluster’, which is fully-compatible with parallel cluster types.

  • Specify ‘n’ to launch nodes on the local machine.
  • Specify ‘url’ for receiving connections from remote nodes.
  • Optionally, specify ‘remote’ to launch remote daemons using a remote configuration generated by remote_config() or ssh_config().

Created clusters may be used for any function in the parallel base package such as parallel::clusterApply() or parallel::parLapply(), or the load-balanced versions such as parallel::parLapplyLB().

library(mirai)

cl <- make_cluster(4)
cl
#> < miraiCluster | ID: `0` nodes: 4 active: TRUE >

parallel::parLapply(cl, iris, mean)
#> $Sepal.Length
#> [1] 5.843333
#> 
#> $Sepal.Width
#> [1] 3.057333
#> 
#> $Petal.Length
#> [1] 3.758
#> 
#> $Petal.Width
#> [1] 1.199333
#> 
#> $Species
#> [1] NA

status() may be called on a ’miraiCluster` to query the number of connected nodes at any time.

status(cl)
#> $connections
#> [1] 4
#> 
#> $daemons
#> [1] "abstract://bfdadd07dfad88c3149c2c97"

stop_cluster(cl)

Making a cluster specifying ‘url’ without ‘remote’ causes the shell commands for manual deployment of nodes to be printed to the console.

cl <- make_cluster(n = 2, url = host_url())
#> Shell commands for deployment on nodes:
#> 
#> [1]
#> Rscript -e "mirai::daemon('tcp://hostname:40081',rs=c(10407,-712929287,487124838,-1548361041,936929572,516359637,-964658030))"
#> 
#> [2]
#> Rscript -e "mirai::daemon('tcp://hostname:40081',rs=c(10407,716174157,753336392,1540784125,1343950952,-477221459,1025440810))"

stop_cluster(cl)

Starting with R 4.4, the parallel package has implemented a new function registerClusterType() for registering alternative communications backends.

The function mirai::register_cluster() is a wrapper around this function to register ‘miraiCluster’ as a cluster type and also set it as the default. This only needs only to be inserted once at the top of a script and all subsequent calls to parallel::makeCluster() will default to ‘miraiCluster’.

library(parallel)

mirai::register_cluster()

cl <- makeCluster(2)
cl
#> < miraiCluster | ID: `2` nodes: 2 active: TRUE >

stopCluster(cl)

Foreach Integration

A ‘miraiCluster’ may also be registered by doParallel for use with the foreach package.

Running some parallel examples for the foreach() function:

library(foreach)
library(iterators)

cl <- make_cluster(4)
doParallel::registerDoParallel(cl)

# normalize the rows of a matrix
m <- matrix(rnorm(9), 3, 3)
foreach(i = 1:nrow(m), .combine = rbind) %dopar%
  (m[i, ] / mean(m[i, ]))
#>               [,1]        [,2]      [,3]
#> result.1  4.127729   3.3741203 -4.501849
#> result.2 -2.495306   0.9463084  4.548998
#> result.3 17.117011 -11.4476759 -2.669335

# simple parallel matrix multiply
a <- matrix(1:16, 4, 4)
b <- t(a)
foreach(b = iter(b, by='col'), .combine = cbind) %dopar%
  (a %*% b)
#>      [,1] [,2] [,3] [,4]
#> [1,]  276  304  332  360
#> [2,]  304  336  368  400
#> [3,]  332  368  404  440
#> [4,]  360  400  440  480