r: R statistical computing scripting language

Contents

  1. Overview of package
  2. Overview of package
    1. General usage
  3. Availability of package by cluster
  4. System installed R packages, by cluster/R version
  5. Installing Modules
  6. Running R in batch mode
  7. Using R and MPI

Overview of package

General information about package
Package: r
Description: R statistical computing scripting language
For more information: https://www.r-project.org
Categories:
License: OpenSource (GPL)

General usage information

R is a language and environment for statistical computing and graphics. It is similar to the S language and environment, and can be considered as an open-source implementation of S. There are some important differences, but much code written for S runs unaltered under R.

This module will add the R and Rscript commands to your path.

In case you need to link against this library in your code, the following environmental variables have been defined:

  • \$R_ROOT has been set to the root of the r installation
  • \$R_LIBDIR points to the directory containing the libraries
  • \$R_INCDIR points to the directory containing the header files

You will probably wish to use these by adding the following flags to your compilation command (e.g. to CFLAGS in your Makefile):

  • -I\$R_INCDIR
and the following flags to your link command (e.g. LDFLAGS in your Makefile):
  • -L\$R_LIBDIR -Wl,-rpath,\$R_LIBDIR

Available versions of the package r, by cluster

This section lists the available versions of the package ron the different clusters.

Available versions of r on the Deepthought2 cluster (RHEL8)

Available versions of r on the Deepthought2 cluster (RHEL8)
Version Module tags CPU(s) optimized for GPU ready?
3.6.3 r/3.6.3 ivybridge, x86_64 Y

Available versions of r on the Juggernaut cluster

Available versions of r on the Juggernaut cluster
Version Module tags CPU(s) optimized for GPU ready?
3.6.3 r/3.6.3 , N
3.6.2 r/3.6.2 N

Available versions of r on the Deepthought2 cluster (RHEL6) [DEPRECATED]

Available versions of r on the Deepthought2 cluster (RHEL6) [DEPRECATED]
Version Module tags CPU(s) optimized for GPU ready?
3.5.1 r/3.5.1 ivybridge N
3.3.2 r/3.3.2 ivybridge N

Installing Modules

R's capabilities can be significantly enhanced through the addition of modules. Code can then enable the library with the library command. The supported R interpretters on the system have a selection of modules preinstalled. If a module you are interested in is not in that list, you can either install a personal copy of the module for yourself, or request that it be installed system wide. We will make reasonable efforts to accomodate such requests as staffing resources allow.

Installing modules yourself

The method for installing R packages is usually fairly straightforward, but obviously not all packages will install in the same manner. But most will follow the procedure below:

  1. module load R/X.Y.Z to select the version of R you wish to use
  2. Create the directory to hold your R modules, if you have not already done so. The default is in the directory R underneath your home directory, but you might wish to put it elsewhere; this will have subdirectories for R version and platform added.
  3. Unless you opted for the default directory ~/R, you need to tell R what directory you are using. To do this, you must set the environmental variable R_LIBS_USER. Multiple directories can be listed; separate the paths with the colon (:) character. This needs to be set whenever you wish to use the modules in R, so you will generally want to set it in your .cshrc.mine or .Renviron files.
  4. There are two standard methods for installing a package, one from the command line, and one from inside R itself. Assuming you are putting stuff in ~/myRpkgs and installing the package foo the commands would be:
    • From the command line, you will first need to download a tarball with the source code for the package. Many packages can be found at the Comprehensive R Archive Network (CRAN). Assuming you downloaded foo.tar.gz to the current directory, you could then install it with:
      R CMD INSTALL -l ~/myRpkgs foo.tar.gz
    • From within R, the install.packages function will connect to CRAN and download and install the package all in one step, with:
      install.packages("foo", lib="~/myRpkgs", repos="http://cran.r-project.org")

If all goes well, the package is now installed in the directory you specified and should be available for use by your R scripts.

Of course, not all packages install quite that easily. If you are comfortable building modules, hopefully the error messages will provide reasonable guidance as to how to proceed. Otherwise, you can just request for Division of Information Technology staff to install it, but that might take time depending on the availability of our time.

Running R in batch mode

Although R's interactive mode is nice for certain things, when you are doing production runs with tried and true scripts, it is usually easier to use R's batch interface. This is especially useful when submitting jobs to an HPC cluster.

If you have some R code in a file test.R and you wish to run it from the command line (or equivalently, from a shell script), you can simply use the Rscript command. E.g.

Rscript --no-save --no-restore test.R

The --no-save and --no-restore prevent the saving of the workspace at the end of the session and the restoring of saved objects at startup. These are typically what you want when running in batch mode. Older versions of R used the R CMD BATCH instead of the Rscript command; the main difference with the former is that it optionally takes the name of an output file. Both should work with currently installed versions of R.

For use on one of the HPC clusters, you will generally need to include the above in a job script, like:

#!/bin/bash
#Request 5 hours
#SBATCH -t 5:00
#Request 4 GiB per CPU-core
#SBATCH --mem-per-cpu=4096
#Request 1 core
#SBATCH -n 1

#Get our profile (and define module command)
. ~/.profile

#Load required modules
module load R/3.3.2

cd MY_WORK_DIRECTORY

#Make sure OpenMP is not "on"
OMP_NUM_THREADS=1
export OMP_NUM_THREADS

Rscript --no-save --no-restore my_R_code.R

Using R and MPI

User of one of the high-performance computing (HPC) clusters will likely be interested in running R codes that span multiple processors often over multiple nodes. This generally is done using MPI. There are a number of R packages that deal with MPI, including

  • Rmpi
  • snow
  • doSNOW: provides a dopar functionality via snow

Most users seem to prefer the snow package, which is presumably higher level and therefore easier to use than Rmpi. There are assorted guides to using R with the snow package on the web, including:

Below are just a few tips gleaned from these pages, etc. that users at UMD might find helpful.

  1. For best results, use the same version of compiler and MPI as used for building R and its MPI packages. The MPI libraries and compiler used for the different versions of R are listed in the version table at the top of this page. It is best to module load the compiler first (not needed for gcc/4.6.1) and then the OpenMPI library.
  2. We have also had reports of wierd errors occurring when using Rmpi (and the packages depending on it) with Infiniband; segfaults and other seemingly random errors when setting up connections. This appears to be related to complications with the used of pinned memory and forking within the R interpretter (see e.g. CRMDA blog and OpenMPI developers mailing list archives regarding this issue). As such, we strongly recommend R users who wish to use MPI disable Infiniband in their mpirun command by adding the arguments --mca btl tcp,self as shown in the example below.
  3. When using snow or one of its derivatives (e.g. doSNOW), you should launch your code with something like
    #!/bin/bash
    #Request 5 hours
    #SBATCH -t 5:00
    #Request 4 GiB per CPU-core
    #SBATCH --mem-per-cpu=4096
    #Request 40 cores
    #SBATCH -n 40
    
    #Get our profile (and define module command)
    . ~/.profile
    
    #Load required modules
    module load gcc/4.9.3
    module load openmpi/1.8.6
    module load R/3.3.2
    
    cd MY_WORK_DIRECTORY
    
    #Make sure OpenMP is not "on"
    OMP_NUM_THREADS=1
    export OMP_NUM_THREADS
    
    #NOTE THE -np 1 below!!!!
    #The --mca btl tcp,self arguments restricts communications to
    #tcp instead of infiniband.  We have seen issues with Rmpi and infiniband
    mpirun -np 1 --mca btl tcp,self R CMD BATCH --no-save --no-restore my_R_code.R
    

    NOTE the use of -np 1 in the above. Although that looks suspicious (telling mpirun to only start one MPI tasks when we asked for 40 cores), it is actually correct for most uses of the snow (and derivative) libraries. This is because when using snow, typically snow will spawn its own workers. If you request something more than 1 MPI task to be launched via the openmpi, or omit the -np 1 altogether (which effectively is asking for mpirun to launch the number of tasks given in the #SBATCH -n line, 40 in this case), you will end up running e.g. 40 copies of the same code, each of which will try to spawn about 40 workers via snow, resulting in a mess (at best very sluggish performance, and more likely wierd errors).

  4. Most snow based R code will at some point invoke the makeCluster function. This takes a parameter specifying the size of the "cluster" to create. Typically, one wants this size to be one less than the number of cores requested from Slurm. This is because the process running the R code which spawns the workers is already consuming one CPU core, so if you try to spawn a number of workers equal to the number of cores requested of Slurm, there will be one core oversubscribed, which causes issues. I typically see an error about there being an insufficient number of "slots" available, and typically the R script just hangs (doing nothing, but not dying until the job is killed for exceeding its walltime, and thereby wasting a lot of SUs). Typically, it is better to do something like:
    cl<-makeCluster(mpi.universe.size()-1, type="MPI")

Installed packages

The following lists many of the system installed packages.

Packages/modules for R by cluster, version

Extensions for R version 3.6.3 (built for gcc@8.4.0 on Deepthought2

Available extensions for R version 3.6.3 (built for gcc@8.4.0 on Deepthought2
Extension name Version
acepack 1.4.1
amap 0.8-17
annotate 1.62.0
annotationdbi 1.46.1
ape 5.3
argparse 2.0.1
askpass 1.1
assertthat 0.2.1
backports 1.1.4
base64enc 0.1-3
bayesm 3.1-3
bh 1.72.0-2
biasedurn 1.07
biobase 2.44.0
biocgenerics 0.30.0
biocparallel 1.18.1
biomart 2.40.5
biostrings 2.52.0
bit64 0.9-7
bit 1.1-14
bitops 1.0-6
blob 1.2.0
boot 1.3-23
brew 1.0-6
broom 0.5.2
callr 3.4.3
catools 1.17.1.2
cellranger 1.1.0
checkmate 1.9.4
class 7.3-15
cli 2.0.2
clipr 0.7.0
clisymbols 1.2.0
cluster 2.1.0
codetools 0.2-16
colorspace 1.4-1
commonmark 1.7
crayon 1.3.4
crosstalk 1.0.0
ctc 1.58.0
curl 4.3
data-table 1.12.2
dbi 1.1.0
dbplyr 1.4.2
delayedarray 0.10.0
desc 1.2.0
deseq2 1.24.0
devtools 2.1.0
digest 0.6.25
dplyr 0.8.3
edger 3.26.8
ellipsis 0.3.0
evaluate 0.14
fansi 0.4.0
fastcluster 1.1.25
findpython 1.0.5
float 0.2-4
forcats 0.4.0
forecast 8.8
foreign 0.8-72
formatr 1.7
formula 1.2-3
fracdiff 1.4-2
fs 1.3.1
futile-logger 1.4.3
futile-options 1.0.1
gdata 2.18.0
genefilter 1.66.0
genelendatabase 1.20.0
geneplotter 1.62.0
generics 0.0.2
genomeinfodb 1.20.0
genomeinfodbdata 1.2.1
genomicalignments 1.20.1
genomicfeatures 1.36.4
genomicranges 1.36.1
getopt 1.20.3
ggdendro 0.1-20
ggplot2 3.2.0
gh 1.0.1
git2r 0.27.1
glimma 1.12.0
glue 1.4.0
go-db 3.4.1
goplot 1.0.2
goseq 1.36.0
gplots 3.0.1.1
gridextra 2.3
gsl 2.1-6
gtable 0.3.0
gtools 3.8.1
haven 2.1.1
highr 0.8
hmisc 4.2-0
hms 0.5.0
htmltable 1.13.1
htmltools 0.3.6
htmlwidgets 1.3
httpuv 1.5.1
httr 1.4.1
ini 0.3.1
iranges 2.18.3
irdisplay 0.7.0
irkernel master
jsonlite 1.6.1
kernsmooth 2.23-15
knitr 1.28
labeling 0.3
lambda-r 1.2.3
later 0.8.0
lattice 0.20-38
latticeextra 0.6-28
lazyeval 0.2.2
limma 3.40.6
lmtest 0.9-37
locfit 1.5-9.1
lubridate 1.7.4
magrittr 1.5
manipulatewidget 0.10.0
markdown 1.1
mass 7.3-51.5
matrix 1.2-17
matrixstats 0.55.0
memoise 1.1.0
metaskat 0.81
mgcv 1.8-28
mime 0.7
miniui 0.1.1.1
modelr 0.1.5
munsell 0.5.0
ncdf4 1.16.1
nlme 3.1-141
nnet 7.3-12
openssl 1.4.1
optparse 1.6.2
pbdbase 0.5-0
pbddmat 0.5-0
pbdml 0.1-1
pbdmpi 0.4-3
pbdslap 0.2-4
pbdxgb 0.90.0.2
pbdzmq 0.3-3
pillar 1.4.2
pkgbuild 1.0.8
pkgconfig 2.0.2
pkgload 1.0.2
plogr 0.2.0
plyr 1.8.4
praise 1.0.0
prettyunits 1.0.2
processx 3.4.1
progress 1.2.2
promises 1.0.1
ps 1.3.0
purrr 0.3.4
quadprog 1.5-7
quantmod 0.4-15
qvalue 2.16.0
r6 2.4.0
randomforest 4.6-14
rcmdcheck 1.3.3
rcolorbrewer 1.1-2
rcpp 1.0.4.6
rcpparmadillo 0.9.600.4.0
rcppeigen 0.3.3.5.0
rcppparallel 4.4.3
rcurl 1.98-1.2
readr 1.3.1
readxl 1.3.1
rematch 1.0.1
remotes 2.1.1
repr 1.0.1
reprex 0.3.0
reshape2 1.4.3
rgdal 1.4-4
rgl 0.100.26
rhtslib 1.18.1
rlang 0.4.6
rlecuyer 0.3-5
rmarkdown 1.14
rmpi 0.6-9
rots 1.12.0
roxygen2 7.1.0
rpart 4.1-15
rprojroot 1.3-2
rsamtools 2.2.1
rsqlite 2.1.2
rstudioapi 0.11
rth master
rtracklayer 1.44.4
rvest 0.3.4
s4vectors 0.22.1
scales 1.0.0
selectr 0.4-1
sessioninfo 1.1.1
shiny 1.3.2
skat 2.0.1
sm 2.2-5.6
snow 0.4-3
sourcetools 0.1.7
sp 1.3-1
spatest 3.1.2
squarem 2021.1
stringi 1.4.3
stringr 1.4.0
summarizedexperiment 1.14.1
survival 2.44-1.1
sys 3.2
testthat 2.3.2
tibble 2.1.3
tidyr 0.8.3
tidyselect 0.2.5
tidyverse 1.2.1
timedate 3043.102
tinytex 0.15
tseries 0.10-47
ttr 0.23-4
urca 1.3-0
usethis 1.5.1
utf8 1.1.4
uuid 0.1-2
vctrs 0.2.0
viridis 0.5.1
viridislite 0.3.0
webshot 0.5.1
whisker 0.3-2
withr 2.2.0
xfun 0.8
xgboost 0.90.0.2
xml2 1.3.2
xml 3.98-1.20
xopen 1.0.0
xtable 1.8-4
xts 0.11-2
xvector 0.24.0
yaml 2.2.0
zeallot 0.1.0
zlibbioc 1.30.0
zoo 1.8-6