R and RStudio on the BMRC cluster
R and Rstudio
R and RStudio are important tools for many researchers. This guide explains how to use them to best effect on the BMRC cluster.
Quick Links
- Pre-installed R software
- Generating Graphics and Plots
- Installing your own R packages
- Using Rstudio for remote R coding via a web browser
Preinstalled R Software
We provide up to date versions of the main R software. Available versions can be listed by running:
module avail R
When you have chosen your desired module, you can load it with module load <module_name> . For example, you can load R/4.1.0-foss-2021a by running:
module load R/4.1.0-foss-2021a
The preinstalled versions of R include a number of commonly used R libraries. You can see which versions are included by running the command below and checking the list of extensions:
module whois R/4.1.0-foss-2021a
The included R libraries are listed as extensions.
In addition to the libraries included with the main R software, it is possible to load additional modules that provide even more libraries. A large subset of the modules used by the Bioconductor project, for example, are available via the R-bundle-Bioconductor module. You can list available versions by running:
module avail R-bundle-Bioconductor
To ensure compatibility, you should aim to choose a version of R-bundle-Biconductor that has the same toolchain listed in its name as your main R package. For example R/4.1.0-foss-2021a is compatible with R-bundle-Bioconductor/3.13-foss-2021a-R-4.1.0 meaning that they can be used togther. You can see which libraries are provided by this Bioconductor modules by running (as before):
module whois R-bundle-Bioconductor/3.13-foss-2021a-R-4.1.0
GENERATING GRAPHICS and PLOTS WITH R ON THE CLUSTER
In order to generate and save graphical outputs, we recommend setting R's graphical device to use cairo-png. It may be useful to set this option globally with
options(bitmapType='cairo-png')
Alternatively, to save plots generated by ggplot2 for example use:
library(ggplot2)
[code to generate plot my_plot here]
ggsave(my_plot, filename="myplot.png", type="cairo-png")
For further discussion, please refer to this external guide.
Installing Local R packages
The versions of R pre-installed on the cluster come with a variety of common packages built in, so when looking for a particular package, we recommend double-checking first whether it is included by default by running e.g. module whois R-bundle-Bioconductor/3.13-foss-2021a-R-4.1.0 .
When a package is not available by default, users are able to install it for their personal use by using a folder on disk as a personal package library. Users have full control over this personal library: one can install and uninstall packages there as desired and these packages are available only to oneself. This short guide takes you through how to do this on the BMRC cluster.
NB The method described below only works for R >= 3.2 which was released in 2015. Users of earlier versions of R are strongly encouraged to make the transition to a more recent version.
The need for multiple local R Package Repositories
At any one time, the BMRC cluster comprises nodes with different generations of Intel CPU architecture. As of April 2020, for example, our cluster nodes comprise machines that fall into two broad Intel CPU families, ivybridge and skylake. Software built for one CPU family may not be compatible with another and this is also true for R packages. If you try to use an R package built for one CPU family on a different CPU family, your code will abort with an "Illegal instruction" or "Illegal operand" warning.
For this reason, it is necessary when installing your own R packages to ensure that you install each package TWICE - one version for ivybridge nodes and a second version for skylake nodes. You also need to ensure that the correct version is selected when running your jobs on the cluster. The instructions below explain how this can be achieved with minimal effort and maintenance.
Notes for existing R users
If you have already installed your own local R packages then you will already have an existing R user library. Where this is the case, we recommend starting afresh with a new user library location and following the method below to re-install your existing packages. This will ensure that your R packages are compatible with all of our cluster queues and nodes.
If you have an existing folder of R packages in your home folder at /users/<group>/R please ensure that this folder is either deleted or renamed e.g. to /users/<group>/R-old. Otherwise, R will attempt to load it by default and this may include loading packages for an incompatible CPU architecture.
Users who have an existing ~/.Rprofile file should normally be able to add the extra code shown below to the beginning of their existing file. In the unlikely event of this causing a conflict, please contact us.
Please also note that R only ever sources one .Rprofile file for the user's local settings. So if you are making use of project-specific or directory-specific .Rprofile files, then you will need to copy the code below into each of them or place it in a central location and source it from each of your .Rprofile files.
Notes for users of "R --vanilla" including snakemake users
The method explained below involves adding some code to your ~/.Rprofile file, which R reads by default on startup. However, using the command R --vanilla explicitly tells R not to read any of its startup files - hence the method described below will not work in those cases. To solve this particular issue, you would need to add the code below to the R script file you are running with R --vanilla. There is no harm in running the code below multiple times, so you can include it anywhere it might be needed.
Please note that when calling an R script from within snakemake as an external script, you script is called with R --vanilla so the advice above also applies.
How to Install Local R Packages
Note that the instructions below require you to run R at the command line on cluster1-2 or rescomp3. RStudio sessions running on the cluster are not able to access the internet so it is not possible to install packages using the RStudio connection methods described in the next section below.
- First choose a new folder to be the main repository directory where your locally installed R packages will live. We recommend a directory in your group's space e.g. /well/<group>/users/<username>/R . This should be a new folder i.e. please do not re-use an existing folder of R packages.
- If it doesn't already exist, create the file ~/.Rprofile - i.e. this is a file named .Rprofile in your home directory. You can reach your home directory by running the cd command on its own. The leading dot in the filename is essential.
- At the beginning of your ~/.Rprofile file, add the following code:
R_LIBS_BASE="/well/<group>/users/<username>/R"
BMRC_RPROFILE="/apps/misc/R/bmrc-r-user-tools/Rprofile"
if (Sys.getenv("SINGULARITY_CONTAINER") == "") {
source(BMRC_RPROFILE)
} else {
print("[BMRC] Warning: The BMRC Rprofile has been de-activated because R is running inside a singularity container.")
}
NB1 Please add one blank line to the end of your ~/.Rprofile file or it will not load.
NB2 You need to customise the first line by setting R_LIBS_BASE to the directory you chose in Step 1. You can choose any directory that you have access to but you need at least to change "<group>" and "<username>" to your real values. Everything up to but not including the last directory of R_LIBS_BASE must already exist.
NB3 The code above ensures that the BMRC-provided Rprofile is installed when running R directly on the cluster. However, the BMRC Rprofile should not be installed when running R inside a Singularity container such as the BMRC-provided SAIGE container. For this reason, the code above automatically ensures that the BMRC RProfile is de-activated when running inside a Singularity container. - Now you are ready to test.
-
Load a recent R module e.g. module load R/3.6.2-foss-2019b and then run with the R command. If everything is running ok, you should see some output that looks like this:
[1] "[BMRC] You have sourced the BMRC Rprofile provided at /apps/misc/R/bmrc-r-user-tools/Rprofile"
[1] "[BMRC] Messages coming from this file (like this one) will be prefixed with [BMRC]"
[1] "[BMRC] You are running R on host <XXX> with CPU <YYY>"
[1] "[BMRC] While running on this host, local R packages will be sourced from and installed to /well/<group>/users/<username>/R"
The path in the final line should an extension of your R_LIBS_BASE variable, with sub-directories for the R version and the CPU architecture i.e. /<R_LIBS_BASE>/<R version>/<architecture> - As a final check, from within R run the .libPaths() command and check that the first entry shows the same directory as specified in the last line of the output in (a).
- Once you have tested your setup using the instructions above, you are ready to start installing your own packages. The most important point to remember is that:
For each local package that you want to install, you will need to install the package TWICE. In particular, you will need to run the install procedure ONCE on either cluster1 or cluster2 (which both use the skylake architecture) and ONCE on rescomp3 (which uses the ivybridge architecture)
NB1 To connect to rescomp3 you must first connect as normal to cluster1 or cluster2 and then ssh to rescomp3 using: ssh rescomp3 .
NB2 R packages can be installed by running e.g. install.packages("zip") within R to install the zip package. - After installing packages, you can now safely submit R jobs to the cluster which use your locally installed packages. The above method with ensure that the correct CPU-specific version of the package will be loaded. NB Please note that in order to avoid similar problems with our pre-installed software, we strongly recommend against using the -V parameter with qsub - see our software guide for further info.
- If something goes wrong in this setup, an informative error message should be printed with the "[BMRC] ..." prefix and R will automatically quit - it will quit immediately on startup and no R jobs will run. Please let us know if you are unable to resolve these errors yourself.
Troubleshooting Package Installation
Although the BMRC Rprofile should not interfere with package installation, occasionally package installs will fail. You can workaround these cases as follows:
- Login to cluster1 or cluster2 and load your desired R module via module load ...
- Start R using R --vanilla . This will ensure that your ~/.Rprofile file is not sourced.
- Now you need to manually add your local installation path. For example, while using cluster1 or cluster2, which use the Skylake architecture, you would add your local package path like this:
.libPaths("/well/group/users/R/4.0/skylake")
Note that you need to adjust the path above as necessary to ensure that it goes to the correct package directory, to the correct subdirectory for your version of R, and to the correct subdirectory for the Skylake architecture. - At this point you can now install your packages.
- Remember to repeat the above process on rescomp3, taking note that rescomp3 uses the Ivybridge architecture, so you will need to adjust your .libPaths(...) command accordingly to point it to the correct subdirectory.
TROUBLESHOOTING RProfile
If you are using the BMRC provided Rprofile file, then in certain circumstances your R session will refuse to load. This will be because you are resuming an R session on a different node with an incompatible CPU architecture. To overcome this problem, you can either move your R session to a different node or you can delete your saved session by deleting the .RData file and then restarting your R session. The same troubleshooting steps apply for RStudio sessions (described below).
RSTUDIO FOR REMOTE R CODING VIA A WEB BROWSER
Using RStudio allows you to securely run an R session on the BMRC cluster through your web browser. This can help with developing and testing code as well as running code interactively ("live"). Follow the process below to setup an RStudio session:
- First login to either cluster1 or cluster2 and then start an interactive session on the cluster. For example:
srun -p short --pty bash - From witihin your qlogin session, load a suitable version of rstudio:
module avail RStudio-Server
module load RStudio-Server/1.4.1717-foss-2021a-Java-11-R-4.1.0 - Start your RStudio service by running:
/apps/misc/R/bmrc-r-user-tools/rstudio/rserver.sh - The script above will start RStudio on a suitable port and provide instructions for how to connect from your own computer.
- First, you need to create an ssh tunnel from your own computer to the Rstudio service. NB Users of the Windows Putty and other customer SSH software will need to adapt the available information to create a tunnel in their software.
- Once the SSH tunnel is created, navigate in your browser to the address shown and you should see an RStudio login page. Please login with your BMRC credentials. Enter your username into the username box and into the password field type your password followed immediately by your two-factor code i.e. add your two factor code to the end of your password.
- Once logged in, an R coding window will appear. Your R session is now running remotely on the BMRC cluster.
Troubleshooting RStudio
If you are using the BMRC provided Rprofile file, then in certain circumstances your rstudio session will refuse to load. This will be because you are resuming an R session on a different node with an incompatible CPU architecture. To overcome this problem, you can either restart your Rstudio session on a different node with a compatible CPU or you can delete your saved session by deleting the .RData file and then restarting your Rstudio session.