Cluster Software & Modules

Cluster Software and Modules

Introduction

Scientific research computing often requires the use of specialist software. This guide discusses how to use the most commonly required standard software packages used by research scientists via the modules system (explained below).

Software modules are one excellent source of research software. Often, however, users will find they have a need for a piece of software that is not yet available on our system - perhaps because it is new or because they are the first to consider using it on our system. If you need to use software that is not currently installed, please email us with details.

Users of Python can manage their own package installations via virtual environments - see our Python guide for details.

Likewise, users of R can manage their own package installations via the instructions in our guide for R and RStudio.

Scientific Software Directory

Please see our Scientific Software Directory for a list of all available software, including software modules and other software available as singularity images.

Why Software Modules are needed

Providing research-specific software to meet scientific needs on a cluster system raises a number of challenges.

For reasons of stability and reproducibility in your scientific work, different users may need different versions of specific software. For example, if you wish to verify an analysis run with version N of a particular software, or if you wish to continue an analysis begun with version N, then version N needs to remained installed even if version N is now several years old and the current version of that software has moved on.
For reasons of different project requirements, you may need to use different versions of a particular piece of software for different projects. For example, you may need to use version N with one project but need to use version M with a different project.
For reasons of different computing environments, some users may need a particular software to be compiled for a specific CPU or GPU while other users need need it compiled for a different CPU or GPU.
For reasons of different software requirements, version N of software A may require version X of software B to be installed, while version N+1 of software A may required version X+1 of software B to be installed.
For reasons of software development, software developers and testers will often need to install new versions of software for testing purposes while keeping the existing installed versions so that users of the existing versions can continue their work.

For all of these reasons, users need to be able to choose between multiple versions of the same software and they also need a simple way - ideally, an automatic way! - to get the correct versions of any supporting software packages. The modules sytems is the standard way to provide multiple versions of the same software with automatic configuration of supporting software.

Modules for different cpu architectures

The BMRC cluster currently comprises computers with two different Intel CPU architectures, ivybridge and skylake. For this reason, we maintain separate versions of each piece of software for each CPU architecture

Ivybridge software and modules are located in /apps/eb/ivybridge/...
Skylake software and modules are located in /apps/eb/skylake/...

In general, users need not worry about these details - all the relevant configuration happens automatically. However, when submitting jobs using qsub (see our cluster guide), using qsub -V can interfere with this process, so we ask users to avoid using that parameter and instead do all necessary environment configuration within their job submission script.

We also have a number of older software installs and modules installed on the system, but we encourage all users to switch to using the newer dedicated ivybridge/skylake modules wherever possible.

Using Modules

Listing and Choosing Available Modules

First login to the cluster.

Once logged in, run module avail to see the list of available modules. This will list all the modules available to you, grouped by their path on the file system. For example, on rescomp1 the output you see might look like this:

$ module avail
------------- /apps/eb/skylake/modules/all ------------
ANTs/2.3.1-foss-2018b-Python-3.6.6
Anaconda3/2019.10
Anaconda3/5.1.0
Anaconda3/5.3.0
Autoconf/2.69-GCCcore-6.4.0
Autoconf/2.69-GCCcore-7.3.0
Autoconf/2.69-GCCcore-8.2.0
[More packages here...]

The output here shows modules that are being sourced from our skylake repository at /apps/eb/skylake/...

Loading Modules

To load a module run use the module load or module add command with the full name of the package. For example, to load a recent version of the R software you can run:

$ module load R/3.6.2-foss-2019b

It is strongly recommended that you use the full name of the module that you wish to load in your module add or module load command. If you run simply module load R, then the version of R you get may be unpredictable over time and so pipelines that previously ran correctly may start to fail. Better to load modules using their full name.

Automatic Dependency Loading

When you load a module, the system automatically takes care to load the correct versions of any dependencies by loading the relevant versions of their modules. To see the full list of modules you have load, run . For example, let’s see what happens when we load the Python/3.7.4-GCCcore-8.3.0 module. First clear your currently loaded modules by running module purge. Now run these commands:

$ module purge
$ module list
No Modulefiles Currently Loaded.
$ module load Python/3.7.4-GCCcore-8.3.0
$ module list
Currently Loaded Modulefiles:
 1) GCCcore/8.3.0 3) binutils/2.32-GCCcore-8.3.0 5) ncurses/6.1-GCCcore-8.3.0 7) Tcl/8.6.9-GCCcore-8.3.0 9) XZ/5.2.4-GCCcore-8.3.0 11) libffi/3.2.1-GCCcore-8.3.0
 2) zlib/1.2.11-GCCcore-8.3.0 4) bzip2/1.0.8-GCCcore-8.3.0 6) libreadline/8.0-GCCcore-8.3.0 8) SQLite/3.29.0-GCCcore-8.3.0 10) GMP/6.1.2-GCCcore-8.3.0 12) Python/3.7.4-GCCcore-8.3.0

As you can see, the Python/3.7.4-GCCcore-8.3.0 module required a number of other supporting pieces of software, each with specific version requirements. The modules system automatically takes care of this for you. In this example, a request to load the Python/3.7.4-GCCcore-8.3.0 module automatically loaded a total of twelve separate module files.

Removing/Unloading Individual Modules

To remove or unload a specific module, use the rm or unload commands

$ module unload Python/3.7.4-GCCcore-8.3.0
$ module list
Currently Loaded Modulefiles:
 1) GCCcore/8.3.0 3) binutils/2.32-GCCcore-8.3.0 5) ncurses/6.1-GCCcore-8.3.0 7) Tcl/8.6.9-GCCcore-8.3.0 9) XZ/5.2.4-GCCcore-8.3.0 11) libffi/3.2.1-GCCcore-8.3.0
 2) zlib/1.2.11-GCCcore-8.3.0 4) bzip2/1.0.8-GCCcore-8.3.0 6) libreadline/8.0-GCCcore-8.3.0 8) SQLite/3.29.0-GCCcore-8.3.0 10) GMP/6.1.2-GCCcore-8.3.0

This example demonstrates that while the module system takes care of automatically loading supporting software, it does not automatically unload supporting software. In the above example, running module load Python/3.7.4-GCCcore-8.3.0 < automatically also loaded eleven other supporting modules. However, running module unload Python/3.7.4-GCCcore-8.3.0 unloads only that specific software module. If you wish to unload these other modules, you can do so either individually or by unloading all modules with the module purge command (see below).

Removing/Unloading All Loaded Modules

To unload all loaded modules, run:

$ module purge

Using Modules in Scripts and Cluster Jobs

You can use modules in scripts, including those you will submit as cluster jobs, exactly as you would use them as the command line. For example, you can simply add the line module load R/3.6.2-foss-2019b to your script and then proceed to use R in your script.

How Modules Work

Sometimes, it is helpful to know in detail how software modules work behind the scenes. Software modules are also called environment modules because they work by setting a number of options in the user’s terminal environment. Most modules, for example, alter the user executable search path (the PATH environment variable in order to make the relevant software available. For example, if you compare the output of echo $PATH before and after running module load Python/3.7.4-GCCcore-8.3.0 you will see that loading this module involves adding /apps/eb/skylake/software/Python/3.7.4-GCCcore-8.3.0/bin to the beginning of your PATH. This ensures that your terminal knows where to find this specific version of python and uses it in preference to any other python version which may be in your path.

Software in Testing Phase

Where software requires additional testing before being released to our main software environments, we release the software first to our testing environment. It is important to understand that software in the testing tree is not guaranteed to be stable, not guaranteed not to change and not guaranteed to be reproducible. In short, software in the testing environment should be used only for the purposes of confirming that the software works. In particular note that use of the testing environment is not a way of getting more up-to-date software. If you have a need for an updated software, please contact us in the usual way to request it.

To use software in the testing environment, run the following command or include it in your job script:

module use -a /apps/eb/testing/${MODULE_CPU_TYPE}/modules/all

After running the above, use module avail to see what is available. Software in the testing trees will be listed under either /apps/eb/testing/skylake or /apps/eb/testing/ivybridge .

If you have been involved in testing new software, please do feedback to BMRC if the software works as expected. When testing software, it is often appropriate to compare the output of new versions with the output from old versions to check that either there are no changes or only expected changes in output.

Cookies on this website