Relion GPU
Running GPU accelerated Relion on the BMRC scheduled nodes
Jobs can be submitted to the UGE queues from the Relion GUI using a 'Standard submission script' which defines the BMRC specific parameters for your jobs. I have created a template file which you can use directly or copy to your own space.
/well/strubi/relion/relion-gpu-gui-uge-template.sh
I have tested this process using Relion v3.1.3. You can load the GPU accelerated version on rescomp1,2 with:
module load RELION/3.1.3-fosscuda-2020b
and launch the GUI with:
relion
(Technical note: you must load a CUDA/GPU enabled version of Relion on the login node so it will pass the correct libraries to the GPU node. The login nodes do not contain GPUs.)
In the Relion GUI you need to define:
* the submit to queue flag
Relion GUI
Compute tab
Submit to queue?
value: Yes
* the standard submission script (e.g. /well/strubi/relion/relion-gpu-gui-uge-template.sh)
Relion GUI
Compute tab
Standard submission script
value: e.g. /well/strubi/relion/relion-gpu-gui-uge-template.sh
* the number of GPUs for the job (num_GPUs)
Relion GUI
Compute tab
Number of GPUs
value: 1 or 2
* the number of MPI ranks (num_GPUs + 1)
Relion GUI
Compute tab
Number of MPI procs
value: 2 or 3
* UGE parallel environment name
Relion GUI
Compute tab
Queue name
value: node_mpi
* number of threads
Relion GUI
Compute tab
Number of threads
value: 8
Notes:
You must set the 'Submit to queue?' flag and use the 'Standard submission script' file or your job will attempt to run on the login node. As the login nodes do not have GPUs the jobs will die. Furthermore, as the login nodes are the main landing point for most of our users, any Relion jobs running on the login nodes will be killed.
You should select 1 or 2 GPUs for your job. The number of MPI ranks/slots required is num_gpus + 1 (so 2,3). It is possible to select an entire node (4 GPUs, 5 slots) for a job but I strongly recommend that you use 1 or 2 GPUs for your job. In testing, I found that using 4 GPUs was only marginally faster than using 2. More critically, if you select 4 GPUs, you will need to wait for the entire node to be empty before you job can run. If you select 2 GPUs then 2 jobs can run simultaneously which will improve throughput for all users. I'm happy to discuss this further if it is demonstrated that 4 GPUs are required for particular jobs.
I have set up a new queue 'rln.qg' for running Relion on GPUs. The 'Standard submission script' above submits jobs to this queue. Currently there is a single node available to the queue (compg026) which is a quad-P100 (16GB) node. If utilisation is high we can look at adding other nodes to this queue.
Many thanks to Pranav Shah for his help in developing and testing this process.