Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

Introduction

The BMRC cluster includes a number of NVidia GPU-accelerated servers in order to support AI, image processing and other GPU-accelerated code. Our GPU-accelerated servers are all located within our G group and named in the format compg...

Access to the scheduled GPU nodes via qsub is restricted by default. If you would like to submit jobs to these GPU nodes please email us to request access.

In addition, we have arranged access for our users to the powerful NVidia DGX-1V (8 x V100) hosted by the University's ARC Facility. If you wish to sign up for an account, please email us to request access.

We have a dedicated mailing list for BMRC GPU users: rescomp-users-gpu@maillist.ox.ac.uk If you wish to be added to the list, please email us.

Varieties of GPU node

In our regular (i.e. non-GPU) cluster, there are 4 separate groups of nodes (compc, compd, compe, compf) where the hardware varies between groups but is identical within each group. The situation is different, however, within the compg GPU nodes. Because of rapidly changing hardware capabilities, there is considerable variation in the hardware capabilities of the GPU nodes: they offer different combinations of CPU and RAM as well as different numbers and types of GPU card. Furthermore, each machine is configured to host only as many slots as it has GPU cards, on the assumption that every job will need at least one GPU card. In consequence, the RAM per slot on the GPU queue can vary widely from a minimum of 64GB of RAM up to 750GB.

Because of the variation in CPU, RAM, GPU card type and number of GPUs available per node, you may need to plan your job submissions carefully. The sections below provide full information on the nodes available in order to assist with your planning.

Interactive GPU Nodes

There are three nodes which are open to all users to log in and run GPU-accelerated applications. These nodes are intended to allow you to develop and test your GPU code before submitting to the GPU queue. Information about the interactive nodes appears in the table below.

To connect to the interactive GPU nodes, login to rescomp1-2 and then e.g. ssh compg005 .

The interactive GPU nodes can be quite busy so please make an effort to check if somebody else is using the GPUs before setting your jobs running. Please see below for notes on how to monitor.

Node GPU Type Num GPU cards GPU RAM per card CPU Cores Total RAM GB CPU Compatibility
compg005 GTX 1080 Ti 4 11 20 256 Ivybridge
compg006 GTX 1080 Ti 4 11 40 (hyperthreading) 256 Ivybridge
compg007 GTX 1080 Ti 4 11 40 (hyperthreading) 256 Ivybridge

Scheduled GPU Cluster Nodes

There are four nodes which are open to all users who request access to run GPU accelerated jobs through the Univa (UGE/SGE) scheduler.

We currently operate a single dedicated cluster queue for GPU resources, gpu8.q. The maximum job duration on this queue is 60 hours.

Jobs are submitted to gpu8.q using qsub in a similar way to submitting a non-gpu job; however, you must supply some extra parameters to indicate your GPU requirements as follows:

qsub -q gpu8.q -l gpu=N gputype=XYZ ...

The specification of GPU parameters is preceded by -l (lowercase L).

  • The gpu=N parameter is required and specifies how many GPU cards your job requires. If you do not specify this your job will fail to run and report error code 100.
  • The gputype=XYZ parameter is optional and specifies what type of GPU card your job requires. For information on the available numbers and types of GPU cards, please see the table below. The principal constraint is that your job must be able to run on a single node - so you cannot e.g. request more GPU cards of a certain type than the maximum that is available on a single node.

When submitting jobs, the total memory requirement for your job should be equal to the the compute memory + GPU memory i.e. you will need to request a sufficient number of slots to cover this total memory requirement.

Node GPU Type Num GPU Cards  GPU RAM per card Num UGE Slots CPU Cores per Slot RAM GB per slot

CPU Compatibility

compg009 P100 4 16 4 6 96 Skylake
compg010 P100 4 16 4 6 96 Skylake
compg011 P100 4 16 4 6 96 Skylake
compg013 P100 4 16 4 6 96 Skylake

Dedicated nodes

We maintain a number of nodes which are dedicated to specific projects. Please email us with any questions regarding these dedicated nodes.

Monitoring


On the interactive nodes you should use the nvidia-smi command to check what processes are running on the GPUs and top to check what is running on the CPUs.


On the scheduled nodes, from a login node you should run

qstat -u "*" -q gpu8.q

to see the jobs running and waiting in the GPU queue.

GPU software

The CUDA libraries are required to run applications on NVidia GPUs. More advanced GPUs require later versions of the cuda libraries. The CUDA page on wikipedia has useful information about versions. Software packages typically need to be compiled for a particular version of cuda.

Our pre-installed CUDA-related software is now made available, in the same way as the majority of our pre-installed software, via software modules. Use module avail XYZ to see which versions of software are available and module load XYZ to load your desired software modules.

In addition to the main CUDA libraries themselves, we also have:

You can also install your own software via e.g. a python virtualenv.