GPU Resources 2026

How to access the BMRC GPU resources.

Before running jobs on the BMRC GPU cluster resources you should know:

1 If your Slurm project has GPU shares ?
2 What GPUs you want your jobs to run on ?
3 How long your jobs will take ?

Jump to 'Submitting Jobs' for example Slurm commands.

Overwiew for PIS

Since 2011, BMRC has sold access to the cluster based on the idea of a “share”. In its simplest form, you could imagine that each group was buying a share of the cluster platform. If one group bought twice the share of another group then, at any time, the scheduler would try to be running twice the number of jobs for the first group as the second group. However, if only one group was submitting jobs at a certain time, then all resources would be handed to that group: use it or lose it, credits could not be stored up. While the details of the implementation have changed and BMRC has become immensely more complex, the basic approach remains the same. You can find out more about the BMRC share philosophy and the fairshare calculation at: https://www.medsci.ox.ac.uk/for-staff/resources/bmrc/cluster-shares.

When there were only a few GPUs in the cluster, and only a few groups using them, we were able simply to extend the existing share model to cover them without too many issues. However, over the past few years GPU methods have become mainstream and GPUs have become incredibly expensive: BMRC currently invests much more in GPU-accelerated servers than in CPU-only servers, but even so it has many fewer GPUs than it has CPU cores. This has led to severe scheduling challenges with the current approach and means that BMRC is not recovering enough through its charging to replace GPUs as they get old without unfairly charging users of CPUs.

BMRC is now charging separately for CPU shares and GPU shares. To run CPU-only jobs you will need CPU shares and to run GPU jobs you will need GPU shares – to run both types of compute you will need both types of shares. This has meant that CPU shares are now significantly cheaper than they were previously. Note that the cost of the GPU share includes the costs of the CPUs and memory of the servers hosting the GPUs: you don’t need CPU shares to run a GPU job.

Having looked at GPU usage patterns we have decided we can offer entry-level continuous access to GPUs for small-to-moderate use at just over £1000 per project per year. Groups with heavier usage will simply need to buy multiple shares. Just as with CPU shares and to avoid overcommitting, there is a cap on the total number of GPU shares that will be sold for the GPU partitions. The cap is related to the total number of physical GPUs that we have available.

To extend the share approach to GPUs has been quite complicated since there are many different kinds of GPU. To solve this, we have further defined a weighting for each type of GPU depending on the cost of providing them (buying, powering and administering them). When accounting is performed, the runtime of a job is multiplied by the relevant GPU weight meaning that users can use more time on cheaper GPUs for the same cost to the project. This also means that only one type of GPU share is needed and it is good for all our types of GPU from RTX6000 to H200, dramatically simplifying the accounting.

BMRC sells both CPU and GPU shares to projects to enable use of the Slurm cluster. CPU and GPU shares are separate: CPU shares do not give access to GPU-accelerated nodes and vice versa. If you wish to enquire about GPU shares please send a request to bmrc-help@medsci.ox.ac.uk.

A GPU share is an abstract quantity which affects the scheduling priority of the job. Once a job is running it gets all the resources that it has requested and it is not competing with other jobs. More shares mean higher priority for jobs in the queue. Each type of GPU has been given a weighting depending on the cost of providing them (buying, powering and administering them). For reference, an 80GB A100 GPU is defined to have a weighting of 1.00.

Based on an analysis of previous usage we defined an appropriate GPU share price based on the cost of continuous usage of up to 1/3 of a GPU. If there is no GPU usage for any particular quarter then there will be no charge for the GPU shares for that quarter.

Selecting the gpu type

BMRC supports a number of GPU types. There is a partition for each GPU type. You must select the partitions corresponding to the GPUs that you wish to run your jobs on.

Important factors are:

1 Billing weight

2 GPU memory

3 Numerical precision

Each GPU type has a billing weight reflecting the cost of running the GPU nodes. This is factored in to usage calculations by the scheduler. More expensive GPUs have a higher weight than cheaper GPUs. If the weight is <1 then less usage will be accounted for. This has an impact in how jobs are prioritised in the scheduler, less usage will lead to a higher priority. Note this is called a 'billing' weight by Slurm, it is *not* a financial weighting on the cost of a GPU share.

To maximise throughput of your jobs you should have an estimate of the GPU memory required by your job and select partitions that satisfy the memory requirement. You can find information about GPU memory in the tables below.

All GPUs provide incredible FP32 performance (default single precision), most applications work at this precision. If your jobs require double precision (FP64) or a lower precision (e.g. FP16, FP8) or more sophisticated cores for e.g. AI workloads, then you should carefully select the appropriate GPUs for your task (A100, V100, P100).

There is a partition (gpu_interactive) for users who need shell access to a GPU node for development or analysis.

Partition Table

PARTITION	NUM_GPU	GPU_MEMORY(GB)	WEIGHT	MAX_RUNTIME(Hours)	NUM_CPU_DEFAULT	MEM(GB)_DEFAULT
Batch Partitions
gpu_a100_80gb	24	80	1	60	11	120
gpu_rtx8000_48gb	12	48	0.72	60	7	185
gpu_a100_40gb	16	40	0.89	60	7	90
gpu_v100_32gb	2	32	0.89	60	7	750
gpu_p100_16gb	12	16	0.66	60	5	90
gpu_v100_16gb	4	16	0.7	60	11	60
gpu_gh200_144gb	40	144	2.12	tbd	72
Interactive Partitions
gpu_interactive	18	24	0.59	12	7	80

Selecting runtime

The maximum runtime for most of the GPU partitions is 60hrs, some are shorter. If you know your jobs will finish sooner than 60hrs then you can apply a Slurm QOS (Quality of Service) to your jobs which will significantly boost the priority of the job in the queue and apply an appropriate runtime limit to the job. The priority boost is most significant for jobs that run under a 4 hour time limit, followed by a 24hr QOS, with 60hr jobs getting no priority boost at all.

GPU QOS Table

QOS Name	Runtime (hrs)	Priority Boost
Partition QOS
gpu_bmrc_partition_limits	60	0
gpu_bmrc_interactive_limits	12	0
User selectable QOS
gpu_bmrc_4hr	4	20000000
gpu_bmrc_24hr	24	10000000

Partition QOS are applied automatically when you select a partition for your job.
User selectable QOS can be applied at job submission (--qos gpu_bmrc_4hr)

Note about limits

As GPUs are a limited resource under considerable demand we need to apply limits to usage to ensure that there is throughput for jobs from all projects and to allow for essential regular maintenance activities to be completed.

We cannot extend the 60hr runtime for jobs in normal operation.

Checkpointing and increasing parallelisation by breaking work into smaller chunks are two common ways to complete your workloads within shorter runtimes. They will also improve the resilience of your workload to interruption.

The per-group limit to the number of GPUs that can be in use is 24. This applies across all partitions.

At least 1 GPU must be selected (--gres gpu:1) for a job to run.

For all jobs the GPU limits are: 24 GPU per project, min 1 GPU per job

For jobs on the batch partitions: 60 hours max runtime

For sessions on the interactive partition: 1 GPU per user max, 12 hour max runtime

If you require GPU resources for jobs that must run longer than 60hrs or you need direct instant access to the GPU or you want to maintain persistent sessions over a long time period, BMRC also provide GPU accelerated VMs in the BMRC private cloud. If you would like to discuss access to the cloud resources please send a request to bmrc-help@medsci.ox.ac.uk.

Hardware

Node	GPU Type	Slurm Features	Num GPU Cards	GPU RAM per card	CPU Cores per GPU	RAM GB per GPU	CPU Compatibility
compg009	p100-sxm2-16gb	flash	4	16	6	91.2	Skylake
compg010	p100-sxm2-16gb	flash	4	16	6	91.2	Skylake
compg011	p100-sxm2-16gb	flash	4	16	6	91.2	Skylake
compg013	p100-sxm2-16gb		4	16	6	91.2	Skylake
compg016	v100-pcie-32gb	flash	2	32	6	750	Skylake
compg019	quadro-rtx6000	flash	4	24	8	91.2	Skylake
compg020	quadro-rtx6000	flash	4	24	8	91.2	Skylake
compg021	quadro-rtx6000	flash	4	24	8	91.2	Skylake
compg026	p100-pcie-16gb	flash	4	16	10	91.2	Skylake
compg027	v100-pcie-16gb		4	16	12	60.8	Skylake
compg028	quadro-rtx8000	flash	4	48	8	187.2	Cascadelake
compg029	quadro-rtx8000	flash	4	48	8	187.2	Cascadelake
compg030	quadro-rtx8000	flash	4	48	8	187.2	Cascadelake
compg031	a100-pcie-40gb	flash	4	40	8	91.2	Cascadelake
compg032	a100-pcie-40gb	flash	4	40	8	91.2	Cascadelake
compg033	a100-pcie-40gb	flash	4	40	8	91.2	Cascadelake
compg034	a100-pcie-40gb	flash	4	40	8	91.2	Cascadelake
compg035	a100-pcie-80gb	flash	4	80	8	91.2	Icelake
compg036	a100-pcie-80gb	flash	4	80	8	91.2	Icelake
compg037	a100-pcie-80gb	flash	2	80	24	256	Icelake
compg038	a100-pcie-80gb	flash	2	80	24	256	Icelake
compg039	a100-pcie-80gb	flash	4	80	12	128	Icelake
compg040	a100-pcie-80gb	flash	4	80	12	128	Icelake
compg041	a100-pcie-80gb	flash	4	80	12	128	Icelake
compg042	a100-pcie-80gb	flash	4	80	12	128	Icelake
compg047	l4	flash	6	24	10	80	Emerald Rapids

LEGACY DEDICATED hardware

We maintain a number of GPU nodes which are dedicated to specific projects and experimental instrument workflows. Please email us with any questions regarding these dedicated nodes.

There are a small number of partitions dedicated to specific projects or instrument workflows

gpu_strubi

gpu_cryosparc

Node	GPU Type	Num GPU cards	GPU RAM per card	CPU Cores	Total RAM GB	CPU Compatibility
compg017	v100-pcie-32gb	2	32	24	1500	Skylake
compg018	quadro-rtx6000	4	24	32	384	Skylake
compg022	v100-pcie-16gb	4	16	32	384	Skylake
compg024	quadro-rtx6000	4	24	32	384	Skylake
compg025	quadro-rtx8000	4	48	32	384	Skylake
compg043	l40s	4	48	12	128	Sapphire Rapids
compg044	l40s	4	48	12	128	Sapphire Rapids
compg045	l40s	4	48	12	128	Sapphire Rapids
compg046	l40s	4	48	12	128	Sapphire Rapids

Submitting jobs

Jobs are submitted using sbatch in a similar way to submitting a non-gpu job; however, you must supply some extra parameters to indicate your GPU requirements as follows:

sbatch --account gpu_<X>.prj --partition gpu_p100_16gb --gres gpu:<N> <JOBSCRIPT>

gpu_<X>.prj is the name of the research group/project Slurm GPU account.

<N> is the number of GPUs required for each job.

The default number of CPU cores per GPU depends on the partition (see 'Selecting the GPU type'). You can request more (or fewer) CPU cores for your job with --cpus-per-gpu <N>. Alternatively, you can set the total number of cores required for the job with -c <N>. <N> is the number of cores.

The default system memory available per GPU depends on the partition (see 'Selecting the GPU type'). You can request more (or less) system memory for your job with --mem-per-gpu <M>G. Alternatively, you can specify the total memory requirement for your job with --mem <M>G. <M> is the number of GB of memory required.

Examples:

Submit a job requiring a single A100 80GB GPU:

sbatch -A gpu_<X>.prj -p gpu_a100_80gb --gres gpu:1 <SCRIPT>

Submit a job requiring 2 GPUs, can be RTX8000 or A100 40GB, that will finish in under 24 hours:

sbatch -A gpu_<X>.prj -p gpu_rtx8000_48gb,gpu_a100_40gb --gres gpu:2 --qos gpu_bmrc_24hr <SCRIPT>

Submit a job for an interactive session:

srun -A gpu_<X>.prj -p gpu_interactive --gres gpu:1 --pty bash

Using fast local scratch space

A number of nodes have fast local NVMe drives for jobs that require a lot of I/O. This space can be accessed from:

/flash/scratch

or from project specific folders in /flash on the nodes.

It is the users responsibility to create a project folder in scratch for their job.

In Slurm you can select nodes with a scratch folder with:

sbatch -A gpu_<X>.prj -p gpu_p100_16gb --gres gpu:1 --constraint "flash" <JOBSCRIPT>

The scratch folder is open to all jobs, so care should be taken to protect your data by placing it in subfolders with the correct permissions.

As the space on these drives is limited you should remove any data from the scratch space when the job is complete. A scheduled automatic deletion from /flash/scratch will be introduced.

Monitoring

In an interactive session you should use the nvidia-smi command to check what processes are running on the GPUs and top to check what is running on the CPUs.

You can attach an interactive session to a running job to run nvidia-smi, top or ps to monitor your running jobs with

srun --jobid <JOB_ID> --pty bash

On the scheduled nodes, from a login node you should run e.g.

squeue -p gpu_rtx8000_48gb,gpu_a100_40gb

to see the jobs running and waiting in those GPU partitions.

You can see the occupancy of the GPUs for a partition with

sinfo -N -O "Nodelist:16,Partition,Available:6,Timelimit,CPUsState,StateCompact:8,Gres:32,GresUsed:32" -p gpu_a100_80gb

Cookies on this website