New Users - Welcome
A warm welcome to new users of the Biomedical Research Computing (BMRC) Facility. This short guide aims to help you orientate yourself and get started quickly.
Please join our mailing list
Please follow the instructions here on how to join our mailing list. All users are warmly encouraged to join. The mailing list is reserved for important news and service announcements (it is not a 'chat' list).
Your First Login
Once your account has been created, you will receive a username and temporary password to access our systems. The temporary password must be changed on first login - please follow the instructions in the welcome email. See our login guide for general information about how to connect and note that you must be connected to the University of Oxford network, either via a physical connection or via VPN, in order to connect via ssh (see the login guide for further info). If you have problems with your first login, please email us on email@example.com. If you have problems on subsequent logins, please first see of Frequently Asked Questions page for advice on self-diagnosing the issue.
Using the Linux Shell
BMRC systems use the BASH shell. If you are not familiar with shells, you can find numerous tutorials on the internet. The website HPC Carpentry offers a good introduction here.
Where should I put my files?
On the BMRC cluster, you have two folders for your dedicated personal use.
- Your home folder will be located at /users/<group>/<username> - with <group> and <username> being your group name and username from your welcome email. This folder is intentionally very small (max 10GB) - you should use it only for storing essential configuration files which software often expects to find there, like the Bash configuration file .bashrc.
- Your group home folder will be located at /well/<group>/users/<username>. This folder shares in your group's allocation for disk space so please use this folder to store all your data, code and other files.
Note for Conda/Anaconda users
By default, conda will store your environments and downloaded packages in your home directory under ~/.conda - this will quickly cause your home directory to run out of space. To prevent this from happening we recommend the following:
- Create a dedicated conda folder in your group home folder with subdirectories for packages and environments e.g.
mkdir -p conda/pkgs conda/envs
- Create the file ~/.condarc containing the following configuration (NB indented lines are indented two spaces):
Accessing Pre-installed scientific Software
To learn how to submit your code to the cluster, please read our guide to Using the Cluster. This guide introduces the concepts of cluster computing so that you can understand what the cluster is, how to submit your jobs, and why you should not run your code directly on rescomp1-3.
When using the BMRC cluster, you do not run your code directly. In this respect, using the BMRC cluster is different to using your own computer and it is important that you understand why this is.
After logging in to the BMRC cluster, you will arrive on one of the computers named rescomp1-3. The purpose of rescomp1-3 is not to run your code - their purpose is to allow you to submit your code to the computing cluster via either a pre-written script using qsub or via an interactive cluster session using qlogin as explained in the guide mentioned above.
All CPU-intensive, RAM-intensive or disk-intensive code running directly on rescomp1-3 is considered a misuse and liable to be terminated without warning in order to prevent adverse effects on other users.
Monitoring your cluster jobs
You are able to check the status of your currently queued and running jobs with qstat. You can also check the status of completed jobs using the qacct command. It is VITAL that these commands are not overused.
Overusing using qstat or qacct can overload the scheduling software. This would mean that the cluster would be unusable for everyone - hundreds and potentially thousands of other users, including yourself - a catastrophic result.
Using qstat or qacct manually (i.e. typing it yourself every so often) is harmless. Problems are likely to arise, however, if these commands are repeatedly called in an automated way. In order to prevent catastrophe, please ensure that any software you are using (scripts, pipelining tools, etc) run these commands with a delay of at least 100 seconds between calls.
If you must use e.g. the watch command then use e.g. watch -n 100 qstat (i.e. set the value of n to at least 100). However, the circumstances where it would make sense to use watch with qstat or qacct are rare. If you want to start one job only when another has finished, or if you want a notification when a job starts or ends, there are better ways to achieve that - please email us for advice.
Users of snakemake, a python pipelining tool, should note that the default settings of snakemake will cause a catastrophic incident because by default snakemake runs the monitoring commands ten times per second. In order to use snakemake safely, you MUST therefore set --max-status-checks-per-second 0.01 in order to ensure that these commands are run 100 seconds apart.
Accessing the Internet
Please note that internet access is possible only from rescomp1-3. By design, internet access is not available from the cluster nodes themselves, and by extension, it is not available to your code when running as a cluster job. For this reason, all data required for your cluster jobs must be downloaded to disk in advance of submitting your jobs.
Installing your own R packages
When you need to install your own R packages, please follow our dedicated guide.