Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Quick Links

USING PYTHON ON THE BMRC CLUSTER

The principal method for using Python on the BMRC cluster is to load one of our pre-installed software modules. To see which versions of Python are available run (noting the capital letter):

module avail Python

Our pre-installed Python modules include a number of common packages. To see which packages are included run e.g.:

module whois Python/3.7.4-GCCcore-8.3.0

and then check the Extensions list.

In addition to Python itself, we have a number of auxiliary Python modules which can be loaded in order to access other widely used packages. For example, scipynumpy and pandas are available through the SciPy-bundle-... modules. To see which versions of SciPy-bundle-... are available run:


module avail SciPy-bundle

To find a SciPy-bundle module that is compatible with your chosen Python module, check the Python version noted in the name and the toolchain. For example, SciPy-bundle/2019.10-foss-2019b-Python-3.7.4 is compatible with Python/3.7.4-GCCcore-8.3.0, because

  1. they use the same version of Python, and
  2. they have compatible toolchains because the GCCcore-8.3.0 toolchain is part of the foss-2019b toolchain (to verify this, use module show foss/2019b).

If in doubt, simply try to load both modules together - if they are incompatible, an error will be reported.

Jupyter Notebook for Remote Python Coding

Jupyter Notebook is a popular Python-based software that allows you to edit and run Python code remotely over a web-browser. Here's how to use it.

  1. Login to cluster1 or cluster2.
  2. If you plan to use Jupyter notebook only with software available through our modules system, then proceed to step 3. Otherwise, if you will need to install your own python modules then you need to setup your own python virtual environments as described in the section below. You will need to create two python virtual environments, one for Skylake and one for Ivybridge (as explained in the section below). Inside each virtual environment, manually install Jupyter Notebook using pip install --force notebook.
  3. While logged in to cluster1 (aka rescomp1) or cluster2 (aka rescomp2), start an interactive cluster session using e.g. srun -p short --pty bash Make a note of which node is running your interactive session by using the hostname -s command or checking your prompt.
  4. If you are using only modules, then at this point you can load the iPython module:
    module load IPython/7.9.0-foss-2019b-Python-3.7.4

    Alternatively, if you are using your own python virtual environment then you need to load the same Python module that you used to create your virtual environment and then load your virtual environment. Run echo $MODULE_CPU_TYPE to see whether you are currently on a skylake or ivybridge host and then activate the appropriate python virtual environment.
  5. Start jupyter notebook: jupyter notebook --no-browser --ip=*

    After running this command, you will see several lines of text appear on screen. The last few lines will look as below - you need only look at the line which begins http://127.0.0.1... 

    To access the notebook, open this file in a browser:
    file:///[....].local/share/jupyter/runtime/nbserver-16902-open.html 
    Or copy and paste one of these URLs:
    http://<interactive_host_name>:8888/?token=59836e245b9bc3ba915d3d7ab31f3fc15f257972ed5c5ea3
    or http://127.0.0.1:8888/?token=59836e245b9bc3ba915d3d7ab31f3fc15f257972ed5c5ea3

    Note that your own port number and token may differ from those shown here and <interactive_host_name> will be the full version of your interactive hostname eg. compc001.hpc.in.ox.ac.uk that you discovered above. In the following instructions, make sure to use the information shown on your own screen
    . 
  6. At this point, you need to create a tunnelled connection from your own computer to your qlogin session. First take note of the port number which could be 8888 as shown above or another number.
  7. Then open a new terminal window ON YOUR OWN COMPUTER (i.e. not on rescomp1 or rescomp2) and create an SSH tunnel following this template: 
     
    ssh -L 8888:interactivehostname:8888 username@cluster1.bmrc.ox.ac.uk
     
    Remember to use your own port number in place of 8888, as well as your own qloginhostname (the short version is sufficient) and your own username.
    NB1 If you are also running jupyter notebook locally on your own computer, it is likely that port 8888 on your own computer is already in use. If so, change the first port number (before qloginhostname) to e.g. 9999. Make sure that the second port number (after qloginhostname) matches what you saw in Step 5.
    NB2 If you have configured your local SSH client to re-use a single SSH connection (i.e. if you have configured "ControlMaster auto" or similar in your ~/.ssh/config) then you should create your SSH tunnel via cluster2 instead of cluster1.
  8. After running the tunnel command your terminal will appear to be logged into cluster1 and (invisibly to you) an additional connection now exists between your computer and your qlogin host. Now open a web browser on your own computer and copy the line from your own terminal corresponding to the http://127.0.0.1... line above.
    NB If you have changed your local port number when creating the SSH tunnel, you will also need to change the port number in the https://127.0.0.1.... url.
  9. Paste the newly copied line into your web browser and Jupyter notebook will appear 
  10. To close down, click the Quit button in the top right of Jupyter notebook and then close all your terminal windows. 

 

Python Virtual Environments and Local Packages

In most cases if you require some software or Python packages which is not yet installed on the cluster, it is best to email us to request it. When sending software requests, please ensure that you send us sufficient information including the software name, its homepage or download page, and whether you wish to use it in conjunction with any other particular software modules.

In some cases, however, you may wish to try out software packages or install them for testing purposes. In these cases, installing your own packages via a Python virtual environment may be the best way.

On the BMRC cluster, we recommend the use of Python virtual environments in preference to other ways of handling multiple python installations. A python virtual environment provides you with a local copy of python over which you have full control. including which packages to install.

The need for dual virtual environments

At any one time the BMRC cluster comprises computers with different generations of CPU architecture. Currently, these fall into two groups. Our C and D nodes, as well as rescomp3 use Ivybridge-compatible CPUs while our E and F nodes, as well as cluster1-2 use skylake CPUs. Software built for skylake will not run on Ivybridge, while software built for Ivybridge can run on Skylake but will not take advantage of the newer capabilities. For this reason, we in fact maintain two separate libraries for our pre-installed software - one for Ivybridge and one for Skylake - although this is normally invisible to the user because our system chooses automatically which software version to make available when you load something. When creating and managing your own environments, however, you will need to manage this yourself.

Creating and managing your own python virtual environments

Here is an example of how to create and manage your own python virtual environments. Using this method, you create local package libraries on disk. Once configured, you can then install or remove packages using e.g. pip as you wish.

In order to ensure that your code will work across all cluster nodes (whether those nodes using ivybridge or skylake CPUs), the overall goal is to create two near-identical local package libraries, one for skylake CPUs and one for Ivybridge CPUs, and to select the correct one automatically when needed.

  1. First login to either rescomp1 or rescomp2, which use skylake CPUs. Use module avail Python to list and choose a suitable version of Python e.g. Python/3.7.4-GCCcore-8.3.0 and then module load Python/3.7.4-GCCcore-8.3.0 to load it.
  2. We will assume you wish to create a python virtual environment called projectA. First, find a suitable place on disk to store all your python virtual environments e.g. /well/<group>/users/<username>/python/ . Create this directory before continuing and then cd into it.
  3. Once inside your python directory, run

    python -m venv projectA-skylake

    This will create a new python virtual environment in the projectA-skylake sub-folder. Once this is created, you must activate it before using it by running

    source projectA-skylake/bin/activate

    Notice that your shell prompt changes to reflect virtual environment. Once it is activated, you can now proceed to install software e.g. by using the pip search XYZ to search for software and then pip install XYZ to install it. Repeat the process to install all the packages you need.
  4. Once you have installed all the packages you need in projectA-skylake run pip freeze > requirements.txt . This will put a list of all your installed packages and their versions into the file requirements.txt . We will use this file to recreate this environment for Ivybridge.
  5. Run deactivate to deactivate your projectA-skylake environment and then ssh to rescomp3. Note you can only reach rescomp3 by first logging into rescomp1-2 and then typing ssh rescomp3 .
  6. Once logged into rescomp3, you should load the same Python module your previously loaded on rescomp1-2 e.g. module load Python/3.7.4-GCCcore-8.3.0 . Note that our system automatically takes care to load the Ivybridge version of this software now that you are on rescomp3.
  7. cd  to your python folder (i.e. the parent folder in which projectA-skylake is located) and now create a second virtual environment by running

    python -m venv projectA-ivybridge

    Once this is created, activate it by running

    source projectA-ivybridge/bin/activate .
  8. With the projectA-ivybridge environment activated, you can copy all the same packages that were previously installed into the skylake repository by running pip install -r /path/to/requirements.txt i.e. using the requirements.txt file you created earlier. Once python has finished installing all the packages from requirements.txt, run deactivate to deactivate your current python environment.
  9. You now have two identical python virtual environments, one built for skylake and the other is built for ivybridge.

Now that you have two identical environments, one for ivybridge and one for skylake, it only remains to choose the correct one to activate in your job submissions scripts. To do that, you can copy or amend the following sample submission script:

#!/bin/bash

# note that you must load whichever main Python module you used to create your virtual environments before activating the virtual environment
module load Python/3.7.4-GCCcore-8.3.0


# Activate the ivybridge or skylake version of your python virtual environment
# NB The environment variable MODULE_CPU_TYPE will evaluate to ivybridge or skylake as appropriate
source /path/to/projectA-${MODULE_CPU_TYPE}/bin/activate


# continue to use your python venv as normal

 

Conda, Anaconda and Miniconda

For all software requirements, we strongly recommend that you make use of our software modules and the methods for handling python virtual environments described above. While conda is often useful in the context of personal machines, it is often a less good fit for a cluster environment.

Where use of conda is preferred, we recommend that you make use of the supplied Anaconda modules rather than installing conda yourself. You can see which versions are available by running

module avail Anaconda

As with python virtual environments, special handling is required for conda as further described below.

Conda Configuration

Using conda is likely to run immediately into problems which can be solved using the configuration below.

By default, conda will store your environments and downloaded packages in your home directory under ~/.conda - this will quickly cause your home directory to run out of space. So conda needs to be configured to store your files in your group folder.

As with python virtual environments (described above), there will also be issues of CPU compatibility (ivybridge vs skylake).

The configuration described below is intended to address both of the above problems.

 

  1. Login to cluster1 and create a dedicated conda folder in your group home folder with subdirectories for packages and environments. NB replace group and username with your own group and username
    mkdir /well/group/users/username/conda

  2. Create the file ~/.condarc containing the following configuration.
    NB1 indented lines are indented two spaces
    NB2 Replace group and username with your own group and username

    channels:
      - conda-forge
      - bioconda
      - defaults
     
    pkgs_dirs:
      - /well/group/users/username/conda/${MODULE_CPU_TYPE}/pkgs
    envs_dirs:
      - /well/group/users/username/conda/${MODULE_CPU_TYPE}/envs
  3. Before activating and using a conda environment, you must initialise conda itself. You can do this, either in the shell or in your qsub scripts as follows (using the Anaconda3/2022.05 module as an example):

    module load Anaconda3/2022.05
    eval "$(conda shell.bash hook)"

  4. In order to allow your conda environments to work well on both Ivybridge and Skylake CPUs, you must create identical conda environments using both cluster1 or cluster2 (for skylake) and rescomp3 (for ivybridge). The condarc file above allows you to re-use the same environment name, but it is essential that you create one environment for skylake and one for ivybridge. The basic workflow would be:

    Starting on cluster1 or cluster2...
    module load Anaconda3/2022.05
    eval "$(conda shell.bash hook)"
    conda create -n myproject python=3
    conda install....

    Then repeat the above on rescomp3.
  5. When submitting jobs scripts that rely on conda, remember to load and activate conda within your script using:

    module load Anaconda3/2022.05
    eval "$(conda shell.bash hook)"
    conda activate..
    python ...