Revolutionize Your Financial Journey with Smart Finance — Streamline Your Finance with Cloud Computing

Submitting a SLURM task to a Cluster: A Guide

If you're accustomed to utilizing Google Colab's free GPUs for Deep Learning model training, but desire a more robust solution with a cluster, and feel uncertain about the process, you've come to the right spot! During my Research internship in Neurosciences at Cambridge University, I......

, and Administrator

2025 July 25 . 10:26 AM

2 min read

Guide on Submitting a Job to a SLURM Cluster

Submitting a SLURM task to a Cluster: A Guide

Running Python scripts for machine learning tasks on a computing cluster equipped with SLURM (Simple Linux Utility for Resource Management) can be an effective way to tackle complex and resource-intensive projects. This guide aims to help users run basic Python scripts on a powerful cluster, as demonstrated in a research internship at Cambridge University.

To begin, prepare a SLURM batch script that specifies the job requirements and runs your Python code. Key SLURM directives include:

1. `#SBATCH --account=your_account` to specify your account. 2. `#SBATCH --partition=gpu_partition` or other relevant partition with GPUs. 3. `#SBATCH --gres=gpu:number` to request GPU resources. 4. `#SBATCH --mem=memory_amount` to allocate memory. 5. `#SBATCH --time=HH:MM:SS` to specify the maximum run time. 6. `#SBATCH -o output_file` and `#SBATCH -e error_file` for logs.

Next, use environment variables and modules to load Python and any required modules for your ML environment inside the script, such as `module load python` and `module load cuda`.

Once the environment is set up, run the Python script inside the SLURM job script, for example:

```bash python your_ml_script.py ```

For multiple jobs on GPUs, you can use GNU Parallel combined with `CUDA_VISIBLE_DEVICES` to run several single-GPU Python jobs concurrently without GPU conflicts.

Here's an example minimal SLURM batch script to run a Python ML task on one GPU:

```bash #!/bin/bash #SBATCH --account=def-myuser # Your project/account #SBATCH --partition=gpu # GPU partition #SBATCH --gres=gpu:1 # Request 1 GPU #SBATCH --mem=16G # Memory #SBATCH --time=02:00:00 # Max run time 2 hours #SBATCH -o myjob_%j.out # Stdout with job ID #SBATCH -e myjob_%j.err # Stderr with job ID

module load python/3.x # Load Python module module load cuda # Load CUDA if needed

python train_model.py # Run your ML Python script ```

When working with SLURM, remember to use `salloc` or `srun` for an interactive session with GPU(s) to test code before batch submission, and check cluster documentation for GPU types, available partitions, and environment modules.

To monitor the job's progress in the queue, use the `squeue` command. For detailed information about a specific job, use the `scontrol show job

It's important to note that if you modify the source code after submitting a job, the latest version will be used at the time of execution, not the one at the time of submission. Be patient and continue learning to optimize workflows.

SLURM manages, schedules, and supervises the execution of jobs, ensuring each job gets its required resources without collisions, delays, or wasting valuable resources. For large distributed ML training, you may integrate MPI or use frameworks like PyTorch’s distributed API inside your SLURM job.

Lastly, always respect the cluster's guidelines and resource limits. To search for all jobs with the same name, use the `squeue -j

In the context of running Python scripts for machine learning tasks on a computing cluster, data-and-cloud-computing technology synergizes with SLURM to manage, schedule, and supervise resource-intensive projects effectively. Utilizing SLURM directives, environment variables, and modules can help load Python and mandatory ML libraries within the script, facilitating the execution of Python ML tasks.

Latest

In this image I can see the watch. Background is in black and brown color.

Explore Latest Tech Innovations

Cartier Introduces New Santos de Cartier Steel & Titanium Models

Discover the latest Santos de Cartier watches. The steel model is available now, while the titanium version arrives in November.

, and Administrator

2025 October 9

In this image, we can see an advertisement contains robots and some text.

Protect Your Finances Online

Australian Organisations Face Growing Ransomware Threat via Supply Chains

Supply chains are the new frontline in the battle against ransomware. Australian organisations must improve communication and enforce robust security standards to protect themselves and their partners.

, and Administrator

2025 October 9