Job Manager

When users log in to the system, they log in to the frontends that give access to the system resources. Users should not run any work on these computers. All users work on these machines and the execution of processes on them, slows down the work of other users.
All HPC jobs running on the system must be executed on the calculation nodes by sending a script to the SCAYLE job manager.
The job manager or queue manager is a system that sends jobs to the calculation nodes to which each user has access, controls their execution and prevents several jobs from sharing the same resources, thus increasing their execution times. The manager used by SCAYLE is SLURM.

In the case of SLURM, the most commonly used commands are:

  • sbatch: this Slurm command is used to send the job manager the script for the job we want to run. For example:

    [user@frontend1 ~]$ sbatch

    It will send the OpenFOAM fluid dynamics software execution script to the job manager.

  • squeue: shows the status of submitted jobs.

    [user@frontend1 ~]$ squeue 
             94631 cascadelake  JOB_OF   user     R   3:47:29  2      cn[5008-5009]

    The information shows us all the jobs that our user has running. For security and privacy reasons, users can only access the information of their own works, not being able to access any kind of information from other users.
    In the previous example, the user has only one job running with a JOBID value of 94631. This JOBID is a unique number that the job manager assigns to each of the jobs it manages. Under PARTITION it informs which group of servers the job is running on. NAME reports the name of the job that was defined in the submit script. USER indicates the owner of the job. ST reports the status of the job, in this case R (running) in execution. There is another PD status, pending, which indicates that the job is waiting to be executed. TIME reports the time the job has been running. Finally, the number of servers used and their names are detailed in the NODES and NODELIST columns.

  • scancel: cancels the execution of a job that is running. For example, according to the example above if we want to cancel the job with JobID 94631:

    [user@frontend1 ~]$ scancel 94631
  • salloc: creates an interactive session with the compute nodes. The primary use of this command is to be able to compile jobs on nodes with the same characteristics as those that will be run later. For example, by running the following command:
    [user@frontend1 ~]$ salloc --ntasks=16 --time=60 --partition=cascadelake

    The job manager will assign an access to one of the calculation nodes of the partition called cascadelake (--partition=cascadelake), for a maximum of 60 minutes (--time=60) and will allow us to use 16 cores for our code compilation or test tasks.

The first step in being able to submit a job to the job manager is to write a submission script containing two types of lines: directives for the job manager and Linux commands.

The latter are the commands that will be interpreted by the Linux shell defined in the first line of the script (#!/bin/bash). The directives for the job manager are placed at the beginning of the script and in the case of SLURM they are lines that begin with the string "#SBATCH" followed by the different options available. These directives are processed by the manager when the script is sent with the sbatch command and serve to provide information to the manager and thus allow the execution nodes to perform the work as desired by the user. For example, the following batch script:

#SBATCH –-ntasks=32 
#SBATCH –-job-name=hello_world 
#SBATCH –-mail-type=ALL 
#SBATCH --output=hello_world_%A_%a.out 
#SBATCH --error=hello_world_%A_%a.err 
#SBATCH –-partition=haswell 
#SBATCH –-qos=normal 
#SBATCH --time=0-00:05:00 
source /soft/calendula2/intel/ipsxe_2018_u4/parallel_studio_xe_2018/ 
srun -n $SLURM_NTASKS hello_world.ex

shows an example of a batch script that will run a basic program on 32 cores. In line 1, as already detailed, you specify the type of shell that will execute the linux commands of the script. All lines beginning with #SBATCH are the directives that the task manager will interpret.
In this example:

#SBATCH --ntasks=32; sets the number of cores desired for the execution of the script.
#SBATCH --job-name=hello_world; name assigned to the job.
#SBATCH; email address to which job-related notifications will be sent.
#SBATCH --mail-type=ALL; defines in which circumstances an e-mail will be sent to the user. in this case "ALL" will be at the beginning of the execution, at the end of the execution and in case the task is cancelled. 
#SBATCH --output=hello_world_%A_%a.out; it is the standard output file. If no output file is defined for the errors, by default the standard output of the execution and the output of the possible errors are unified in a single file. 
#SBATCH --error=hello_world_%A_%a.err; defines the error output file. 
#SBATCH --partition=haswell; partition to which the job is sent. 
#SBATCH --qos=normal; qos with which the job is sent.
#SBATCH --time=0-00:05:00; (D-HH:MM:SS) time limit for job.

IMPORTANT: For the job to work properly it is mandatory to add the parameter #SBATCH --time=(D-HH:MM:SS) to the script.
Where D is days, HH is hours, MM is minutes and SS is seconds.
The time defined with parameter ---time will in no case have priority over the maximum execution time associated with the QOS of the job.

As previously indicated, to check the status of the jobs submitted by the user the command will be:

$ squeue

Each QOS (Quality Of Services) allows you to customize various parameters such as the maximum time a job can run, the maximum number of cores that can be requested by a user or which users can send jobs to that partition. The default QOS used by users if nothing is specified is the normal QOS.
By default, users have access to certain limits. To request access to a particular QOS, the user must contact the support staff.

These are the limits of the QOS we have available where:

  • MaxWall: This is the maximum time that can be requested when sending a job (days-hours:minutes:seconds).
  • MaxTRESPU: This is the maximum core number which a user can book simultaneously.
  • MaxJobsPU: This is the maximum job number which are running simultaneously by a user.
Name Priority MaxWall MaxTRESPU MaxJobsPU
normal 100 5-00:00:00 cpu=512 50
long 100 15-00:00:00 cpu=256
xlong 100 30-00:00:00 cpu=128

These QOS may change depending on the needs of the system.

When the same job has to be repeated a series of times only varying the value of some parameter, the task manager allows this task to be performed automatically. This type of work is called array jobs.
To send an array job you must use the option --array of the sbatch command, for example from the command line:

 frontend> sbatch ... --array 1-20 ...

would send 20 synultaneous executions of the program. If we want to include it in our own script, we should add it to the rest of the task manager options:

  #SBATCH --output=hello_world_%A_%a.out 
  #SBATCH --error=hello_world_%A_%a.err
  #SBATCH –-partition=haswell 
  #SBATCH –-qos=normal 10
  #SBATCH –-array=1-20

Given the characteristics of the limits of the queuing system, in Slurm there is the option of determining the number of jobs that we want to have simultaneously in execution.

  #SBATCH –-array=1-20%4

With the previous line we indicate that we want to launch an array job of 20 jobs and that simultaneously are running 4.

This does not guarantee that the jobs enter one after the other. It depends on the load of the machine and priorities.

There are a number of environment variables that are defined in the work environment when the script is executed through the task manager. These variables can be used in the script. Among the most interesting for the usual use are the following:

  • $SLURM_JOB_ID: job identifier.
  • $SLURM_JOB_NAME: job name.
  • $SLURM_SUBMIT_DIR: sending directory.
  • $SLURM_JOB_NUM_NODES: number of nodes assigned to the job.
  • $SLURM_CPUS_ON_NODE: number of cores/node.
  • $SLURM_NTASKS: total number of cores per job.
  • $SLURM_NODEID: index of the node that is executed in relation to the nodes assigned to the work.
  • $SLURM_PROCID: index of the task in relation to the work.