As of February 2022, running jobs on Kraken cluster compute nodes is only possible through the queuing system ([[https://slurm.schedmd.com/documentation.html|SLURM]]). Compute nodes are not directly accessible from the network. Special administrative node "kraken" is available for connecting to the cluster, preparing and submitting jobs to the queuing system. The Administrative node is not intended to run computations, use it primarily for data manipulation, submitting jobs to queues, and compiling custom programs. You cannot run parallel jobs outside of the queues. The "NoCompute" queue is intended for computationally inefficient parallel programs (e.g. Paraview). This node runs only on the administrative node (which has increased RAM to 320GB to allow for big data processing). In addition to [[computing:cluster:fronty:start#zakladni_prikazy|SLURM commands]], the command shows the current cluster load ''freenodes'' {{:computing:cluster:fronty:computing:cluster:fronty:freenodes.png?direct&400|}} **Users queue a job and let the queue system run the computation. The job is queued in order of the system's internal priorities and waits for exectuion. The queuing system runs the job as soon as the requested compute resources are available. Users do not need to monitor compute capacity availability themselves, they can log out of the cluster and wait for the computation to completed. There is an option for notification email to be sent upon completion of the job, for details see [[computing:cluster:fronty:start#parametry_prikazu_srun_a_sbatch|Command Overview]]).** See below for a basic description of how to work with the queue system (SLURM), specifics on starting specific applications are on separate pages: * [[computing:cluster:fronty:abaqus|Abaqus]] * [[computing:cluster:fronty:ansys|Ansys (Fluent)]] * [[computing:cluster:fronty:comsol|Comsol]] * [[computing:cluster:fronty:matlab|Matlab]] * [[computing:cluster:fronty:openfoam|OpenFOAM]] * [[computing:cluster:fronty:paraview|Paraview]] * [[computing:cluster:fronty:pmd|PMD]] ====== SLURM queuing system ====== The queuing system takes care of optimal cluster utilization, providing a number of tools for job submission, job control and parallelization. All tasks are performed by logging into the administrative node "kraken" (ssh username@kraken). Full documentation can be found at [[https://slurm.schedmd.com|slurm.schedmd.com]] ==== Basic commands: ==== === Running jobs === There are 2 commands to queue a job, ''srun'' and ''sbatch'': srun // Key command for queuing a job.// For parallel jobs, it replaces the "mpirun" command (mpi libraries in modules therefore do not offer the mpirun command either...).\\ ''srun'' in this form requests resources according to '''' and runs the program on them. If you are running a **non-**parallel job, leave the parameter ''-n 1'' (the default), if you choose a higher value, the non-parallel program will run n times! sbatch //Submit the job to the queue according to the prepared script, see examples below. The script for parallel jobs usually includes a line with the "srun" command (commercial codes are usually run without //srun//, like codes). // The most common way to queue a job is just sbatch+script. === Task management ==== sinfo //Lists the queues and their current usage.// squeue //Lists information about running jobs in the queue system.// Meaning of abbreviations in //squeue// ([[https://curc.readthedocs.io/en/latest/running-jobs/squeue-status-codes.html|complete list here]]): - In the "ST" (status) column: **R** - running, **PD** - pending (waiting for allocation of resources), **CG** - completing (some processes are finished, but some are still active),... - in the REASON column: **Priority** - the task(s) with higher priority is/are in the queue, **Dependency** - the task is waiting for the completion of the task in the dependency and will be started afterwards, **Resources** - the task is waiting for the release of the required resources,... scancel //Quit the queued task.// === User information === sacct //Lists information about the user's jobs (including history).// {{:computing:cluster:fronty:slurm_command_summary.pdf|List of commands with parameters in PDF document.}} ===== Running the tasks ===== Tasks can be run on multiple nodes, but always on one of the parts of the server: * part **M** - kraken machines-m1 to m10 (all users) * part **L** - kraken-l1 to l4 machines (limited access) Tasks can be run on: * directly from the line with the ''srun'' command * using a script with the ''sbatch'' command ==== Guidelines for running jobs ==== * A job must always run under a queue (partition). If no queue is specified, ''Mexpress'' is used. A list of defined queues is given below. * By specifying a queue, a run time limit is defined. * Tasks in the express and short queues cannot be given a longer run time using ''-''''-time''. The default queue long time is set to 1 week, but they allow running up to 2 weeks, e.g. 9 days and 5 hours by specifying '''-p Llong''' ''-''''-time=9-05:00:0''. * Slurm will prioritize the task and user that the cluster uses less when queuing pending tasks. Therefore, it is not advantageous to declare a longer computation time than strictly necessary. ==== Predefined queues and time limits ==== There are 6 queues ("partitions") on the Kraken cluster, divided by job run length (express, short, long) and cluster partition ("Mxxx" and "Lxxx"). If the user does not specify a queue with the ``-````-partition`` switch, the default value (Mexpress) is used: ^ cluster part ^ partition ^ node ^ time limit ^ | M (nodes kraken-m[1-10]) | **Mexpress** | kraken-m[1-10] | 6 hours | | ::: | Mshort | kraken-m[1-10] | **2 days** | | ::: | ::: | ::: | 3 days | | ::: | Mlong | kraken-m[3-6], kraken-m8 | **1 week** | | ::: | ::: | ::: | 2 weeks | | L (nodes kraken-l[1-4]) | Lexpress | kraken-l[1-4] | 6 hours | | ::: | Lshort | kraken-l[1-4] | 2 days | | ::: | Llong | kraken-l[1-4] | **1 week** | | ::: | ::: | ::: | 2 months (max) | | admin node only | NoCompute | kraken | **1 hour** | | ::: | ::: | ::: | 8 hours | *bold=default Details of the settings can also be viewed using the command scontrol show partition [partition_name] ==== Parameters for the ''srun'' and ''sbatch'' commands ==== The program run is controlled by parameters. For the ''srun'' command they are entered directly into the command line, for the ''sbatch'' command they are written into the startup script. In the script, each parameter is preceded by the identifier ''#SBATCH''. Options can be entered in two forms, either the full form ''-''''-ntasks=5'' (two hyphens and an equal sign) or the abbreviated form ''-n 5'' (one hyphen and a space). ^ option ^ description ^ example ^ | ``-J``, ``-````-job-name=`` | Job name, shown e.g. in output of //squeue// | ``-J my_first_job`` | | ``-p``, ``-````-partition=`` | Request a specific partition for the resource allocation | ``-p Mshort`` | | ``-n``, ``-````-ntasks=`` | Number of resources (~cores) to be allocated for the task | ``-n 50`` | | ``-N``, ``-````-nodes=`` | Number of nodes to be used | ``-N 3`` | | ``-````-mem`` | Job memory request | ``-````-mem=1gb`` | | ``-o``, ``-````-output=`` | Name of file where slurm will output | ``-o out.txt`` | | ``-e``, ``-````-error=`` | standard error to file | ``-e err.txt`` | | ``-````-mail-user=`` | User to receive email notification of state changes as defined by --mail-type | ``-````-mail-user=my@email`` | | ``-````-mail-type=`` | Send email with BEGIN, END, FAIL, ALL,... | ``-````-mail-type=BEGIN,END`` | | ``-````-ntasks-per-node=`` | Request that ntasks be invoked on each node | | | ``-t``, ``-````-time=`` | Set a limit on the total run time of the job allocation (days-hours:minutes:seconds) | ``-t 1:12`` | | ``-w``, ``-````-nodelist=`` | Request a specific list of hosts | ``-w kraken-m2,kraken-m[5-6]`` | | ``-x``, ``-````-exclude={...]}`` | Request that a specific list of hosts not be included in the resources allocated to this job | ``-````-exclude=kraken-m[7-9]`` | Do jmen souborů výstupu (output, error) lze začlenit proměnné jako jméno nodu (%N), číslo úlohy (%J), jméno uživatele (%u), apod... Výpis standardní chyby zadaný ve skriptu ''#SBATCH -e slurm.%N.%J.%u.err'' bude v souboru ''slurm.kraken-m123.12345.username.err'' Začátek běhu nově zadávané úlohy lze podmínit např. dokončením úlohy již běžící ``sbatch --dependency=after:123456:+5 myjob.slurm``, Here 123456 is the job number (according to the squeue listing) and "+5" indicates a delay of 5 minutes between the end of the previous job and the start of the new one. For a complete list of parameters, see e.g. [[https://slurm.schedmd.com/sbatch.html|Slurm-sbatch]]. \\ \\ ===== Example: compiling a parallel job and queuing it ===== In the /home/SOFT/modules/HelloMPI/ directory, you will find the source code for a simple parallel program to calculate π and a script to enter it into the SLURM queue system. After copying it to your own directory, you can test the queue settings in the file //slurm.sh// and the complexity of the calculation or the accuracy of the result, i.e. the parameter //n// in the source file //pi.c// . === Compilation === First we copy the directory to our local directory cp -r /home/SOFT/modules/HelloMPI/ ./ enter the directory cd HelloMPI The directory should contain 2 files, the source "pi.c" and the script "slurm.sh" ls To compile the program we will need the "mpicc" command. This is not available from the system, but it is provided by one of the mpi library modules (openmpi, mpich, intel-mpi, ...), e.g. ml openmpi After loading the module of the selected mpi library, we have the mpicc command available to compile the program mpicc pi.c -o pi We now have the executable file "pi" in the directory ls Queue the "pi" program using either the "srun" or "sbatch