Skip to main content

How do I interact with Jobs in Real Time?

Interactive Jobs

Batch jobs are submitted to slurm queuing system and runs when there is requested resource available. However, it can’t be used when user test and troubleshoot code in real time. Interactive jobs allow to interact with applications in real time. Users can then run graphical user interface (GUI) applications, execute scripts, or run other commands directly on a compute node.

Using srun command:

srun will submit your resource request to the queue. When the resource is available, a new bash session starts on reserved compute node. Same slurm flags are used for srun command.

Example:

CODE
## For KUACC 
srun -N 1 -n 4 -A users -p short --qos=users --gres=gpu:1 --mem=64G --time 1:00:00 --constraint=tesla_v100 --pty bash 

By this command, slurm reserves 1 node, 4 cores, 64GB RAM, 1 gpu and constraint flag limits gpu type to tesla_v100 gpus with 1 hour time limit in short queue. Then, opens a terminal on compute node. If the terminal on compute node is closed, job is killed on queue.

CODE
## For VALAR 
srun -p ai --gres=gpu:1 --mem=20G --cpus-per-task=4 --time=02:00:00 --pty bash 

By this command, Slurm reserves resources on ai partition with 1 GPU of any type (default T4 nodes if available), 4 CPU cores, and 20GB RAM for 2 hours. Then, opens a terminal on a compute node. If the terminal is closed, the job is killed on the queue.

CODE
## For VALAR with constraint
srun -p ai --gres=gpu:tesla_v100:1 --mem=30G --cpus-per-task=6 --time=04:00:00 --pty bash

By this command, Slurm reserves resources on ai partition with 1 GPU of type Tesla V100, 6 CPU cores, and 30GB RAM for 4 hours. Then, opens a terminal on a compute node. Closing the terminal will terminate the job.

Using salloc command:

salloc works same as srun. It will submit your resource request to queue. When the resource is available, it opens a terminal on the login node. However, you will have permission to ssh to reserved node.

CODE
srun --pty bash

Example: Same as in srun

CODE
## For KUACC 
salloc -N 1 -n 4 -A users -p short --qos=users --gres=gpu:1 --mem=64G --time 1:00:00 --constraint=tesla_v100 

By this command, Slurm reserves 1 node on the short partition with 4 CPU tasks, 64GB RAM, and 1 GPU. The --constraint=tesla_v100 flag restricts the allocation to nodes with Tesla V100 GPUs. The --qos=users and -A users options ensure the job runs under the users account and QoS. The time limit is set to 1 hour. After allocation, the user can start processes on the reserved node (for example with srun --pty bash). If the session is closed, the job is terminated in the queue.

CODE
## For VALAR 
salloc -p ai --gres=gpu:ampere_a40:1 --cpus-per-task=4 --mem=20G --time=03:00:00 

By this command, Slurm allocates resources on ai partition with 1 GPU of type NVIDIA A40, 4 CPU cores, and 20GB RAM for 3 hours. After allocation, the user can start a shell on the compute node with srun --pty bash.

CODE
squeue -u username

or

CODE
kuacc-queue|grep username
CODE
ssh username@computenode_name
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.