IT Help

scontrol show node

The command scontrol show node <node_name> is used to display detailed information about a specific compute node in the SLURM cluster.

Example Output:

5e770a62-711b-4021-8c56-fbc6415128ba.png




Explanation of Fields

Field

Description

NodeName

The name of the node (e.g., ai28).

Arch

CPU architecture (e.g., x86_64).

CoresPerSocket

Number of CPU cores per socket.

CPUAlloc / CPUEfctv / CPUTot / CPULoad

CPUAlloc: Number of cores currently allocated to jobs. CPUEfctv: Effective CPUs available for scheduling (usually same as CPUAlloc). CPUTot: Total number of CPU cores. CPULoad: Current CPU load (system average over 1 minute).

AvailableFeatures / ActiveFeatures

Node-specific tags (features) defined in SLURM configuration, used for constraints in job submissions. (null) means no custom features defined. Example usage: sbatch --constraint=tesla_v100 job.sh.

Gres

Generic resources on the node — here it shows gpu:lovelace_l40s:4, meaning 4 NVIDIA L40S GPUs.

NodeAddr / NodeHostName

The network address and hostname of the node.

Version

SLURM version running on this node.

OS

Operating system and kernel version.

RealMemory / AllocMem / FreeMem

RealMemory: Total physical memory available on the node (in MB). AllocMem: Memory currently allocated to running jobs. FreeMem: Unused memory reported by slurmd.

Sockets / Boards / ThreadsPerCore

Hardware topology — number of sockets, boards, and threads per core.

State

Node status: IDLE: Node is free. ALLOCATED: Fully used by jobs. MIXED: Partially used.DOWN or DRAIN: Unavailable for jobs.

TmpDisk

Temporary local disk size (in MB or GB).

Weight

Scheduling weight — higher values increase node selection priority.

Partitions

List of partitions where this node belongs (e.g., avg).

BootTime / SlurmdStartTime

When the node was last booted and when the SLURM daemon (slurmd) started.

LastBusyTime

Timestamp of the last activity (when the node last ran a job).

ResumeAfterTime

If configured, the time after which the node automatically resumes from a suspended state.

CfgTRES

Configured “Trackable Resources” for this node — includes CPUs, memory, GPUs, etc.

AllocTRES

Currently allocated resources (to active jobs).

CurrentWatts / AveWatts

Power consumption metrics (if sensors are available). Here, both are 0, meaning no power monitoring data.