IT Help

jtop

jtop is a monitoring tool used to track the real-time resource usage of jobs running on the cluster, including CPU load, memory consumption, and GPU status. It is particularly useful for analyzing the efficiency of jobs across different nodes.

Usage

jtop [options] 

Options

  • The following flags can be used to filter or modify the output:

  • -n <node>: Queries jobs on a specific target node.

  • -u <user>: Filters the job list by a specific username.

  • -p <partition>: Filters jobs by a specific partition or queue.

  • -l: Shows only the jobs running on the local node.

  • -h: Displays the help message and usage details.

Output Columns

When you run jtop, the output contains the following information:

  • JOBID: The unique ID assigned to the job by Slurm.

  • USER: The owner of the running job.

  • ELAPSED: The total wall-clock time the job has been running (Format: Days-Hours:Minutes:Seconds).

  • CPU: The total number of CPU cores allocated to the job.

  • RUN: The current CPU utilization rate (e.g., 0.99 indicates 99% usage of a core).

  • D: Represents disk I/O or processes in an uninterruptible sleep state.

  • RSS(MB): The current Resident Set Size (physical memory) being used, measured in Megabytes.

  • GPU: The allocated GPU resources and their specific types.

  • NODE: The specific compute node where the job is executing.

Examples

List all jobs in the 'ai' partition:

/usr/bin/jtop -p ai 

List all jobs for a specific user:

/usr/bin/jtop -u valar 

List jobs for a specific user on a specific node:

/usr/bin/jtop -n ai01 -u valar 

Show only the jobs running on the local node:

/usr/bin/jtop -l 


Note: jtop provides a snapshot of active processes. Monitoring the RUN and RSS(MB) columns is recommended to ensure your jobs are utilizing the requested resources effectively without hitting limits.