squeue
, e.g.
login-1: squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1243530 standard test2.sh payerle R 18:47:23 2 compute-b18-[2-3]
1244127 standard slurm.sh kevin R 1:15:47 1 compute-b18-4
1230562 standard test1.sh payerle PD 0:00 1 (Resources)
1244242 standard test1.sh payerle PD 0:00 2 (Resources)
1244095 standard slurm2.sh kevin PD 0:00 1 (ReqNodeNotAvail)
The ST
column gives the state of the job,
with the following codes:
The NODELIST(REASON)
field will tell you on which nodes jobs
that are currently running are running on. If the job is pending (i.e.
not running), it will give a short explanation for why the job is not running
(as of the last time the scheduler examined the job). Typically one might
see something like:
(Resources)
if the scheduler is unable to find
sufficient idle resources to run your job (i.e. the cluster is too busy to
run your job at this time. The job should run once resources become available
(i.e. some currently running jobs complete, freeing resources)(Priority)
if their are other jobs with higher priority
ahead of yours in the queue. The job should run once the jobs ahead of it get scheduled.(AssocGrpCPUMinsLimit)
or (AssociationJobLimit)
: these generally mean that your allocation account has insufficient
funds available to complete this job and all currently running jobs charging
against that allocation account. See the relevant FAQ entry for more information. This job will only run
if the currently running jobs complete using much less SUs than predicted
(based on their wall time limit) and/or if the allocation account gets
replenished.(QOSResourceLimit)
generally occur only if you have
submitted a large number of jobs. Some of those jobs will be held in a
pending state to prevent adverse impact on the rest of the cluster. These
jobs will typically run once the job count is reduced (by currently running
jobs completing). See the
relevant FAQ entry for more information.Typically, if you see something note in the above list, there is a problem and you will want to contact systems staff to assist.
The squeue
command also takes a wide range of options, including
options to control what is output and how. See the
man page (man squeue
)
for more information.
For example: if you add the following to your ~/.aliases
file (assuming you are using a C-shell variant):
alias sqp 'squeue -S -Q -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %Q %R"'
sqp
will list jobs in the
queue in order of descending priority.
The scheduler tries to schedule all jobs as quickly as possible, subject to cluster policies, available hardware, allocation priority (contributers to the cluster get higher priority allocations), etc. Typically jobs run within a day or so, but this can vary and usage of the cluster can vary widely at times.
The command squeue
command, with the appropriate
arguments, can show you the scheduler's
estimate of when a pending/idle job will start running. It is, of course,
just the scheduler's best estimate, given current conditions, and the actual
time a job starts might be earlier or later than that depending on factors such
as the behavior of currently running jobs, the submission of new jobs, and
hardware issues, etc.
To see this, you need to request that squeue
show
the %S
field in the output format option, e.g.
login-1> squeue -o "%.9i %.9P %.8j %.8u %.2t %.10M %.6D %S"
JOBID PARTITION NAME USER ST TIME NODES START_TIME
473 standard test1.sh payerle PD 0:00 4 2014-05-08T12:44:34
479 standard test1.sh kevin PD 0:00 4 N/A
489 standard tptest1. payerle PD 0:00 2 N/A
Obviously, the times given are estimates. The job could start earlier if other jobs ahead of it in the queue do not use their full walltime, or could get delayed if jobs with a higher priority than yours are submitted before your start time.
To get more detailed information about a job that is currently running or in
the queue, you can use the
scontrol show job JOBNUMBER
command. This command
provides much detail about your job, eg.
login-2> scontrol show job 486
JobId=486 Name=test1.sh
UserId=payerle(34676) GroupId=glue-staff(8675)
Priority=33 Account=test QOS=normal
JobState=PENDING Reason=Priority Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=00:00:00 TimeLimit=00:03:00 TimeMin=N/A
SubmitTime=2014-05-06T11:20:20 EligibleTime=2014-05-06T11:20:20
StartTime=Unknown EndTime=Unknown
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=standard AllocNode:Sid=pippin:31236
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=2 NumCPUs=8 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/export/home/pippin/payerle/slurm-tests/test1.sh
WorkDir=/home/pippin/payerle/slurm-tests
The scontrol command above will only display information about jobs which are either running or in the queue. Often it is useful to get information about jobs which have already finished. The sacct command allows one to inspect jobs which already completed (although you can look at queued/running jobs with the sacct command as well, most of the interesting information is not available until after the job completes).
A full listing of the options to the sacct command can be found using
the man sacct
command. We discuss only a small subset
of the options here, but hopefully the more commonly used ones.
Options can be divided into two broad categories: filtering which
jobs/job steps to display, and controlling what information is displayed.
The following options are useful for filtering jobs with sacct:
There are a lot of fields that the sacct command is able to display for jobs. Generally you will wish to specify what is displayed. The flag -o FIELDS or --format FIELDS, where FIELDS is a comma delimited list of field names, with the following field names commonly used:
billing
is also of note; it indicates
the hourly SU cost of the job.
You can see the maximum memory needed by the job on any of the nodes
assigned to the node by looking at the MaxRSS
field. You
will need to look for the largest value among all of the records displayed
for the job (Note: You should not use
the -X
or --allocations
flag in the sacct
command since the maximum memory usage typically occurs in one of
the child job steps and is larger than the number displayed for the
main job step). For example, if you do
login-1: sacct -j 1004702 -o JobID,MaxRSS -p
JobID|MaxRSS|
1004702||
1004702.batch|1021268K|
1004702.extern|0|
We see that the maximum value of MaxRSS for job 1004702 is 1021268KiB or 1021268 KiB * 1 MiB/1024 KiB * 1 GiB/1024 MiB = 0.97 % INCLUDE glossary_term term="gib" text="GiB" %].
You can compute the SU cost for the job with the Elapsed
and AllocTRES
fields. In the AllocTRES field, look for
the value associated with the TRES named billing
--- this
the SU cost per hour of walltime for the job. Then multiple by the
elapsed walltime as given in the Elapsed field. You do not
need to do this for every line/job step for the job but just for the main
job step, as the main step includes the resources and walltime for the other
job steps (i.e., for this purpose you can use the -X
flag).
E.g., If elapsed is 2:45:00 and billing shows 16, the SU cost of the
job is 2.75 hours * 16 SU/hour = 44 SU.
Slurm outputs the stdout
and stderr
streams
for your job to the files you specified on the shared filesystem
in real time. There is no need for an extra command like qpeek
under the PBS/Moab/Torque environment.
Sometimes one needs to kill a job. To kill/cancel a job that is
waiting in the queue, or is already running, use the scancel
command:
login-1> scancel -i 122488
Cancel job_id=122488 name=test1.sh partition=standard [y/n]? y
login-1>
Notices of scheduled and unscheduled outages, issues, etc on the clusters will be announced on the appropriate mailing lists (e.g. HPCC Announce for the Deepthought* clusters) --- users are automatically subscribed to these lists when they get access to the cluster.
Sometimes you want a broader overview of the cluster. The
squeue
command can give you information on what jobs are
running on the cluster. The sinfo -N
command can show
you attributes of the nodes on the cluster. But both of these use
a text orientated display, which while providing fairly dense amount of
information, is often difficult to digest.
The sview
command uses real
(not text mode) graphics to show the status of the cluster. As such
it requires an X server running on the computer
you are sitting at. This will present a graphical overview of the nodes
in the cluster and their state, as well as the job queue.
|
PLEASE SET THE REFRESH INTERVAL to something like 300 seconds (5 minutes).
Select
Options| Set Refresh Interval . The application default
is far too frequent and causes performance issues.
|
For an even prettier view, there are online pages for monitoring for the clusters at: