As an user of the cluster you have access to at least one allocation
account, the one belonging to the group which requested your access to the
HPC cluster. Some groups have normal and high priority allocations, and
some users are in/have access to allocations from more than one group. You
can see which allocations you have access to with the
All jobs that are submitted are associated with an allocation; this can be
specified with the
-A flag when the job is submitted, or will
use the submitter's default allocation (typically their normal priority
allocation if they have multiple allocations). With the exception of jobs in
serial queue, the CPU time for running your job (multiplied
by the number of processor cores consumed) will be charged to that allocation.
serial queue jobs are ultra-low priority and can be
preempted, we do not charge for CPU time on that queue.) The allocation
gets debitted at the start of the run for the estimated cost of the job
(based on the maximum walltime specified for the job); when the job actually
completes any needed adjustments are made between the estimated and actual
charges. If there are insufficient funds in the specified allocation to
cover the job when the job is about to be scheduled, the job will not be
scheduled and will become deferred until funds are available. NOTE
that the scheduler is NOT smart enough to try another account you
have access to if the initial allocation is depleted.
Groups can get allocations in one of two ways:
Paid allocations come in pairs; for a group named GROUP there will
be a normal priority allocations GROUP and a high priority
allocation GROUP-hi. As the name implies, jobs submitted with
the high priority allocation will be preferentially scheduled over jobs
using the normal priority allocation (unless are using the
queue, which ignores the allocation and runs all jobs with ultra-low priority).
Certain queues which can potentially
tie up much of the cluster will only accept jobs submitted with high priority
The HPC Allocations and
Advisory Committee can grant one-time unpaid allocations to faculty
and students for small projects, classes, feasibility tests, etc. Jobs
submitted with these allocations run at normal priority (unless submitted
serial queue, which ignores allocations and runs all jobs
at ultra-low priority).
If you only have a single allocation (check with the
command), you can skip this section. You only have the one allocation, so
there is nothing to chose.
If you have multiple allocations due to your membership in multiple groups, you may wish to choose which allocation you use based on your job. I.e., if the job is doing something for group A, you probably should only submit it using one of the group A allocations, even if you also have access to group B allocations. If the research areas of the two groups overlap, you will need to follow what ever group-specific policies may exist (contact your colleagues).
If you have access to normal and high priority allocations, you probably want to submit the job to the high priority allocation. These are replenished monthly, and funds do not carry over, so you might as well use it.
Of course, you need to ensure that the allocation you choose has sufficient funds for your job. If when your job is about to start running there are not sufficient funds to cover its expected cost (based on specified or queue specific maximum walltime and number of cores requested), your job will not run and instead be deferred until the time such funds are available. Note that the queuing system will NOT automatically select another allocation, if for example your high priority allocation is depleted but funds exist in your normal priority allocation. The job will just get deferred.
Note also that others in your group may have access to the same allocation, so just because funds were there when you submitted a job, someone else's jobs may have started since then and may reduce the funds in the allocation.
To specify the allocation to be used by a job, use the
option with qsub. E.g., if you have access to the
allocation and wish to submit a job
myjob.csh using that
allocation, the command would look something like
qsub -A clfshpc-hi myjob.sh. Of course, you may need to include
additional arguments as well. You can also add the line
#PBS -A clfshpc-hinear the top of your
myjob.cshscript instead of giving the
-Aoption on the command line.
Unpaid allocations do not get automatically replenished. Jobs will deplete funds in the allocation until the allocation runs out of funds, or the time limit for the project, etc. for which the allocation was granted by the HPC Allocations and Advisory Committee expires and the allocation is deleted.
Paid allocations get refreshed every month. For each group which contributed equipment to the cluster, a raw quarterly value equal to the amount of computation that can be done on that equipment in a month is computed (currently just number of cores times number of hours in a quarter; no adjustment for CPU speed is currently made). From this, 20% is removed for OIT overhead --- this covers administrative and other downtime, and some may be used for unpaid allocations. This is the groups quarterly allotment.
Every quarter, on the first day of the month (e.g. 1 Jan, 1 Apr, 1 Jul, 1 Oct), the normal priority allocation for each group is reset to the quarterly allotment. Any amount left over from the previous quarter is lost.
On the first of every month (after the quarterly allotments done if it is also the start of the quarter), the high priority allocation for the group is replenished by transferring funds from the normal priority allocation. It will be brought up to one third of the quarterly allotment (e.g. a monthly allotment) provided that there are sufficient funds in normal allocation. If there are not sufficient funds in the normal allocation, whatever amount is left in the normal allocation is moved to the high priority allocation.
If your group completely uses up exactly their hi-priority allocation every month, and does not directly use their normal priority allocation, at the beginning of each month in a quarter one should see:
In practice, you will see some variation, due to the high priority allocation not being completely depleted at the end of the month (so less that a full monthly allotment is transferred out of the normal priority allocation, resulting in it having more funds), and jobs running against the normal priority allocation, reducing its funds. Note: there is no rollover of unused funds from quarter to quarter in the normal priority allocation, or month to month in the high priority allocation. (Although unused funds in high priority allocation will mean less funds will be transferred out of normal priority allocation to refresh it, resulting in extra normal priority funds).
You and your research group are responsible for ensuring proper rationing of your allocations. Excessive use of funds in the first month of a quarter could result in no funds at all for the next two months in either allocation. This can be desired, if you have important deadlines at the end of the first month a the quarter, an advantage of the Deepthought HPCC model is that you can use nearly 3 times the power of the computers you purchased in a single to rush out computations, at the cost of having very limitted usage the following two months (but since is after the deadlines, that may not be important). But if this is because some junior member of the group is sending an excessive number of very expensive jobs, this can be quite problematic, especially as you might not notice the impact of the errant user until too late.
OIT cannot tell which jobs are important and which are not, or what is good usage of your allocation funds and what is not. If we notice seriously problematic usage (e.g. a job reserving 10 nodes but only running processes on 1 node), we will do our best to notify and instruct the relevant users. But you are responsible for monitoring your own jobs, and it behooves you to monitor jobs of other users of your allocations. We will provide the necessary tools to do such, but we strongly advise all research groups to have at least one person monitor the usage of their allocations' funds regularly to ensure there are no problems, or at least catch any problems early.
The first level of monitoring of your allocations is with the
mybalance command, or the very similar
payerle:f20-l1:~>mybalance Project Machines Balance --------- -------- ------------ test ANY 72000000 test-hi ANY 14399061 payerle:f20-l1:~>gbalance -u payerle Id Name Amount Reserved Balance CreditLimit Available -- --------- ------------ -------- ------------ ----------- ---------- 33 test 72000000 0 72000000 0 72000000 34 test-hi 14399061 0 14399061 0 14399061
By default, both commands return balances in CPU-seconds. Yuo can give
-h flag to return in more tractable CPU-hours.
All allocations you have access to and their balances are listed. The
numbers listed under
Reserved if any are for jobs currently
running (an amount equal to expected cost based on specified or queue-limit
walltime is reserved when job starts, when job finished the reservation is
lifted and actual usage is charged).
For a history of usage by you and others in your group, in either tabular or graphical form, there is a web form you can use to query the jobs database; you can access it via http://deepthought.umd.edu/stats. There are many options available.
To view the combined normal/high priority usage for the quarter for a
group, the script
quarterly_project_usage is available, e.g.
payerle:f20-l1:~>quarterly_project_usage -p myproject Quarterly usage summary for allocations myproject/myproject-hi For quarter beginning Jul-2010 Quarterly allocation is 616.70 kSU or 205.57 kSU per month User kSU used (number of jobs) Quarterly Total Jul-2010 Aug-2010 Sep-2010 user001 294.30 kSU (4308) 160.05 kSU (2529) 121.08 kSU (1499) 13.17 kSU (280) user002 27.88 kSU (874) 5.04 kSU (181) 21.31 kSU (636) 1.53 kSU (57) user003 21.20 kSU (620) 7.66 kSU (193) 13.55 kSU (427) 0.00 kSU (0) user004 14.27 kSU (1585) 0.00 kSU (0) 5.09 kSU (784) 9.17 kSU (801) user005 12.13 kSU (720) 9.00 kSU (517) 2.49 kSU (182) 0.64 kSU (21) user006 9.34 kSU (254) 0.00 kSU (0) 9.34 kSU (254) 0.00 kSU (0) user007 1.74 kSU (207) 1.42 kSU (196) 0.00 kSU (0) 0.32 kSU (11) user008 0.54 kSU (9) 0.54 kSU (9) 0.00 kSU (0) 0.00 kSU (0) TOTALS 381.41 kSU (8577) 183.71 kSU (3625) 172.87 kSU (3782) 24.83 kSU (1170) % of alloc 61.85 % 89.37 % 84.09 % 12.08 %
As indicated in sample output, the usage is reported in kSU (1000 CPU-hour) units, and is compared to the monthly/quarterly allocation. You should show concern if the percent used for the month is significantly in excess of the portion of the month that has been past; e.g. if your monthly allocation is 40% used and it is only 1 week into the month. Similarly, if percent of the quarterly allocation consumed is significantly in excess of the fraction of the quarter past; e.g. if quarterly allocation is 40% consumed and only halfway through the first month of the quarter, there is likely to be problems.
For generating reports of which members of your group used how much of the
allocation, the script
prove more useful. Usage is:
Times should be given as
YYYY-MM-DD. This will give summary
of fund usage during the time period, e.g.
f20-l1:~: usage_report -p myProj -s 2010-02-01 -e 2010-03-01 # Statement for project myProj # Generated on Thu Feb 25 11:00:19 2010. # Reporting account activity from 2010-02-01 to now. ############################### Debit Summary ################################## Object Action Project User Machine Amount Count ------- -------- -------- ------- ----------- --------- ----- Job Charge myProj user1 deepthought -21142.45 6 Job Charge myProj user2 deepthought -0.09 2 Job Charge myProj user3 deepthought -964.37 4 Total Debits: -22106.91 Total Jobs: 12
If even more detail is desired, the
gstatement command can be used,