• National Institute for Computational Sciences is a UT/ORNL Partnership

Job Accounting

PBS allocates cores to batch jobs in units of the number of cores available per node. A node cannot be allocated to multiple jobs, so a job is charged for the entire node whether or not it uses all its cores. The PBS -l size option specifies the number of cores to allocate to a job. For example on Kraken a multiple of 12 must be requested and on Athena a multiple of 4 must be requested.

The service unit charge for each job is:

PBS 'size' * walltime

where walltime is the number of wall clock hours used by the job.

The showusage utility can be used to view your project allocation and overall usage through the last job accounting posting (usually the previous night). For example,

>showusage
 Usage on kraken:
                                  Project Totals          <userid>
 Project      Allocation        Usage    Remaining          Usage
_________________________|___________________________|_____________
 <YourProj>    2000000   |   123456.78   1876543.22  |     1560.80

The -h option will list more usage details.

More detailed accounting information can be obtained using the glsjob command:

glsjob -u janeuser
Prints current accounting information for all janeuser's jobs
glsjob -p MY-PROJECT01
Prints current accounting information for all jobs charged to account MY-PROJECT01
glsjob --man
Displays documentation for glsjob

Job Refund Policy

NICS will provide refunds for user jobs which are adversely impacted by system issues beyond the control of the user. Refund requests must be made within two calendar weeks of a job’s completion date by submitting a ticket to help@teragrid.org. Please provide: username, machine name, jobID, reason for refund request.

Examples of refund requests that will not be approved include: jobs run on projects that have a negative balance, jobs that started and completed after the project’s end date, and jobs that failed because they reached the user-specified wallclock limit.

NICS strongly encourages the use of application checkpoint restart files. Users should only request refunds from the time of the last successful checkpoint. The refund limit for eligible jobs is six hours. Exceptions to the maximum refund will only be considered for cases where appropriate checkpointing can not effectively mitigate loss due to the nature of the underlying machine problem.