[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] question about accounting groups



Hi jeff,

On 29/04/21 13:47, Jeff Templon wrote:
Hi Greg, all

On 28 Apr 2021, at 16:09, Greg Thain via HTCondor-users wrote:

Jeff:

The accounting information is stored in the "AccountantNew.log" file, which is maintained by the condor_negotiator. This file is written to in a transaction-log style, where all changes to the state are appended to the file, and periodically the file is rewritten with the current state.

-greg

Thanks Greg. It seems like HTCondor has a different idea about what âaccountingâ is than I do, and thatâs whatâs throwing me off track. What I see in the AccountantNew.log file is lots of stuff about the current state of slots, and lots of stuff about the current and historical state of aggregate usage for each user. Is this a correct assessment?

The thing I was looking for when I say âaccountingâ is something like this:

2020-10-22 16:47:22 some-job-unique-id user=templon group=pdp cput=07:22:01 wall=07:24:44 ncores=2 physmem=2700 vmem=4321 exstat=0
[ â ]
This is something similar to the above:
[root@ce06-htc 2021-4]# condor_q -jobads history.2411281.0 -af:j Owner AcctGroup 'Interval(RemoteSysCpu+RemoteUserCpu)' 'Interval(RemoteWallClockTime)'Â 'OriginalCpus ?: (CpusProvisioned ?: RequestCpus)' 'ResidentSetSize_RAW' 'ImageSize_RAW' exitstatus

2411281.0 atlasprd011 atlas 22:24:46 3:16:05 8 6863744 30293384 0

for this to work:
on each schedd you have defined PER_JOB_HISTORY_DIR to an existing directory
in that directory you'll find one file per finished job (such as history.2411281.0)

Alternatively, on each schedd you run condor_history, almost the same way:

[root@ce06-htc 2021-4]# condor_history -lim 3 -af:j Owner AcctGroup 'Interval(RemoteSysCpu+RemoteUserCpu)' 'Interval(RemoteWallClockTime)'Â 'OriginalCpus ?: (CpusProvisioned ?: RequestCpus)' 'ResidentSetSize_RAW' 'ImageSize_RAW' exitstatus
2477965.0 pilatlas030 atlas 13:16 1:07:58 1 812544 4117724 0
2468261.0 belleprd belle 12:39:23 13:08:26 1 1586904 4231892 0
2475954.0 pillhcb031 lhcb 8:41:22 8:43:30 1 1461276 5068740 0

Two points:

1) condor_history just "remembers things" until possible: when space is freed the "memory" of condor_history disappears. Since we want to keep log history file for further check if needed, we configured PER_JOB_HISTORY_DIR and took care to
store old history files somewhere outside of the local disk

2) we have local jobs submitted by users using: condor_submit [...] -spool myjob.sub In that case finished jobs remain in the schedd queue 864000 seconds after they are done (this is to let the user retrieve its outputsandbox)
and they are not seen by condor_history (you can see them with condor_q).
The history log file for those jobs is created 10 days after they are done.

Stefano





One line for each job that has run on the system. So if I want to know how much âtemplonâ has run over the past month, I can select all the records for the past month and add the wall*ncores. What do HTCondor folk call this (not accounting I guess) and where is it stored and how is it accessed?

Thanks,

JT
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/