[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] wall clock time in condor_q

> From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
> Hi Michael,

> Unfortunately, I donât think itâs possible to do a
> âRemoteCpuUtilizationPercentâ attribute â at least, I failed miserably at
> doing this last time I tried (I suppose there could be new attributes?).

> Brian

Here's a version I've been using in one of my pools:

RemoteCpuUtilizationPercent = ifThenElse(JobStatus == 2 \
        && CurrentTime > JobCurrentStartDate, \
        (RemoteSysCpu + RemoteUserCpu / RequestCpus) \
        / (CurrentTime - JobCurrentStartDate) * 100, UNDEFINED)
SUBMIT_EXPRS = $(SUBMIT_EXPRS) RemoteCpuUtilizationPercent

This is an earlier slightly crummy version, because when I went to bring it into another pool I realized that in order to get historical averages, rather than merely identifying errored-out MATLAB workers waiting for input from /dev/null with their utilization percentage dwindling towards zero due to insufficiently educated users of the "-r" option, I needed to have the value defined after the job finished running.

You'll note that it's quite quick-and-dirty, since it doesn't take suspension time into account (because this pool doesn't suspend jobs), and it doesn't work on checkpointed restarted standard universe jobs either. The way I read the manual, RemoteSysCpu and RemoteUserCpu count goodput run time, and go to zero following an uncheckpointed eviction, or in other words, every eviction in the vanilla universe, so the time scales match in vanilla but they wouldn't match in standard. But this pool doesn't run standard universe jobs anyway.

I don't have the newer version for historical figures from the other pool on hand at the moment, I'll check in with someone who has access to it and send another message. It basically just switches from using CurrentTime to CompletionDate depending on the JobStatus. That version really gave us what we needed - they put in a funding proposal to triple the number of GPUs in each compute node based on the data.


Michael V. Pelletier
IT Program Execution
Principal Engineer
978.858.9681 (5-9681) NOTE NEW NUMBER
339.293.9149 cell
339.645.8614 fax