[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Question about condor_history, Run_Time, and RemoteWallClockTime.
- Date: Thu, 6 Nov 2014 12:26:05 -0500
- From: Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Question about condor_history, Run_Time, and RemoteWallClockTime.
On Tue, Oct 28, 2014 at 10:05 PM, Stub <spamrefuse@xxxxxxxxx> wrote:
> As follow up on my own email:
I think your address confused my spam filter. I just saw your messages
in my junk mail. :-)
> What time does condor_history report as "RUN_TIME"? "4hrs" or "6hrs" ?
I *think* it reports RemoteWallClockTime (or perhaps
CumulativeSlotTime, which is the same value if you're not using slot
weights). So in your example it would report 4 hours.
> So occasionally HTCondor is running a job, when the PC is suddenly switched off, without giving HTCondor the time to gracefully handle the situation.
> It takes the HTCondor master a while of waiting time to realize that it is wiser to give up on that dangling job and restart it elsewhere. In this case the RUN_TIME parameter is muddled up, for which I guess HTCondor has no blame.....but it also means that in this setup the RUN_TIME parameter should not be used for accounting and/or billing users....
It depends on what you want to accomplish. If you only want to bill
them for the successful runs, then you can use CommittedSlotTime. If
they get charged for badput, too, well, then the sudden disappearance
of a machine is a risk they take. I'm sure you can shorten the time it
takes the schedd to realize the startd has disappeared, but I don't
remember the exact settings offhand.
Leader in Utility HPC Software