[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] File last modification time or job last write() attribute?



> On May 25, 2016, at 1:36 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
> 
> On 5/25/2016 1:07 PM, Iain Bradford Steers wrote:
>> There’s also a delayed set available via the condor_chirp utility.
>> 
>> Gets back to the schedd with a maximum of 15-minutes delay IIRC.
>> 
>> Cheers, Iain
> 
> +1
> 
> The chirp command is set_job_attr_delayed.  From the manual ( see https://is.gd/PyO8Bh ) :
> 
> set_job_attr_delayed JobAttributeName AttributeValue
>    Sets the named job ClassAd attribute with the given attribute value, but does not immediately synchronize the value with the submit side. It can take 15 minutes before the synchronization occurs. This has much less overhead than the non delayed version. With this option, jobs do not need ClassAd attribute WantIOProxy set. With this option, job attribute names are restricted to begin with the case sensitive substring Chirp.
> 

As an example - 

CMS’s physics application invokes something like:

condor_chirp set_job_attr_delayed ChirpCMSSWLastEvent $UNIX_TIMESTAMP

whenever an event has finished (and it has been at least 5 minutes since the last update; otherwise, we might be doing this at 50Hz).

Taking into account synchronization delays, abnormally long events, and whatnot, if the ChirpCMSSWLastEvent attribute is more than 30 minutes old, we start to suspect a job is stuck.

Additionally, you can use “condor_tail” to peak at files in the output sandbox (stdout, stderr, and anything listed in transfer_output_files) if you identify the job as suspiciously slow.

Brian