[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Is a JobStatus==5 check required in a periodic_release expression?



I'd be in favor of that. My first attempts to identify the problem centered around trying to find any indications in the SchedLog file. Something as simple as logging the release reason when the release is done would have been useful.

Michael V Pelletier
Principal Engineer

Raytheon Technologies
Digital Technology
HPC Support Team
 


-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Todd L Miller
Sent: Thursday, March 4, 2021 5:36 PM
To: Michael Pelletier via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Subject: [External] Re: [HTCondor-users] Is a JobStatus==5 check required in a periodic_release expression?

> Thanks to the ReleaseReason tip, I was able to determine that the 
> unexpectedly released job was being condor_released, and so I wrote a 
> wrapper script to log the process tree and environment each time that 
> condor_release was run, which led back to another housekeeping job 
> that was releasing held jobs if a given mountpoint was operational, 
> apparently assuming that any held job was held because of a mountpoint 
> problem..

 	I'm sufficiently impressed by this sleuthing that I wonder if this sort of audit log ought to be something that HTCondor offers as a knob for its admins....

- ToddM