[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Howto set a reasonable SYSTEM_PERIODIC_REMOVE_REASON



Hi all,

one question would be, if one can use the new style syntax also beyond
job transforms? Maybe with something like [1] as a rough sketch?
Since [1] is not a class ad (and not for a transform but just for
setting an ad), it is not clear to me, if it could be applied - or if it
has to be of a class ad form `SOMEAD = @=end [ ...code... ] @=end` ? ð

Cheers,
  Thomas


[1]

SOMEAD = @=end

COPY RemoveMultipleRunJobs   OrgRemoveMultipleRunJobs
COPY SYSTEM_PERIODIC_REMOVE  OrgSYSTEM_PERIODIC_REMOVE

If RemoveMultipleRunJobs
  SET SYSTEM_PERIODIC_REMOVE_REASON = "too many jobs"
endif

If RemoveHeldJobs
  SET SYSTEM_PERIODIC_REMOVE_REASON = "being in hold for to long"
endif

@end



On 01/10/2020 09.59, Beyer, Christoph wrote:
> Hi,
> 
> this bothered us for a while and maybe it could end up in the recipes somehow (?)
> 
> Our system_periodic_remove string looks like this: 
> 
> RemoveMultipleRunJobs = ( NumJobStarts >= 3 )
> RemoveReadyJobs = (( JobStatus == 2 ) && ( ( CurrentTime - EnteredCurrentStatus ) > MaxJobRetirementTime ))
> RemoveHeldJobs = ( (JobStatus==5 && (CurrentTime - EnteredCurrentStatus) > 14 * 24 * 3600) )
> SYSTEM_PERIODIC_REMOVE = $(RemoveHeldJobs)           || \
>                          $(RemoveMultipleRunJobs)    || \
>                          $(RemoveReadyJobs)
> 
> The default SYSTEM_PERIODIC_REMOVE_REASON looks like this: 
> ShadowLog.old:09/30/20 07:25:58 (9862228.0) (1574665): Job 9862228.0 is being removed: The system macro SYSTEM_PERIODIC_REMOVE expression '((JobStatus == 5 && (CurrentTime - EnteredCurrentStatus) > 14 * 24 * 3600)) || (NumJobStarts >= 3) || ((JobStatus == 2) && ((CurrentTime - EnteredCurrentStatus) > MaxJobRetirementTime))' evaluated to TRUE
> 
> Which does not really mean anything to the user and even as an admin you need to recheck the job classadds to reveal the actual remove reason.
> 
> This sets the SYSTEM_PERIODIC_REMOVE_REASON according to the remove-reason (who would have thought) 
> 
> SYSTEM_PERIODIC_REMOVE_REASON = strcat("Job removed by SYSTEM_PERIODIC_REMOVE due to ", \
> ifThenElse(JobStatus == 2 && CurrentTime - EnteredCurrentStatus > 3600*24*9, \
> "runtime being longer than 9 days", \
> ifThenElse(JobStatus == 5 && CurrentTime - EnteredCurrentStatus > 3600*24*6, \
> "being in hold state for 7 days", \
> "more than 3 failed jobstarts") \
> ) )
> 
> (of course it is the similar syntax for the SYSTEM_PERIODIC_HOLD_REASON) 
> 
> Tested in $CondorVersion: 8.9.3
> 
> Best
> christoph
> 
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature