[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Howto set a reasonable SYSTEM_PERIODIC_REMOVE_REASON



oh, 

and by reading my own post here, make sure the remove reaseon is actually corresponding to the 'actual reason' not like in my example ;) 

I hope you get the idea anyway !

best
christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Christoph Beyer" <christoph.beyer@xxxxxxx>
An: "Condor-Users Mail List" <condor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 1. Oktober 2020 09:59:11
Betreff: [HTCondor-users] Howto set a reasonable	SYSTEM_PERIODIC_REMOVE_REASON

Hi,

this bothered us for a while and maybe it could end up in the recipes somehow (?)

Our system_periodic_remove string looks like this: 

RemoveMultipleRunJobs = ( NumJobStarts >= 3 )
RemoveReadyJobs = (( JobStatus == 2 ) && ( ( CurrentTime - EnteredCurrentStatus ) > MaxJobRetirementTime ))
RemoveHeldJobs = ( (JobStatus==5 && (CurrentTime - EnteredCurrentStatus) > 14 * 24 * 3600) )
SYSTEM_PERIODIC_REMOVE = $(RemoveHeldJobs)           || \
                         $(RemoveMultipleRunJobs)    || \
                         $(RemoveReadyJobs)

The default SYSTEM_PERIODIC_REMOVE_REASON looks like this: 
ShadowLog.old:09/30/20 07:25:58 (9862228.0) (1574665): Job 9862228.0 is being removed: The system macro SYSTEM_PERIODIC_REMOVE expression '((JobStatus == 5 && (CurrentTime - EnteredCurrentStatus) > 14 * 24 * 3600)) || (NumJobStarts >= 3) || ((JobStatus == 2) && ((CurrentTime - EnteredCurrentStatus) > MaxJobRetirementTime))' evaluated to TRUE

Which does not really mean anything to the user and even as an admin you need to recheck the job classadds to reveal the actual remove reason.

This sets the SYSTEM_PERIODIC_REMOVE_REASON according to the remove-reason (who would have thought) 

SYSTEM_PERIODIC_REMOVE_REASON = strcat("Job removed by SYSTEM_PERIODIC_REMOVE due to ", \
ifThenElse(JobStatus == 2 && CurrentTime - EnteredCurrentStatus > 3600*24*9, \
"runtime being longer than 9 days", \
ifThenElse(JobStatus == 5 && CurrentTime - EnteredCurrentStatus > 3600*24*6, \
"being in hold state for 7 days", \
"more than 3 failed jobstarts") \
) )

(of course it is the similar syntax for the SYSTEM_PERIODIC_HOLD_REASON) 

Tested in $CondorVersion: 8.9.3

Best
christoph


-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/