[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SYSTEM_PERIODIC_REMOVE question



Hmmm…

 

According to the shadowlog, the job in fact is killed, but it’s put back in queue and restarted afterwards:

06/26/14 12:04:04 (106.0) (2973): Updating Job Queue: SetAttribute(NumJobStarts = 180)

06/26/14 12:04:04 (106.0) (2973): Updating Job Queue: SetAttribute(RecentBlockReadKbytes = 0)

06/26/14 12:04:04 (106.0) (2973): Updating Job Queue: SetAttribute(RecentBlockReads = 0)

 

Looks like I am missing something to really kill the job and remove it from the queue : any idea ?

 

Thanks

 

De : HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] De la part de SCHAER Frederic
Envoyé : jeudi 26 juin 2014 11:33
À : htcondor-users@xxxxxxxxxxx
Objet : [PROVENANCE INTERNET] [HTCondor-users] SYSTEM_PERIODIC_REMOVE question

 

Hi,

 

I am testing  the SYSTEM_PERIODIC_REMOVE macro, and I have set it to remove jobs that run for more than 30 seconds (tests…).

I’ve added debug, and I see it’s evaluated to true.

But job is not removed :

 

/var/log/condor/ShadowLog:06/26/14 11:18:03 (106.0) (18625): Started timer to evaluate periodic user policy expressions every 300 seconds

/var/log/condor/ShadowLog:06/26/14 11:18:11 (106.0) (18625): Classad debug: [0.00095ms] JobStatus --> 2

/var/log/condor/ShadowLog:06/26/14 11:18:11 (106.0) (18625): Classad debug: [0.00596ms] time() --> 1403774291

/var/log/condor/ShadowLog:06/26/14 11:18:11 (106.0) (18625): Classad debug: [0.00286ms] EnteredCurrentStatus --> 1403774281

/var/log/condor/ShadowLog:06/26/14 11:18:11 (106.0) (18625): Classad debug: [0.25296ms] JobStatus == 2 && ( time() - EnteredCurrentStatus > 30 ) --> FALSE

/var/log/condor/ShadowLog:06/26/14 11:23:03 (106.0) (18625): Classad debug: [0.00095ms] JobStatus --> 2

/var/log/condor/ShadowLog:06/26/14 11:23:03 (106.0) (18625): Classad debug: [0.00095ms] time() --> 1403774583

/var/log/condor/ShadowLog:06/26/14 11:23:03 (106.0) (18625): Classad debug: [0.00310ms] EnteredCurrentStatus --> 1403774281

/var/log/condor/ShadowLog:06/26/14 11:23:03 (106.0) (18625): Classad debug: [0.39101ms] JobStatus == 2 && ( time() - EnteredCurrentStatus > 30 ) --> TRUE

/var/log/condor/ShadowLog:06/26/14 11:23:03 (106.0) (18625): Job 106.0 is being removed: The system macro SYSTEM_PERIODIC_REMOVE _expression_ 'debug(JobStatus == 2 && (time() - EnteredCurrentStatus > 30))' evaluated to TRUE

/var/log/condor/ShadowLog:06/26/14 11:23:03 (106.0) (18625): Updating Job Queue: SetAttribute(RemoveReason = "The system macro SYSTEM_PERIODIC_REMOVE _expression_ 'debug(JobStatus == 2 && (time() - EnteredCurrentStatus > 30))' evaluated to TRUE")

/var/log/condor/ShadowLog:06/26/14 11:23:03 Daemon Log is logging: D_FULLDEBUG D_ALWAYS D_ERROR

 

[root@dev7246 condor]# condor_q

 

 

-- Submitter: dev7246.xx : <xxx:40562> : dev7246.xx

ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD

106.0   irf030          6/25 16:30   0+14:41:02 R  0   415.0 (gridjob          )

 

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

 

My question is : what am I doing wrong ?

Is this because I added the debug wrapper function that there is no job deletion ?

This specific test job is doing a “sleep 3600000000”, so it’s not a very misbehaving job…

 

What could prevent condor from deleting the job ?

 

Thanks && regards