[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor held jobs should retry/release after certain configured timeout automatically



> On Apr 8, 2015, at 8:19 AM, Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx> wrote:
> 
> On Wed, Apr 8, 2015 at 7:26 AM, Sridhar Thumma <deadman.den@xxxxxxxxx> wrote:
> 
>> SYSTEM_PERIODIC_RELEASE=((NumSystemHolds < 5 && (time() -
>> EnteredCurrentStatus) > 30) &&
>> (HoldReason.substr("InvalidAMIID.NotFound",0)!=""))
>> 
> That's not how substr is called. I'm not sure substr would be all that
> helpful here anyway.
> 
>> SYSTEM_PERIODIC_RELEASE=((NumSystemHolds < 5 && (time() -
>> EnteredCurrentStatus) > 30) && regexp("^.+InvalidAMIID.+$",HoldReason))
>> 
> It looks like the regexp parsing doesn't like the use of ^ and $. You
> might try dropping that. I did a similar test for sleep jobs in my
> history (version 8.3.2):
> 
> -bash-3.2$ condor_history -const 'regexp("^.+sleep.+$", Cmd)' | wc -l
> 1
> -bash-3.2$ condor_history -const 'regexp("sleep", Cmd)' | wc -l
> 5146
> -bash-3.2$
> 
> Since you have a held job in the queue, you can use condor_q with a
> constraint to test your SYSTEM_PERIODIC_RELEASE expression before you
> set it.
> 

Nah, the regexp is fine.  See:

$ python
Python 2.6.6 (r266:84292, Jan 23 2014, 10:39:35) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import classad
>>> classad.ExprTree('regexp("^.+InvalidAMIID.+$",HoldReason)').eval({'HoldReason': 'I am an InvalidAMIID!'})
True

(note in your test, the regexp requires there to be characters before and after the 'sleep' string)

Are you sure you don't have a hold/release loop?

Brian