[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs running forever...



On Wed, 19 Jul 2006 13:26:16 +0200
ucarlino@xxxxxxxxxx wrote:

> Hello,
> sometime it happens that a jobs stays running without terminating and
> the only thing that can be done is to kill them with 'condor__rm'.
>  
> It is possible to avoid this placing, in the submit file, the directive:
> # Limit runtime to 30 minutes (30*60=1800 seconds)
> #
> maxRunTime = 1800
> #
> # Limit total time in queue to 12 hours (60*60*12=43200 seconds)
> #
> maxQueueTime = 43200
> #
> # Remove jobs exceeding maxRunTime or maxQueueTime
> #
> periodic_remove = (RemoteWallClockTime > $(maxRunTime) || ((QDate -
> CurrentTime) > $(maxQueueTime))
> 
>  
> I was wondering if this configuration could be defined at pool level,
> avoiding the need to put it in every submit file.

Hello,

I was also wondering that, and I found a solution for that (not that it is very
elegant, but it works...): I've defined in the local configuration file of my
schedd host the following two lines:

PeriodicRemove=(RemoteWallClockTime > 10)
SUBMIT_EXPRS = PeriodicRemove

This way, the attribute PeriodicRemove (which is the classad counterpart of the
keyword periodic_remove in the submit file) gets appended to the job classad...

According to the documentation, this will not prevent users to override this
attribute, by putting in the submit file a line containing +PeriodicRemove=False, 
but it will at least provide a default value...

Hope this helps...

Pascal