[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs lingering in queue if target shuts down mid-job



Hi Thomass,

I think that decreasing values of variables MAX_CLAIM_ALIVES_MISSED and
ALIVE_INTERVAL will help you.

Details in manual:
http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:AliveInterval
http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxClaimAlivesMissed

Regards, 
Lukas

On Tue, Nov 22, 2011 at 01:59:01PM +0000, Thomas Luff wrote:
> If a target machine shutsdown/crashes whilst a job is running on the machine the job will hang around in the queue with the status 'Running'.
> 
> Even if the machine is shutdown and left off, the job still acts as if it's running and has been like this for over an hour now.
> 
> Is it possible to make these jobs automatically fail or requeue if the target machine goes down?
> 
> Thanks
> 
> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>