[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problems with power outage etc



Peter Ellevseth <Peter.Ellevseth@...> writes:

> 
> Hello all
> 
> We have had a few incidents with power outages etc. What then happens is 
that our jobs are usually restarted.
> This is not something we generally want. Our jobs usually run for weeks 
and we would rather have the job exit
> than restarting as all result files are usually overriden in such an 
event. What is the best approach to
> avvoid this? 
> 
> This morning we also had a problem when a domain controller went down for 
a while and the starter wasn't able
> to see the schedd even though they were both alive. At some point then the 
lease expired and the job
> restarted. We want to avoid this aswell.
> 
> >From my standpoint it would be better if the jobs would just keep running 
even though the schedd is out of
> reach. Our cluster is sufficiently small that if a couple errant jobs keep 
on running we can fix that manually.
> 
> Regards Peter
> 
> 

Sorry but what is the link with my post? I think your post isn't in the 
right place