[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] quick question: is periodic vacate possible



Are you using Condor's checkpointing mechanism, or your own?  If it's Condor's, then PERIODIC_CHECKPOINT will do the trick (http://www.cs.wisc.edu/condor/manual/v7.5/7_2Setting_up.html#47702); otherwise, how is your executable told to write it's checkpoint file out?  Via a signal?

-B
 
On 2010-06-17, at 4:35 AM, Smith, Ian wrote:

> Dear All,
> 
> Just a very quick question that I can't seem to find an answer for
> anywhere:
> 
> Is it possible to periodically vacate jobs in the same way as
> they can be periodically held and removed ?
> 
> The reason I ask is that I've been building checkpointing
> into some of our vanilla universe jobs and it would
> be useful if these could be vacated say once every
> few hours so that the checkpoint file get stored in
> the $(SPOOL). Some of the jobs can run for days
> and with few students around the campus at present
> they are unlikely to get evicted by user logins. This
> means that the output can get lost if the startd 
> crashes for some reason*, loosing several days
> work.
> 
> regards,
> 
> -ian.
> 
> * I've noticed several connection failures with long running jobs 
>  and I'm still not sure of the reason although someone turning
>  off an execute host running a job is obviously one !
> 
> --------------------------------------------
> Dr Ian C. Smith,
> Advanced Research Computing (e-Science) Team,
> The University of Liverpool
> Computing Services Department
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/