[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] quick question: is periodic vacate possible



I'm using my own checkpointing mechanism which is written into the
code. The code (a R script) saves its workspace to file periodically
and this gets written to  $(SPOOL ) when the job is evicted. When
the job restarts, the workspace is restored.

regards,

-ian.


> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-
> bounces@xxxxxxxxxxx] On Behalf Of Burnett, Ben
> Sent: 17 June 2010 15:05
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] quick question: is periodic vacate possible
> 
> Are you using Condor's checkpointing mechanism, or your own?  If it's Condor's, then
> PERIODIC_CHECKPOINT will do the trick
> (http://www.cs.wisc.edu/condor/manual/v7.5/7_2Setting_up.html#47702); otherwise,
> how is your executable told to write it's checkpoint file out?  Via a signal?
> 
> -B
> 
> On 2010-06-17, at 4:35 AM, Smith, Ian wrote:
> 
> > Dear All,
> >
> > Just a very quick question that I can't seem to find an answer for
> > anywhere:
> >
> > Is it possible to periodically vacate jobs in the same way as
> > they can be periodically held and removed ?
> >
> > The reason I ask is that I've been building checkpointing
> > into some of our vanilla universe jobs and it would
> > be useful if these could be vacated say once every
> > few hours so that the checkpoint file get stored in
> > the $(SPOOL). Some of the jobs can run for days
> > and with few students around the campus at present
> > they are unlikely to get evicted by user logins. This
> > means that the output can get lost if the startd
> > crashes for some reason*, loosing several days
> > work.
> >
> > regards,
> >
> > -ian.
> >
> > * I've noticed several connection failures with long running jobs
> >  and I'm still not sure of the reason although someone turning
> >  off an execute host running a job is obviously one !
> >
> > --------------------------------------------
> > Dr Ian C. Smith,
> > Advanced Research Computing (e-Science) Team,
> > The University of Liverpool
> > Computing Services Department
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/