[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] quick question: is periodic vacate possible



OK I think I see how  to go about this now. How would I write the
PREEMPT expression - presumably it would need to include
a WANT_VACATE==TRUE term  (so that only jobs that save
their own checkpoints are vacated) and some way of determining
if the run time was greater than a given periodic checkpoint time
(I guess this value could be supplied via a job classad ?).

many thanks,

-ian.

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-
> bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
> Sent: 21 June 2010 13:39
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] quick question: is periodic vacate possible
> 
> On 2010-06-21, at 5:06 AM, "Smith, Ian" <I.C.Smith@xxxxxxxxxxxxxxx>
> wrote:
> >
> > I've set WANT_VACATE=TRUE on all of the execute hosts - is
> > it possible to set this on a per job basis ?
> 
> Certainly. Tag your jobs on submission:
> 
> +CheckpointJob = True
> 
> And then
> 
> WANT_VACATE = CheckpointJob =?= True
> 
> - Ian
> 
> 
> 
> >
> > thanks,
> >
> > -ian.
> >
> >> -----Original Message-----
> >> From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-
> >> bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
> >> Sent: 17 June 2010 16:43
> >> To: Condor-Users Mail List
> >> Subject: Re: [Condor-users] quick question: is periodic vacate
> >> possible
> >>
> >> Ian,
> >>
> >> The machine's PREEMPT expression could be used to periodically
> >> checkpoint vanilla universe jobs that implement some kind of
> >> self-checkpointing.  You would just want to make sure that
> >> WANT_VACATE
> >> is true for the jobs that get preempted or they will be booted off
> >> without any chance to save state.
> >>
> >> --Dan
> >>
> >> Smith, Ian wrote:
> >>> Dear All,
> >>>
> >>> Just a very quick question that I can't seem to find an answer for
> >>> anywhere:
> >>>
> >>> Is it possible to periodically vacate jobs in the same way as
> >>> they can be periodically held and removed ?
> >>>
> >>> The reason I ask is that I've been building checkpointing
> >>> into some of our vanilla universe jobs and it would
> >>> be useful if these could be vacated say once every
> >>> few hours so that the checkpoint file get stored in
> >>> the $(SPOOL). Some of the jobs can run for days
> >>> and with few students around the campus at present
> >>> they are unlikely to get evicted by user logins. This
> >>> means that the output can get lost if the startd
> >>> crashes for some reason*, loosing several days
> >>> work.
> >>>
> >>> regards,
> >>>
> >>> -ian.
> >>>
> >>> * I've noticed several connection failures with long running jobs
> >>>  and I'm still not sure of the reason although someone turning
> >>>  off an execute host running a job is obviously one !
> >>>
> >>> --------------------------------------------
> >>> Dr Ian C. Smith,
> >>> Advanced Research Computing (e-Science) Team,
> >>> The University of Liverpool
> >>> Computing Services Department
> >>>
> >>> _______________________________________________
> >>> Condor-users mailing list
> >>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> >>> with a
> >>> subject: Unsubscribe
> >>> You can also unsubscribe by visiting
> >>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>>
> >>> The archives can be found at:
> >>> https://lists.cs.wisc.edu/archive/condor-users/
> >>>
> >> _______________________________________________
> >> Condor-users mailing list
> >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> >> with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>
> >> The archives can be found at:
> >> https://lists.cs.wisc.edu/archive/condor-users/
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> > with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/