[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] quick question: is periodic vacate possible



Ian,

The machine's PREEMPT expression could be used to periodically checkpoint vanilla universe jobs that implement some kind of self-checkpointing. You would just want to make sure that WANT_VACATE is true for the jobs that get preempted or they will be booted off without any chance to save state.

--Dan

Smith, Ian wrote:
Dear All,

Just a very quick question that I can't seem to find an answer for
anywhere:

Is it possible to periodically vacate jobs in the same way as
they can be periodically held and removed ?

The reason I ask is that I've been building checkpointing
into some of our vanilla universe jobs and it would
be useful if these could be vacated say once every
few hours so that the checkpoint file get stored in
the $(SPOOL). Some of the jobs can run for days
and with few students around the campus at present
they are unlikely to get evicted by user logins. This
means that the output can get lost if the startd crashes for some reason*, loosing several days
work.

regards,

-ian.

* I've noticed several connection failures with long running jobs and I'm still not sure of the reason although someone turning
  off an execute host running a job is obviously one !

--------------------------------------------
Dr Ian C. Smith,
Advanced Research Computing (e-Science) Team,
The University of Liverpool
Computing Services Department

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/