[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor on Windows - failover and checkpointing



On 4/25/06, Shaun J. O'Callaghan <Shaun.OCallaghan@xxxxxxxxxxxx> wrote:
> When implementing a Condor based system on a Windows network, as
> checkpointing functionality is missing, does this simply mean when a job is
> interrupted it is either suspended or terminated altogether?

by default yes.

You have the option of getting fancy and trpping the eviction signal,
responding to it in time by exiting and doing your own checkpointing.
see posts passim on the list about this.

> Also, does this mean that there's no transaction-style failover in the event
> of a job failure?

Your job is, by default terminated and then becomes available to run
on another machine (or indeed the same one if it becomes free again)

>
> Any light that could be shed on these two issues would be greatly
> appreciated.

searching the list will provide more info - my answers are, I'm
afraid, brief at the moment but the question has been answered before.

Matt