[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] CM Failover with submits from CM
- Date: Tue, 14 Jul 2009 10:35:29 -0500
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] CM Failover with submits from CM
Janzen Brewer wrote:
Dan Bradley wrote:
Condor supports fail-over of the submit node.
I understand that the submit node can be failed over, but I'm curious as
to what happens to the output of a completed job if the submit node from
which it was submitted failed during its execution. Does the execute
node keep the output until the secondary submit node undergoes failback?
Or does it attempt to write it to the same directory on the secondary
I don't know much about schedd failover.
I think the directories where output is to be stored would all need to
be on a shared disk accessible to both submit nodes. Jobs that are
running when the primary submit node fails will wait for up to the job
lease duration (default 20 minutes) for the secondary submit node to
take over. When the job finishes, whether if finishes during that time
or after that time, the output would get copied back to the functioning
submit node onto the shared disk.
Of course, if you do all this only to make the shared filesystem into a
single point of failure, you've probably only made things slightly worse.