Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Antwort: Re: Fault Behaviour of Condor

Date: Thu, 3 Aug 2006 09:32:01 +0100
From: "Matt Hope" <matthew.hope@xxxxxxxxx>
Subject: Re: [Condor-users] Antwort: Re: Fault Behaviour of Condor

On 8/3/06, thomas.t.hoppe@xxxxxxxxxxxxxxxxxxx
<thomas.t.hoppe@xxxxxxxxxxxxxxxxxxx> wrote:



Hi Matt,

For 1.) and 2.) the behaviour is just fine! -- I've also followed the discussion
regarding disk failure.
Maybe the documentation should state more clearly that
Condors default behaviour is to restart a job in case if a fault
(I might have overseen that).


I guess that is kind of percieved as the 'proper' default behaviour
for a job queue system.
Note that by using the periodic_* and on_exit_* expressions on
submission you can change this

Regarding 3.)
I gave it over an hour I think.


What is your job lease duration (if you are using it)

I've updated my Executors to 6.8 but the behaviour persists.
Do you think moving the central manager to 6.8 can resolve this?


Shadows failing to die when their starter is not talking to them
anymore is not something an upgrade to the collector/negotiator can
solve.

If your executors are on 6.8 you probably want your submitters to be
6.8 as well...

Matt

Follow-Ups:
- Re: [Condor-users] Antwort: Re: Fault Behaviour of Condor
  - From: Nomura Kohei

References:
- Re: [Condor-users] Fault Behaviour of Condor
  - From: Matt Hope
- [Condor-users] Antwort: Re: Fault Behaviour of Condor
  - From: thomas . t . hoppe

Prev by Date: Re: [Condor-users] Arguments in submit file
Next by Date: [Condor-users] CONDOR_IDS in condor_config
Previous by thread: [Condor-users] Antwort: Re: Fault Behaviour of Condor
Next by thread: Re: [Condor-users] Antwort: Re: Fault Behaviour of Condor
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Antwort: Re: Fault Behaviour of Condor