Subject: [Condor-users] Antwort: Re: Fault Behaviour of Condor
For 1.) and 2.) the behaviour is just
fine! -- I've also followed the discussion
regarding disk failure.
Maybe the documentation should state
more clearly that
Condors default behaviour is to restart
a job in case if a fault
(I might have overseen that).
I gave it over an hour I think.
I've updated my Executors to 6.8 but
the behaviour persists.
Do you think moving the central manager
to 6.8 can resolve this?
On 8/2/06, thomas.t.hoppe@xxxxxxxxxxxxxxxxxxx
> I'm currently running a small Condor 6.7.19 Pool with GT4 Gram as
> Interface for testing.
> I wanted to test the Condors behaviour in case of several fault scenarios.
> Here are my results:
> 1.) Killing the job on the executor machine
> Outcome: Condor returned an exit code of 1
This is the desired behaviour
> 2.) Shuting down the condor deamons on the executor
> Outcome: Condor restarted the job on another machine -- WOW, is this
> standard behaviour of Condor?!
> I never saw that.
This is again the desired behaviour - with the notable exception of
disks dying (see recent post) condor is very well behaved for an
execute machine stopping itself nicely.
> 3.) Shutting down the NIC on the executor (I assume same as pulling
> Outcome: Condor hangs, a shadow process is existing all the time
> I even cannot remove the job with condor_rm!
> Maybe a bug? what can I do?
condor_rm -forcex may get rid of it (you may need to kill off the
shadow by hand, it should eventually timeout though, how long did you
Some older versions in the 6.6 series, at least on windows, were very
poor at cleaning up dead jobs when the execute machine stopped
responding at all.
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
You can also unsubscribe by visiting
The archives can be found at either