Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Fault Behaviour of Condor

Date: Wed, 2 Aug 2006 16:57:50 +0100
From: "Matt Hope" <matthew.hope@xxxxxxxxx>
Subject: Re: [Condor-users] Fault Behaviour of Condor

On 8/2/06, thomas.t.hoppe@xxxxxxxxxxxxxxxxxxx
<thomas.t.hoppe@xxxxxxxxxxxxxxxxxxx> wrote:


Hi,

I'm currently running a small Condor 6.7.19 Pool with GT4 Gram as Submit
Interface for testing.
I wanted to test the Condors behaviour in case of several fault scenarios.
Here are my results:

1.) Killing the job on the executor machine
Outcome: Condor returned an exit code of 1


This is the desired behaviour

2.) Shuting down the condor deamons on the executor
Outcome: Condor restarted the job on another machine -- WOW, is this
standard behaviour of Condor?!
I never saw that.


This is again the desired behaviour - with the notable exception of
disks dying (see recent post) condor is very well behaved for an
execute machine stopping itself nicely.

3.) Shutting down the NIC on the executor (I assume same as pulling the
plug)
Outcome: Condor hangs, a shadow process is existing all the time
I even cannot remove the job with condor_rm!
Maybe a bug? what can I do?


condor_rm -forcex may get rid of it (you may need to kill off the
shadow by hand, it should eventually timeout though, how long did you
give it?).

Some older versions in the 6.6 series, at least on windows, were very
poor at cleaning up dead jobs when the execute machine stopped
responding at all.

Matt

Follow-Ups:
- [Condor-users] Antwort: Re: Fault Behaviour of Condor
  - From: thomas . t . hoppe

References:
- [Condor-users] Fault Behaviour of Condor
  - From: thomas . t . hoppe

Prev by Date: Re: [Condor-users] Condor or Classads as part of a Grid Infrastructre?
Next by Date: Re: [Condor-users] host failure detection
Previous by thread: [Condor-users] Fault Behaviour of Condor
Next by thread: [Condor-users] Antwort: Re: Fault Behaviour of Condor
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Fault Behaviour of Condor