Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Antwort: Re: Fault Behaviour of Condor

Date: Thu, 03 Aug 2006 09:55:17 +0200
From: thomas.t.hoppe@xxxxxxxxxxxxxxxxxxx
Subject: [Condor-users] Antwort: Re: Fault Behaviour of Condor

Hi Matt,

For 1.) and 2.) the behaviour is just fine! -- I've also followed the discussion
regarding disk failure.
Maybe the documentation should state more clearly that
Condors default behaviour is to restart a job in case if a fault
(I might have overseen that).

Regarding 3.)
I gave it over an hour I think.
I've updated my Executors to 6.8 but the behaviour persists.
Do you think moving the central manager to 6.8 can resolve this?

thanks, Thomas

matthew.hope@xxxxxxxxx
Gesendet von: condor-users-bounces@xxxxxxxxxxx

02.08.2006 17:57

Bitte antworten an
condor-users@xxxxxxxxxxx

An	condor-users@xxxxxxxxxxx
Kopie
Thema	Re: [Condor-users] Fault Behaviour of Condor

On 8/2/06, thomas.t.hoppe@xxxxxxxxxxxxxxxxxxx <thomas.t.hoppe@xxxxxxxxxxxxxxxxxxx> wrote: > > Hi, > > I'm currently running a small Condor 6.7.19 Pool with GT4 Gram as Submit > Interface for testing. > I wanted to test the Condors behaviour in case of several fault scenarios. > Here are my results: > > 1.) Killing the job on the executor machine > Outcome: Condor returned an exit code of 1 This is the desired behaviour > 2.) Shuting down the condor deamons on the executor > Outcome: Condor restarted the job on another machine -- WOW, is this > standard behaviour of Condor?! > I never saw that. This is again the desired behaviour - with the notable exception of disks dying (see recent post) condor is very well behaved for an execute machine stopping itself nicely. > 3.) Shutting down the NIC on the executor (I assume same as pulling the > plug) > Outcome: Condor hangs, a shadow process is existing all the time > I even cannot remove the job with condor_rm! > Maybe a bug? what can I do? condor_rm -forcex may get rid of it (you may need to kill off the shadow by hand, it should eventually timeout though, how long did you give it?). Some older versions in the 6.6 series, at least on windows, were very poor at cleaning up dead jobs when the execute machine stopped responding at all. Matt _______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at either https://lists.cs.wisc.edu/archive/condor-users/ http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR

Follow-Ups:
- Re: [Condor-users] Antwort: Re: Fault Behaviour of Condor
  - From: Matt Hope

References:
- Re: [Condor-users] Fault Behaviour of Condor
  - From: Matt Hope

Prev by Date: [Condor-users] Is this a bug of classad java library?
Next by Date: Re: [Condor-users] Arguments in submit file
Previous by thread: Re: [Condor-users] Fault Behaviour of Condor
Next by thread: Re: [Condor-users] Antwort: Re: Fault Behaviour of Condor
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] Antwort: Re: Fault Behaviour of Condor