Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] negotiating with schedds when a client has FW

Date: Fri, 17 Jun 2005 15:50:52 -0400
From: Dan Christensen <jdc@xxxxxx>
Subject: Re: [Condor-users] negotiating with schedds when a client has FW

Andrey Kaliazin <A.Kaliazin@xxxxxxxxxxx> writes:

> Schedd is fine here, it provides the string of jobs to run and just waits
> patiently, while Negotiator
> dispatches them. If Start daemons respond properly everything is fine.
> But, if one of the compute nodes which appears on top of the matched list
> fails for various reasons 
> (mainly networking problems in our case) then Negotiator would not just
> dismiss it and get the next 
> best node, but halts the whole cycle. 
> And couple of minutes later, in the next cycle the story repeats itself, 
> because this faulty node is still on top of the list. 

This sounds like exactly the same problem we run into frequently here.
Our machines are administered by various individuals, and firewalls
are often accidentally closed or other problems happen, and until they
are fixed the cluster is barely usable.  Sometimes the admin is away
and I don't have the power to fix the problem or even to turn off the
machine!  It would be nice if Condor handled such situations gracefully.

Dan

Follow-Ups:
- RE: [Condor-users] negotiating with schedds when a client has FW
  - From: Christopher Mellen

References:
- RE: [Condor-users] negotiating with schedds when a client has FW
  - From: Andrey Kaliazin

Prev by Date: [Condor-users] Condor 6.7.8
Next by Date: [Condor-users] linux kernel 2.6.11 / fedora core 4
Previous by thread: RE: [Condor-users] negotiating with schedds when a client has FW
Next by thread: RE: [Condor-users] negotiating with schedds when a client has FW
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] negotiating with schedds when a client has FW