[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Negotiator gets stuck



Dear all,

I hope somebody can clear up this situation for me. 
It is more than often in our environment execute nodes die fir various
reasons. But the Condor
is designed to cope with exactly this kind of environment, right?

Then why the Negotiator is failing to bypass a single node, which it cannot
communicate with?
Instead it stops submission process altogether until this node in question
is dropped completely
from the pool. I only suppose that my config settings are not quite right
somewhere. Is there 
anything I can change on the server to make Negotiator to disregard nodes it
is having difficulties to 
communicate with?

Negotiator log entries -
...
2/18 10:28:37     Request 00005.00014:
2/18 10:31:46 Can't connect to <134.151.151.168:9554>:0, errno = 110
2/18 10:31:46 Will keep trying for 10 seconds...
2/18 10:31:47 Connect failed for 10 seconds; returning FALSE
2/18 10:31:47 ERROR:
SECMAN:2003:TCP connection to <134.151.151.168:9554> failed

2/18 10:31:47 condor_write(): Socket closed when trying to write buffer
2/18 10:31:47 Buf::write(): condor_write() failed
2/18 10:31:47       Could not send PERMISSION
2/18 10:31:47   Error: Ignoring schedd for this cycle
2/18 10:31:47 ---------- Finished Negotiation Cycle ----------


Andrey