[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Is it negotiator problem?



On Feb 7, 2006, at 10:09 PM, Srinivas Malyala wrote:

I have condor in two machines, jobs can be send in both machines but
condor do not distribute them propertly. What can cause this?

negotiator log file looks like here
...
2/8 08:37:30 ---------- Started Negotiation Cycle ----------
2/8 08:37:30 Phase 1:  Obtaining ads from collector ...
2/8 08:37:30   Getting all public ads ...
2/8 08:37:30 NEGOTIATOR_TIMEOUT_MULTIPLIER is undefined, using default value of 0
2/8 08:37:30   Sorting 0 ads ...
2/8 08:37:30   Getting startd private ads ...
2/8 08:37:30 NEGOTIATOR_TIMEOUT_MULTIPLIER is undefined, using default value of 0
2/8 08:37:30 condor_read(): recv() returned -1, errno = 104, assuming failure.
2/8 08:37:30 Couldn't fetch ads: communication error
2/8 08:37:30 Aborting negotiation cycle
2/8 08:37:31 DaemonCore: in SendAliveToParent()
2/8 08:37:31 DaemonCore: attempting to connect to '<172.16.16.42:32781>'
2/8 08:37:31 NEGOTIATOR_TIMEOUT_MULTIPLIER is undefined, using default value of 0
2/8 08:42:30 ---------- Started Negotiation Cycle ----------

It looks like the negotiator is have trouble talking to one of your schedd daemons. Does the schedd log on any of your machines contain information about connection errors? 

+--------------------------------+-----------------------------------+
|           Jaime Frey           | I used to be a heavy gambler.     |
|       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |
+--------------------------------+-----------------------------------+