[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 6.9.2 startup error



Hi Dan,

Yes, gadget is the scheduler and the log was produced by that machine.
I took a look at the negotiator's log to see some trace of this communication problem and I found this:

5/24 14:52:47 ---------- Started Negotiation Cycle ----------
5/24 14:52:47 Phase 1:  Obtaining ads from collector ...
...
5/24 14:52:47 Negotiating with szabolcs@xxxxxxxxxxxxxxxxxxx at <192.168.0.50:3661>
5/24 14:52:47 0 seconds so far
5/24 14:52:47 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <192.168.0.50:3661>.
5/24 14:52:47     Failed to get reply from schedd
5/24 14:52:47   Error: Ignoring schedd for this cycle
5/24 14:52:47 ---------- Finished Negotiation Cycle ----------

I guess if the negotiator can negotiate with the computer using the IP 192.168.0.50 than it had to connect with it somehow.
Than what might cause the problem when waiting for the reply?

Cheers,
Szabolcs



Is gadget.digicpictures.local the name of the host that this SchedLog was produced on? If so, then this sounds to me like the schedd trying to directly claim its "local" startd, because it hasn't successfully communicated with the negotiator for a long time. How long is controlled by SCHEDD_ASSUME_NEGOTIATOR_GONE, which defaults to 1200 seconds.

--Dan