[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor 6.8.0 and Solaris 10





Curtis W. Hillegas wrote:

Condor Users,

I have a Condor 6.8.0 installation that was working fine on a Solaris 9
system (and is still working on all my other Solaris 9 systems) before
upgrading to the systems to Solaris 10.  The system in question is a
submit and compute server (not the master).  Now I get errors like the
following for both the schedd and startd:

8/9 11:05:38 ERROR: SECMAN:2003:TCP connection to <a.b.c.d:9618> failed
8/9 11:05:38 Failed to start non-blocking update to <e.f.g.h:47014>.

Where a.b.c.d is the IP address of our master (RHEL4, Condor 6.8.0) and
e.f.g.h is the IP address of the Solaris 10 system.


The disagreement between a.b.c.d and e.f.g.h is a known bug in 6.8.0, fixed for the future release of 6.8.1. The e.f.g.h numbers are incorrect.

I do not know of any reason why the TCP connection to the collector would be failing under Solaris 10, but there are some known problems with the non-blocking collector update process (being fixed for 6.8.1), so please try the following configuration and let us know if it makes a difference:

NONBLOCKING_COLLECTOR_UPDATE = False

If you have time, I would also like to see your full MasterLog (on the Solaris 10 box) from startup when things are not working. Please add the following to your configuration before starting it up:

MASTER_DEBUG = D_COMMAND D_NETWORK D_FULLDEBUG

You could send this report to condor-admin@xxxxxxxxxxxx

Thanks in advance, and sorry you ran into trouble,
--Dan