[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor grid environment



i am setting up a grid environment and having some issues with the condor compute nodes (private cluster network) connecting to the central manager.  the central manager is on the same switch as the computes.

running condor 7.0.5 on the central manager through the vdt-control.
running condor 7.2.2 on the compute nodes.  tried to rule out incompatibility of releases by setting up 7.0.5 on the compute nodes.  nothing different.

i have the condor_host set up as its internal address 10.1.1.1
the condor_config on the central manager is set up to listen on the internal nic.

running  condor_config_val with the different options, returns the values that i would like to see. i currently even have the read/write access set to *.
 
on the compute node, in the MasterLog, i continually see,

4/15 13:55:10 AUTHENTICATE: no available authentication methods succeeded, failing!
4/15 13:55:10 ERROR: SECMAN:2004:Failed to create security session to <10.1.1.1:9618> with TCP.|AUTHENTICATE:1003:Failed to authenticate with any method
4/15 13:55:10 Failed to start non-blocking update to <10.1.1.1:9618>

condor_status, from the compute and the central manager both return the processors on central manager but none of the compute nodes.
i can telnet to both machines and connect successfully.
port 9618 is open on both machines with udp/tcp.
each time i make changes to the config files i run the condor_reconfig -all to populate the changes.
    the result of that is, in the CollectorLog
    4/15 14:01:06 AUTHENTICATE: no available authentication methods succeeded, failing!
4/15 14:01:06 DC_AUTHENTICATE: authenticate failed: AUTHENTICATE:1003:Failed to authenticate with any method -- this is repeated 6 times.
and in the MasterLog on the compute
    4/15 14:00:10 ERROR: SECMAN:2004:Failed to create security session to <10.1.1.1:9618> with TCP.|AUTHENTICATE:1003:Failed to authenticate with any method
4/15 14:00:10 Failed to start non-blocking update to <10.1.1.1:9618>.

this seems to indicate that the two are communicating in some way but are unable to transfer information so as to allow proper status information to be propagated.

I can not seem to locate where might be the issue with my setup. 


Any help or ideas of where to look would be appreciated in this matter.

Thank you,
JD