[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.



I presume x.x.x.x is the correct IP for your Linux central manager machine?

 

The error in the Master log looks like it might be an authorization problem â the collector isnât allowing the Windows node to send updates.  

 

Check the ALLOW_WRITE configuration knob in the in the Collector, does it permit the IP of the Windows node?

 

At the same timestamp  as the error from the master log (plus or minus a few seconds in case of clock mis-match), is there a message in the Collector log about refusing an attempt to send updates?

 

This error

 

09/24/18 09:46:01 Query info: matched=6; skipped=4; query_time=0.000806; send_time=0.001738; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<127.0.0.1:25381>; projection={}
09/24/18 09:46:01 DaemonCore: Can't receive command request from 127.0.0.1 (perhaps a timeout?)

 

is a bit more puzzling to me.  I donât see how a request from a windows node to the collector could result in a peer address of 127.0.0.1

 

Does the config on the Windows machine have this?

 

NETWORK_INTERFACE = 127.0.0.1

 

If so, remove that line.

 

If not try running

 

   condor_config_val -write:upgrade  config.log

 

and sending me the config.log file.  Iâll see if I can see anything in that config that could cause the peer address to be set incorrectly.

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stefano Colafranceschi
Sent: Monday, September 24, 2018 12:12 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.

 

Dear all,

I am trying to have a linux (latest) htcondor running with a windows node. On Linux I can submit jobs and they get processed no problems, but I canât figure out whatâs wrong adding a windows machine to the pool.

 

This is the error that I see on the MasterLog (windows client):

 

ERROR: SECMAN:2003:TCP connection to collector x.x.x.x failed.

Failed to start non-blocking update to <x.x.x.x:9618>.

 

And this is the content of the Collectorlog on the linux server, just after I issued on the windows machine condor_status -master

 

09/24/18 09:46:01 Got QUERY_STARTD_PVT_ADS
09/24/18 09:46:01 Number of Active Workers 0
09/24/18 09:46:01 (Sending 4 ads in response to query)
09/24/18 09:46:01 Query info: matched=4; skipped=0; query_time=0.000839; send_time=0.000619; type=MachinePrivate; requirements={true}; peer=<
127.0.0.1:27363>; projection={}
09/24/18 09:46:01 Number of Active Workers 0
09/24/18 09:46:01 (Sending 6 ads in response to query)
09/24/18 09:46:01 Query info: matched=6; skipped=4; query_time=0.000806; send_time=0.001738; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<
127.0.0.1:25381>; projection={}
09/24/18 09:46:01 DaemonCore: Can't receive command request from 127.0.0.1 (perhaps a timeout?)

 

 

p.s. I am sure both windows and Linux have 9618 port open.

 

Thanks for any suggestions!