[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.



This looks ok to me.

 

Your ALLOW_WRITE line is allowing everything on the 10.* subnet, that should be sufficient to give your Windows machine permission to send ads to the Collector.  (Iâm assuming your Windows machine is in that subnet?)

 

Could I also see the configuration of your Windows machine?  Perhaps the problem is there.

 

-tj

 

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stefano Colafranceschi - Mathematical Sciences Dept
Sent: Tuesday, September 25, 2018 12:14 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.

 

Thanks, find inline answer and attached config file

> On Sep 25, 2018, at 11:57 AM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
>
> I presume x.x.x.x is the correct IP for your Linux central manager machine?
yes 10.6.10.15


> The error in the Master log looks like it might be an authorization problem â the collector isnât allowing the Windows node to send updates. 
right but I canât figure out the issue.


> Check the ALLOW_WRITE configuration knob in the in the Collector, does it permit the IP of the Windows node?

> At the same timestamp  as the error from the master log (plus or minus a few seconds in case of clock mis-match), is there a message in the Collector log about refusing an attempt to send updates?
yes basically the error you describe as puzzling appears in coincidence with an attempt of the windows node to access.


> This error

> 09/24/18 09:46:01 Query info: matched=6; skipped=4; query_time=0.000806; send_time=0.001738; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<127.0.0.1:25381>; projection={}
> 09/24/18 09:46:01 DaemonCore: Can't receive command request from 127.0.0.1 (perhaps a timeout?)

> is a bit more puzzling to me.  I donât see how a request from a windows node to the collector could result in a peer address of 127.0.0.1

> Does the config on the Windows machine have this?
this file c:\windows\system32\driver\etc\host does not contain 127.0.0.1 it contains just "10.6.10.15   mastercondorâ (I added this for convenience)

> NETWORK_INTERFACE = 127.0.0.1

> If so, remove that line.

> If not try running

>    condor_config_val -write:upgrade  config.log
ok done attached

> and sending me the config.log file.  Iâll see if I can see anything in that config that could cause the peer address to be set incorrectly.

thank you very much for your help and support!


> -tj

> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stefano Colafranceschi
> Sent: Monday, September 24, 2018 12:12 PM
> To: htcondor-users@xxxxxxxxxxx
> Subject: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.

> Dear all,
> I am trying to have a linux (latest) htcondor running with a windows node. On Linux I can submit jobs and they get processed no problems, but I canât figure out whatâs wrong adding a windows machine to the pool.

> This is the error that I see on the MasterLog (windows client):

> ERROR: SECMAN:2003:TCP connection to collector x.x.x.x failed.
> Failed to start non-blocking update to <x.x.x.x:9618>.

> And this is the content of the Collectorlog on the linux server, just after I issued on the windows machine condor_status -master

> 09/24/18 09:46:01 Got QUERY_STARTD_PVT_ADS
> 09/24/18 09:46:01 Number of Active Workers 0
> 09/24/18 09:46:01 (Sending 4 ads in response to query)
> 09/24/18 09:46:01 Query info: matched=4; skipped=0; query_time=0.000839; send_time=0.000619; type=MachinePrivate; requirements={true}; peer=<127.0.0.1:27363>; projection={}
> 09/24/18 09:46:01 Number of Active Workers 0
> 09/24/18 09:46:01 (Sending 6 ads in response to query)
> 09/24/18 09:46:01 Query info: matched=6; skipped=4; query_time=0.000806; send_time=0.001738; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<127.0.0.1:25381>; projection={}
> 09/24/18 09:46:01 DaemonCore: Can't receive command request from 127.0.0.1 (perhaps a timeout?)


> p.s. I am sure both windows and Linux have 9618 port open.

> Thanks for any suggestions!
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/