[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection tocollectorfailed.



to be complete I see that loopback IP 127.0.0.1 even when I simply type condor_status on the linux master. What do you suggest to debug further?

StefanoC

> On Sep 26, 2018, at 12:21 PM, Stefano Colafranceschi - Mathematical Sciences Dept <stefano.colafranceschi@xxxxxxx> wrote:
> 
> to debug I completely disable ufw (along with SELinux) and the windows firewall (and defender)
> 
>> On Sep 26, 2018, at 12:08 PM, Grassia, Philippe M. (Philippe) <pgrassia@xxxxxxxxxxx> wrote:
>> 
>> This shold be sufficient. Then short of a firewall config (either on the windows host or the CONDOR_HOST) I'm at wit's end.
>> 
>> Philippe
>> 
>> 
>> 
>> On 9/26/18 7:44 AM, Stefano Colafranceschi wrote:
>>> I can see running condor tasks under the windows client (condor_master, condor_procd, condor_schedd condor_share_port condor_startd), they start when the computer boots up as the condor msi package added condor as a service. Is this sufficient? Or do you suggest I am missing something that might cause the malfunctioning I am reporting?
>>> 
>>> StefanoC
>>> 
>>> From: Grassia, Philippe M. (Philippe)
>>> Sent: Wednesday, September 26, 2018 10:27 AM
>>> To: htcondor-users@xxxxxxxxxxx
>>> Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection tocollectorfailed.
>>> 
>>> WSL does not have an init/service management system. How do you start and maintain the daemons on the windows host ? nssm ? powershell scripts ?
>>> 
>>> 
>>> On 09/26/2018 07:02 AM, Stefano Colafranceschi wrote:
>>> Find attached the config file of condor on my windows client (which is in 10.x.x.x), any further suggestions?
>>> 
>>> Thanks!
>>> 
>>> StefanoC
>>> 
>>> From: John M Knoeller
>>> Sent: Tuesday, September 25, 2018 5:19 PM
>>> To: HTCondor-Users Mail List
>>> Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collectorfailed.
>>> 
>>> This looks ok to me.
>>> 
>>> Your ALLOW_WRITE line is allowing everything on the 10.* subnet, that should be sufficient to give your Windows machine permission to send ads to the Collector.  (Iâm assuming your Windows machine is in that subnet?)
>>> 
>>> 
>>> Could I also see the configuration of your Windows machine?  Perhaps the problem is there.
>>> 
>>> -tj
>>> 
>>> 
>>> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stefano Colafranceschi - Mathematical Sciences Dept
>>> Sent: Tuesday, September 25, 2018 12:14 PM
>>> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
>>> Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.
>>> 
>>> Thanks, find inline answer and attached config file
>>> 
>>>> On Sep 25, 2018, at 11:57 AM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
>>>> 
>>>> I presume x.x.x.x is the correct IP for your Linux central manager machine?
>>> yes 10.6.10.15
>>> 
>>>> 
>>>> The error in the Master log looks like it might be an authorization problem â the collector isnât allowing the Windows node to send updates.  
>>> right but I canât figure out the issue.
>>> 
>>>> 
>>>> Check the ALLOW_WRITE configuration knob in the in the Collector, does it permit the IP of the Windows node?
>>>> 
>>>> At the same timestamp  as the error from the master log (plus or minus a few seconds in case of clock mis-match), is there a message in the Collector log about refusing an attempt to send updates?
>>> yes basically the error you describe as puzzling appears in coincidence with an attempt of the windows node to access.
>>> 
>>>> 
>>>> This error
>>>> 
>>>> 09/24/18 09:46:01 Query info: matched=6; skipped=4; query_time=0.000806; send_time=0.001738; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<127.0.0.1:25381>; projection={}
>>>> 09/24/18 09:46:01 DaemonCore: Can't receive command request from 127.0.0.1 (perhaps a timeout?)
>>>> 
>>>> is a bit more puzzling to me.  I donât see how a request from a windows node to the collector could result in a peer address of 127.0.0.1
>>>> 
>>>> Does the config on the Windows machine have this?
>>> this file c:\windows\system32\driver\etc\host does not contain 127.0.0.1 it contains just "10.6.10.15   mastercondorâ (I added this for convenience)
>>>> 
>>>> NETWORK_INTERFACE = 127.0.0.1
>>>> 
>>>> If so, remove that line.
>>>> 
>>>> If not try running
>>>> 
>>>>   condor_config_val -write:upgrade  config.log
>>> ok done attached
>>>> 
>>>> and sending me the config.log file.  Iâll see if I can see anything in that config that could cause the peer address to be set incorrectly.
>>> 
>>> thank you very much for your help and support!
>>> 
>>>> 
>>>> -tj
>>>> 
>>>> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stefano Colafranceschi
>>>> Sent: Monday, September 24, 2018 12:12 PM
>>>> To: htcondor-users@xxxxxxxxxxx
>>>> Subject: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.
>>>> 
>>>> Dear all,
>>>> I am trying to have a linux (latest) htcondor running with a windows node. On Linux I can submit jobs and they get processed no problems, but I canât figure out whatâs wrong adding a windows machine to the pool.
>>>> 
>>>> This is the error that I see on the MasterLog (windows client):
>>>> 
>>>> ERROR: SECMAN:2003:TCP connection to collector x.x.x.x failed.
>>>> Failed to start non-blocking update to <x.x.x.x:9618>.
>>>> 
>>>> And this is the content of the Collectorlog on the linux server, just after I issued on the windows machine condor_status -master
>>>> 
>>>> 09/24/18 09:46:01 Got QUERY_STARTD_PVT_ADS
>>>> 09/24/18 09:46:01 Number of Active Workers 0
>>>> 09/24/18 09:46:01 (Sending 4 ads in response to query)
>>>> 09/24/18 09:46:01 Query info: matched=4; skipped=0; query_time=0.000839; send_time=0.000619; type=MachinePrivate; requirements={true}; peer=<127.0.0.1:27363>; projection={}
>>>> 09/24/18 09:46:01 Number of Active Workers 0
>>>> 09/24/18 09:46:01 (Sending 6 ads in response to query)
>>>> 09/24/18 09:46:01 Query info: matched=6; skipped=4; query_time=0.000806; send_time=0.001738; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<127.0.0.1:25381>; projection={}
>>>> 09/24/18 09:46:01 DaemonCore: Can't receive command request from 127.0.0.1 (perhaps a timeout?)
>>>> 
>>>> 
>>>> p.s. I am sure both windows and Linux have 9618 port open.
>>>> 
>>>> Thanks for any suggestions!
>>>> _______________________________________________
>>>> HTCondor-users mailing list
>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>> 
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to 
>>> htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> 
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> 
>>> The archives can be found at:
>>> 
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>