[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Access problems with Windows 10 machine



So you say you are seeing the master ad for the Win10 box in the collector? that is

and yet the MasterLog is showing that it was unable to write to the collector?

 

If the master is failing to write to the collector,  we would not expect condor_status -master

to show that ad.

 

I think we need more context,  what are the messages in the MasterLog before the failure?

 

Similarly, the failure messages from the Collector may be a consequence of an earlier failure,

what are the messages from the CollectorLog before the failure message?

 

To the timestamps of the MasterLog failure messages and the CollectorLog messages line up?

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Peter Ellevseth
Sent: Thursday, September 13, 2018 7:55 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Access problems with Windows 10 machine

 

Hello all

 

We are running a htcondor cluster with linux machine (Centos 7). I have one Windows 10 machine that I want to add. I was able to install htcondor successfully. I can see it via condor_status -master, but startd is not able to connect.

 

I get the following error on the Win10 machine

09/13/18 11:27:56 condor_write(): Socket closed when trying to write 1269 bytes to collector [address of condor-host], fd is 1312, errno=10054

09/13/18 11:27:56 Buf::write(): condor_write() failed

 

That was from the master-log and I see a similar message in the startlog.

 

On my condor-head noe I see errors in the Collector log like:

 

CollectorLog:09/13/18 14:24:21 DaemonCore: Can't receive command request from [IP of Win10] (perhaps a timeout?)

CollectorLog:09/13/18 14:28:04 Deadline expired after 301.038s waiting for <[IP of Win10]:51106> to send payload for command 0 UPDATE_STARTD_AD.

CollectorLog:09/13/18 14:33:04 Deadline expired after 301.040s waiting for <[IP of Win10]:51124> to send payload for command 0 UPDATE_STARTD_AD.

CollectorLog:09/13/18 14:34:21 condor_read() failed: recv(fd=31) returned -1, errno = 110 Connection timed out, reading 5 bytes from <[IP of Win10]:51123>.

CollectorLog:09/13/18 14:34:21 condor_read(): UNEXPECTED read timeout after 0s during non-blocking read from <[IP of Win10]:51123> (desired timeout=1s)

CollectorLog:09/13/18 14:34:21 DaemonCore: Can't receive command request from [IP of Win10] (perhaps a timeout?)

CollectorLog:09/13/18 14:38:04 Deadline expired after 301.040s waiting for <[IP of Win10]:51135> to send payload for

 

Anyone have any suggestions?

 

Regards,

Peter

 

Image removed by sender. http://signature.safetec.no/images/SafeTec_Logo2.jpg

Peter Ellevseth

Senior Safety Engineer / Senior sikkerhetsingeniør
Dir: +47 93 43 56 01 / Tel: +47 51 93 92 20 (Stavanger)
peter.ellevseth@xxxxxxxxxx
www.safetec.no


Image removed by sender. http://signature.safetec.no/images/fbnew.png Image removed by sender. http://signature.safetec.no/images/linnew.png

 

Visste du at i 2017 satt kun 60 % av alle drepte i trafikken i en bil? De resterende var enten syklister, fotgjengere eller moped/motorsyklister. Sjekk din kommune her