[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Windows Exec Nodes Falling Out of Cluster



If you turn on D_FULLDEBUG in the collector, you will see whether the update ads are ever arriving.  Likely they are not.

 

By default in 8.2  the initial update from stard to the collector is TCP, but later updates are sent as UDP packets. – so when your network is dropping UDP packets, the startds will only show up for an hour or so and then go away.

 

If you add this to the configuration of the execute nodes, they will always use TCP send updates to the collector.

 

UPDATE_COLLECTOR_WITH_TCP = true

 

With this knob, the logs in the startds might have something useful to say.  (With UDP updates, the startds have no way of knowing if the update succeeded or not.)

 

-tj

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Deck, William
Sent: Tuesday, November 24, 2015 2:11 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Windows Exec Nodes Falling Out of Cluster

 

All,

 

We are currently running into an issue where all of our Windows Server 2012 R2 boxes running Condor 8.2.8 are falling out of the cluster.  We can see the the Collector’s Housekeeper removing the stale ads however it appears from network traces and turning on Condor D_FULLDEBUG (for startd and master) that the collector should be receiving updates.   Does anyone have any ideas on why this is happening or how to better debug this issue?

 

 

HOUSEKEEPER excerpt  

11/24/15 13:37:32 Housekeeper:  Ready to clean old ads

11/24/15 13:37:32       Cleaning StartdAds ...

11/24/15 13:37:32               **** Removing stale ad: "< slot1@xxxxxxxxxxxxxxxxxx , 10.12.XXX.XXX >"

11/24/15 13:37:32               **** Removing stale ad: "< slot1@ XXXXXX.ds.susq.com , 10.12.XXX.XXX >"

11/24/15 13:37:32       Cleaning StartdPrivateAds ...

11/24/15 13:37:32               **** Removing stale ad: "< slot1@ XXXXXX.ds.susq.com , 10.12.XXX.XXX >"

11/24/15 13:37:32               **** Removing stale ad: "< slot1@ XXXXXX.ds.susq.com , 10.12.XXX.XXX >"

11/24/15 13:37:32       Cleaning ScheddAds ...

11/24/15 13:37:32       Cleaning SubmittorAds ...

11/24/15 13:37:32       Cleaning LicenseAds ...

11/24/15 13:37:32       Cleaning MasterAds ...

11/24/15 13:37:32       Cleaning CkptServerAds ...

11/24/15 13:37:32       Cleaning CollectorAds ...

11/24/15 13:37:32       Cleaning StorageAds ...

11/24/15 13:37:32       Cleaning NegotiatorAds ...

11/24/15 13:37:32       Cleaning HadAds ...

11/24/15 13:37:32       Cleaning GridAds ...

11/24/15 13:37:32       Cleaning XferServiceAds ...

11/24/15 13:37:32       Cleaning LeaseManagerAds ...

11/24/15 13:37:32       Cleaning Generic Ads ...

11/24/15 13:37:32 Housekeeper:  Done cleaning

 

--

Will Deck

 

 



IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.