[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] protocol error in collector after housekeeping

I’d just found that and tested it as your message came in.


[root@xxxxxxxxx condor]# condor_config_val -master CONDOR_DEVELOPERS_COLLECTOR

Not defined


Setting that to NONE stopped it crashing. 


It resolves to  Does it use a library to look that up?  The machine is a minimal centos 7 install so maybe there’s a library missing.


These machines don't have any access to the outside world anyway so it’ll never connect.





From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Monday, 27 June 2016 6:36 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] protocol error in collector after housekeeping


Hi Klint,


Looks like your collector machine has something bogus setup in the /etc/hosts file or DNS when resolving "condor.cs.wisc.edu". Could you investigate that for us? 


Meanwhile as an immediate workaround, perhaps you could avoid the problem if you put in the condor_config file on your central manager machine:


Hope this helps,



Sent from my iPhone

On Jun 27, 2016, at 2:38 AM, Klint Gore <kgore4@xxxxxxxxxx> wrote:

Just in case

[root@xxxxxxxxx condor]# condor_config_val -v COLLECTOR_HOST
# at: <Default>

-----Original Message-----
From: Klint Gore
Sent: Monday, 27 June 2016 5:40 PM
To: HTCondor-Users Mail List
Subject: RE: protocol error in collector after housekeeping

[root@xxxxxxxxx condor]# condor_config_val -master CONDOR_HOST
[root@xxxxxxxxx condor]# condor_config_val -v CONDOR_HOST CONDOR_HOST =  # at: /etc/condor/config.d/condor_config.local, line 1  # raw: CONDOR_HOST =

Jobs do get run in the 15 minutes after the collector restarts until the housekeeper kicks in.

------ collector log with D_FULLDEBUG

06/27/16 17:22:41 Housekeeper:  Ready to clean old ads
06/27/16 17:22:41       Cleaning StartdAds ...
06/27/16 17:22:41       Cleaning StartdPrivateAds ...
06/27/16 17:22:41       Cleaning ScheddAds ...
06/27/16 17:22:41       Cleaning SubmittorAds ...
06/27/16 17:22:41       Cleaning LicenseAds ...
06/27/16 17:22:41       Cleaning MasterAds ...
06/27/16 17:22:41       Cleaning CkptServerAds ...
06/27/16 17:22:41       Cleaning CollectorAds ...
06/27/16 17:22:41       Cleaning StorageAds ...
06/27/16 17:22:41       Cleaning NegotiatorAds ...
06/27/16 17:22:41       Cleaning HadAds ...
06/27/16 17:22:41       Cleaning GridAds ...
06/27/16 17:22:41       Cleaning XferServiceAds ...
06/27/16 17:22:41       Cleaning LeaseManagerAds ...
06/27/16 17:22:41       Cleaning Generic Ads ...
06/27/16 17:22:41 Housekeeper:  Done cleaning
06/27/16 17:22:42 ScheddAd     : Updating ... "< 10-1-1-61.agbu.localdomain , >"
06/27/16 17:22:42 In OfflineCollectorPlugin::update ( 1 )
06/27/16 17:22:42 CollectorAd  : Updating ... "< AGBU@xxxxxxxxxxxxxxxxxxxxxxxxxx >"
06/27/16 17:22:42 Attempting to send update via UDP to collector condor.cs.wisc.edu <:9618>
06/27/16 17:22:42 ERROR "Unknown protocol (1) in Sock::bind(); aborting." at line 741 in file /slots/01/dir_1114870/userdir/.tmpthm9vL/BUILD/condor-8.4.

Looks like the address is blank in that attempting to update line.


-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Iain Bradford Steers
Sent: Monday, 27 June 2016 4:35 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] protocol error in collector after housekeeping

Hi Klint,

I've seen this error message type in the past when I've accidentally appended the port to the address a second time.

However your CONDOR_HOST var seems okay.

Could you run the following:

condor_config_val -master CONDOR_HOST

condor_config_val -v CONDOR_HOST

I think we can ignore the connection refused error for the moment. The master doesn't know the collector is dead, so is trying to send an update, I think. (Sounds like a bug in itself really)

Could you bump up the debugging?


Cheers, Iain
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: