I’d just found that and tested it as your message came in.
[root@xxxxxxxxx condor]# condor_config_val -master
CONDOR_DEVELOPERS_COLLECTOR
Not defined
Setting that to NONE stopped it crashing.
It resolves to 128.105.19.35. Does it use a library to look that up?
The machine is a minimal centos 7 install so maybe there’s a library
missing.
These machines don't have any access to the outside world anyway so
it’ll never connect.
Klint.
*From:*HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] *On
Behalf Of *Todd Tannenbaum
*Sent:* Monday, 27 June 2016 6:36 PM
*To:* HTCondor-Users Mail List
*Subject:* Re: [HTCondor-users] protocol error in collector after
housekeeping
Hi Klint,
Looks like your collector machine has something bogus setup in the
/etc/hosts file or DNS when resolving "condor.cs.wisc.edu
<http://condor.cs.wisc.edu>". Could you investigate that for us?
Meanwhile as an immediate workaround, perhaps you could avoid the
problem if you put in the condor_config file on your central manager
machine:
CONDOR_DEVELOPERS_COLLECTOR = NONE
Hope this helps,
Todd
Sent from my iPhone
On Jun 27, 2016, at 2:38 AM, Klint Gore <kgore4@xxxxxxxxxx
<mailto:kgore4@xxxxxxxxxx>> wrote:
Just in case
[root@xxxxxxxxx <mailto:root@xxxxxxxxx> condor]# condor_config_val
-v COLLECTOR_HOST
COLLECTOR_HOST = 10.1.1.55
# at: <Default>
# raw: COLLECTOR_HOST = $(CONDOR_HOST)
-----Original Message-----
From: Klint Gore
Sent: Monday, 27 June 2016 5:40 PM
To: HTCondor-Users Mail List
Subject: RE: protocol error in collector after housekeeping
[root@xxxxxxxxx <mailto:root@xxxxxxxxx> condor]# condor_config_val
-master CONDOR_HOST
10.1.1.55
[root@xxxxxxxxx <mailto:root@xxxxxxxxx> condor]# condor_config_val
-v CONDOR_HOST CONDOR_HOST = 10.1.1.55 # at:
/etc/condor/config.d/condor_config.local, line 1 # raw: CONDOR_HOST
= 10.1.1.55
Jobs do get run in the 15 minutes after the collector restarts until
the housekeeper kicks in.
------ collector log with D_FULLDEBUG
06/27/16 17:22:41 Housekeeper: Ready to clean old ads
06/27/16 17:22:41 Cleaning StartdAds ...
06/27/16 17:22:41 Cleaning StartdPrivateAds ...
06/27/16 17:22:41 Cleaning ScheddAds ...
06/27/16 17:22:41 Cleaning SubmittorAds ...
06/27/16 17:22:41 Cleaning LicenseAds ...
06/27/16 17:22:41 Cleaning MasterAds ...
06/27/16 17:22:41 Cleaning CkptServerAds ...
06/27/16 17:22:41 Cleaning CollectorAds ...
06/27/16 17:22:41 Cleaning StorageAds ...
06/27/16 17:22:41 Cleaning NegotiatorAds ...
06/27/16 17:22:41 Cleaning HadAds ...
06/27/16 17:22:41 Cleaning GridAds ...
06/27/16 17:22:41 Cleaning XferServiceAds ...
06/27/16 17:22:41 Cleaning LeaseManagerAds ...
06/27/16 17:22:41 Cleaning Generic Ads ...
06/27/16 17:22:41 Housekeeper: Done cleaning
06/27/16 17:22:42 ScheddAd : Updating ... "<
10-1-1-61.agbu.localdomain , 10.1.1.61 >"
06/27/16 17:22:42 In OfflineCollectorPlugin::update ( 1 )
06/27/16 17:22:42 CollectorAd : Updating ... "<
AGBU@xxxxxxxxxxxxxxxxxxxxxxxxxx
<mailto:AGBU@xxxxxxxxxxxxxxxxxxxxxxxxxx> >"
06/27/16 17:22:42 Attempting to send update via UDP to collector
condor.cs.wisc.edu <http://condor.cs.wisc.edu> <:9618>
06/27/16 17:22:42 ERROR "Unknown protocol (1) in Sock::bind();
aborting." at line 741 in file
/slots/01/dir_1114870/userdir/.tmpthm9vL/BUILD/condor-8.4.
7/src/condor_io/sock.cpp
------
Looks like the address is blank in that attempting to update line.
Klint.
-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On
Behalf Of Iain Bradford Steers
Sent: Monday, 27 June 2016 4:35 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] protocol error in collector after
housekeeping
Hi Klint,
I've seen this error message type in the past when I've accidentally
appended the port to the address a second time.
However your CONDOR_HOST var seems okay.
Could you run the following:
condor_config_val -master CONDOR_HOST
condor_config_val -v CONDOR_HOST
I think we can ignore the connection refused error for the moment.
The master doesn't know the collector is dead, so is trying to send
an update, I think. (Sounds like a bug in itself really)
Could you bump up the debugging?
MASTER_DEBUG = D_FULLDEBUG
COLLECTOR_DEBUG = D_FULLDEBUG
Cheers, Iain
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/