[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] upgrading central manager from condor 7.6.6 to 8.2.10



Hi all,


I recently attempted an upgrade of my central manager node from 7.6.6 to 8.2.10 on a centos5 machine.


I did the hot upgrade route by just copying all the new binaries over from the release directory i.e. bin, include, sbin, libexec and lib folders respectively into the current directory of these folder, while backing up the old binaries. Condor automatically restarted once it noticed the new binaries in place.


I kept my existing condor_config.local file.


I got some initial errors while running condor_status which I was able to resolve by adding the following setting to the config file

SEC_DEFAULT_NEGOTIATION = NEVER

At any rate I then decided to try to submit a test job after the upgrade, the test job did not even make it to the queue, so I checked the logs to see if anything interesting going on.


I found the following errors in my collector log repeatedly. - NOTE nothing changed on my network or firewall, firewall is disabled on my execute nodes.


05/20/16 16:07:22 Failed to send DC_INVALIDATE_KEY to daemon at <192.168.122.1:35985>: SECMAN:2003:TCP connection to daemon at <192.168.122.1:35985> failed.


I did some digging around the interwebs and was wondering if there's was some "network" related setting that needed to be in my config file from version 8.2.10 since I barely modified my original config file.


I checked out the config file example for 8.2.10 and notified the following lines, is it possible that these settings are mandatory in this version of condor? Maybe there are even other settings from the config file that are mandatory for 8.2.10 aside from these.


UPDATE_COLLECTOR_WITH_TCP = TRUE 

COLLECTOR_SOCKET_CACHE_SIZE = 300 


I'd like to know if others use 8.2.x versions have these values in their configs on central manager node. I was wondering since my error was TCP related maybe it had something to do with this.


I appreciate any and all feedback as I'm hoping this upgrade will help troubleshoot or even fix a persistent issue I have with parallel universe jobs constantly getting stuck in queue.


Thanks for any feedback in advance!