[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor_master does not start



I have a condor cluster running on a group of imacs and a mac pro server acting as the central manager. A few weeks ago, my central manager automattically updated to Mac OS 10.9 (someone before me must have mis-configured it, I wouldn't intentionally have it auto up date). However my cluster seemed to have continued working (after updating NFS manager to a version compatible with my new OS version). Now, with no other changes, a user reports to me that the cluster isn't working and he gets the error:

codytrey@metis:~$ condor_status
Error: communication error
CEDAR:6001:Failed to connect to <128.194.151.191:9618>


Okay, easy enough, the master isn't running on the central manager. So I run condor_master and it runs with no output as if it worked fine, but a ps aux | grep condor shows that no condor daemons are running, checking the master log I find this:


06/12/14 13:06:29 ******************************************************
06/12/14 13:06:29 ** condor_master (CONDOR_MASTER) STARTING UP
06/12/14 13:06:29 ** /condor/sbin/condor_master
06/12/14 13:06:29 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 06/12/14 13:06:29 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
06/12/14 13:06:29 ** $CondorVersion: 7.8.6 Oct 24 2012 BuildID: 73238 $
06/12/14 13:06:29 ** $CondorPlatform: x86_64_macos_10.7 $
06/12/14 13:06:29 ** PID = 34380
06/12/14 13:06:29 ** Log last touched 6/12 13:02:23
06/12/14 13:06:29 ******************************************************
06/12/14 13:06:29 Using config source: /etc/condor/condor_config
06/12/14 13:06:29 Using local config sources:
06/12/14 13:06:29    /condor/var/condor_config.local
06/12/14 13:06:29 Sock::bind failed: errno = 49 Can't assign requested address
06/12/14 13:06:29 Failed to bind to command ReliSock
06/12/14 13:06:29 (Make sure your IP address is correct in /etc/hosts.)
06/12/14 13:06:29 ERROR "BindAnyCommandPort() failed" at line 9247 in file /usr/local/condor/local/execute/slot2/dir_26216/userdir/src/condor_daemon_core.V6/daemon_core.cpp


I checked that /etc/hosts is correct, and it is. Am I missing something, or is it possible that my condor version is incompatible with the newer version of OS X?

Thanks,

Cody

--

---------------------------------------------------------------------------
Cody Belcher                                    email: codytrey@xxxxxxxx
Computer Support Group                          phone: (979) 845-1379
Department of Physics & Astronomy               office: MPHY 155
---------------------------------------------------------------------------