[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problems getting Condor Daemons to run on OS X



I am having difficulty getting the condor central manager to run on my
Mac OS X machine (Praetorian). I would like this machine to run as the
master, however upon running:
/condor_master

The only processes that startup are:
condor   692   0.0  0.3    39376   1344  ??  Ss   10:17AM   0:00.03
/condor_master
condor   693   0.0  0.4    31064   2240  ??  Ss   10:17AM   0:04.60
condor_startd -f
condor   694   0.0  0.4    30960   2108  ??  Ss   10:17AM   0:00.09
condor_schedd -f

I noticed that the manager should also have
condor_ master
condor_ collector
condor_ negotiator
condor_ startd
condor_ schedd

After taking a look through the logfiles I am wondering whether it is
having problems with loopback as since the condor_master is located on
the local machine. I checked my network name and it is listed simply as
Praetorian, but it seems to add a .local extension to this name in
condor. Below are my logs, and my config file. If someone could help me
work through this I would be most grateful. Thanks

In StartLog...

3/9 10:32:21 Error sending update to the collector praetorian.local
<192.168.0.110:9618>: Failed to send UDP update command to collector
3/9 10:32:21 All resources are free, exiting.
3/9 10:32:21 **** condor_startd (condor_STARTD) EXITING WITH STATUS 0
3/9 10:32:42 ******************************************************
3/9 10:32:42 ** condor_startd (CONDOR_STARTD) STARTING UP
3/9 10:32:42 ** /Users/condor/condor/sbin/condor_startd
3/9 10:32:42 ** $CondorVersion: 6.6.8 Jan 27 2005 $
3/9 10:32:42 ** $CondorPlatform: PPC-OSX_10_2 $
3/9 10:32:42 ** PID = 827
3/9 10:32:42 ******************************************************
3/9 10:32:42 Using config file: /Users/condor/condor_config
3/9 10:32:42 DaemonCore: Command Socket at <192.168.0.110:52009>
3/9 10:32:42 "/Users/condor/condor/sbin/condor_starter.pvm -classad"
did not produce any output, ignoring
3/9 10:32:43 New machine resource allocated
3/9 10:32:43 About to run initial benchmarks.
3/9 10:32:49 Completed initial benchmarks.




In MasterLog....

3/9 10:32:11 DaemonCore: Command received via TCP from host
<192.168.0.110:51982>
3/9 10:32:11 DaemonCore: received command 454 (DAEMONS_OFF), calling
handler (admin_command_handler)
3/9 10:32:11 Sent SIGTERM to STARTD (pid 805)
3/9 10:32:11 Sent SIGTERM to SCHEDD (pid 806)
3/9 10:32:21 The SCHEDD (pid 806) exited with status 0
3/9 10:32:21 The STARTD (pid 805) exited with status 0
3/9 10:32:21 All daemons are gone.
3/9 10:32:22 DaemonCore: Command received via TCP from host
<192.168.0.110:52007>
3/9 10:32:22 DaemonCore: received command 454 (DAEMONS_OFF), calling
handler (admin_command_handler)
3/9 10:32:22 All daemons are gone.
3/9 10:32:32 Got SIGTERM. Performing graceful shutdown.
3/9 10:32:32 All daemons are gone.  Exiting.
3/9 10:32:32 **** condor_master (condor_MASTER) EXITING WITH STATUS 0
3/9 10:32:42 ******************************************************
3/9 10:32:42 ** condor_master (CONDOR_MASTER) STARTING UP
3/9 10:32:42 ** /Users/condor/condor/sbin/condor_master
3/9 10:32:42 ** $CondorVersion: 6.6.8 Jan 27 2005 $
3/9 10:32:42 ** $CondorPlatform: PPC-OSX_10_2 $
3/9 10:32:42 ** PID = 826
3/9 10:32:42 ******************************************************
3/9 10:32:42 Using config file: /Users/condor/condor_config
3/9 10:32:42 DaemonCore: Command Socket at <192.168.0.110:52008>
3/9 10:32:42 Started DaemonCore process
"/Users/condor/condor/sbin/condor_startd", pid and pgroup = 827
3/9 10:32:42 Started DaemonCore process
"/Users/condor/condor/sbin/condor_schedd", pid and pgroup = 828
3/9 10:32:47 Can't connect to <192.168.0.110:9618>:0, errno = 61
3/9 10:32:47 Will keep trying for 10 seconds...
3/9 10:32:57 Connect failed for 10 seconds; returning FALSE
3/9 10:32:57 ERROR:
SECMAN:2003:TCP connection to <192.168.0.110:9618> failed

3/9 10:32:57 Can't send UPDATE_MASTER_AD to collector praetorian.local
<192.168.0.110:9618>: Failed to send UDP update command to collector





Here's my condor config file:

##  What machine is your central manager?
CONDOR_HOST		= praetorian.local

##--------------------------------------------------------------------
##  Pathnames:
##--------------------------------------------------------------------
##  Where have you installed the bin, sbin and lib condor directories?
RELEASE_DIR		= /Users/condor/condor

##  Where is the local condor directory for each host?
LOCAL_DIR		= $(TILDE)
#LOCAL_DIR		= $(RELEASE_DIR)/hosts/$(HOSTNAME)

##  Where is the machine-specific local config file for each host?
#LOCAL_CONFIG_FILE	= $(LOCAL_DIR)/condor_config.local
#LOCAL_CONFIG_FILE	= $(RELEASE_DIR)/etc/$(HOSTNAME).local

## If the local config file is not present, is it an error?
## WARNING: This is a potential security issue.
## If not specificed, te default is True
REQUIRE_LOCAL_CONFIG_FILE = FALSE