[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] I can not run condor_master on the 2nd node



Ben,

thanks for comment. I do not know, how I could forget to look inside it. Here is the output of the error, but I am still stuck with my own.

Regards,
Rene

labounek@magellan:~$ tail -n46 /var/log/condor/MasterLog
02/26/16 14:37:16 ******************************************************
02/26/16 14:37:16 ** condor_master (CONDOR_MASTER) STARTING UP
02/26/16 14:37:16 ** /usr/sbin/condor_master
02/26/16 14:37:16 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
02/26/16 14:37:16 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
02/26/16 14:37:16 ** $CondorVersion: 8.4.0 Sep 23 2015 BuildID: Debian-8.4.0~dfsg.1-1~nd80+1 Debian-8.4.0~dfsg.1-1~nd80+1 $
02/26/16 14:37:16 ** $CondorPlatform: X86_64-Debian_8 $
02/26/16 14:37:16 ** PID = 3353
02/26/16 14:37:16 ** Log last touched 2/26 14:28:19
02/26/16 14:37:16 ******************************************************
02/26/16 14:37:16 Using config source: /etc/condor/condor_config
02/26/16 14:37:16 Using local config sources:
02/26/16 14:37:16    /etc/condor/config.d/00debconf
02/26/16 14:37:16    /etc/condor/condor_config.local
02/26/16 14:37:16 config Macros = 62, Sorted = 62, StringBytes = 1664, TablesBytes = 2288
02/26/16 14:37:16 CLASSAD_CACHING is OFF
02/26/16 14:37:16 Daemon Log is logging: D_ALWAYS D_ERROR
02/26/16 14:37:17 Daemoncore: Listening at <172.19.37.21:6174> on TCP (ReliSock) and UDP (SafeSock).
02/26/16 14:37:17 DaemonCore: command socket at <172.19.37.21:6174?addrs=172.19.37.21-6174>
02/26/16 14:37:17 DaemonCore: private command socket at <172.19.37.21:6174?addrs=172.19.37.21-6174>
02/26/16 14:37:17 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1443039692)
02/26/16 14:37:17 error opening watchdog pipe /var/run/condor/procd_pipe.watchdog: No such file or directory (2)
02/26/16 14:37:17 ProcFamilyClient: error initializing LocalClient
02/26/16 14:37:17 ProcFamilyProxy: error initializing ProcFamilyClient
02/26/16 14:37:17 attempting to restart the Procd
02/26/16 14:37:17 error opening watchdog pipe /var/run/condor/procd_pipe.watchdog: No such file or directory (2)
02/26/16 14:37:17 ProcFamilyClient: error initializing LocalClient
02/26/16 14:37:17 recover_from_procd_error: error initializing ProcFamilyClient
02/26/16 14:37:17 attempting to restart the Procd
02/26/16 14:37:17 error opening watchdog pipe /var/run/condor/procd_pipe.watchdog: No such file or directory (2)
02/26/16 14:37:17 ProcFamilyClient: error initializing LocalClient
02/26/16 14:37:17 recover_from_procd_error: error initializing ProcFamilyClient
02/26/16 14:37:17 attempting to restart the Procd
02/26/16 14:37:17 error opening watchdog pipe /var/run/condor/procd_pipe.watchdog: No such file or directory (2)
02/26/16 14:37:17 ProcFamilyClient: error initializing LocalClient
02/26/16 14:37:17 recover_from_procd_error: error initializing ProcFamilyClient
02/26/16 14:37:17 attempting to restart the Procd
02/26/16 14:37:17 error opening watchdog pipe /var/run/condor/procd_pipe.watchdog: No such file or directory (2)
02/26/16 14:37:17 ProcFamilyClient: error initializing LocalClient
02/26/16 14:37:17 recover_from_procd_error: error initializing ProcFamilyClient
02/26/16 14:37:17 attempting to restart the Procd
02/26/16 14:37:17 error opening watchdog pipe /var/run/condor/procd_pipe.watchdog: No such file or directory (2)
02/26/16 14:37:17 ProcFamilyClient: error initializing LocalClient
02/26/16 14:37:17 recover_from_procd_error: error initializing ProcFamilyClient
02/26/16 14:37:17 ERROR "unable to restart the ProcD after several tries" at line 678 in file /tmp/buildd/condor-8.4.0~dfsg.1/src/condor_utils/proc_family_proxy.cpp
02/26/16 14:37:17 All daemons are gone.
labounek@magellan:~$



Dne 26.2.2016 v 15:43 Ben Cotton napsal(a):
Rene,

For magellan, you won't see anything listening on 9618 anyway. That's
just used by the condor_collector. One thing you didn't share is your
MasterLog. That will probably have useful information as to why the
condor_master process didn't start.


Thanks,
BC