[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] I can not run condor_master on the 2nd node



I was looking more into MasterLog, it looks the condor_master was called twice and the previous error message is different. So, I am sending longer error message.

If this is a file /var/run/condor/procd_pipe.4744.0 then there is only /var/run/condor/procd_pipe file.

Rene

labounek@magellan:/var/lock/condor$ tail -n150 /var/log/condor/MasterLog
02/26/16 14:37:17 All daemons are gone.
02/26/16 16:35:38 I am: hostname: magellan, fully qualified doman name: magellan.fnol.loc, IP: 172.19.37.21, IPv4: 172.19.37.21, IPv6:
02/26/16 16:35:38 I am: hostname: magellan, fully qualified doman name: magellan.fnol.loc, IP: 172.19.37.21, IPv4: 172.19.37.21, IPv6:
02/26/16 16:35:38 ******************************************************
02/26/16 16:35:38 ** condor_master (CONDOR_MASTER) STARTING UP
02/26/16 16:35:38 ** /usr/sbin/condor_master
02/26/16 16:35:38 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
02/26/16 16:35:38 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
02/26/16 16:35:38 ** $CondorVersion: 8.4.0 Sep 23 2015 BuildID: Debian-8.4.0~dfsg.1-1~nd80+1 Debian-8.4.0~dfsg.1-1~nd80+1 $
02/26/16 16:35:38 ** $CondorPlatform: X86_64-Debian_8 $
02/26/16 16:35:38 ** PID = 4744
02/26/16 16:35:38 ** Log last touched 2/26 14:37:17
02/26/16 16:35:38 ******************************************************
02/26/16 16:35:38 Using config source: /etc/condor/condor_config
02/26/16 16:35:38 Using local config sources:
02/26/16 16:35:38    /etc/condor/config.d/00debconf
02/26/16 16:35:38    /etc/condor/condor_config.local
02/26/16 16:35:38 config Macros = 62, Sorted = 62, StringBytes = 1664, TablesBytes = 2288
02/26/16 16:35:38 CLASSAD_CACHING is OFF
02/26/16 16:35:38 Daemon Log is logging: D_ALWAYS D_ERROR
02/26/16 16:35:39 Daemoncore: Listening at <172.19.37.21:13774> on TCP (ReliSock) and UDP (SafeSock).
02/26/16 16:35:39 DaemonCore: command socket at <172.19.37.21:13774?addrs=172.19.37.21-13774>
02/26/16 16:35:39 DaemonCore: private command socket at <172.19.37.21:13774?addrs=172.19.37.21-13774>
02/26/16 16:35:39 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1443039692)
02/26/16 16:35:39 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:35:39 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:35:39 LocalClient: error initializing NamedPipeReader
02/26/16 16:35:39 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:35:39 register_subfamily: ProcD communication error
02/26/16 16:35:39 Create_Process: error registering family for pid 4752
02/26/16 16:35:39 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:35:39 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:35:39 restarting /usr/sbin/condor_startd in 10 seconds
02/26/16 16:35:39 Timer -1 not found
02/26/16 16:35:49 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:35:49 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:35:49 LocalClient: error initializing NamedPipeReader
02/26/16 16:35:49 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:35:49 register_subfamily: ProcD communication error
02/26/16 16:35:49 Create_Process: error registering family for pid 4754
02/26/16 16:35:49 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:35:49 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:35:49 restarting /usr/sbin/condor_startd in 11 seconds
02/26/16 16:36:00 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:36:00 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:36:00 LocalClient: error initializing NamedPipeReader
02/26/16 16:36:00 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:36:00 register_subfamily: ProcD communication error
02/26/16 16:36:00 Create_Process: error registering family for pid 4757
02/26/16 16:36:00 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:36:00 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:36:00 restarting /usr/sbin/condor_startd in 13 seconds
02/26/16 16:36:13 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:36:13 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:36:13 LocalClient: error initializing NamedPipeReader
02/26/16 16:36:13 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:36:13 register_subfamily: ProcD communication error
02/26/16 16:36:13 Create_Process: error registering family for pid 4760
02/26/16 16:36:13 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:36:13 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:36:13 restarting /usr/sbin/condor_startd in 17 seconds
02/26/16 16:36:30 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:36:30 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:36:30 LocalClient: error initializing NamedPipeReader
02/26/16 16:36:30 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:36:30 register_subfamily: ProcD communication error
02/26/16 16:36:30 Create_Process: error registering family for pid 4761
02/26/16 16:36:30 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:36:30 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:36:30 restarting /usr/sbin/condor_startd in 25 seconds
02/26/16 16:36:55 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:36:55 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:36:55 LocalClient: error initializing NamedPipeReader
02/26/16 16:36:55 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:36:55 register_subfamily: ProcD communication error
02/26/16 16:36:55 Create_Process: error registering family for pid 4763
02/26/16 16:36:55 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:36:55 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:36:55 restarting /usr/sbin/condor_startd in 41 seconds
02/26/16 16:37:36 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:37:36 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:37:36 LocalClient: error initializing NamedPipeReader
02/26/16 16:37:36 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:37:36 register_subfamily: ProcD communication error
02/26/16 16:37:36 Create_Process: error registering family for pid 4764
02/26/16 16:37:36 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:37:36 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:37:36 restarting /usr/sbin/condor_startd in 73 seconds
02/26/16 16:38:49 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:38:49 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:38:49 LocalClient: error initializing NamedPipeReader
02/26/16 16:38:49 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:38:49 register_subfamily: ProcD communication error
02/26/16 16:38:49 Create_Process: error registering family for pid 4769
02/26/16 16:38:49 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:38:49 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:38:49 restarting /usr/sbin/condor_startd in 137 seconds
02/26/16 16:41:06 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:41:06 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:41:06 LocalClient: error initializing NamedPipeReader
02/26/16 16:41:06 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:41:06 register_subfamily: ProcD communication error
02/26/16 16:41:06 Create_Process: error registering family for pid 4779
02/26/16 16:41:06 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:41:06 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:41:06 restarting /usr/sbin/condor_startd in 265 seconds
02/26/16 16:43:19 WARNING: forward resolution of magellan doesn't match 172.19.37.21!
02/26/16 16:43:19 WARNING: forward resolution of magellan doesn't match 172.19.37.21!
02/26/16 16:43:19 I am: hostname: magellan, fully qualified doman name: magellan.fnol.loc, IP: 172.19.37.21, IPv4: 172.19.37.21, IPv6:
02/26/16 16:43:19 Reconfiguring all managed daemons.
02/26/16 16:43:31 I am: hostname: magellan, fully qualified doman name: magellan.fnol.loc, IP: 172.19.37.21, IPv4: 172.19.37.21, IPv6:
02/26/16 16:43:31 I am: hostname: magellan, fully qualified doman name: magellan.fnol.loc, IP: 172.19.37.21, IPv4: 172.19.37.21, IPv6:
02/26/16 16:43:31 ******************************************************
02/26/16 16:43:31 ** condor_master (CONDOR_MASTER) STARTING UP
02/26/16 16:43:31 ** /usr/sbin/condor_master
02/26/16 16:43:31 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
02/26/16 16:43:31 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
02/26/16 16:43:31 ** $CondorVersion: 8.4.0 Sep 23 2015 BuildID: Debian-8.4.0~dfsg.1-1~nd80+1 Debian-8.4.0~dfsg.1-1~nd80+1 $
02/26/16 16:43:31 ** $CondorPlatform: X86_64-Debian_8 $
02/26/16 16:43:31 ** PID = 4791
02/26/16 16:43:31 ** Log last touched 2/26 16:43:19
02/26/16 16:43:31 ******************************************************
02/26/16 16:43:31 Using config source: /etc/condor/condor_config
02/26/16 16:43:31 Using local config sources:
02/26/16 16:43:31    /etc/condor/config.d/00debconf
02/26/16 16:43:31    /etc/condor/condor_config.local
02/26/16 16:43:31 config Macros = 62, Sorted = 62, StringBytes = 1664, TablesBytes = 2288
02/26/16 16:43:31 CLASSAD_CACHING is OFF
02/26/16 16:43:31 Daemon Log is logging: D_ALWAYS D_ERROR
02/26/16 16:43:31 lock_file returning ERROR, errno=11 (Resource temporarily unavailable)
02/26/16 16:43:31 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable)
02/26/16 16:43:31 ERROR "Can't get lock on "/var/lock/condor/InstanceLock"" at line 1106 in file /tmp/buildd/condor-8.4.0~dfsg.1/src/condor_master.V6/master.cpp
02/26/16 16:45:31 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:45:31 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:45:31 LocalClient: error initializing NamedPipeReader
02/26/16 16:45:31 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:45:31 register_subfamily: ProcD communication error
02/26/16 16:45:31 Create_Process: error registering family for pid 4808
02/26/16 16:45:31 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:45:31 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:45:31 restarting /usr/sbin/condor_startd in 521 seconds
02/26/16 16:54:12 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:54:12 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:54:12 LocalClient: error initializing NamedPipeReader
02/26/16 16:54:12 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:54:12 register_subfamily: ProcD communication error
02/26/16 16:54:12 Create_Process: error registering family for pid 4841
02/26/16 16:54:12 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:54:12 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:54:12 restarting /usr/sbin/condor_startd in 1033 seconds
labounek@magellan:/var/lock/condor$


Dne 26.2.2016 v 16:49 René Labounek napsal(a):
Ben,
Truly the directory did not exist. I have created it. CONDOR_ADMIN is root@localhost.

labounek@magellan:/var/run$ sudo mkdir condor

Root owns that folder. After ls- l

drwxr-xr-x  2 root        root          80 úno 26 16:35 condor

Inside the condor folder, it looks like this after sudo condor_master. And the condor_master (under user condor) and condor_procd (under user root) are running at magellan:

labounek@magellan:/var/run/condor$ ls -l
celkem 0
prw------- 1 condor root 0 úno 26 16:40 procd_pipe
prw------- 1 condor root 0 úno 26 16:40 procd_pipe.watchdog
labounek@magellan:/var/run/condor$


But still, the condor_status see only 12 emperor's cores. I suppose because the condor_startd is not still running.

Here is the new MasterLog. And the file /var/lock/condor/InstanceLock is at magellan.

labounek@magellan:/var/lock/condor$ ls -l InstanceLock
-rw------- 1 condor condor 0 úno 26 16:43 InstanceLock
labounek@magellan:/var/lock/condor$


Regards,
Rene


02/26/16 16:43:31 ******************************************************
02/26/16 16:43:31 ** condor_master (CONDOR_MASTER) STARTING UP
02/26/16 16:43:31 ** /usr/sbin/condor_master
02/26/16 16:43:31 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
02/26/16 16:43:31 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
02/26/16 16:43:31 ** $CondorVersion: 8.4.0 Sep 23 2015 BuildID: Debian-8.4.0~dfsg.1-1~nd80+1 Debian-8.4.0~dfsg.1-1~nd80+1 $
02/26/16 16:43:31 ** $CondorPlatform: X86_64-Debian_8 $
02/26/16 16:43:31 ** PID = 4791
02/26/16 16:43:31 ** Log last touched 2/26 16:43:19
02/26/16 16:43:31 ******************************************************
02/26/16 16:43:31 Using config source: /etc/condor/condor_config
02/26/16 16:43:31 Using local config sources:
02/26/16 16:43:31    /etc/condor/config.d/00debconf
02/26/16 16:43:31    /etc/condor/condor_config.local
02/26/16 16:43:31 config Macros = 62, Sorted = 62, StringBytes = 1664, TablesBytes = 2288
02/26/16 16:43:31 CLASSAD_CACHING is OFF
02/26/16 16:43:31 Daemon Log is logging: D_ALWAYS D_ERROR
02/26/16 16:43:31 lock_file returning ERROR, errno=11 (Resource temporarily unavailable)
02/26/16 16:43:31 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable)
02/26/16 16:43:31 ERROR "Can't get lock on "/var/lock/condor/InstanceLock"" at line 1106 in file /tmp/buildd/condor-8.4.0~dfsg.1/src/condor_master.V6/master.cpp
02/26/16 16:45:31 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:45:31 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:45:31 LocalClient: error initializing NamedPipeReader
02/26/16 16:45:31 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:45:31 register_subfamily: ProcD communication error
02/26/16 16:45:31 Create_Process: error registering family for pid 4808
02/26/16 16:45:31 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:45:31 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:45:31 restarting /usr/sbin/condor_startd in 521 seconds




Dne 26.2.2016 v 16:20 Ben Cotton napsal(a):
Rene,

I think this is the important line:

02/26/16 14:37:17 error opening watchdog pipe /var/run/condor/procd_pipe.watchdog: No such file or directory (2)
Does the /var/run/condor/ exist and is it writable by the condor user?
If not, try creating that directory and see if HTCondor will start.
I'm not as familiar with Debian systems, but I know on RHEL7, that
directory is created by systemd at boot time.


Thanks,
BC




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/