[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] I can not run condor_master on the 2nd node



Ben,
Truly the directory did not exist. I have created it. CONDOR_ADMIN is root@localhost.

labounek@magellan:/var/run$ sudo mkdir condor

Root owns that folder. After ls- l

drwxr-xr-x  2 root        root          80 úno 26 16:35 condor

Inside the condor folder, it looks like this after sudo condor_master. And the condor_master (under user condor) and condor_procd (under user root) are running at magellan:

labounek@magellan:/var/run/condor$ ls -l
celkem 0
prw------- 1 condor root 0 úno 26 16:40 procd_pipe
prw------- 1 condor root 0 úno 26 16:40 procd_pipe.watchdog
labounek@magellan:/var/run/condor$


But still, the condor_status see only 12 emperor's cores. I suppose because the condor_startd is not still running.

Here is the new MasterLog. And the file /var/lock/condor/InstanceLock is at magellan.

labounek@magellan:/var/lock/condor$ ls -l InstanceLock
-rw------- 1 condor condor 0 úno 26 16:43 InstanceLock
labounek@magellan:/var/lock/condor$


Regards,
Rene


02/26/16 16:43:31 ******************************************************
02/26/16 16:43:31 ** condor_master (CONDOR_MASTER) STARTING UP
02/26/16 16:43:31 ** /usr/sbin/condor_master
02/26/16 16:43:31 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
02/26/16 16:43:31 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
02/26/16 16:43:31 ** $CondorVersion: 8.4.0 Sep 23 2015 BuildID: Debian-8.4.0~dfsg.1-1~nd80+1 Debian-8.4.0~dfsg.1-1~nd80+1 $
02/26/16 16:43:31 ** $CondorPlatform: X86_64-Debian_8 $
02/26/16 16:43:31 ** PID = 4791
02/26/16 16:43:31 ** Log last touched 2/26 16:43:19
02/26/16 16:43:31 ******************************************************
02/26/16 16:43:31 Using config source: /etc/condor/condor_config
02/26/16 16:43:31 Using local config sources:
02/26/16 16:43:31    /etc/condor/config.d/00debconf
02/26/16 16:43:31    /etc/condor/condor_config.local
02/26/16 16:43:31 config Macros = 62, Sorted = 62, StringBytes = 1664, TablesBytes = 2288
02/26/16 16:43:31 CLASSAD_CACHING is OFF
02/26/16 16:43:31 Daemon Log is logging: D_ALWAYS D_ERROR
02/26/16 16:43:31 lock_file returning ERROR, errno=11 (Resource temporarily unavailable)
02/26/16 16:43:31 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable)
02/26/16 16:43:31 ERROR "Can't get lock on "/var/lock/condor/InstanceLock"" at line 1106 in file /tmp/buildd/condor-8.4.0~dfsg.1/src/condor_master.V6/master.cpp
02/26/16 16:45:31 mkfifo of /var/run/condor/procd_pipe.4744.0 error: Permission denied (13)
02/26/16 16:45:31 failed to initialize named pipe at /var/run/condor/procd_pipe.4744.0
02/26/16 16:45:31 LocalClient: error initializing NamedPipeReader
02/26/16 16:45:31 ProcFamilyClient: failed to start connection with ProcD
02/26/16 16:45:31 register_subfamily: ProcD communication error
02/26/16 16:45:31 Create_Process: error registering family for pid 4808
02/26/16 16:45:31 Create_Process(/usr/sbin/condor_startd): child failed because it failed to register itself with the ProcD
02/26/16 16:45:31 ERROR: Create_Process failed trying to start /usr/sbin/condor_startd
02/26/16 16:45:31 restarting /usr/sbin/condor_startd in 521 seconds




Dne 26.2.2016 v 16:20 Ben Cotton napsal(a):
Rene,

I think this is the important line:

02/26/16 14:37:17 error opening watchdog pipe /var/run/condor/procd_pipe.watchdog: No such file or directory (2)
Does the /var/run/condor/ exist and is it writable by the condor user?
If not, try creating that directory and see if HTCondor will start.
I'm not as familiar with Debian systems, but I know on RHEL7, that
directory is created by systemd at boot time.


Thanks,
BC