[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor central manager can't find another machines



Hello,
I checked ALLOW_WRITE . this  is  correct and for  ensuring that it's correct I changed it to ALLOW_WRITE=  *  ,but the problem  persists.
All of machines ping each other. Also, I've disabled the firewall on all systems but still central manager can't find  other machines and problem exists.
My machines type are:
Machine  1: submit, execute, manage
Machine  2: submit, execute
Machine  3:submit, execute

and this is Master log for central manager machine:

12/05/12 07:38:34 Can't open directory "/home/condor/config" as PRIV_UNKNOWN, errno: 2 (No such file or directory)
12/05/12 07:38:34 Can't open directory "/home/condor/Desktop/condor/config" as PRIV_UNKNOWN, errno: 2 (No such file or directory)
12/05/12 07:38:34 Setting maximum accepts per cycle 8.
12/05/12 07:38:34 ******************************************************
12/05/12 07:38:34 ** condor_master (CONDOR_MASTER) STARTING UP
12/05/12 07:38:34 ** /home/condor/Desktop/condor/sbin/condor_master
12/05/12 07:38:34 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
12/05/12 07:38:34 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
12/05/12 07:38:34 ** $CondorVersion: 7.8.6 Oct 24 2012 BuildID: 73238 $
12/05/12 07:38:34 ** $CondorPlatform: x86_64_ubuntu_10.04.4 $
12/05/12 07:38:34 ** PID = 2217
12/05/12 07:38:34 ** Log last touched 12/5 07:37:18
12/05/12 07:38:34 ******************************************************
12/05/12 07:38:34 Using config source: /home/condor/Desktop/condor/etc/condor_config
12/05/12 07:38:34 Using local config sources:
12/05/12 07:38:34    /home/condor/Desktop/condor/condor_config.local
12/05/12 07:38:34 DaemonCore: command socket at <172.16.160.45:58638>
12/05/12 07:38:34 DaemonCore: private command socket at <172.16.160.45:58638>
12/05/12 07:38:34 Setting maximum accepts per cycle 8.
12/05/12 07:38:34 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_collector", pid and pgroup = 2218
12/05/12 07:38:34 Waiting for /home/condor/Desktop/condor/log/.collector_address to appear.
12/05/12 07:38:35 Found /home/condor/Desktop/condor/log/.collector_address.
12/05/12 07:38:35 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_negotiator", pid and pgroup = 2221
12/05/12 07:38:35 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2222
12/05/12 07:38:35 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_startd", pid and pgroup = 2223
12/05/12 07:38:35 The SCHEDD (pid 2222) exited with status 4
12/05/12 07:38:35 Sending obituary for "/home/condor/Desktop/condor/sbin/condor_schedd"
12/05/12 07:38:35 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 10 seconds
12/05/12 07:38:45 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2252
12/05/12 07:38:45 The SCHEDD (pid 2252) exited with status 4
12/05/12 07:38:45 Sending obituary for "/home/condor/Desktop/condor/sbin/condor_schedd"
12/05/12 07:38:45 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 11 seconds
12/05/12 07:38:56 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2255
12/05/12 07:38:56 The SCHEDD (pid 2255) exited with status 4
12/05/12 07:38:56 Sending obituary for "/home/condor/Desktop/condor/sbin/condor_schedd"
12/05/12 07:38:56 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 13 seconds
12/05/12 07:39:09 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2258
12/05/12 07:39:09 The SCHEDD (pid 2258) exited with status 4
12/05/12 07:39:09 Sending obituary for "/home/condor/Desktop/condor/sbin/condor_schedd"
12/05/12 07:39:09 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 17 seconds
12/05/12 07:39:26 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2261
12/05/12 07:39:26 The SCHEDD (pid 2261) exited with status 4
12/05/12 07:39:26 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 25 seconds
12/05/12 07:39:51 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2265
12/05/12 07:39:51 The SCHEDD (pid 2265) exited with status 4
12/05/12 07:39:51 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 41 seconds
12/05/12 07:40:32 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2267
12/05/12 07:40:32 The SCHEDD (pid 2267) exited with status 4
12/05/12 07:40:32 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 73 seconds
12/05/12 07:41:45 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2293
12/05/12 07:41:45 The SCHEDD (pid 2293) exited with status 4
12/05/12 07:41:45 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 137 seconds
12/05/12 07:44:02 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2301
12/05/12 07:44:02 The SCHEDD (pid 2301) exited with status 4
12/05/12 07:44:02 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 265 seconds
12/05/12 07:48:27 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2318
12/05/12 07:48:27 The SCHEDD (pid 2318) exited with status 4
12/05/12 07:48:27 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 521 seconds
12/05/12 07:57:08 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 2348
12/05/12 07:57:08 The SCHEDD (pid 2348) exited with status 4
12/05/12 07:57:08 restarting /home/condor/Desktop/condor/sbin/condor_schedd in 1033 seconds


and Master log for one execute,submit Machine:

12/05/12 07:40:17 Can't open directory "/home/condor/config" as PRIV_UNKNOWN, errno: 2 (No such file or directory)
12/05/12 07:40:17 Can't open directory "/home/condor/Desktop/condor/local.mn3/config" as PRIV_UNKNOWN, errno: 2 (No such file or directory)
12/05/12 07:40:17 Setting maximum accepts per cycle 8.
12/05/12 07:40:17 ******************************************************
12/05/12 07:40:17 ** condor_master (CONDOR_MASTER) STARTING UP
12/05/12 07:40:17 ** /home/condor/Desktop/condor/sbin/condor_master
12/05/12 07:40:17 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
12/05/12 07:40:17 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
12/05/12 07:40:17 ** $CondorVersion: 7.8.6 Oct 24 2012 BuildID: 73238 $
12/05/12 07:40:17 ** $CondorPlatform: x86_64_ubuntu_10.04.4 $
12/05/12 07:40:17 ** PID = 1919
12/05/12 07:40:17 ** Log last touched 12/5 07:40:01
12/05/12 07:40:17 ******************************************************
12/05/12 07:40:17 Using config source: /home/condor/Desktop/condor/etc/condor_config
12/05/12 07:40:17 Using local config sources:
12/05/12 07:40:17    /home/condor/Desktop/condor/local.mn3/condor_config.local
12/05/12 07:40:17 DaemonCore: command socket at <172.16.160.48:48750>
12/05/12 07:40:17 DaemonCore: private command socket at <172.16.160.48:48750>
12/05/12 07:40:17 Setting maximum accepts per cycle 8.
12/05/12 07:40:17 Warning: Collector information was not found in the configuration file. ClassAds will not be sent to the collector and this daemon will not join a larger Condor pool.
12/05/12 07:40:17 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_schedd", pid and pgroup = 1920
12/05/12 07:40:17 Started DaemonCore process "/home/condor/Desktop/condor/sbin/condor_startd", pid and pgroup = 1921

Thanks in advance.
Naseri.

From: muluken sholaye <mulesho2490@xxxxxxxxx>
To: Mohammad Naseri <md.naseri@xxxxxxxxx>; HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Sent: Wednesday, December 5, 2012 3:17 PM
Subject: Re: [HTCondor-users] Condor central manager can't find another machines



On Wed, Dec 5, 2012 at 11:26 AM, Mohammad Naseri <md.naseri@xxxxxxxxx> wrote:
Hello,
    first check that allow_write is allowed to the specific machine on execution nodes
     something like
            ALLOW_WRITE =$(FULL_HOSTNAME)
      and then make sure that you can ping from each machine to central manager and vice versa.
    if you still get the problem post the content of MasterLog to this forum..
                      Cheers


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/