[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] problems with startup and executing a job



For Problem 2, the NegotiatorLog said:

10/20 10:26:01 ---------- Started Negotiation Cycle ----------
10/20 10:26:01 Phase 1:  Obtaining ads from collector ...
10/20 10:26:01   Getting all public ads ...
10/20 10:26:01   Sorting 6 ads ...
10/20 10:26:01   Getting startd private ads ...
10/20 10:26:01 Got ads: 6 public and 2 private
10/20 10:26:01 Public ads include 1 submitter, 2 startd
10/20 10:26:01 Phase 2:  Performing accounting ...
10/20 10:26:01 Phase 3:  Sorting submitter ads by priority ...
10/20 10:26:01 Phase 4.1:  Negotiating with schedds ...
10/20 10:26:01   Negotiating with condor@nini at <129.254.175.78:46913>
10/20 10:26:01 0 seconds so far
10/20 10:26:01     Request 00001.00000:
10/20 10:26:01       Rejected 1.0 condor@nini <129.254.175.78:46913>: no
match found
10/20 10:26:01     Request 00006.00000:
10/20 10:26:01       Rejected 6.0 condor@nini <129.254.175.78:46913>: no
match found
10/20 10:26:01     Got NO_MORE_JOBS;  done negotiating
10/20 10:26:01 ---------- Finished Negotiation Cycle ----------



在 2006-10-20五的 10:59 +0900,nini写道:
> Dear all, I got two problems:
> 
> ~~~~~~~~~~~~~~~~~~~~~~~Problem 1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> After starting condor_master as root on all machines in the pool, the
> MasterLog on central-manager looks ok, but that on the other machine has
> problem:
> 
> 10/20 09:48:17 ******************************************************
> 10/20 09:48:17 ** condor_master (CONDOR_MASTER) STARTING UP
> 10/20 09:48:17 ** /home/condor/condor/sbin/condor_master
> 10/20 09:48:17 ** $CondorVersion: 6.8.1 Sep 17 2006  $
> 10/20 09:48:17 ** $CondorPlatform: I386-LINUX_RHEL3 $
> 10/20 09:48:17 ** PID = 2768
> 10/20 09:48:17 ** Log last touched 10/18 17:52:01
> 10/20 09:48:17 ******************************************************
> 10/20 09:48:17 Using config
> source: /home/condor/condor/etc/condor_config
> 10/20 09:48:17 Using local config sources:
> 10/20 09:48:17    /home/condor/condor_config.local
> 10/20 09:48:17 DaemonCore: Command Socket at <129.254.187.125:42587>
> 10/20 09:48:17 Started DaemonCore process
> "/home/condor/condor/sbin/condor_startd", pid and pgroup = 2769
> 10/20 09:48:18 Started DaemonCore process
> "/home/condor/condor/sbin/condor_schedd", pid and pgroup = 2770
> 10/20 09:48:23 attempt to connect to <129.254.187.125:9618> failed:
> Connection refused (connect errno = 111).
> 10/20 09:48:23 ERROR: SECMAN:2003:TCP connection to
> <129.254.187.125:9618> failed
> 
> 10/20 09:48:23 Failed to start non-blocking update to
> <129.254.187.125:9618>.
> 
> 
> The IP address above is the local machine's IP, should it be? Can
> anybody give hints for the failed connection?
> 
> 
> Just now I restart condor with condor_master, the MasterLog changed:
> 
> 10/20 10:54:48 ******************************************************
> 10/20 10:54:48 ** condor_master (CONDOR_MASTER) STARTING UP
> 10/20 10:54:48 ** /home/condor/condor/sbin/condor_master
> 10/20 10:54:48 ** $CondorVersion: 6.8.1 Sep 17 2006  $
> 10/20 10:54:48 ** $CondorPlatform: I386-LINUX_RHEL3 $
> 10/20 10:54:48 ** PID = 3527
> 10/20 10:54:48 ** Log last touched 10/20 10:54:18
> 10/20 10:54:48 ******************************************************
> 10/20 10:54:48 Using config
> source: /home/condor/condor/etc/condor_config
> 10/20 10:54:48 Using local config sources:
> 10/20 10:54:48    /home/condor/condor_config.local
> 10/20 10:54:48 FileLock::obtain(1) failed - errno 11 (Resource
> temporarily unavailable) 10/20 10:54:48 ERROR "Can't get lock on
> "/home/condor/log/InstanceLock"" at line 976 in file master.C
> 
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~Problem 2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Given Problem 1 not solved, I submit jobs on the central-manager, all
> the jobs are kept idle, no execution. The jobs' logs contain only:
> 
> 000 (007.000.000) 10/20 10:26:01 Job submitted from host:
> <129.254.175.78:46913>
> ...
> 
> Condor is installed with all manager/submit/execute functions on
> central-manager, I cannot solve what may cause this happening!
> 
> 
> Thanks,
> 
> 
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>