[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] problems with startup and executing a job



Also for Problem 2:

~>condor_q -analyze

007.000:  Run analysis summary.  Of 4 machines,
      2 are rejected by your job's requirements
      2 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
        No successful match recorded.
        Last failed match: Fri Oct 20 12:01:27 2006
        Reason for last match failure: no match found

The Requirements expression for your job is:

( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >=
ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( TARGET.FileSystemDomain == "nini" )2
2   ( target.Arch == "INTEL" )        4
3   ( target.OpSys == "LINUX" )       4
4   ( target.Disk >= 10000 )          4
5   ( ( 1024 * target.Memory ) >= 10000 )4



在 2006-10-20五的 11:03 +0900,nini写道:
> For Problem 2, the NegotiatorLog said:
> 
> 10/20 10:26:01 ---------- Started Negotiation Cycle ----------
> 10/20 10:26:01 Phase 1:  Obtaining ads from collector ...
> 10/20 10:26:01   Getting all public ads ...
> 10/20 10:26:01   Sorting 6 ads ...
> 10/20 10:26:01   Getting startd private ads ...
> 10/20 10:26:01 Got ads: 6 public and 2 private
> 10/20 10:26:01 Public ads include 1 submitter, 2 startd
> 10/20 10:26:01 Phase 2:  Performing accounting ...
> 10/20 10:26:01 Phase 3:  Sorting submitter ads by priority ...
> 10/20 10:26:01 Phase 4.1:  Negotiating with schedds ...
> 10/20 10:26:01   Negotiating with condor@nini at <129.254.175.78:46913>
> 10/20 10:26:01 0 seconds so far
> 10/20 10:26:01     Request 00001.00000:
> 10/20 10:26:01       Rejected 1.0 condor@nini <129.254.175.78:46913>: no
> match found
> 10/20 10:26:01     Request 00006.00000:
> 10/20 10:26:01       Rejected 6.0 condor@nini <129.254.175.78:46913>: no
> match found
> 10/20 10:26:01     Got NO_MORE_JOBS;  done negotiating
> 10/20 10:26:01 ---------- Finished Negotiation Cycle ----------
> 
> 
> 
> 在 2006-10-20五的 10:59 +0900,nini写道:
> > Dear all, I got two problems:
> > 
> > ~~~~~~~~~~~~~~~~~~~~~~~Problem 1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > After starting condor_master as root on all machines in the pool, the
> > MasterLog on central-manager looks ok, but that on the other machine has
> > problem:
> > 
> > 10/20 09:48:17 ******************************************************
> > 10/20 09:48:17 ** condor_master (CONDOR_MASTER) STARTING UP
> > 10/20 09:48:17 ** /home/condor/condor/sbin/condor_master
> > 10/20 09:48:17 ** $CondorVersion: 6.8.1 Sep 17 2006  $
> > 10/20 09:48:17 ** $CondorPlatform: I386-LINUX_RHEL3 $
> > 10/20 09:48:17 ** PID = 2768
> > 10/20 09:48:17 ** Log last touched 10/18 17:52:01
> > 10/20 09:48:17 ******************************************************
> > 10/20 09:48:17 Using config
> > source: /home/condor/condor/etc/condor_config
> > 10/20 09:48:17 Using local config sources:
> > 10/20 09:48:17    /home/condor/condor_config.local
> > 10/20 09:48:17 DaemonCore: Command Socket at <129.254.187.125:42587>
> > 10/20 09:48:17 Started DaemonCore process
> > "/home/condor/condor/sbin/condor_startd", pid and pgroup = 2769
> > 10/20 09:48:18 Started DaemonCore process
> > "/home/condor/condor/sbin/condor_schedd", pid and pgroup = 2770
> > 10/20 09:48:23 attempt to connect to <129.254.187.125:9618> failed:
> > Connection refused (connect errno = 111).
> > 10/20 09:48:23 ERROR: SECMAN:2003:TCP connection to
> > <129.254.187.125:9618> failed
> > 
> > 10/20 09:48:23 Failed to start non-blocking update to
> > <129.254.187.125:9618>.
> > 
> > 
> > The IP address above is the local machine's IP, should it be? Can
> > anybody give hints for the failed connection?
> > 
> > 
> > Just now I restart condor with condor_master, the MasterLog changed:
> > 
> > 10/20 10:54:48 ******************************************************
> > 10/20 10:54:48 ** condor_master (CONDOR_MASTER) STARTING UP
> > 10/20 10:54:48 ** /home/condor/condor/sbin/condor_master
> > 10/20 10:54:48 ** $CondorVersion: 6.8.1 Sep 17 2006  $
> > 10/20 10:54:48 ** $CondorPlatform: I386-LINUX_RHEL3 $
> > 10/20 10:54:48 ** PID = 3527
> > 10/20 10:54:48 ** Log last touched 10/20 10:54:18
> > 10/20 10:54:48 ******************************************************
> > 10/20 10:54:48 Using config
> > source: /home/condor/condor/etc/condor_config
> > 10/20 10:54:48 Using local config sources:
> > 10/20 10:54:48    /home/condor/condor_config.local
> > 10/20 10:54:48 FileLock::obtain(1) failed - errno 11 (Resource
> > temporarily unavailable) 10/20 10:54:48 ERROR "Can't get lock on
> > "/home/condor/log/InstanceLock"" at line 976 in file master.C
> > 
> > 
> > ~~~~~~~~~~~~~~~~~~~~~~~~~Problem 2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Given Problem 1 not solved, I submit jobs on the central-manager, all
> > the jobs are kept idle, no execution. The jobs' logs contain only:
> > 
> > 000 (007.000.000) 10/20 10:26:01 Job submitted from host:
> > <129.254.175.78:46913>
> > ...
> > 
> > Condor is installed with all manager/submit/execute functions on
> > central-manager, I cannot solve what may cause this happening!
> > 
> > 
> > Thanks,
> > 
> > 
> > 
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> > The archives can be found at either
> > https://lists.cs.wisc.edu/archive/condor-users/
> > http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> > 
> 
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR