[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem : submitted jobs stay in I state even with available execute nodes.



I've setup a new condor master on some machine. I then installed condor in AFS for a number of workstations so they can submit jobs:

./condor_install --install-dir=/usr/local/condor-7.0.4 -- type=submit,execute --central-manager=condor.cc.lehigh.edu

When I submit jobs they just sit in the queue, even though there are available machines:

[lusol@xs106d condor]$ condor_q -better-analyze


-- Submitter: xs106d.cc.lehigh.edu : <128.180.52.73:55084> : xs106d.cc.lehigh.edu
---
001.000:  Run analysis summary.  Of 258 machines,
   178 are rejected by your job's requirements
     7 reject your job because of their own requirements
     0 match but are serving users with a better priority in the pool
     0 match but reject the job for unknown reasons
     0 match but will not currently preempt their existing job
    73 are available to run your job
       Last successful match: Fri Oct 17 11:25:50 2008

The Requirements expression for your job is:

( ( target.Arch == "x86_64" ) ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( target.HasFileTransfer )

   Condition                         Machines Matched    Suggestion
   ---------                         ----------------    ----------
1   ( ( target.Arch == "x86_64" ) )   80
2   ( target.OpSys == "LINUX" )       216
3   ( target.Disk >= 1 )              258
4   ( ( 1024 * target.Memory ) >= 1 ) 258
5   ( target.HasFileTransfer )        258

The following attributes are missing from the job ClassAd:

CheckpointPlatform
Scheduler

Here's a condor_status summary:


Total Owner Claimed Unclaimed Matched Preempting Backfill

INTEL/LINUX 140 0 0 140 0 0 0 INTEL/OSX 102 18 0 84 0 0 0 INTEL/WINNT51 54 46 0 8 0 0 0 X86_64/LINUX 84 5 0 76 3 0 0

Total 380 69 0 308 3 0 0


Any insight would be most appreciated ... thanks,


Steve