[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_q and condor_status don't agree



Hi,

I've run into a problem where the outputs from condor_q and condor_status 
don't agree.
The worst thing about this is that jobs get 'Matched' but don't ever get 
started. This seems to occur when I load lots of jobs (1000+) into the queue.
The central manager is Solaris, the exec hosts are WindowsXP

I'm looking at the config files and wondering whether any timeouts need to be 
lengthened or shortened .. for instance, with a JOB_START_INTERVAL of 2 
seconds it takes over 15 minutes to start 500 jobs by which time the ClassAd 
will be stale as its lifetime is only 15minutes.   But I'm not sure whether 
that should matter.

Any thoughts ?

Andrew
......

depot  mel% condor_version
$CondorVersion: 6.6.10 Jun 13 2005 $
$CondorPlatform: SUN4X-SOLARIS29 $

depot  mel% condor_status -total

                     Machines Owner Claimed Unclaimed Matched Preempting

       INTEL/WINNT51      586   117       4        97     368          0
     SUN4u/SOLARIS29        2     2       0         0       0          0

               Total      588   119       4        97     368          0


depot  mel% condor_q   

(lots of output)

 324.996 mel            12/13 15:47   0+00:00:00 I  0   0.0  java Pauser1 
x1k-9
 324.997 mel            12/13 15:47   0+00:00:00 I  0   0.0  java Pauser1 
x1k-9
 324.998 mel            12/13 15:47   0+00:00:00 I  0   0.0  java Pauser1 
x1k-9
 324.999 mel            12/13 15:47   0+00:00:00 I  0   0.0  java Pauser1 
x1k-9

1548 jobs; 1070 idle, 478 running, 0 held

.........................