[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Our pool appears to work inefficiently



Hi,
  I could be exposing my lack of knowledge of the mechanics of condor
pools, however on hand I am quite surprised that the performance of the
pool is, on the whole, quite poor. The composition of the pool is
complicated -- there are machines from different departments and/or
subnet, and so this may be a very difficult issue to analyse or for any
one to advise us on...

According to condor_status most of the machines are unclaimed, however
when I submit a batch of 100 simple jobs I find that maybe 50% of them
will run simultaneously in the pool -- the rest are rejected, and
condor_q tells me that machines do match however reject the jobs for
some unknown reason. The vast majority of the machines are running XP
with SP2. 

Can anyone please advise us in this respect. For example what might be
wrong in the pool, or what analysis might we consider doing?

Thank you -- David Baker.

Condor job

Universe = vanilla
TRANSFER_FILES = ALWAYS
Requirements = (OpSys == "WINNT51") && (Arch == "INTEL")
Executable = test_condor.bat
Output = hello.out
Error = hello.err
Log = hello.log
queue 100

Output of condor_q
4018.097:  Run analysis summary.  Of 1366 machines,
     42 are rejected by your job's requirements
      0 reject your job because of their own requirements
     46 match, but are serving users with a better priority in the pool
   1216 match, match, but reject the job for unknown reasons
     62 match, but will not currently preempt their existing job
      0 are available to run your job

Output of condor_status
                     Machines Owner Claimed Unclaimed Matched Preempting

       INTEL/WINNT40        2     0       0         2       0          0
       INTEL/WINNT51     1365     7     108      1250       0          0

               Total     1367     7     108      1252       0          0