[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs not finding matches.



Pretty new to condor administration here, and I have a sort of high
level troubleshooting question.

The last few weeks we've had a few deadlines approaching, so a lot of
our users are sending a lot of jobs. Some folks are making good
progress, but some people have a LOT of idle jobs, and I'm trying to
sort out exactly why they're idle. Obviously you can't tell me why, but
maybe you can help me figure out what steps to take to figure out why.

Okay, so here's poor todd, who has 25 running jobs and 747 idle jobs.

whateverasaurus 10:40:57$ condor_q -g | grep " I " | grep todd | wc -l
747
whateverasaurus 10:41:01$ condor_q -g | grep " R " | grep todd | wc -l
25

Yes, he has horrible userprio right now:

whateverasaurus 10:41:37$ condor_userprio
Last Priority Update:  7/3  10:40
                             Effective
User Name                    Priority 
-----------------------      ---------
smirarab@cs                   0.50
amy@cs                        0.50
laustin@cs                    0.64
mgebhart@cs                   0.90
kscherer@cs                   0.91
ckcuong@cs                    1.02
bayzid@cs                     1.73
joeraii@cs                    1.99
akanksha@cs                   2.23
elie@cs                       3.00
naga86@cs                     3.01
schrum2@cs                    3.98
dongli@cs                     4.37
julian@cs                    84.29
todd@cs                     395.11
namphuon@cs                 400.82
bsunil@cs                   437.25
<none>                      1404.07
-----------------------      ---------
Number of users shown: 17                          

And so some of his jobs are getting repeatedly preempted and not making
any progress for that reason. But why are they getting preempted?

Picking an idle todd-job at random (ha, the one I had been looking at
earlier is now running, typical; picking ANOTHER one at random):

whateverasaurus 10:43:29$ condor_q -g -better-analyze 572871.4  
<snip>

572871.004:  Run analysis summary.  Of 3179 machines,
   1395 are rejected by your job's requirements 
    997 reject your job because of their own requirements 
     31 match but are serving users with a better priority in the pool 
      0 match but reject the job for unknown reasons 
     17 match but will not currently preempt their existing job 
      0 match but are currently offline 
    739 are available to run your job
        Last successful match: Mon Jul  2 13:16:27 2012

The Requirements expression for your job is:

( target.Memory >= 4000 && target.Lucid ) && ( TARGET.Arch == "X86_64" )
&&
( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= DiskUsage ) &&
( ( RequestMemory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   target.Memory >= 4000             2096                 
2   ( TARGET.Arch == "X86_64" )       2714                 
3   target.Lucid                      3131                 
4   ( TARGET.OpSys == "LINUX" )       3179                 
5   ( TARGET.Disk >= 25000 )          3179                 
6   ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt
undefined,JobVMMemory,2.197265625000000E+03)) ) >= 2250000 )
                                      3179                 
7   ( TARGET.FileSystemDomain == "cs" )
                                      3179                 

So there would appear to be a lot of machines that might run his job.
There's maybe 300 of these that he won't be able to use because they're
restricted to a group he's not in, but that still leaves ~400.

And actually,

whateverasaurus 10:46:34$ condor_status -const 'Memory > 4000' | grep
X86_64 | grep Unclaimed | wc -l
658

He should be able to use any of those machines, and they're Unclaimed.

Any suggestions on how to start troubleshooting this? That's a ton of
unclaimed machines, and currently I have 1424 jobs sitting idle that
would love to have a machine.

Thanks, guys!

--
amy