[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Why could this match not be made?



> Very strange


Indeed. I'm seeing too many idle machines in my system. I currently have
the following machines idle:

ichesal@TTC-ICHESAL-LNX /ttcbatch/experiments/ichesal/condor/sleeper [0]
> condor_status -avail -const 'alteramachineclass==3066'
 
Name          OpSys       Arch   State      Activity   LoadAv Mem
ActvtyTime
 
vm2@TTC-BS306 WINNT50     INTEL  Unclaimed  Idle       0.020
2015[?????]
vm1@TTC-BS306 WINNT50     INTEL  Unclaimed  Idle       0.660  2015
0+00:17:31
vm1@TTC-BS306 WINNT50     INTEL  Unclaimed  Idle       0.140
2015[?????]
vm2@TTC-BS306 WINNT50     INTEL  Unclaimed  Idle       0.010  2015
0+00:00:41
vm2@TTC-BS306 WINNT50     INTEL  Unclaimed  Idle       0.110
2015[?????]
vm1@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.000
2015[?????]
vm1@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.010  2015
0+00:10:11
vm2@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.010  2015
0+00:02:31
vm1@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.090
2015[?????]
vm1@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.070
2015[?????]
vm1@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.010  2015
0+00:05:12
vm2@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.000
2015[?????]
vm2@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.130  2015
0+00:01:52
vm2@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.000
2015[?????]
vm1@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.000
2015[?????]
vm2@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.010  2015
0+00:02:26
vm1@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.000
2015[?????]
vm2@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.200
2015[?????]
vm2@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.010  2015
0+00:04:35
vm2@TTC-BS306 WINNT51     INTEL  Unclaimed  Idle       0.010  2015
0+00:00:37
vm1@TTC-EAHME WINNT51     INTEL  Unclaimed  Idle       0.180
1023[?????]
 
                     Machines Owner Claimed Unclaimed Matched Preempting
 
       INTEL/WINNT50        5     0       0         5       0          0
       INTEL/WINNT51       16     0       0        16       0          0
 
               Total       21     0       0        21       0          0

Despite the fact that there are several hundred jobs queued that could
be using these machines. Users seem to be getting no more than 9 or 10
machines simulatenously even though there are more machines available
than that. The schedds all have MAX_JOBS_RUNNING set to 70, but I'm
seeing more than 10 or so running at a time. What would be preventing
machines from claiming all the unclaimed machines that are in the
system? I have NEGOTIATOR_IGNORE_USER_PRIORITIES = True set on the
negotiator so it's not like any user can be using too many resources. Is
it because I have NEGOTIATE_ALL_JOBS_IN_CLUSTER = False on the schedd?

- Ian