[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Why could this match not be made?



> Despite the fact that there are several hundred jobs queued 
> that could be using these machines. Users seem to be getting 
> no more than 9 or 10 machines simulatenously even though 
> there are more machines available than that. The schedds all 
> have MAX_JOBS_RUNNING set to 70, but I'm seeing more than 10 
> or so running at a time. What would be preventing machines 
> from claiming all the unclaimed machines that are in the 
> system? I have NEGOTIATOR_IGNORE_USER_PRIORITIES = True set 
> on the negotiator so it's not like any user can be using too 
> many resources. Is it because I have 
> NEGOTIATE_ALL_JOBS_IN_CLUSTER = False on the schedd?

I tried changing NEGOTIATE_ALL_JOBS_IN_CLUSTER = True on one of the
schedds that holds jobs that can run on these machines and it hasn't
made any difference. The negotiator is just saying there are no matches
found for the jobs:

	
3/2 17:23:31     Request 00191.00092:
3/2 17:23:31       Preempting bchan@xxxxxxxxxx (prio=26.07) on
vm2@xxxxxxxxxxxxxxxxxxxxxxxxx for kbrunham@xxxxxxxxxx (prio=43.73)
3/2 17:23:31       Connecting to startd vm2@xxxxxxxxxxxxxxxxxxxxxxxxx at
<137.57.176.79:4452>
3/2 17:23:31 NEGOTIATOR_TIMEOUT_MULTIPLIER is undefined, using default
value of 0
3/2 17:23:31 SEC_DEBUG_PRINT_KEYS is undefined, using default value of
False
3/2 17:23:31       Sending MATCH_INFO/capability
3/2 17:23:31       (Capability is "<137.57.176.79:4452>#1107310199#1428"
)
3/2 17:23:31       Sending PERMISSION, capability, startdAd to schedd
3/2 17:23:31       Notifying the accountant
3/2 17:23:31       Successfully matched with
vm2@xxxxxxxxxxxxxxxxxxxxxxxxx
3/2 17:23:31     Sending SEND_JOB_INFO/eom
3/2 17:23:31     Getting reply from schedd ...
3/2 17:23:31     Got JOB_INFO command; getting classad/eom
3/2 17:23:32     Request 00191.00093:
3/2 17:23:32       Rejected 191.93 kbrunham@xxxxxxxxxx
<137.57.142.7:3194>: no match found
3/2 17:23:32     Sending SEND_JOB_INFO/eom
3/2 17:23:32     Getting reply from schedd ...
3/2 17:23:32     Got JOB_INFO command; getting classad/eom
3/2 17:23:32     Request 00177.00000:
3/2 17:23:32       Rejected 177.0 kbrunham@xxxxxxxxxx
<137.57.142.7:3194>: no match found
3/2 17:23:32     Sending SEND_JOB_INFO/eom
3/2 17:23:32     Getting reply from schedd ...
3/2 17:23:32     Got JOB_INFO command; getting classad/eom
3/2 17:23:32     Request 00178.00000:
3/2 17:23:32       Rejected 178.0 kbrunham@xxxxxxxxxx
<137.57.142.7:3194>: no match found

And so on. But there are machines that meet the requirements! I'm
looking at them, sitting Unclaimed+Idle. Very frustrating.

- Ian