[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Is this a negotiator bug or what???



We use group accounting and you can see in the negotiator D_FULLDEBUG output below there are two lines I've inserted the word "HERE" in. Where the first HERE is, I'm expecting it to be saying that group_MCprod is over quota so it's skipping it but instead it is saying that the usage is 0. It goes ahead and negotiates with group_MCprod then even though at the second HERE you can see it knows that it's using 3591 slots and the quota is 520. The condor_user_prio command at the bottom also shows the slots being used. Near the bottom of the debug output there is also a line with matchmakingAlgorithm: in it again saying the usage is 0.

I've been fighting with this for a long time. Occasionally one of our groups will manage to suck up all our slots even though they're over quota. Most of the time they appear to work.

Any seen this before?

Thanks,

joe


03/22 11:56:06 group group_italy dynamic quota for 11106 slots = 188.000
03/22 11:56:06 Group Table : group group_italy quota 188.000 usage 115.000 prio 61.17
03/22 11:56:06 group group_japan dynamic quota for 11106 slots = 233.000
03/22 11:56:06 Group Table : group group_japan quota 233.000 usage 0.000 prio 0.00
03/22 11:56:06 group group_karlsruhe dynamic quota for 11106 slots = 55.000
03/22 11:56:06 Group Table : group group_karlsruhe quota 55.000 usage 0.000 prio 0.00
03/22 11:56:06 group group_mit dynamic quota for 11106 slots = 33.000
03/22 11:56:06 Group Table : group group_mit quota 33.000 usage 0.000 prio 0.00
03/22 11:56:06 group group_physmon dynamic quota for 11106 slots = 11.000
03/22 11:56:06 Group Table : group group_physmon quota 11.000 usage 0.000 prio 0.00
03/22 11:56:06 group group_prd dynamic quota for 11106 slots = 815.000
03/22 11:56:06 Group Table : group group_prd quota 815.000 usage 299.000 prio 36.69
03/22 11:56:06 group group_sam dynamic quota for 11106 slots = 277.000
03/22 11:56:06 Group Table : group group_sam quota 277.000 usage 0.000 prio 0.00 03/22 11:56:06 group group_fixedwntest dynamic quota for 11106 slots = 55.000 03/22 11:56:06 Group Table : group group_fixedwntest quota 55.000 usage 0.000 prio 0.00
03/22 11:56:06 group group_fnal dynamic quota for 11106 slots = 233.000
03/22 11:56:06 Group Table : group group_fnal quota 233.000 usage 173.000 prio 74.25
03/22 11:56:06 group group_highprio dynamic quota for 11106 slots = 888.000
03/22 11:56:06 Group Table : group group_highprio quota 888.000 usage 147.000 prio 16.55
03/22 11:56:06 group group_ntp dynamic quota for 11106 slots = 916.000
03/22 11:56:06 Group Table : group group_ntp quota 916.000 usage 567.000 prio 61.90
03/22 11:56:06 group group_mcprod dynamic quota for 11106 slots = 520.000
HERE --------> 03/22 11:56:06 Group Table : group group_mcprod quota 520.000 usage 0.000 prio 0.00
03/22 11:56:06 group group_btagging dynamic quota for 11106 slots = 222.000
03/22 11:56:06 Group Table : group group_btagging quota 222.000 usage 0.000 prio 0.00
03/22 11:56:06 group group_dbg dynamic quota for 11106 slots = 55.000
03/22 11:56:06 Group Table : group group_dbg quota 55.000 usage 0.000 prio 0.00
03/22 11:56:06 Group group_alignment - skipping, no submitters
03/22 11:56:06 Group group_calib - skipping, no submitters
03/22 11:56:06 Group group_dqm - skipping, no submitters
03/22 11:56:06 Group group_florida - skipping, no submitters
03/22 11:56:06 Group group_japan - skipping, no submitters
03/22 11:56:06 Group group_karlsruhe - skipping, no submitters
03/22 11:56:06 Group group_mit - skipping, no submitters
03/22 11:56:06 Group group_physmon - skipping, no submitters
03/22 11:56:06 Group group_sam - skipping, no submitters
03/22 11:56:06 Group group_fixedwntest - skipping, no submitters
03/22 11:56:06 Group group_mcprod - negotiating
03/22 11:56:06 Phase 3:  Sorting submitter ads by priority ...
03/22 11:56:06 Phase 4.1:  Negotiating with schedds ...
03/22 11:56:06     numSlots = 520
03/22 11:56:06     slotWeightTotal = 520.000000
03/22 11:56:06     pieLeft = 520.000
03/22 11:56:06     NormalFactor = 1.000000
03/22 11:56:06     MaxPrioValue = 25528.660156
03/22 11:56:06     NumSubmitterAds = 1
03/22 11:56:06 Negotiating with group_MCprod.vellidis@xxxxxxxx at <131.225.240.215:38554>
03/22 11:56:06 0 seconds so far
03/22 11:56:06   Calculating submitter limit with the following parameters
03/22 11:56:06     SubmitterPrio       = 25528.660156
03/22 11:56:06     SubmitterPrioFactor = 20.000000
03/22 11:56:06     submitterShare      = 1.000000
03/22 11:56:06     submitterAbsShare   = 1.000000
03/22 11:56:06     submitterLimit    = 520.000000
HERE ---------> 03/22 11:56:06     submitterUsage    = 3591.000000
03/22 11:56:06 Socket to group_MCprod.vellidis@xxxxxxxx (<131.225.240.215:38554>) already in cache, reusing
03/22 11:56:06     Sending SEND_JOB_INFO/eom
03/22 11:56:06     Getting reply from schedd ...
03/22 11:56:06     Got JOB_INFO command; getting classad/eom
03/22 11:56:06     Request 17947890.00000:
03/22 11:56:06 matchmakingAlgorithm: limit 520.000000 used 0.000000 pieLeft 520.000000
03/22 11:56:06 Start of sorting MatchList (len=44)
03/22 11:56:06 Finished sorting MatchList
03/22 11:56:06 Connecting to startd glidein_5068@xxxxxxxxxxxxxxxxxxxx at <131.225.238.42:43337>
03/22 11:56:06       Sending PERMISSION, claim id, startdAd to schedd
03/22 11:56:06 Matched 17947890.0 group_MCprod.vellidis@xxxxxxxx <131.225.240.215:38554> preempting none <131.225.238.42:43337> glidein_5068@xxxxxxxxxxxxxxxxxxxx


[cdfcaf@fcdfhead10 /export/condor_local/spool] condor_userprio -getreslist group_MCprod.vellidis@xxxxxxxx | tail -1
Number of Resources Used: 3579
[cdfcaf@fcdfhead10 /export/condor_local/spool] condor_userprio -getreslist group_mcprod.vellidis@xxxxxxxx | tail -1
Number of Resources Used: 0