[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs with multiple CPUs and Group Quotas



Hi all,

I'm seeing strange behavior when I created a new group for 8-core jobs.
 I have a new group like:

group_atlas.multicore

with a static quota of 400 and accept_surplus off.  There are 50 slots
with a special flag set and cpus=8.  When I first submit these jobs I
get the following in the logs:

04/25/12 16:38:04 group quotas: group group_atlas.multicore assigned
quota= 400
04/25/12 16:38:04 group quotas: group= group_atlas.multicore  cquota=
400  static= 1  accept= 0  quota= 400  req= 1  usage= 0

and the very first group being negotiated for I see:

04/25/12 16:38:04 Group group_atlas.multicore - sortkey= 0
04/25/12 16:38:04 Group group_atlas.multicore - BEGIN NEGOTIATION
04/25/12 16:38:04 Phase 3:  Sorting submitter ads by priority ...
04/25/12 16:38:04    maxAllowed  = 1.000000 groupQuota  = 1.000000
groupusage  = 0.000000
04/25/12 16:38:04 Phase 4.1:  Negotiating with schedds ...
04/25/12 16:38:04     numSlots = 12051
04/25/12 16:38:04     slotWeightTotal = 1.000000
04/25/12 16:38:04     pieLeft = 1.000
04/25/12 16:38:04     NormalFactor = 1.000000
04/25/12 16:38:04     MaxPrioValue = 1.250000
04/25/12 16:38:04     NumSubmitterAds = 1
04/25/12 16:38:04   Negotiating with
group_atlas.multicore.usatlas1@xxxxxxx at <130.199.185.164:47859>
04/25/12 16:38:04 0 seconds so far
04/25/12 16:38:04    maxAllowed  = 1.000000 groupQuota  = 1.000000
groupusage  = 0.000000
04/25/12 16:38:04   Calculating submitter limit with the following
parameters
04/25/12 16:38:04     SubmitterPrio       = 1.250000
04/25/12 16:38:04     SubmitterPrioFactor = 2.500000
04/25/12 16:38:04     submitterShare      = 1.000000
04/25/12 16:38:04     submitterAbsShare   = 1.000000
04/25/12 16:38:04     submitterLimit    = 1.000000
04/25/12 16:38:04     submitterUsage    = 0.000000
04/25/12 16:38:04 Socket to group_atlas.multicore.usatlas1@xxxxxxx
(<130.199.185.164:47859>) already in cache, reusing
04/25/12 16:38:04     Sending SEND_JOB_INFO/eom
04/25/12 16:38:04     Getting reply from schedd ...
04/25/12 16:38:04     Got JOB_INFO command; getting classad/eom
04/25/12 16:38:04     Request 160101.00000:
04/25/12 16:38:04 matchmakingAlgorithm: limit 1.000000 used 0.000000
pieLeft 1.000000
04/25/12 16:38:05       Rejected 160101.0
group_atlas.multicore.usatlas1@xxxxxxx <130.199.185.164:47859>: group
quota exceeded
04/25/12 16:38:05     Sending SEND_JOB_INFO/eom
04/25/12 16:38:05     Getting reply from schedd ...
04/25/12 16:38:05     Got NO_MORE_JOBS;  done negotiating

I do not understand this behavior, why is the group quota exceeded?
There are sufficient free slots on our farm for these jobs, so why is it
not matching?

Thanks for any help,
-Will