[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fair share



Hi Collin,

Thanks for your reply!
Actually my config is based from another ARC-CE with HTCondor(dedicated to
Atlas) which works:

GROUP_ACCEPT_SURPLUS=True
GROUP_AUTOREGROUP=False
GROUP_NAMES=group_ATLAS, group_ATLAS.pilot, group_ATLAS.pilot_multicore,
group_ATLAS.prodatlas,  group_ATLAS.prodatlas_multicore,
group_ATLAS.sum_test, group_OPS, group_OPS.ops
GROUP_PRIO_FACTOR_group_ATLAS=100000
GROUP_PRIO_FACTOR_group_OPS=1
GROUP_QUOTA_DYNAMIC_group_ATLAS=0.999
GROUP_QUOTA_DYNAMIC_group_ATLAS.pilot=0.2
GROUP_QUOTA_DYNAMIC_group_ATLAS.pilot_multicore=0.001
GROUP_QUOTA_DYNAMIC_group_ATLAS.prodatlas=0.195
GROUP_QUOTA_DYNAMIC_group_ATLAS.prodatlas_multicore=0.599
GROUP_QUOTA_DYNAMIC_group_ATLAS.sum_test=0.005
GROUP_QUOTA_DYNAMIC_group_OPS=0.001
GROUP_QUOTA_DYNAMIC_group_OPS.ops=1
NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION=False
NEGOTIATOR_USE_SLOT_WEIGHTS=True
NEGOTIATOR_USE_WEIGHTED_DEMAND=True
PRIORITY_HALFLIFE=259200
#prodatlas_multicore_LIMIT=70

condor_status -submitters
Name                                  Machine RunningJobs   IdleJobs  
HeldJobs

group_ATLAS.pilot.griduser04@xxxxxxxx arc-htc.nipne.ro         160          3
group_ATLAS.prodatlas.griduser01@xxxxxxxx arc-htc.nipne.ro           2
group_ATLAS.prodatlas_multicore.griduser01@xxxxxxxx arc-htc.nipne.ro      
  10
group_ATLAS.sum_test.griduser02@xxxxxxxx arc-htc.nipne.ro           0
ops01@xxxxxxxx                        arc-htc.nipne.ro           0          0

Anyway I will increase the log verbosity as you suggested for Negociator.

Best regards,
Mihai


> Hi Mihai,
>
> Are jobs submitted to those groups able to match?
>
> The negotiator won't allocate any resources to a group if it's not
> requesting any, which results in the group getting skipped with that
> message if it doesn't have any idle jobs. You can see this happen in the
> negotiator's log if you increase the logging level by setting
> NEGOTIATOR_DEBUG. I think D_ACCOUNTANT will work, but we use D_FULLDEBUG
> so
> I'm not 100% sure which level these lines are coming from.
>
> Here's an example of this happening in our negotiator's log, trimmed to
> only the relevant lines:
>
> (From the config for reference)
> GROUP_ACCEPT_SURPLUS = True
> GROUP_AUTOREGROUP = False
> NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION = False
> NEGOTIATOR_USE_SLOT_WEIGHTS = true
> NEGOTIATOR_USE_WEIGHTED_DEMAND = true
> GROUP_QUOTA_DYNAMIC_prod.boss2.boss2_anim = 0.04
>
> 06/21/19 05:15:29 Phase 2:  Performing accounting ...
> 06/21/19 05:15:30 group quotas: assigning 14 submitters to accounting
> groups
> 06/21/19 05:15:30 group quotas: assigning group quotas from 76836
> available
> weighted slots
> 06/21/19 05:15:30 group quotas: subtree <none> receiving quota= 76836
> 06/21/19 05:15:30 group quotas: subtree prod.boss2 receiving quota=
> 560.903
> 06/21/19 05:15:30 group quotas: group prod.boss2, allocated 0 for static
> children, 560.903 for dynamic children
> 06/21/19 05:15:30 group quotas: subtree prod.boss2.boss2_anim receiving
> quota= 20.3965
> 06/21/19 05:15:30 group quotas: group prod.boss2.boss2_anim, allocated 0
> for static children, 20.3965 for dynamic children
> 06/21/19 05:15:30 group quotas: group prod.boss2.boss2_anim assigned
> quota=
> 20.3965
> 06/21/19 05:15:30 group quotas: group= prod.boss2.boss2_anim  cquota= 0.04
>  static= 0  accept= 1  quota= 20.3965  req= 0  usage= 0
> 06/21/19 05:15:30 group quotas: fairshare (1): group=
> prod.boss2.boss2_anim
>  quota= 20.3965  requested= 0
> 06/21/19 05:15:30 group quotas: fairshare (2): group=
> prod.boss2.boss2_anim
>  quota= 20.3965  allocated= 0  requested= 0
> 06/21/19 05:15:30 group quotas: group= prod.boss2.boss2_anim  quota=
> 20.3965  requested= 0  allocated= 0  unallocated= 0
> 06/21/19 05:15:30 Group prod.boss2.boss2_anim - sortkey= 0
> 06/21/19 05:15:30 Group prod.boss2.boss2_anim - skipping, zero slots
> allocated
>
> Note that the group still received its quota even though it ended up with
> no allocation (because there were no idle jobs in the group at that time).
>
> Best,
> Collin
>
> On Fri, Jun 21, 2019 at 6:21 AM Mihai Ciubancan <ciubancan@xxxxxxxx>
> wrote:
>
>> Hi everyone,
>>
>> I'm trying to implement a fair share policy but is not working(for an
>> ARC-CE dedicated to some of the LHC experiments). The config file looks
>> like this:
>>
>> GROUP_ACCEPT_SURPLUS=True
>> GROUP_AUTOREGROUP=False
>> GROUP_NAMES=group_ATLAS, group_ALICE, group group_OPS,
>> group_ATLAS.atlas,
>> group_ATLAS.prodatlas_multicore, group_ATLAS.prodatlas,
>> group_ATLAS.pilotatlas_multicore, group_ATLAS.pilotatlas,
>> group_ATLAS.sum_test, group_ALICE.alice, group_ALICE.pilotalice,
>> group_OPS.ops
>> GROUP_PRIO_FACTOR_group_ATLAS=100000
>> GROUP_PRIO_FACTOR_group_ALICE=100000
>> GROUP_PRIO_FACTOR_group_OPS=1
>> GROUP_QUOTA_DYNAMIC_group_ATLAS=0.497
>> GROUP_QUOTA_DYNAMIC_group_ATLAS.atlas=0.001
>> GROUP_QUOTA_DYNAMIC_group_ATLAS.prodatlas_multicore=0.599
>> GROUP_QUOTA_DYNAMIC_group_ATLAS.prodatlas=0.194
>> GROUP_QUOTA_DYNAMIC_group_ATLAS.pilotatlas_multicore=0.001
>> GROUP_QUOTA_DYNAMIC_group_ATLAS.pilotatlas=0.2
>> GROUP_QUOTA_DYNAMIC_group_ATLAS.sum_test=0.005
>> GROUP_QUOTA_DYNAMIC_group_ALICE=0.502
>> GROUP_QUOTA_DYNAMIC_group_ALICE.alice=0.999
>> GROUP_QUOTA_DYNAMIC_group_ALICE.pilotalice=0.001
>> GROUP_QUOTA_DYNAMIC_group_OPS=0.001
>> GROUP_QUOTA_DYNAMIC_group_OPS.ops=1
>> NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION=False
>> NEGOTIATOR_USE_SLOT_WEIGHTS=True
>> NEGOTIATOR_USE_WEIGHTED_DEMAND=True
>> PRIORITY_HALFLIFE=259200
>>
>> And in the log of Negociator I have:
>>
>> 06/21/19 16:16:51 Phase 2:  Performing accounting ...
>> 06/21/19 16:16:51 group quotas: assigning 3 submitters to accounting
>> groups
>> 06/21/19 16:16:51 group quotas: assigning group quotas from 1668
>> available
>> weighted slots
>> 06/21/19 16:16:51 group quotas: allocation round 1
>> 06/21/19 16:16:51 group quotas: groups= 14  requesting= 1  served= 1
>> unserved= 0  slots= 541  requested= 1425  allocated= 1425  surplus= 243
>> maxdelta= 0
>> 06/21/19 16:16:51 group quotas: entering RR iteration n= 0
>> 06/21/19 16:16:51 Group group_ALICE.alice - skipping, zero slots
>> allocated
>> 06/21/19 16:16:51 Group group_ALICE.pilotalice - skipping, zero slots
>> allocated
>> 06/21/19 16:16:51 Group group_ATLAS.atlas - skipping, zero slots
>> allocated
>> 06/21/19 16:16:51 Group group_ATLAS.pilotatlas - skipping, zero slots
>> allocated
>> 06/21/19 16:16:51 Group group_ATLAS.pilotatlas_multicore - skipping,
>> zero
>> slots allocated
>> 06/21/19 16:16:51 Group group_ATLAS.prodatlas - skipping, zero slots
>> allocated
>> 06/21/19 16:16:51 Group group_ATLAS.prodatlas_multicore - skipping, zero
>> slots allocated
>> 06/21/19 16:16:51 Group group_ATLAS.sum_test - skipping, zero slots
>> allocated
>> 06/21/19 16:16:51 Group group_OPS.ops - skipping, zero slots allocated
>> 06/21/19 16:16:51 Group group - skipping, zero slots allocated
>> 06/21/19 16:16:51 Group group_ALICE - skipping, zero slots allocated
>> 06/21/19 16:16:51 Group group_ATLAS - skipping, zero slots allocated
>> 06/21/19 16:16:51 Group group_OPS - skipping, zero slots allocated
>> 06/21/19 16:16:51 Group <none> - skipping, at or over quota
>> (quota=2.27374e-13) (usage=1446) (allocation=1425)
>> 06/21/19 16:16:51 Round 1 totals: allocated= 1425  usage= 1425
>> 06/21/19 16:16:51 ---------- Finished Negotiation Cycle ----------
>>
>> Can you help me to fix this?
>>
>> Thank you,
>> Mihai
>>
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> with
>> a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>
>
> --
> *Collin Mehring *| PE-JoSE - Software Engineer
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
> a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/


Dr. Mihai Ciubancan
IT Department
National Institute of Physics and Nuclear Engineering "Horia Hulubei"
Str. Reactorului no. 30, P.O. BOX MG-6
077125, Magurele - Bucharest, Romania
http://www.ifin.ro
Work:   +40214042360
Mobile: +40761345687
Fax:    +40214042395