[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Stumped by group_quota_dynamic



Greetings all,
We have a QA condor (8.4.6) system here with about 2 slots with more coming soon (yay!). They are both running CentOS 6.7. All users are on the same uid_domain and all servers are on the same filesystem_domain.Â

We would like to set up groups where group a gets a more resources than group b, which is more than group c, etc. I would also like to set up a set of group_accept_surplus rules to allow the unused slots to be reallocated, but would like to get this sorted out first.

Here is how I have defined the groups.

GROUP_NAMES= group_a, group_b, group_c, group_d, group_e, group_f, group_g

# The following is based on 7 groups, must add up to 1
# group_a gets 7x, group_g = 1x, totals
# 1/28x where x comes out to 0.0357142857142857

GROUP_QUOTA_DYNAMIC_group_a = 0.250
GROUP_QUOTA_DYNAMIC_group_b = 0.214
GROUP_QUOTA_DYNAMIC_group_c = 0.179
GROUP_QUOTA_DYNAMIC_group_d = 0.143
GROUP_QUOTA_DYNAMIC_group_e = 0.107
GROUP_QUOTA_DYNAMIC_group_f = 0.071
GROUP_QUOTA_DYNAMIC_group_g = 0.036

When I add the accounting_groupÂto the submit file, the job just hangs out in the Idle state. Here is the submit file:

# Unix submit description file
# sleep.sub -- simple sleep job

executable       Â= sleep.sh
log           = sleep.log
output         Â= outfile.txt
error          = errors.txt
accounting_group    Â= group_a
should_transfer_files  = Yes
when_to_transfer_output = ON_EXIT
queue

condor_q -better-analyze shows this:

[cyang@centos sleep]$ condor_q -better-analyze Â148.0


-- Schedd: centos.example.com : <10.2.7.151:9618?...
User priority for cyang@xxxxxxxxxxx is not available, attempting to analyze without it.
---
148.000: ÂRun analysis summary. Of 2 machines,
   0 are rejected by your job's requirements
   0 reject your job because of their own requirements
   0 match and are already running your jobs
   0 match but are serving other users
   0 are available to run your job
    No successful match recorded.
    Last failed match: Wed May Â4 09:48:46 2016

    Reason for last match failure: no match found

The Requirements _expression_ for your job is:

  ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) &&
  ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
  ( TARGET.HasFileTransfer )

Your job defines the following attributes:

  DiskUsage = 1
  ImageSize = 1
  RequestDisk = 1
  RequestMemory = 1

The Requirements _expression_ for your job reduces to these conditions:

    ÂSlots
Step  ÂMatched ÂCondition
----- Â-------- Â---------
[0] Â Â Â Â Â 2 ÂTARGET.Arch == "X86_64"
[1] Â Â Â Â Â 2 ÂTARGET.OpSys == "LINUX"
[3] Â Â Â Â Â 2 ÂTARGET.Disk >= RequestDisk
[5] Â Â Â Â Â 2 ÂTARGET.Memory >= RequestMemory
[7] Â Â Â Â Â 2 ÂTARGET.HasFileTransfer

Suggestions:

  Condition             Machines Matched  ÂSuggestion
  ---------             ----------------  Â----------
1 Â ( TARGET.Arch == "X86_64" ) Â Â Â 2
2 Â ( TARGET.OpSys == "LINUX" ) Â Â Â 2
3 Â ( TARGET.Disk >= 1 ) Â Â Â Â Â Â Â2
4 Â ( TARGET.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) )
                   2
5 Â ( TARGET.HasFileTransfer ) Â Â Â Â2

So, it looks like the machines match, but yet it won't run. When I use the +AccountingGroup = group_a directive, it runs without any problem.Â

Additionally, condor_userprio shows just cyang@xxxxxxxxxxx with no associated groups.

[cyang@rhw1160 sleepjob]$ condor_userprio
Last Priority Update: Â5/4 Â09:51
              ÂEffective  Priority  Res  Total Usage ÂTime Since
User Name           Priority  ÂFactor  In Use (wghted-hrs) Last Usage
--------------------------- ------------ --------- ------ ------------ ----------
cyang@xxxxxxxxxxx         507.02  1000.00   Â0     0.30  Â0+00:05
--------------------------- ------------ --------- ------ ------------ ----------
Number of users: 1 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â0 Â Â Â Â 0.30 Â Â0+23:59


Any thoughts as to why the jobs are held? Or am I doing something obviously wrong?

Thanks.


Charles Yang
Senior Research Engineer
NOAA NESDIS/STAR