[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Groups, priorities, system accounts?



Hi,

I was trying to reproduce a simple setup in a VM cluster, to test some
settings. All went fine until I got to configure users, groups and
respective priorities. My reference is :

https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigPrioritiesForUsers

The release is :

# rpm -qa | grep condor
condor-8.2.3-1.1.osgup.el6.x86_64

The environment is pretty disposable so I'm trying to do something
minimalistic. I didn't set any NIS or "shared user domain". I added
users to the system in the master/submitter node and submit jobs from
there. It looks like jobs run as nobody which is fine for me. Sleep
jobs. I'm just trying to test some queue priority settings.

My users for now are :

[root@wn-2 condor]# condor_userprio
Last Priority Update: 10/3  19:38
                               Effective   Priority   Res   Total
Usage  Time Since
User Name                       Priority    Factor   In Use
(wghted-hrs) Last Usage
----------------------------- ------------ --------- ------
------------ ----------
gandalf@xxxxxxxxxxxxxxxxxxxxx       500.00   1000.00      0
0.44    0+01:01
dumbo@xxxxxxxxxxxxxxxxxxxxx        1210.35   1000.00      0
26.35    0+00:21

And group settings are :

GROUP_NAMES = tolkien, dysnep
GROUP_PRIO_FACTOR_tolkien      = 90
GROUP_PRIO_FACTOR_dysnep  = 9000
GROUP_AUTOREGROUP = True

GROUP_QUOTA_DYNAMIC_tolkien = .5
GROUP_QUOTA_DYNAMIC_dysnep = .5

DEFAULT_PRIO_FACTOR = 10000

Nothing complex, it seems. When submitting as dumbo I include in the JDL :

+AccountingGroup = "dysnep"

Which propagates fine to the job :

$ condor_q -l 16.1990 | grep -i accounting
AccountingGroup = "dysnep"

But what I get in the Negotiator Log is :

10/03/14 19:42:10 Group dysnep - skipping, zero slots allocated
10/03/14 19:42:10 Group tolkien - skipping, zero slots allocated
10/03/14 19:42:10 Group <none> - BEGIN NEGOTIATION
10/03/14 19:42:10 subtree_usage at dysnep is 0
10/03/14 19:42:10 subtree_usage at tolkien is 0
10/03/14 19:42:10 subtree_usage at <none> is 128

So as you see the jobs have the right group ClassAd but is classified
in the Negotiator as in the <none> group. What could look even
stranger, is that the accounting groups seem to be taken as users, see
that the actual accounting groups are there but were never used :


[root@master condor]# condor_userprio -allusers -hierarchical
Last Priority Update: 10/3  20:45
                               Effective   Priority   Res   Total
Usage  Time Since
User Name                       Priority    Factor   In Use
(wghted-hrs) Last Usage
----------------------------- ------------ --------- ------
------------ ----------
tolkien                                        90.00      0
0.00 16346+20:4
gandalf@xxxxxxxxxxxxxxxxxxxxx       500.00   1000.00      0
0.44    0+02:09
dumbo@xxxxxxxxxxxxxxxxxxxxx        1171.65   1000.00      0
26.35    0+01:28
dysnep                                       9000.00      0
0.00 16346+20:4
dysnep@xxxxxxxxxxxxxxxxxxxxx       4568.58   1000.00     70
146.65      <now>
<none>                                       1000.00    128
175.74      <now>
tolkien@xxxxxxxxxxxxxxxxxxxxx      5587.88  10000.00     58
2.29      <now>
----------------------------- ------------ --------- ------
------------ ----------

As you can see, the balance between them is done according to the
"fake users" priorities. Namely dysnep will get prio 1000 and tolkien
10000.

I tried this with : TRUST_UID_DOMAIN being both false and true. Not
sure how related that is, but I have a feeling it is somewhat.

So my last thought was - maybe I need to setup properly the unix
accounts in all VMs and have Condor use that? I thought that Condor
has its own way to account for that and generally "trusts" what the
user tell it (unless strict security enforced). For example - it
trusts the +AccountingGroup that you pass to it, and recognizes
properly the Schedd users even if they only exists there?

Is it obvious to anyone what I missed? I'm used to Condor 7.8 and I
heard/saw that a lot changed since then.

Thanks,
Samir