[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Groups and SubGroups issue



Hi Todd,

Thanks for the answer!

You are right, the OPS jobs are submitted(scrip) from a different server
then our schedd server. I will look into the documentation that you sent
it to me.

Regards,
Mihai


> On 10/12/2020 9:52 AM, Mihai Ciubancan wrote:
>> ...also, do you have any suggestion for OPS ?
>>
>> Mihai
>
> Hi Mihai,
>
> Just a random guess re your troubles with OPS :
>
>  From the below it looks like you are injecting the group attributes into
> the job ad using SUBMIT_ATTRS; this config
> knob is used by clients like condor_submit and the Python job submission
> APIs. SUBMIT_ATTRS is not used by the
> condor_schedd.
>
> Perhaps whatever is submitting your OPS jobs (script? human?) is doing so
> from a server that is different from the
> server where your schedd is running, and thus may have a condor_config
> that does not include your SUBMIT_ATTRS
> customization?
>
> If you want attribute to appear in every job entering into a
> condor_schedd, regardless of the client's configuration, I
> suggest you use a schedd job transform to instruct the schedd itself to
> insert the attributes - see:
> https://htcondor.readthedocs.io/en/latest/admin-manual/policy-configuration.html#job-transforms
>
> regards
> Todd
>
>
>>> The problem is that
>>>
>>>                     ifThenElse(regexp("lhcb01",Owner), "lhcb",
>>>
>>> Matches the last 6 characters of pillhcb01.
>>>
>>> You need to put the longer test first, or change your regex to use an
>>> match that is anchored at the start of the string like "^lhcb01"
>>>
>>> alice01 and pilalice01 have the same problem.
>>>
>>> -tj
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of
>>> Mihai Ciubancan
>>> Sent: Monday, October 12, 2020 6:11 AM
>>> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
>>> Subject: [HTCondor-users] Groups and SubGroups issue
>>>
>>> Hello,
>>>
>>> I have configured a bunch of groups for my cluster, depending on users
>>> and
>>> the number of cores used by their jobs:
>>>
>>> RO07AcctGroup = ifThenElse(NordugridQueue =?= "atlas", "group_ATLAS", \
>>>                  ifThenElse(NordugridQueue =?= "lhcb",  "group_LHCB", \
>>>                  ifThenElse(NordugridQueue =?= "alice", "group_ALICE",
>>> \
>>>                  ifThenElse(NordugridQueue =?= "ops", "group_OPS" ))))
>>>
>>> RO07AcctSubGroup = ifThenElse(regexp("atlas01",Owner) && RequestCpus
>>> >1,
>>> "atlas_multicore", \
>>>                     ifThenElse(regexp("atlas01", Owner), "atlas", \
>>>                     ifThenElse(regexp("lhcb01",Owner), "lhcb", \
>>>                     ifThenElse(regexp("pillhcb01",Owner), "pilotlhcb",
>>> \
>>>                     ifThenElse(regexp("prdlhcb01",Owner), "prodlhcb", \
>>>                     ifThenElse(regexp("alice01",Owner), "alice", \
>>>                     ifThenElse(regexp("pilalice01",Owner),
>>> "pilotalice", \
>>>                     ifThenElse(regexp("ops01",Owner), "ops" ))))))))
>>>
>>> AccountingGroup = strcat(RO07AcctGroup, ".", RO07AcctSubGroup, ".",
>>> Owner)
>>> ConcurrencyLimits = strcat(RO07AcctGroup, ",", RO07AcctSubGroup, ",",
>>> Owner)
>>> SUBMIT_ATTRS = $(SUBMIT_ATTRS), RO07AcctGroup, RO07AcctSubGroup,
>>> AccountingGroup, ConcurrencyLimits
>>>
>>> If for Atlas and Alice(as far as I can see) it's working properly, for
>>> LHCb and OPS the mapping is wrong, as you can see from the output of
>>> "condor_status -submitter" command:
>>>
>>> Name                         Machine            RunningJobs IdleJobs
>>> HeldJobs
>>>
>>> alice01@xxxxxxxx             arc6atlas1.nipne.r          36        0
>>>   0
>>> atlas01@xxxxxxxx             arc6atlas1.nipne.r           0        0
>>>   0
>>> group_ALICE.alice.alice01@ni arc6atlas1.nipne.r         306        1
>>>   0
>>> group_ATLAS.atlas.atlas01@ni arc6atlas1.nipne.r          32        7
>>>   0
>>> group_ATLAS.atlas_multicore. arc6atlas1.nipne.r         305       79
>>>   1
>>> group_LHCB.lhcb.lhcb01@nipne arc6atlas1.nipne.r           0        0
>>>   0
>>> group_LHCB.lhcb.pillhcb01@ni arc6atlas1.nipne.r         712      129
>>>   0
>>> lhcb01@xxxxxxxx              arc6atlas1.nipne.r           0        0
>>>   0
>>> ops01@xxxxxxxx               arc6atlas1.nipne.r           1        0
>>>   0
>>> pillhcb01@xxxxxxxx           arc6atlas1.nipne.r           0        0
>>>   0
>>>                             RunningJobs           IdleJobs
>>> HeldJobs
>>>
>>>      alice01@xxxxxxxx                36                  0
>>> 0
>>>      atlas01@xxxxxxxx                 0                  0
>>> 0
>>> group_ALICE.alice.al               306                  1
>>> 0
>>> group_ATLAS.atlas.at                32                  7
>>> 0
>>> group_ATLAS.atlas_mu               305                 79
>>> 1
>>> group_LHCB.lhcb.lhcb                 0                  0
>>> 0
>>> group_LHCB.lhcb.pill               712                129
>>> 0
>>>       lhcb01@xxxxxxxx                 0                  0
>>> 0
>>>        ops01@xxxxxxxx                 1                  0
>>> 0
>>>    pillhcb01@xxxxxxxx                 0                  0
>>> 0
>>>
>>>                 Total              1392                216
>>> 1
>>>
>>> So the jobs run by user pillhcb01 should be mapped under
>>> group_LHCB_pilotlhcb and not group_LHCB_lhcb, while ops jobs are
>>> running
>>> under <none> group instead of group_OPS.
>>>
>>> Do you know what I'm doing wrong?
>>>
>>> Regards,
>>> Mihai
>>>
>>>
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with
>>> a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with
>>> a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>>
>> Dr. Mihai Ciubancan
>> IT Department
>> National Institute of Physics and Nuclear Engineering "Horia Hulubei"
>> Str. Reactorului no. 30, P.O. BOX MG-6
>> 077125, Magurele - Bucharest, Romania
>> http://www.ifin.ro
>> Work:   +40214042360
>> Mobile: +40761345687
>> Fax:    +40214042395
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> --
> Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
> Center for High Throughput Computing   Department of Computer Sciences
> HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
> Phone: (608) 263-7132                  Madison, WI 53706-1685
>
>


Dr. Mihai Ciubancan
IT Department
National Institute of Physics and Nuclear Engineering "Horia Hulubei"
Str. Reactorului no. 30, P.O. BOX MG-6
077125, Magurele - Bucharest, Romania
http://www.ifin.ro
Work:   +40214042360
Mobile: +40761345687
Fax:    +40214042395