[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] how to submit a job to specific WNs



Hello,

I have been trying to schedule some jobs (owned by a certain VO) submitted to the batch farm (ARC-CE + HTCondor) to specific WNs, however I achieved only partial results (following a recipe at   https://www.gridpp.ac.uk/wiki/Enable_Queues_on_ARC_HTCondor), and I wonder whether I could get some help from the list.

So on ARC-CE arc-ce03

#> cat /etc/arc.conf

...
[queue/ska]
name="ska"
homogeneity="True"
comment="SKA queue"
defaultmemory="3000"
nodememory="16384"
MainMemorySize=16384
OSFamily="linux"
OSName="ScientificSL"
OSVersion="7.3"
opsys="ScientificSL"
opsys="7.3"
opsys="Carbon"
nodecpu="Intel Xeon E5440 @ 2.83GHz"
condor_requirements="(Opsys == "linux") && (OpSysMajorVer == 7) && (SkaRes == True)"
authorizedvo="skatelescope.eu"
...


Also I have configured 4 WNs as:

[root@lcg2170 config.d]# cat /etc/condor/config.d/99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue =?= "ska") && (X509UserProxyVOName =?= "skatelescope.eu")


[root@lcg2195 config.d]# cat 49-catalin
RANK=1.0
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue == "ska")


[root@lcg2197 ~]# cat /etc/condor/config.d/99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue == "ska")


[root@lcg1716 config.d]# cat 99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" || Owner =?= "jpk" || X509UserProxyVOName =?= "skatelescope.eu") && (NordugridQueue =?= "ska")


My job test  (which I submit with 'arcsub -c arc-ce03.gridpp.rl.ac.uk ./list_of_rpms.xrsl' ) is

-bash-4.1$ cat ./list_of_rpms.xrsl

&(executable="query_rpm.sh")
(stdout="test.out")
(stderr="test.err")
(jobname="ARC-HTCondor test")
(count=2)
(memory=1500)
(queue="ska")



The results are not as expected, as the jobs are getting submitted, but they are scheduled on random nodes.
However few things are as expected i.e.

[root@lcg1716 config.d]# condor_who
[root@lcg1716 config.d]#

[root@lcg2170 config.d]# condor_who
[root@lcg2170 config.d]#

[root@lcg2195 config.d]# condor_who
[root@lcg2195 config.d]#

[root@lcg2197 ~]# condor_who

OWNER                     CLIENT                   SLOT JOB          RUNTIME
tna62a001@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_17 24276544.0   8+02:32:36
alicesgm@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_18 24266128.0   8+06:46:55



Also

[root@arc-ce03 config.d]# condor_status -constraint '(Opsys == "linux") && (OpSysMajorVer == 7) && (SkaRes == True)'
Name                             OpSys      Arch   State     Activity LoadAv Me

slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Owner     Idle      0.060 17
slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Owner     Idle      0.000 21
slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle      0.000 21
slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle      0.540 21
slot1_17@xxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy      0.000
slot1_18@xxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy      0.000

                     Machines Owner Claimed Unclaimed Matched Preempting  Drain

        X86_64/LINUX        6     2       2         2       0          0      0

               Total        6     2       2         2       0          0      0



On above output I believe the 4 WNs are correctly advertising themselves as available for 'ska' jobs (SkaRes == True)

What it appears I cannot control yet is the Negotiator does not match the Job requirements to advertised resources.

So my question is what am I missing and where (it could be on ARC-CE in /etc/condor/config.d/ but I do not know what to add there)

Also as an detail, our batch farm (ARC-CE + HTCondor) is running Docker containers for each job_slot, not sure if this is the problem here or not.

Many thanks for any help,
Catalin Condurache
RAL Tier-1