[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how to submit a job to specific WNs



can you run

	condor_q <jobid> -af:jr Requirements

were <jobid> is the job id of one of your jobs, and then send me the output?
I would like to see what the Requirements expression for the job is once it gets to the Schedd.

It would be safer, if your job requirements were specified using the TARGET prefix like this

condor_requirements="(TARGET.Opsys == "linux") && (TARGET.OpSysMajorVer == 7) && (TARGET.SkaRes == True)"

If you don't use TARGET, and your job has the Opsys, OpSysMajorVer or SkaRes attributes, then an attribute reference without TARGET will resolve against the attribute in the job ad instead of the attribute in the Startd ad.

Also, this statement:

	START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" || Owner =?= "jpk" || X509UserProxyVOName =?= "skatelescope.eu") && (NordugridQueue =?= "ska")

should be using == rather than =?=,  because we want START to be undefined when there is no job to compare it to.  when you use =?=  START becomes false in the absence of a job, which makes the Startd go into OWNER state.

-tj


-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Catalin Condurache - UKRI STFC
Sent: Thursday, May 10, 2018 11:07 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] how to submit a job to specific WNs

Hello,

I have been trying to schedule some jobs (owned by a certain VO) submitted to the batch farm (ARC-CE + HTCondor) to specific WNs, however I achieved only partial results (following a recipe at   https://www.gridpp.ac.uk/wiki/Enable_Queues_on_ARC_HTCondor), and I wonder whether I could get some help from the list.

So on ARC-CE arc-ce03

#> cat /etc/arc.conf

...
[queue/ska]
name="ska"
homogeneity="True"
comment="SKA queue"
defaultmemory="3000"
nodememory="16384"
MainMemorySize=16384
OSFamily="linux"
OSName="ScientificSL"
OSVersion="7.3"
opsys="ScientificSL"
opsys="7.3"
opsys="Carbon"
nodecpu="Intel Xeon E5440 @ 2.83GHz"
condor_requirements="(Opsys == "linux") && (OpSysMajorVer == 7) && (SkaRes == True)"
authorizedvo="skatelescope.eu"
...


Also I have configured 4 WNs as:

[root@lcg2170 config.d]# cat /etc/condor/config.d/99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue =?= "ska") && (X509UserProxyVOName =?= "skatelescope.eu")


[root@lcg2195 config.d]# cat 49-catalin
RANK=1.0
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue == "ska")


[root@lcg2197 ~]# cat /etc/condor/config.d/99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue == "ska")


[root@lcg1716 config.d]# cat 99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" || Owner =?= "jpk" || X509UserProxyVOName =?= "skatelescope.eu") && (NordugridQueue =?= "ska")


My job test  (which I submit with 'arcsub -c arc-ce03.gridpp.rl.ac.uk ./list_of_rpms.xrsl' ) is

-bash-4.1$ cat ./list_of_rpms.xrsl

&(executable="query_rpm.sh")
(stdout="test.out")
(stderr="test.err")
(jobname="ARC-HTCondor test")
(count=2)
(memory=1500)
(queue="ska")



The results are not as expected, as the jobs are getting submitted, but they are scheduled on random nodes.
However few things are as expected i.e.

[root@lcg1716 config.d]# condor_who
[root@lcg1716 config.d]#

[root@lcg2170 config.d]# condor_who
[root@lcg2170 config.d]#

[root@lcg2195 config.d]# condor_who
[root@lcg2195 config.d]#

[root@lcg2197 ~]# condor_who

OWNER                     CLIENT                   SLOT JOB          RUNTIME
tna62a001@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_17 24276544.0   8+02:32:36
alicesgm@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_18 24266128.0   8+06:46:55



Also

[root@arc-ce03 config.d]# condor_status -constraint '(Opsys == "linux") && (OpSysMajorVer == 7) && (SkaRes == True)'
Name                             OpSys      Arch   State     Activity LoadAv Me

slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Owner     Idle      0.060 17
slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Owner     Idle      0.000 21
slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle      0.000 21
slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle      0.540 21
slot1_17@xxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy      0.000
slot1_18@xxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy      0.000

                     Machines Owner Claimed Unclaimed Matched Preempting  Drain

        X86_64/LINUX        6     2       2         2       0          0      0

               Total        6     2       2         2       0          0      0



On above output I believe the 4 WNs are correctly advertising themselves as available for 'ska' jobs (SkaRes == True)

What it appears I cannot control yet is the Negotiator does not match the Job requirements to advertised resources.

So my question is what am I missing and where (it could be on ARC-CE in /etc/condor/config.d/ but I do not know what to add there)

Also as an detail, our batch farm (ARC-CE + HTCondor) is running Docker containers for each job_slot, not sure if this is the problem here or not.

Many thanks for any help,
Catalin Condurache
RAL Tier-1



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/