[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how to submit a job to specific WNs



Hi John,

I ran your command before any other changes and

24379845.0 ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus ) && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName =?= "atlas" && NumJobStarts == 0 || x509UserProxyVOName =!= "atlas" )


Then I added 'TARGET.' to condor_requirements in /etc/arc.conf but still very similar output

24379911.0 ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus ) && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName =?= "atlas" && NumJobStarts == 0 || x509UserProxyVOName =!= "atlas" )


All the above are set in /etc/condor/config.d/67job-transform-docker.config on ARC node

# Convert job to Docker universe
JOB_TRANSFORM_NAMES = $(JOB_TRANSFORM_NAMES), DefaultDocker
JOB_TRANSFORM_DefaultDocker @=end
[
   Requirements = JobUniverse == 5 && DockerImage =?= undefined && Owner =!= "nagios";
...
   set_Requirements = ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus ) && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName =?= "atlas" && NumJobStarts == 0 || x509UserProxyVOName =!= "atlas");
...
]
@end


AFAICT those condor_requirements from arc.conf are not passed to condor on ARC node.
Also I do not want to add more specific requirements to that line 'set_Requirements = ...'

Regards,
Catalin



> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of John M Knoeller
> Sent: 10 May 2018 20:57
> To: HTCondor-Users Mail List
> Subject: Re: [HTCondor-users] how to submit a job to specific WNs
> 
> can you run
> 
> 	condor_q <jobid> -af:jr Requirements
> 
> were <jobid> is the job id of one of your jobs, and then send me the output?
> I would like to see what the Requirements expression for the job is once it gets
> to the Schedd.
> 
> It would be safer, if your job requirements were specified using the TARGET
> prefix like this
> 
> condor_requirements="(TARGET.Opsys == "linux") && (TARGET.OpSysMajorVer
> == 7) && (TARGET.SkaRes == True)"
> 
> If you don't use TARGET, and your job has the Opsys, OpSysMajorVer or SkaRes
> attributes, then an attribute reference without TARGET will resolve against the
> attribute in the job ad instead of the attribute in the Startd ad.
> 
> Also, this statement:
> 
> 	START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" ||
> Owner =?= "jpk" || X509UserProxyVOName =?= "skatelescope.eu") &&
> (NordugridQueue =?= "ska")
> 
> should be using == rather than =?=,  because we want START to be undefined
> when there is no job to compare it to.  when you use =?=  START becomes false
> in the absence of a job, which makes the Startd go into OWNER state.
> 
> -tj
> 
> 
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Catalin Condurache - UKRI STFC
> Sent: Thursday, May 10, 2018 11:07 AM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] how to submit a job to specific WNs
> 
> Hello,
> 
> I have been trying to schedule some jobs (owned by a certain VO) submitted to
> the batch farm (ARC-CE + HTCondor) to specific WNs, however I achieved only
> partial results (following a recipe at
> https://www.gridpp.ac.uk/wiki/Enable_Queues_on_ARC_HTCondor), and I
> wonder whether I could get some help from the list.
> 
> So on ARC-CE arc-ce03
> 
> #> cat /etc/arc.conf
> 
> ...
> [queue/ska]
> name="ska"
> homogeneity="True"
> comment="SKA queue"
> defaultmemory="3000"
> nodememory="16384"
> MainMemorySize=16384
> OSFamily="linux"
> OSName="ScientificSL"
> OSVersion="7.3"
> opsys="ScientificSL"
> opsys="7.3"
> opsys="Carbon"
> nodecpu="Intel Xeon E5440 @ 2.83GHz"
> condor_requirements="(Opsys == "linux") && (OpSysMajorVer == 7) &&
> (SkaRes == True)"
> authorizedvo="skatelescope.eu"
> ...
> 
> 
> Also I have configured 4 WNs as:
> 
> [root@lcg2170 config.d]# cat /etc/condor/config.d/99-catalin SkaRes = True
> STARTD_ATTRS = $(STARTD_ATTRS), SkaRes START = $(START) &&
> (NordugridQueue =?= "ska") && (X509UserProxyVOName =?=
> "skatelescope.eu")
> 
> 
> [root@lcg2195 config.d]# cat 49-catalin
> RANK=1.0
> SkaRes = True
> STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
> START = $(START) && (NordugridQueue == "ska")
> 
> 
> [root@lcg2197 ~]# cat /etc/condor/config.d/99-catalin SkaRes = True
> STARTD_ATTRS = $(STARTD_ATTRS), SkaRes START = $(START) &&
> (NordugridQueue == "ska")
> 
> 
> [root@lcg1716 config.d]# cat 99-catalin
> SkaRes = True
> STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
> START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" || Owner =?=
> "jpk" || X509UserProxyVOName =?= "skatelescope.eu") && (NordugridQueue
> =?= "ska")
> 
> 
> My job test  (which I submit with 'arcsub -c arc-ce03.gridpp.rl.ac.uk
> ./list_of_rpms.xrsl' ) is
> 
> -bash-4.1$ cat ./list_of_rpms.xrsl
> 
> &(executable="query_rpm.sh")
> (stdout="test.out")
> (stderr="test.err")
> (jobname="ARC-HTCondor test")
> (count=2)
> (memory=1500)
> (queue="ska")
> 
> 
> 
> The results are not as expected, as the jobs are getting submitted, but they are
> scheduled on random nodes.
> However few things are as expected i.e.
> 
> [root@lcg1716 config.d]# condor_who
> [root@lcg1716 config.d]#
> 
> [root@lcg2170 config.d]# condor_who
> [root@lcg2170 config.d]#
> 
> [root@lcg2195 config.d]# condor_who
> [root@lcg2195 config.d]#
> 
> [root@lcg2197 ~]# condor_who
> 
> OWNER                     CLIENT                   SLOT JOB          RUNTIME
> tna62a001@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_17 24276544.0
> 8+02:32:36
> alicesgm@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_18 24266128.0
> 8+06:46:55
> 
> 
> 
> Also
> 
> [root@arc-ce03 config.d]# condor_status -constraint '(Opsys == "linux") &&
> (OpSysMajorVer == 7) && (SkaRes == True)'
> Name                             OpSys      Arch   State     Activity LoadAv Me
> 
> slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Owner     Idle      0.060 17
> slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Owner     Idle      0.000 21
> slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle      0.000 21
> slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle      0.540 21
> slot1_17@xxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy      0.000
> slot1_18@xxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy      0.000
> 
>                      Machines Owner Claimed Unclaimed Matched Preempting  Drain
> 
>         X86_64/LINUX        6     2       2         2       0          0      0
> 
>                Total        6     2       2         2       0          0      0
> 
> 
> 
> On above output I believe the 4 WNs are correctly advertising themselves as
> available for 'ska' jobs (SkaRes == True)
> 
> What it appears I cannot control yet is the Negotiator does not match the Job
> requirements to advertised resources.
> 
> So my question is what am I missing and where (it could be on ARC-CE in
> /etc/condor/config.d/ but I do not know what to add there)
> 
> Also as an detail, our batch farm (ARC-CE + HTCondor) is running Docker
> containers for each job_slot, not sure if this is the problem here or not.
> 
> Many thanks for any help,
> Catalin Condurache
> RAL Tier-1
> 
> 
>