[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how to submit a job to specific WNs



Depending on the requirements before the transform this will probably work.

JOB_TRANSFORM_DefaultDocker @=end
 [
    Requirements = JobUniverse == 5 && DockerImage =?= undefined && Owner  =!= "nagios";
 ...
   copy_Requirements = "PreDockerRequirements"
   set_Requirements = PreDockerRequirements && (TARGET.HasDocker) &&  ( x509UserProxyVOName =?= "atlas" && NumJobStarts == 0 || x509UserProxyVOName =!= "atlas")
@end

We can assume that jobs already have these clauses in their requirements

&& ( TARGET.Disk >= RequestDisk ) && (TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus ) && ( TARGET.HasFileTransfer )

So if we preserve the original requirements, then we don't need them in the transform explicitly.

If the original job requirements conflict with the new transform requiremnts in some way, we can look into doing something a little more complicated.

-tj

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Catalin Condurache - UKRI STFC
Sent: Friday, May 11, 2018 10:07 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] how to submit a job to specific WNs

Yes, I can modify that transform, carefully of course, and on one ARC-CE at a time to minimise any negative impact.

Also I have been trying to add something similar in condor/config.d/XX-..., but couldn't find anything correct. Maybe it cannot be done elsewhere than in the JOB_TRANSFORM...

Waiting for your proposal...

Many thanks,
Catalin



> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of John M Knoeller
> Sent: 11 May 2018 15:46
> To: HTCondor-Users Mail List
> Subject: Re: [HTCondor-users] how to submit a job to specific WNs
> 
> So it looks like none of the statements in your condor_requirements are in the
> job's Requirements expression.
> 
> I think you are correct that the DefaultDocker transform has replaced the
> statements from your condor_requirements with it's own.
> 
> Do you control the job transform configuration?  I think we could modify that
> transform so that it preserves your original requirements while adding the
> statements needed to make it a Docker job - but unless we can modify that
> transform, your condor_requirements isn't going to have any effect.
> 
> -tj
> 
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Catalin Condurache - UKRI STFC
> Sent: Friday, May 11, 2018 8:25 AM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: Re: [HTCondor-users] how to submit a job to specific WNs
> 
> Hi John,
> 
> I ran your command before any other changes and
> 
> 24379845.0 ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && (
> TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus )
> && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName =?= "atlas" &&
> NumJobStarts == 0 || x509UserProxyVOName =!= "atlas" )
> 
> 
> Then I added 'TARGET.' to condor_requirements in /etc/arc.conf but still very
> similar output
> 
> 24379911.0 ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && (
> TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus )
> && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName =?= "atlas" &&
> NumJobStarts == 0 || x509UserProxyVOName =!= "atlas" )
> 
> 
> All the above are set in /etc/condor/config.d/67job-transform-docker.config on
> ARC node
> 
> # Convert job to Docker universe
> JOB_TRANSFORM_NAMES = $(JOB_TRANSFORM_NAMES), DefaultDocker
> JOB_TRANSFORM_DefaultDocker @=end [
>    Requirements = JobUniverse == 5 && DockerImage =?= undefined && Owner
> =!= "nagios"; ...
>    set_Requirements = ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk
> ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >=
> RequestCpus ) && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName
> =?= "atlas" && NumJobStarts == 0 || x509UserProxyVOName =!= "atlas"); ...
> ]
> @end
> 
> 
> AFAICT those condor_requirements from arc.conf are not passed to condor on
> ARC node.
> Also I do not want to add more specific requirements to that line
> 'set_Requirements = ...'
> 
> Regards,
> Catalin
> 
> 
> 
> > -----Original Message-----
> > From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On
> > Behalf Of John M Knoeller
> > Sent: 10 May 2018 20:57
> > To: HTCondor-Users Mail List
> > Subject: Re: [HTCondor-users] how to submit a job to specific WNs
> >
> > can you run
> >
> > 	condor_q <jobid> -af:jr Requirements
> >
> > were <jobid> is the job id of one of your jobs, and then send me the output?
> > I would like to see what the Requirements expression for the job is
> > once it gets to the Schedd.
> >
> > It would be safer, if your job requirements were specified using the
> > TARGET prefix like this
> >
> > condor_requirements="(TARGET.Opsys == "linux") &&
> > (TARGET.OpSysMajorVer == 7) && (TARGET.SkaRes == True)"
> >
> > If you don't use TARGET, and your job has the Opsys, OpSysMajorVer or
> > SkaRes attributes, then an attribute reference without TARGET will
> > resolve against the attribute in the job ad instead of the attribute in the Startd
> ad.
> >
> > Also, this statement:
> >
> > 	START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" ||
> Owner
> > =?= "jpk" || X509UserProxyVOName =?= "skatelescope.eu") &&
> > (NordugridQueue =?= "ska")
> >
> > should be using == rather than =?=,  because we want START to be
> > undefined when there is no job to compare it to.  when you use =?=
> > START becomes false in the absence of a job, which makes the Startd go into
> OWNER state.
> >
> > -tj
> >
> >
> > -----Original Message-----
> > From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On
> > Behalf Of Catalin Condurache - UKRI STFC
> > Sent: Thursday, May 10, 2018 11:07 AM
> > To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> > Subject: [HTCondor-users] how to submit a job to specific WNs
> >
> > Hello,
> >
> > I have been trying to schedule some jobs (owned by a certain VO)
> > submitted to the batch farm (ARC-CE + HTCondor) to specific WNs,
> > however I achieved only partial results (following a recipe at
> > https://www.gridpp.ac.uk/wiki/Enable_Queues_on_ARC_HTCondor), and I
> > wonder whether I could get some help from the list.
> >
> > So on ARC-CE arc-ce03
> >
> > #> cat /etc/arc.conf
> >
> > ...
> > [queue/ska]
> > name="ska"
> > homogeneity="True"
> > comment="SKA queue"
> > defaultmemory="3000"
> > nodememory="16384"
> > MainMemorySize=16384
> > OSFamily="linux"
> > OSName="ScientificSL"
> > OSVersion="7.3"
> > opsys="ScientificSL"
> > opsys="7.3"
> > opsys="Carbon"
> > nodecpu="Intel Xeon E5440 @ 2.83GHz"
> > condor_requirements="(Opsys == "linux") && (OpSysMajorVer == 7) &&
> > (SkaRes == True)"
> > authorizedvo="skatelescope.eu"
> > ...
> >
> >
> > Also I have configured 4 WNs as:
> >
> > [root@lcg2170 config.d]# cat /etc/condor/config.d/99-catalin SkaRes =
> > True STARTD_ATTRS = $(STARTD_ATTRS), SkaRes START = $(START) &&
> > (NordugridQueue =?= "ska") && (X509UserProxyVOName =?=
> > "skatelescope.eu")
> >
> >
> > [root@lcg2195 config.d]# cat 49-catalin
> > RANK=1.0
> > SkaRes = True
> > STARTD_ATTRS = $(STARTD_ATTRS), SkaRes START = $(START) &&
> > (NordugridQueue == "ska")
> >
> >
> > [root@lcg2197 ~]# cat /etc/condor/config.d/99-catalin SkaRes = True
> > STARTD_ATTRS = $(STARTD_ATTRS), SkaRes START = $(START) &&
> > (NordugridQueue == "ska")
> >
> >
> > [root@lcg1716 config.d]# cat 99-catalin SkaRes = True STARTD_ATTRS =
> > $(STARTD_ATTRS), SkaRes START = (NODE_IS_HEALTHY =?= True) &&
> (Owner
> > =?= "catalin" || Owner =?= "jpk" || X509UserProxyVOName =?=
> > "skatelescope.eu") && (NordugridQueue =?= "ska")
> >
> >
> > My job test  (which I submit with 'arcsub -c arc-ce03.gridpp.rl.ac.uk
> > ./list_of_rpms.xrsl' ) is
> >
> > -bash-4.1$ cat ./list_of_rpms.xrsl
> >
> > &(executable="query_rpm.sh")
> > (stdout="test.out")
> > (stderr="test.err")
> > (jobname="ARC-HTCondor test")
> > (count=2)
> > (memory=1500)
> > (queue="ska")
> >
> >
> >
> > The results are not as expected, as the jobs are getting submitted,
> > but they are scheduled on random nodes.
> > However few things are as expected i.e.
> >
> > [root@lcg1716 config.d]# condor_who
> > [root@lcg1716 config.d]#
> >
> > [root@lcg2170 config.d]# condor_who
> > [root@lcg2170 config.d]#
> >
> > [root@lcg2195 config.d]# condor_who
> > [root@lcg2195 config.d]#
> >
> > [root@lcg2197 ~]# condor_who
> >
> > OWNER                     CLIENT                   SLOT JOB          RUNTIME
> > tna62a001@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_17 24276544.0
> > 8+02:32:36
> > alicesgm@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_18 24266128.0
> > 8+06:46:55
> >
> >
> >
> > Also
> >
> > [root@arc-ce03 config.d]# condor_status -constraint '(Opsys ==
> > "linux") && (OpSysMajorVer == 7) && (SkaRes == True)'
> > Name                             OpSys      Arch   State     Activity LoadAv Me
> >
> > slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Owner     Idle      0.060 17
> > slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Owner     Idle      0.000 21
> > slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle      0.000 21
> > slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle      0.540 21
> > slot1_17@xxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy      0.000
> > slot1_18@xxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy      0.000
> >
> >                      Machines Owner Claimed Unclaimed Matched
> > Preempting  Drain
> >
> >         X86_64/LINUX        6     2       2         2       0          0      0
> >
> >                Total        6     2       2         2       0          0      0
> >
> >
> >
> > On above output I believe the 4 WNs are correctly advertising
> > themselves as available for 'ska' jobs (SkaRes == True)
> >
> > What it appears I cannot control yet is the Negotiator does not match
> > the Job requirements to advertised resources.
> >
> > So my question is what am I missing and where (it could be on ARC-CE
> > in /etc/condor/config.d/ but I do not know what to add there)
> >
> > Also as an detail, our batch farm (ARC-CE + HTCondor) is running
> > Docker containers for each job_slot, not sure if this is the problem here or
> not.
> >
> > Many thanks for any help,
> > Catalin Condurache
> > RAL Tier-1
> >
> >
> >
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/