[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how to submit a job to specific WNs



Yes, I can modify that transform, carefully of course, and on one ARC-CE at a time to minimise any negative impact.

Also I have been trying to add something similar in condor/config.d/XX-..., but couldn't find anything correct. Maybe it cannot be done elsewhere than in the JOB_TRANSFORM...

Waiting for your proposal...

Many thanks,
Catalin



> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of John M Knoeller
> Sent: 11 May 2018 15:46
> To: HTCondor-Users Mail List
> Subject: Re: [HTCondor-users] how to submit a job to specific WNs
> 
> So it looks like none of the statements in your condor_requirements are in the
> job's Requirements expression.
> 
> I think you are correct that the DefaultDocker transform has replaced the
> statements from your condor_requirements with it's own.
> 
> Do you control the job transform configuration?  I think we could modify that
> transform so that it preserves your original requirements while adding the
> statements needed to make it a Docker job - but unless we can modify that
> transform, your condor_requirements isn't going to have any effect.
> 
> -tj
> 
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Catalin Condurache - UKRI STFC
> Sent: Friday, May 11, 2018 8:25 AM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: Re: [HTCondor-users] how to submit a job to specific WNs
> 
> Hi John,
> 
> I ran your command before any other changes and
> 
> 24379845.0 ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && (
> TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus )
> && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName =?= "atlas" &&
> NumJobStarts == 0 || x509UserProxyVOName =!= "atlas" )
> 
> 
> Then I added 'TARGET.' to condor_requirements in /etc/arc.conf but still very
> similar output
> 
> 24379911.0 ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && (
> TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus )
> && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName =?= "atlas" &&
> NumJobStarts == 0 || x509UserProxyVOName =!= "atlas" )
> 
> 
> All the above are set in /etc/condor/config.d/67job-transform-docker.config on
> ARC node
> 
> # Convert job to Docker universe
> JOB_TRANSFORM_NAMES = $(JOB_TRANSFORM_NAMES), DefaultDocker
> JOB_TRANSFORM_DefaultDocker @=end [
>    Requirements = JobUniverse == 5 && DockerImage =?= undefined && Owner
> =!= "nagios"; ...
>    set_Requirements = ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk
> ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >=
> RequestCpus ) && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName
> =?= "atlas" && NumJobStarts == 0 || x509UserProxyVOName =!= "atlas"); ...
> ]
> @end
> 
> 
> AFAICT those condor_requirements from arc.conf are not passed to condor on
> ARC node.
> Also I do not want to add more specific requirements to that line
> 'set_Requirements = ...'
> 
> Regards,
> Catalin
> 
> 
> 
> > -----Original Message-----
> > From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On
> > Behalf Of John M Knoeller
> > Sent: 10 May 2018 20:57
> > To: HTCondor-Users Mail List
> > Subject: Re: [HTCondor-users] how to submit a job to specific WNs
> >
> > can you run
> >
> > 	condor_q <jobid> -af:jr Requirements
> >
> > were <jobid> is the job id of one of your jobs, and then send me the output?
> > I would like to see what the Requirements expression for the job is
> > once it gets to the Schedd.
> >
> > It would be safer, if your job requirements were specified using the
> > TARGET prefix like this
> >
> > condor_requirements="(TARGET.Opsys == "linux") &&
> > (TARGET.OpSysMajorVer == 7) && (TARGET.SkaRes == True)"
> >
> > If you don't use TARGET, and your job has the Opsys, OpSysMajorVer or
> > SkaRes attributes, then an attribute reference without TARGET will
> > resolve against the attribute in the job ad instead of the attribute in the Startd
> ad.
> >
> > Also, this statement:
> >
> > 	START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" ||
> Owner
> > =?= "jpk" || X509UserProxyVOName =?= "skatelescope.eu") &&
> > (NordugridQueue =?= "ska")
> >
> > should be using == rather than =?=,  because we want START to be
> > undefined when there is no job to compare it to.  when you use =?=
> > START becomes false in the absence of a job, which makes the Startd go into
> OWNER state.
> >
> > -tj
> >
> >
> > -----Original Message-----
> > From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On
> > Behalf Of Catalin Condurache - UKRI STFC
> > Sent: Thursday, May 10, 2018 11:07 AM
> > To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> > Subject: [HTCondor-users] how to submit a job to specific WNs
> >
> > Hello,
> >
> > I have been trying to schedule some jobs (owned by a certain VO)
> > submitted to the batch farm (ARC-CE + HTCondor) to specific WNs,
> > however I achieved only partial results (following a recipe at
> > https://www.gridpp.ac.uk/wiki/Enable_Queues_on_ARC_HTCondor), and I
> > wonder whether I could get some help from the list.
> >
> > So on ARC-CE arc-ce03
> >
> > #> cat /etc/arc.conf
> >
> > ...
> > [queue/ska]
> > name="ska"
> > homogeneity="True"
> > comment="SKA queue"
> > defaultmemory="3000"
> > nodememory="16384"
> > MainMemorySize=16384
> > OSFamily="linux"
> > OSName="ScientificSL"
> > OSVersion="7.3"
> > opsys="ScientificSL"
> > opsys="7.3"
> > opsys="Carbon"
> > nodecpu="Intel Xeon E5440 @ 2.83GHz"
> > condor_requirements="(Opsys == "linux") && (OpSysMajorVer == 7) &&
> > (SkaRes == True)"
> > authorizedvo="skatelescope.eu"
> > ...
> >
> >
> > Also I have configured 4 WNs as:
> >
> > [root@lcg2170 config.d]# cat /etc/condor/config.d/99-catalin SkaRes =
> > True STARTD_ATTRS = $(STARTD_ATTRS), SkaRes START = $(START) &&
> > (NordugridQueue =?= "ska") && (X509UserProxyVOName =?=
> > "skatelescope.eu")
> >
> >
> > [root@lcg2195 config.d]# cat 49-catalin
> > RANK=1.0
> > SkaRes = True
> > STARTD_ATTRS = $(STARTD_ATTRS), SkaRes START = $(START) &&
> > (NordugridQueue == "ska")
> >
> >
> > [root@lcg2197 ~]# cat /etc/condor/config.d/99-catalin SkaRes = True
> > STARTD_ATTRS = $(STARTD_ATTRS), SkaRes START = $(START) &&
> > (NordugridQueue == "ska")
> >
> >
> > [root@lcg1716 config.d]# cat 99-catalin SkaRes = True STARTD_ATTRS =
> > $(STARTD_ATTRS), SkaRes START = (NODE_IS_HEALTHY =?= True) &&
> (Owner
> > =?= "catalin" || Owner =?= "jpk" || X509UserProxyVOName =?=
> > "skatelescope.eu") && (NordugridQueue =?= "ska")
> >
> >
> > My job test  (which I submit with 'arcsub -c arc-ce03.gridpp.rl.ac.uk
> > ./list_of_rpms.xrsl' ) is
> >
> > -bash-4.1$ cat ./list_of_rpms.xrsl
> >
> > &(executable="query_rpm.sh")
> > (stdout="test.out")
> > (stderr="test.err")
> > (jobname="ARC-HTCondor test")
> > (count=2)
> > (memory=1500)
> > (queue="ska")
> >
> >
> >
> > The results are not as expected, as the jobs are getting submitted,
> > but they are scheduled on random nodes.
> > However few things are as expected i.e.
> >
> > [root@lcg1716 config.d]# condor_who
> > [root@lcg1716 config.d]#
> >
> > [root@lcg2170 config.d]# condor_who
> > [root@lcg2170 config.d]#
> >
> > [root@lcg2195 config.d]# condor_who
> > [root@lcg2195 config.d]#
> >
> > [root@lcg2197 ~]# condor_who
> >
> > OWNER                     CLIENT                   SLOT JOB          RUNTIME
> > tna62a001@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_17 24276544.0
> > 8+02:32:36
> > alicesgm@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_18 24266128.0
> > 8+06:46:55
> >
> >
> >
> > Also
> >
> > [root@arc-ce03 config.d]# condor_status -constraint '(Opsys ==
> > "linux") && (OpSysMajorVer == 7) && (SkaRes == True)'
> > Name                             OpSys      Arch   State     Activity LoadAv Me
> >
> > slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Owner     Idle      0.060 17
> > slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Owner     Idle      0.000 21
> > slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle      0.000 21
> > slot1@xxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle      0.540 21
> > slot1_17@xxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy      0.000
> > slot1_18@xxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy      0.000
> >
> >                      Machines Owner Claimed Unclaimed Matched
> > Preempting  Drain
> >
> >         X86_64/LINUX        6     2       2         2       0          0      0
> >
> >                Total        6     2       2         2       0          0      0
> >
> >
> >
> > On above output I believe the 4 WNs are correctly advertising
> > themselves as available for 'ska' jobs (SkaRes == True)
> >
> > What it appears I cannot control yet is the Negotiator does not match
> > the Job requirements to advertised resources.
> >
> > So my question is what am I missing and where (it could be on ARC-CE
> > in /etc/condor/config.d/ but I do not know what to add there)
> >
> > Also as an detail, our batch farm (ARC-CE + HTCondor) is running
> > Docker containers for each job_slot, not sure if this is the problem here or
> not.
> >
> > Many thanks for any help,
> > Catalin Condurache
> > RAL Tier-1
> >
> >
> >
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/