[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] â Re: Running a mixed pool from a jobs perspective (WNs with Docker and pilots without Docker)



For completeness, writing to let you guys know that I did some more testing in a controlled environment today and my tests were successful. We are now planning to fetch the machine attribute from the pilots/workers and then populate 'WantDocker' accordingly.

Thanks!

Best regards,
Farrukh

On Fri, Aug 4, 2017 at 5:50 PM, Farrukh Aftab Khan <farrukh.aftab.khan@xxxxxxxxx> wrote:
Hi Brian,

Do you mean the START _expression_? I set it to 'True' on a worker (without docker) and then submitted a job with 'WantDocker=True'. The job failed (kind of expected after reading the docker section in the manual).

After reading your email, I gave JobMachineAttrs another try and this time things worked. On the job side, I specify [1] and I see that the worker is correctly running my job in docker. Maybe I made a typo before or one of the workers wasn't configured properly to run docker (we are still testing docker in our testbed cluster). I am going to redo the tests on Monday and hopefully this solves our issue.

Thanks!

Best regards,
Farrukh

[1]
JobMachineAttrs = "HasDocker"
WantDocker = isUndefined(MachineAttrHasDocker0) ? FALSE : MachineAttrHasDocker0


Date: Fri, 04 Aug 2017 16:31:09 -0500
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: fkhan@xxxxxxxx, Krista Larson <klarson1@xxxxxxxx>
Subject:
ââ
Re: [HTCondor-users] Running a mixed pool from a jobs
    perspective (WNs with Docker and pilots without Docker)
Message-ID: <E819E9C9-33D7-4242-9AC7-585D33AE110B@xxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"

Hi Farrukh,

Have you tried using match expressions? Those are evaluated by the schedd before the job ad is sent to the startd.

That said, looking at a random StartdLog, I see MachineAttr* attributes in the startd activate claim request -- which occurs prior to launching the job. So, I'm confused why that isn't working for you.

Worst case scenario - the shadow has a copy of both the machine and job ad at the point when the job is launched. Hence, it should be a very modest patch to do the attribute lookup with both ads in the context.

Brian

> On Aug 4, 2017, at 2:46 PM, Farrukh Aftab Khan <farrukh.aftab.khan@xxxxxxxxx> wrote:
>
> Hi guys,
>
> We are trying to run jobs in a pool with both local worker nodes and pilots coming in from off site. Our local workers have docker and we want to run all jobs running on the local cluster inside containers. To do this, we add the following classAds to the job JDL:
>
> WantDocker = TRUE
> DockerImage = "xyz"
>
> Our general idea with this setup is for the jobs to be able to run wherever they can find resources i.e. be it off site or on site. These jobs run fine when they run on the local workers with docker but are kicked off when they try to run on pilots without docker support.
>
> Is there a way for us to force all local running jobs to run on docker from the startd side alone? This way the jobs won't have to specify that they want docker and can run on pilots too. At the same time we will be able to force all jobs trying to run on the local cluster to always use docker.
>
> So far I have tried using 'JobMachineAttrs' to fetch 'HasDocker' in the job classAd, but by the time the corresponding 'MachineAttrHasDocker0' gets populated the job is already running. I also tried referencing HasDocker directly from WantDocker in the job classAds, but the classAd isn't evaluated with reference to the startd classAds. I know this can probably be done by putting the local on site workers behind a CE but this is something we are trying to get away from.
>
> Any other ideas or suggestions are welcome.
>
> Best regards,
> Farrukh
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www-auth.cs.wisc.edu/lists/htcondor-users/attachments/20170804/06a81ed8/attachment.html>

------------------------------

**********************************************