[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor in a Container works! (was Re: Trouble trying to make HTCondor work in a Docker container)



Hi guys! Thanks for the interest. I did get it working and will write up a little README and a blog post later today. My current post at http://jimwhite.github.io/Âis about running the BLLIP parser and it's Python GUI in Docker.

The current config runs without privileged (because I want this to be a vanilla Kubernetes thing) but I also want to be able to use Docker images as executables for jobs that will be required for Docker-in-Docker. Â

My new unexpected challenge is that the reason I hadn't seen how to change the number of Kube minions (cloud instances as opposed to pod replicas) dynamically is that that is not currently supported (at least on GCE). I may do something based on the newly released GCE auto-scaler or perhaps the new Google Kubernetes PaaS will do what I need. Â

I've been looking around at Condor auto-scaling solutions for whether there is any existing code that I could use for adjusting the replica controller but haven't seen anything that seems better than writing it from scratch. Any suggestions?

Jim

On Mon, Nov 10, 2014 at 10:33 AM, Tim St Clair <tstclair@xxxxxxxxxx> wrote:
Hey folks -

For this use case, you're likely going to need to disable all cgroup isolation on condor when running, lest you try to re-parent and I don't even recommend trying to do that, because you may enter into a hurt-locker. Also your containers may need to be privileged to run, see https://github.com/GoogleCloudPlatform/kubernetes/issues/391 for more details.

Best of luck,
Tim

----- Original Message -----
> From: "Todd Tannenbaum" <tannenba@xxxxxxxxxxx>
> To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Sent: Monday, November 10, 2014 12:15:14 PM
> Subject: Re: [HTCondor-users] Trouble trying to make HTCondor work in a Docker container
>
> On 11/9/2014 7:05 AM, Jim White wrote:
> > Adding UPDATE_COLLECTOR_WITH_TCP=TRUE gets the workers to show up in the
> > pool and allowing everything for hosts in the private IP space lets jobs
> > run. I'll have a blog post within a few days to show this running in
> > Google Compute Engine.
> >
> > Jim
> >
>
> Jim, this sounds great, please share the blog post once written!
>
> Thanks
> Todd
>
>
>
>
> > On Sun, Nov 9, 2014 at 2:14 AM, Jim White <jimwhite@xxxxxx
> > <mailto:jimwhite@xxxxxx>> wrote:
> >
> >Â Â ÂWell, I've gotten past that issue by enabling DNS and (after dealing
> >Â Â Âwith Docker not permitting limit change operations) I can almost see
> >  Âdaylight. Submit/execute on the central manager host works, and the
> >Â Â Âworkers can see the queue on the central manager but they never join
> >Â Â Âthe pool, neither to jobs submitted at a worker ever get matched.
> >Â Â ÂThe logs on the central manager never show any connections or
> >Â Â Ârequests from the worker's IP address (although there must be some
> >  Âsince reads for condor_q and condor_status work). So I figure this
> >Â Â Âmust be some ordinary Condor config mistake on my part or a
> >Â Â Âcomplication due to the IP address mapping between containers and
> >Â Â Âhosts but I'm surprised I can't find any error messages anywhere in
> >Â Â Âthe logs on either side.
> >
> >Â Â ÂMy current condor_config.local looks something like this
> >Â Â Â(CONDOR_HOST and DAEMON_LIST are set by calling condor_configure
> >Â Â Âwhen the container runs):
> >
> >Â Â Â Â Â## Inside Docker we don't want to rely on DNS for user
> >Â Â Â Â Âauthentication.
> >
> >Â Â Â Â ÂTRUST_UID_DOMAIN = TRUE
> >Â Â Â Â ÂUID_DOMAIN = my-condor-pool
> >
> >Â Â Â Â Â## Use CCB so we don't need to deal with multiple ephemeral ports
> >Â Â Â Â Â## which are not yet supported by Docker.
> >
> >Â Â Â Â ÂUSE_SHARED_PORT = True
> >Â Â Â Â ÂSHARED_PORT_ARGS = -p 9886
> >
> >Â Â Â Â ÂSEC_DEFAULT_NEGOTIATION = NEVER
> >Â Â Â Â ÂSEC_DEFAULT_AUTHENTICATION = NEVER
> >
> >Â Â Â Â Â## We're not gonna try and reconfigure for each host involved.
> >Â Â Â Â Â## Just rely on our private network.
> >Â Â Â Â ÂALLOW_READÂ Â Â Â Â Â = *,*@*
> >Â Â Â Â ÂALLOW_WRITEÂ Â Â Â Â Â= *,*@*
> >Â Â Â Â ÂALLOW_ADMINISTRATORÂ Â= *,*@*
> >Â Â Â Â ÂALLOW_CONFIGÂ Â Â Â Â = *,*@*
> >Â Â Â Â ÂALLOW_NEGOTIATORÂ Â Â = *,*@*
> >Â Â Â Â ÂALLOW_DAEMONÂ Â Â Â Â = *,*@*
> >
> >Â Â Â Â Â# This didn't seem to change the setting for the collector:
> >Â Â Â Â Â# MAX_FILE_DESCRIPTORS=1024
> >Â Â Â Â Â# Maybe DEFAULT_MAX_FILE_DESCRIPTORS?
> >Â Â Â Â Â# The collector wants to allow at least 10240 open descriptors,
> >Â Â Â Â Â# but Docker doesn't permit changing limits.
> >Â Â Â Â ÂCOLLECTOR_MAX_FILE_DESCRIPTORS=1024
> >
> >Â Â Â Â Â# Fiddling with these have had no effect so far...
> >Â Â Â Â ÂFLOCK_FROM=10.*
> >Â Â Â Â ÂFLOCK_TO=$(COLLECTOR_HOST)
> >Â Â Â Â ÂHOSTALLOW_READ=10.*
> >Â Â Â Â ÂHOSTALLOW_WRITE=10.*
> >
> >
> >Â Â ÂJim
> >
> >
> >
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>

--
Cheers,
Timothy St. Clair
Red Hat Inc.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/