[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Trouble trying to make HTCondor work in a Docker container
- Date: Mon, 10 Nov 2014 13:33:29 -0500 (EST)
- From: Tim St Clair <tstclair@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] Trouble trying to make HTCondor work in a Docker container
Hey folks -
For this use case, you're likely going to need to disable all cgroup isolation on condor when running, lest you try to re-parent and I don't even recommend trying to do that, because you may enter into a hurt-locker. Also your containers may need to be privileged to run, see https://github.com/GoogleCloudPlatform/kubernetes/issues/391 for more details.
Best of luck,
----- Original Message -----
> From: "Todd Tannenbaum" <tannenba@xxxxxxxxxxx>
> To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Sent: Monday, November 10, 2014 12:15:14 PM
> Subject: Re: [HTCondor-users] Trouble trying to make HTCondor work in a Docker container
> On 11/9/2014 7:05 AM, Jim White wrote:
> > Adding UPDATE_COLLECTOR_WITH_TCP=TRUE gets the workers to show up in the
> > pool and allowing everything for hosts in the private IP space lets jobs
> > run. I'll have a blog post within a few days to show this running in
> > Google Compute Engine.
> > Jim
> Jim, this sounds great, please share the blog post once written!
> > On Sun, Nov 9, 2014 at 2:14 AM, Jim White <jimwhite@xxxxxx
> > <mailto:jimwhite@xxxxxx>> wrote:
> > Well, I've gotten past that issue by enabling DNS and (after dealing
> > with Docker not permitting limit change operations) I can almost see
> > daylight. Submit/execute on the central manager host works, and the
> > workers can see the queue on the central manager but they never join
> > the pool, neither to jobs submitted at a worker ever get matched.
> > The logs on the central manager never show any connections or
> > requests from the worker's IP address (although there must be some
> > since reads for condor_q and condor_status work). So I figure this
> > must be some ordinary Condor config mistake on my part or a
> > complication due to the IP address mapping between containers and
> > hosts but I'm surprised I can't find any error messages anywhere in
> > the logs on either side.
> > My current condor_config.local looks something like this
> > (CONDOR_HOST and DAEMON_LIST are set by calling condor_configure
> > when the container runs):
> > ## Inside Docker we don't want to rely on DNS for user
> > authentication.
> > TRUST_UID_DOMAIN = TRUE
> > UID_DOMAIN = my-condor-pool
> > ## Use CCB so we don't need to deal with multiple ephemeral ports
> > ## which are not yet supported by Docker.
> > USE_SHARED_PORT = True
> > SHARED_PORT_ARGS = -p 9886
> > SEC_DEFAULT_NEGOTIATION = NEVER
> > SEC_DEFAULT_AUTHENTICATION = NEVER
> > ## We're not gonna try and reconfigure for each host involved.
> > ## Just rely on our private network.
> > ALLOW_READ = *,*@*
> > ALLOW_WRITE = *,*@*
> > ALLOW_ADMINISTRATOR = *,*@*
> > ALLOW_CONFIG = *,*@*
> > ALLOW_NEGOTIATOR = *,*@*
> > ALLOW_DAEMON = *,*@*
> > # This didn't seem to change the setting for the collector:
> > # MAX_FILE_DESCRIPTORS=1024
> > # Maybe DEFAULT_MAX_FILE_DESCRIPTORS?
> > # The collector wants to allow at least 10240 open descriptors,
> > # but Docker doesn't permit changing limits.
> > COLLECTOR_MAX_FILE_DESCRIPTORS=1024
> > # Fiddling with these have had no effect so far...
> > FLOCK_FROM=10.*
> > FLOCK_TO=$(COLLECTOR_HOST)
> > HOSTALLOW_READ=10.*
> > HOSTALLOW_WRITE=10.*
> > Jim
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at:
Timothy St. Clair
Red Hat Inc.