[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Trouble trying to make HTCondor work in a Docker container



Hey folks - 

For this use case, you're likely going to need to disable all cgroup isolation on condor when running, lest you try to re-parent and I don't even recommend trying to do that, because you may enter into a hurt-locker.  Also your containers may need to be privileged to run, see https://github.com/GoogleCloudPlatform/kubernetes/issues/391 for more details. 

Best of luck,
Tim

----- Original Message -----
> From: "Todd Tannenbaum" <tannenba@xxxxxxxxxxx>
> To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Sent: Monday, November 10, 2014 12:15:14 PM
> Subject: Re: [HTCondor-users] Trouble trying to make HTCondor work in a Docker container
> 
> On 11/9/2014 7:05 AM, Jim White wrote:
> > Adding UPDATE_COLLECTOR_WITH_TCP=TRUE gets the workers to show up in the
> > pool and allowing everything for hosts in the private IP space lets jobs
> > run.  I'll have a blog post within a few days to show this running in
> > Google Compute Engine.
> >
> > Jim
> >
> 
> Jim, this sounds great, please share the blog post once written!
> 
> Thanks
> Todd
> 
> 
> 
> 
> > On Sun, Nov 9, 2014 at 2:14 AM, Jim White <jimwhite@xxxxxx
> > <mailto:jimwhite@xxxxxx>> wrote:
> >
> >     Well, I've gotten past that issue by enabling DNS and (after dealing
> >     with Docker not permitting limit change operations) I can almost see
> >     daylight.  Submit/execute on the central manager host works, and the
> >     workers can see the queue on the central manager but they never join
> >     the pool, neither to jobs submitted at a worker ever get matched.
> >     The logs on the central manager never show any connections or
> >     requests from the worker's IP address (although there must be some
> >     since reads for condor_q and condor_status work).  So I figure this
> >     must be some ordinary Condor config mistake on my part or a
> >     complication due to the IP address mapping between containers and
> >     hosts but I'm surprised I can't find any error messages anywhere in
> >     the logs on either side.
> >
> >     My current condor_config.local looks something like this
> >     (CONDOR_HOST and DAEMON_LIST are set by calling condor_configure
> >     when the container runs):
> >
> >         ## Inside Docker we don't want to rely on DNS for user
> >         authentication.
> >
> >         TRUST_UID_DOMAIN = TRUE
> >         UID_DOMAIN = my-condor-pool
> >
> >         ## Use CCB so we don't need to deal with multiple ephemeral ports
> >         ## which are not yet supported by Docker.
> >
> >         USE_SHARED_PORT = True
> >         SHARED_PORT_ARGS = -p 9886
> >
> >         SEC_DEFAULT_NEGOTIATION = NEVER
> >         SEC_DEFAULT_AUTHENTICATION = NEVER
> >
> >         ## We're not gonna try and reconfigure for each host involved.
> >         ## Just rely on our private network.
> >         ALLOW_READ            = *,*@*
> >         ALLOW_WRITE           = *,*@*
> >         ALLOW_ADMINISTRATOR   = *,*@*
> >         ALLOW_CONFIG          = *,*@*
> >         ALLOW_NEGOTIATOR      = *,*@*
> >         ALLOW_DAEMON          = *,*@*
> >
> >         # This didn't seem to change the setting for the collector:
> >         # MAX_FILE_DESCRIPTORS=1024
> >         # Maybe DEFAULT_MAX_FILE_DESCRIPTORS?
> >         # The collector wants to allow at least 10240 open descriptors,
> >         # but Docker doesn't permit changing limits.
> >         COLLECTOR_MAX_FILE_DESCRIPTORS=1024
> >
> >         # Fiddling with these have had no effect so far...
> >         FLOCK_FROM=10.*
> >         FLOCK_TO=$(COLLECTOR_HOST)
> >         HOSTALLOW_READ=10.*
> >         HOSTALLOW_WRITE=10.*
> >
> >
> >     Jim
> >
> >
> >
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 

-- 
Cheers,
Timothy St. Clair
Red Hat Inc.