[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor Schedd and docker



Hi Iain,

i think condor_q tries to contact the schedd using the container's IP address, which of course isn't accessible from outside:

Fetching job ads...
-- Failed to fetch ads from: <172.17.0.42:9618?addrs=172.17.0.42-9618&noUDP&sock=78_147a_4> : <hostname>
CEDAR:6001:Failed to connect to <172.17.0.42:9618?addrs=172.17.0.42-9618&noUDP&sock=78_147a_4>

Regards,
Andrew.

________________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Iain Bradford Steers [iain.steers@xxxxxxx]
Sent: Friday, December 18, 2015 12:15 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] HTCondor Schedd and docker

Hi Andrew,

Excellent, means it’s not just that I’ve screwed up my container networking!

Yeah it’s a bit of a weird one.

Doing

condor_q -name 11e3ad4f744a

Looking at the CollectorLog it looks like the query that used the Name attribute of the Schedd didn’t get passed through

2/18/15 12:08:12 (Sending 0 ads in response to query)
12/18/15 12:08:12 Query info: matched=0; skipped=1; query_time=0.000041; send_time=0.000049; type=Scheduler; requirements={( ( Name == "" ) )}; peer=<##ip##:27811>; projection={}

I’m wondering if it’s something to do with how condor is internally forming the requirements to send to the collector, however I haven’t spotted it in the debug output yet.

Cheers, Iain

> On Dec 18, 2015, at 12:02, andrew.lahiff@xxxxxxxxxx wrote:
>
> Hi Iain,
>
> I can reproduce the same problem, but I'm not sure how to fix it - having a collector, negotiator and startds in containers seems to be fairly straightforward (and works), but a schedd is a bit harder.
>
> Regards,
> Andrew.
>
> ________________________________
> From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Iain Bradford Steers [iain.steers@xxxxxxx]
> Sent: Wednesday, December 16, 2015 11:48 AM
> To: htcondor-users@xxxxxxxxxxx
> Subject: [HTCondor-users] HTCondor Schedd and docker
>
> Hi All,
>
> Seeing as it's almost the holidays I decided to have some fun with docker and a single schedd.
>
> I've built and run a container that runs condor with the following custom config.
>
> DAEMON_LIST = MASTER, COLLECTOR, SCHEDD
> USE_SHARED_PORT = True
>
> The container works and condor appears to work. I've exposed the usual port 9618 as well. Using bash inside the container I can query both the collector and schedd.
>
> However from outside the container I only appear able to query the collector.(1)
>
> condor_q with multiple attempts at getting -name right just tells me that the collector has no knowledge of the schedd. I've removed the -pool ip as it's publicly routable i.e.
>
> outside]# condor_q -name 11e3ad4f744a
> Error: Collector has no record of schedd/submitter.
>
> Has anyone else tried this before?
>
> Nothing urgent here, just decided to have some fun and a poke around.
>
> Cheers, Iain
>
> (1)
> outside]# condor_status -schedds
> Name                 Machine    TotalRunningJobs TotalIdleJobs TotalHeldJobs
>
> 11e3ad4f744a         11e3ad4f74                0             0              0
>                      TotalRunningJobs      TotalIdleJobs      TotalHeldJobs
>
>
>               Total                 0                  0                  0
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/