[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor Schedd and docker



Hi,

Based on the offlist information from Iain, I think this is a bug with TCP_FORWARDING_HOST (ticket #5339).

The relevant addresses are of the form:

MyAddress / ScheddIpAddr: <$EXTERNAL:9618?addrs=$INTERNAL-9618&noUDP&sock=8_6140_3>

AddressV1 "{[ p=\"primary\"; a=\”$EXTERNAL\"; port=9618; n=\"Internet\"; spid=\"8_6140_3\"; noUDP=true; ], [ p=\"IPv4\"; a=\”$INTERNAL\"; port=9618; n=\"Internet\"; spid=\"8_6140_3\"; noUDP=true; ]}"

To illustrate the problem, I fed the addresses directly to a python program

>>> import htcondor
>>> htcondor.version()
'$CondorVersion: 8.5.0 Oct 11 2015 BuildID: 344547 $'
>>> schedd = htcondor.Schedd({"ScheddIpAddr": “<$EXTERNAL:9618?noUDP&sock=8_6140_3>"})
>>> schedd.query()
[]
>>> schedd = htcondor.Schedd({"ScheddIpAddr": “<$EXTERNAL:9618?addrs=$INTERNAL-9618&noUDP&sock=8_6140_3>"})
>>> schedd.query()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: Failed to fetch ads from schedd.

where the latter fails after about 60s.  Using strace, I verified that the second query is trying to connect to $INTERNAL.

Both server and client are 8.5.0.  Looking over the 8.4 release notes - it appears this is a known bug and should be fixed in 8.5.1.

Brian

> On Dec 19, 2015, at 5:32 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
> 
> Hi Iain, Andrew,
> 
> Can you send me the output of “condor_status -schedd -l | sort” offlist (should contain public IPs, hence offlist)?
> 
> Thanks,
> 
> Brian
> 
>> On Dec 19, 2015, at 5:24 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
>> 
>> Hi Iain,
>> 
>> Perhaps the docker generated host names are so long that condor_status display is truncating them. Try doing 
>>  condor_status -wide -schedd
>> to display the full name of the schedd and hand that full name to 
>>  condor_q -name 
>> ?
>> 
>> Best
>> Todd
>> 
>> 
>> Sent from my iPhone
>> 
>>> On Dec 19, 2015, at 7:38 AM, Iain Bradford Steers <iain.steers@xxxxxxx> wrote:
>>> 
>>> Hi,
>>> 
>>> Just confirming that my test results in the same outcome as Andrew’s. Can query from outside the container as long as the TCP_FORWARDING_HOST is set to the host IP and the container’s hostname replicates the actual host.
>>> 
>>> Greg; Yes NAT’d containers, the idea is that for individual users or the really small VOs who aren’t familiar with deploying a condor schedd can instead just run the container and it provides them with a submit schedd for submission against our HTCondorCEs, giving them the niceties of the tracking, spool handling etc.
>>> 
>>> Cheers, Iain
>>> 
>>> 
>>>> On Dec 18, 2015, at 17:52, andrew.lahiff@xxxxxxxxxx wrote:
>>>> 
>>>> 
>>>> No :-(
>>>> 
>>>> [root@vm91 ~]# condor_q -pool <hostname> -name myschedd@60d2cbbd1b3
>>>> Error: Collector has no record of schedd/submitter
>>>> 
>>>> Regards,
>>>> Andrew.
>>>> 
>>>> ________________________________________
>>>> From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Greg Thain [gthain@xxxxxxxxxxx]
>>>> Sent: Friday, December 18, 2015 5:20 PM
>>>> To: HTCondor-Users Mail List
>>>> Subject: Re: [HTCondor-users] HTCondor Schedd and docker
>>>> 
>>>>> On 12/18/2015 10:39 AM, andrew.lahiff@xxxxxxxxxx wrote:
>>>>> Interestingly, condor_status -schedd does report it:
>>>>> 
>>>>> [root@vm91 ~]# condor_status -pool <hostname> -schedd
>>>>> Name                 Machine    TotalRunningJobs TotalIdleJobs TotalHeldJobs
>>>>> 
>>>>> myschedd@60d2cbbd1b3 60d2cbbd1b                0             0              0
>>>>>                    TotalRunningJobs      TotalIdleJobs      TotalHeldJobs
>>>> 
>>>> Ah, does
>>>> 
>>>> condor_q -name myschedd@60d2cbbd1b3
>>>> 
>>>> then work?
>>>> 
>>>> -greg
>>>> _______________________________________________
>>>> HTCondor-users mailing list
>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>> 
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>> _______________________________________________
>>>> HTCondor-users mailing list
>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>> 
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> 
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/