[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Problem with Internal DNS on Amazon AWS VPC



This may be more of a question for an Amazon AWS forum, but I’d like to see if there’s a way of reconfiguring Condor to work around the problem, or if anyone has experienced anything similar.

 

I have had a few Amazon EC2 instances running Condor on Scientific Linux 6.3 quite successfully, however we now want to run them in an Amazon Virtual Private Cloud (VPC). The problem I seem to have is that instances running in VPC do not have access to internal DNS in the same way that regular EC2 instances do. ie – nothing is contactable via hostname, not even the local machine:

 

[root@ip-10-0-14-137 ~]# hostname

ip-10-0-14-137

 

[root@ip-10-0-14-137 ~]# ping ip-10-0-14-137

ping: unknown host ip-10-0-14-137

 

[root@ip-10-0-14-137 ~]# nslookup ip-10-0-14-137

Server:         10.0.0.2

Address:        10.0.0.2#53

 

** server can't find ip-10-0-14-137: NXDOMAIN

 

[root@ip-10-0-14-137 ~]# ifconfig

eth0      Link encap:Ethernet  HWaddr 0E:6D:51:79:E6:48

          inet addr:10.0.14.137  Bcast:10.0.15.255  Mask:255.255.240.0

          ……

 

[root@ip-10-0-14-137 ~]# ping 10.0.14.137

PING 10.0.14.137 (10.0.14.137) 56(84) bytes of data.

64 bytes from 10.0.14.137: icmp_seq=1 ttl=64 time=0.024 ms

^C

 

This seems to have the effect that condor doesn’t pick up the hostname when it can’t resolve an IP address back to that hostname:

 

[root@ip-10-0-5-109 run2]# condor_status

 

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

 

slot1@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:00:04

slot2@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:00:05

slot3@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:00:06

slot4@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:00:07

slot5@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:01:26

slot6@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:01:27

slot7@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:02:28

slot8@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:02:21

slot1@ip-10-0-5-10 LINUX      X86_64 Claimed   Busy     1.000  3735  0+00:01:18

slot2@ip-10-0-5-10 LINUX      X86_64 Claimed   Busy     0.990  3735  0+00:01:28

slot3@ip-10-0-5-10 LINUX      X86_64 Claimed   Busy     1.060  3735  0+00:00:55

slot4@ip-10-0-5-10 LINUX      X86_64 Claimed   Busy     0.960  3735  0+00:00:54

 

You can see from the last 4 slots, running on the submit node that I have added an entry to /etc/hosts on that node for the hostname pointing to the correct IP.

 

We could obviously do the same for each of the worker nodes, but this isn’t going to be practical when running many spot instances.

 

I’m assuming this is also what’s causing problems with running these jobs on the worker nodes currently.

 

Any thoughts?

 

Thanks,

Giles



EastQuayIT Ltd is a limited company, registered in England and Wales with Registration no. 07595813. VAT No: GB 116 6924 08.

Any quotation above is based on the terms and conditions of business and commencement of the services is evidence of your acceptance to the same. This message, including any attachments, has been sent by EastQuayIT Ltd and is intended solely for the use of the person(s) to whom it is addressed. Its contents are confidential and if you are not the intended recipient, please could you delete this email from your system, without copying or disclosing its contents, and inform the sender by return e-mail that you have received this message. Email communications cannot be guaranteed to be secure, or free from computer viruses, therefore EastQuayIT Ltd does not accept legal responsibility for this message or its contents. The recipient is responsible for checking this message for viruses and verifying its authenticity before acting on the contents. Any views or opinions presented are solely those of the author and do not necessarily represent those of EastQuayIT Ltd.