[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Problem with Internal DNS on Amazon AWS VPC

This may be more of a question for an Amazon AWS forum, but I’d like to see if there’s a way of reconfiguring Condor to work around the problem, or if anyone has experienced anything similar.


I have had a few Amazon EC2 instances running Condor on Scientific Linux 6.3 quite successfully, however we now want to run them in an Amazon Virtual Private Cloud (VPC). The problem I seem to have is that instances running in VPC do not have access to internal DNS in the same way that regular EC2 instances do. ie – nothing is contactable via hostname, not even the local machine:


[root@ip-10-0-14-137 ~]# hostname



[root@ip-10-0-14-137 ~]# ping ip-10-0-14-137

ping: unknown host ip-10-0-14-137


[root@ip-10-0-14-137 ~]# nslookup ip-10-0-14-137




** server can't find ip-10-0-14-137: NXDOMAIN


[root@ip-10-0-14-137 ~]# ifconfig

eth0      Link encap:Ethernet  HWaddr 0E:6D:51:79:E6:48

          inet addr:  Bcast:  Mask:



[root@ip-10-0-14-137 ~]# ping

PING ( 56(84) bytes of data.

64 bytes from icmp_seq=1 ttl=64 time=0.024 ms



This seems to have the effect that condor doesn’t pick up the hostname when it can’t resolve an IP address back to that hostname:


[root@ip-10-0-5-109 run2]# condor_status


Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime


slot1@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:00:04

slot2@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:00:05

slot3@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:00:06

slot4@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:00:07

slot5@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:01:26

slot6@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:01:27

slot7@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:02:28

slot8@             LINUX      X86_64 Unclaimed Idle     0.000  3761  0+00:02:21

slot1@ip-10-0-5-10 LINUX      X86_64 Claimed   Busy     1.000  3735  0+00:01:18

slot2@ip-10-0-5-10 LINUX      X86_64 Claimed   Busy     0.990  3735  0+00:01:28

slot3@ip-10-0-5-10 LINUX      X86_64 Claimed   Busy     1.060  3735  0+00:00:55

slot4@ip-10-0-5-10 LINUX      X86_64 Claimed   Busy     0.960  3735  0+00:00:54


You can see from the last 4 slots, running on the submit node that I have added an entry to /etc/hosts on that node for the hostname pointing to the correct IP.


We could obviously do the same for each of the worker nodes, but this isn’t going to be practical when running many spot instances.


I’m assuming this is also what’s causing problems with running these jobs on the worker nodes currently.


Any thoughts?




EastQuayIT Ltd is a limited company, registered in England and Wales with Registration no. 07595813. VAT No: GB 116 6924 08.

Any quotation above is based on the terms and conditions of business and commencement of the services is evidence of your acceptance to the same. This message, including any attachments, has been sent by EastQuayIT Ltd and is intended solely for the use of the person(s) to whom it is addressed. Its contents are confidential and if you are not the intended recipient, please could you delete this email from your system, without copying or disclosing its contents, and inform the sender by return e-mail that you have received this message. Email communications cannot be guaranteed to be secure, or free from computer viruses, therefore EastQuayIT Ltd does not accept legal responsibility for this message or its contents. The recipient is responsible for checking this message for viruses and verifying its authenticity before acting on the contents. Any views or opinions presented are solely those of the author and do not necessarily represent those of EastQuayIT Ltd.