[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] flocking: trying to connect directly to theprivate network



Hi Henrique,

Those viznodes are going to have to talk to your submit node, which will
not be possible with their private IP addresses because I'm betting that
the head node is not performing IP masquerading on their behalf, right?
Another alternative is to run a GCB instance on the head node, though I
haven't actually done this myself. See the following link for details:

http://www.cs.wisc.edu/~sschang/firewall/gcb/index.htm

The way we do it is to give all nodes in our Condor "world" certain
"private" addresses that are routable within our domain (hence local
routers need to be suitably configured). Then the head node has IP
forwarding activated as well as being an ARP proxy. This then acts as the
gateway for the cluster nodes, and it all works well.

Cheers,
Mark

> Hi all
> 
> I've just finished installing condor in a cluster (viz), which head node is 
> vizhead and the nodes are viznode1, viznode2... in a 10.1.1.* private 
> network
> 
> >From my submit node (127.39.27.70) I fired a job matching the OS on the viz 
> cluster so there's no chance to execute anywhere but there.
> 
> Problem is, the submit node seems to talk to vizhead and then a match is 
> made. Up to this point, everything is okay.
> 
> However, it seems that the submit node then tries to connect directly to the 
> cluster nodes, which are in the private network (10.1.1.*) and, of course, 
> fails.
> 
> How to workaround this?
> 
> The SchedLog in the vizhead follows:
> 
> 9/24 16:40:27 DaemonCore: Command received via TCP from host 
> <127.39.27.70:44768>
> 9/24 16:40:27 DaemonCore: received command 416 (NEGOTIATE), calling handler 
> (negotiate)
> 9/24 16:40:27 Negotiating for owner: hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx
> 9/24 16:40:27 Checking consistency running and runnable jobs
> 9/24 16:40:27 Tables are consistent
> 9/24 16:40:27 Out of servers - 0 jobs matched, 10 jobs idle, 1 jobs rejected
> 9/24 16:40:27 Increasing flock level for hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx 
> to 1.
> 9/24 16:40:31 Sent ad to central manager for 
> hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx
> 9/24 16:41:02 DaemonCore: Command received via TCP from host 
> <127.39.27.155:51294>
> 9/24 16:41:02 DaemonCore: received command 416 (NEGOTIATE), calling handler 
> (negotiate)
> 9/24 16:41:02 Negotiating for owner: hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx
> 9/24 16:41:02 Checking consistency running and runnable jobs
> 9/24 16:41:02 Tables are consistent
> 9/24 16:41:02 Out of servers - 7 jobs matched, 3 jobs idle, 1 jobs rejected
> 9/24 16:41:47 select returns 0, connect failed
> 9/24 16:41:47 Will keep trying for 45 seconds...
> 9/24 16:41:48 Connect failed for 45 seconds; returning FALSE
> 9/24 16:41:48 Couldn't send REQUEST_CLAIM to startd at <10.1.1.111:38543>
> 9/24 16:43:03 Can't connect to <10.1.1.111:38543>:0, errno = 145
> 9/24 16:43:03 Will keep trying for 10 seconds...
> 9/24 16:43:04 Connect failed for 10 seconds; returning FALSE
> 9/24 16:43:04 ERROR:
> SECMAN:2003:TCP connection to <10.1.1.111:38543> failed
> 
> 9/24 16:43:04 Sent RELEASE_CLAIM to startd on <10.1.1.111:38543>
> 9/24 16:43:04 Match record (<10.1.1.111:38543>, 22, 0) deleted
> 9/24 16:43:49 select returns 0, connect failed
> 9/24 16:43:49 Will keep trying for 45 seconds...
> 9/24 16:43:50 Connect failed for 45 seconds; returning FALSE
> 9/24 16:43:50 Couldn't send REQUEST_CLAIM to startd at <10.1.1.112:33865>
> 9/24 16:45:05 Can't connect to <10.1.1.112:33865>:0, errno = 145
> 9/24 16:45:05 Will keep trying for 10 seconds...
> 9/24 16:45:06 Connect failed for 10 seconds; returning FALSE
> 9/24 16:45:06 ERROR:
> SECMAN:2003:TCP connection to <10.1.1.112:33865> failed
> 
> Thanks!
> 
> Henrique
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users


---------------------------------------------
Department of Earth Sciences
University of Cambridge
Downing Street
Cambridge CB2 3EQ
Phone: ( +44 ) 1223 333400
Fax: ( +44 ) 1223 333450