[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] private networks, submit nodes and flocking



I appreciate this response ... it looks very promising.

One suggestion I came across in the mailing list archives was that setting

NETWORK_INTERFACE=0.0.0.0

allows the daemons on that machine to bind to all interfaces. Can anyone in the condor group comment on the viability of this solution?

Also, I'm still not sure why condor needs to bind to a specific interface. Is this a design feature or limitation?


thanx again,

rob


On Feb 21, 2005, at 12:14 PM, De-Wei Yin wrote:

Rob,

Go to the Condor user mailing list archive (somehow through
https://lists.cs.wisc.edu/mailman/listinfo/condor-users) and look at the
archive for June 2004, around the middle of the month I think. I posted
fairly detailed notes that describes how to flock from an internal pool
to an external one.


It sounds like you have the schedds bound to the internal NIC.  They
need to be bound to the external NIC and also configured to listen to
the 192.168.x.x interface through HOSTALLOW_READ and HOSTALLOW_WRITE
(see my notes for details).  The gateway machines on the internal pool
will have to be configured to allow IP masquerading so that they can
talk to the other pool, transfer checkpoint data, etc.

In our case we have a private network with three head nodes that have
dual NICs, one is the Condor CM, the other two are submit nodes. All
three are configured with Condor bound to the external NIC, but
HOST_ALLOW_* 192.168.1.*. The kernel IP masquerading bit is set on the
gateway hosts to the internal nodes, which happen to be the submit nodes
as well. The external pool that we flock to is completely public.


Sorry I don't have time to get repeat the details here, but if you have
more questions, let me know and I'll try to answer them.

Dewey

On Mon, 21 Feb 2005, Robert E. Parrott wrote:

Hi Folks,

This is a restatement of an earlier question, but one I've seen before
without an adequate solution.

We have a pool on an internal network, and a workstation pool on an
external network, and would jobs on the internal pool to be able to
flock to the external pool.  (Linux/UNIX machines for now)

The submit nodes on the internal pool all have both internal & external
interfaces, and the head nodes of each have both internal & external
interfaces, so that negotiation cycles complete successfully, but jobs
never start on the external compute nodes.


As I understand it now, condor daemons bind to specific network
interfaces, particularly the schedd. This causes the schedd to try to
reach an IP on the external network via an internal interface, causing
hangs when contacting the schedd via condor_q or during negotiation
cycles.

Questions:

Is this assessment of the problem correct?

Why do the condor daemons bind to specific network interfaces and
ignore the routing table?

Is there a workaround to this problem, for the specific case where
every submit node can see both networks, but are presently bound to a
specific network interface (the internal in this case).


Thanx for the input ... this seems a not-uncommon case, so a general solution would probably benefit a good number of users.


rob


_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx

--
Mr. De-Wei Yin, MASc, PEng
Dept of Chemical & Biological Engineering tel: +1 608 262-3370
University of Wisconsin-Madison fax: +1 608 262-5434
1415 Engineering Drive dyin at cae dot wisc dot edu
Madison WI 53706-1691 USA www.engr.wisc.edu/groups/mtsm/