Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] flocking: trying to connect directly to theprivatenetwork

Date: Sun, 26 Sep 2004 06:39:29 GB
From: mcal00@xxxxxxxxxxxxx
Subject: Re: [Condor-users] flocking: trying to connect directly to theprivatenetwork

Hi Henrique,

I expect the Wisconsin chaps  to speak up if I utter any falsehoods in this
message, but in current "out-of-the-box" Condor the central manager (which
is running on your cluster head-node, right?) is just a resource broker.
Once a submit node and execute machine have been matched he just retires
gracefully from the scene and the shadow process on the submit node and the
starter on the execute node communicate directly. Hence, there has to be a
viable route between these machines, which you currently don't have for a
number of reasons (no port forwarding on the head node, un-routable private
IP addresses being used by the cluster nodes).

Flocking allows central managers of *different* pools to share resources
between their pools, so a job submitted from a machine on one pool can end
up running on an execute node in the other. Again, those central managers
are just brokering; they're not doing any fancy routing of Condor traffic.

I don't know how stable GCB is (Sechang?), and as far as I'm aware DFP is
not even available yet. If you don't want to use these, then as far as I
can tell you have two more options open to you: 1) enable port forwarding
on your head node and give those cluster nodes addresses that would be
routable outside of the cluster (this is what we use), or 2) set up the
head node to IP masquerade on their behalf, assuming you've built this into
the kernel. 

Does this sound sensible Wisconsin?

Cheers,
Mark

> Hi Mark,
> 
> Thanks for your reply.
> 
> I was under the impression that Condor would tackle this problem natively -
> by flocking.  But now I'm confused by what flocking means - if it's
> marshalling files between whole pools or only between two machines.
> 
> One interesting thing I did other day is to issue a
> NETWORK_INTERFACE = 0.0.0.0   (which is INADDR_ANY in netinet/in.h anyway).
> It's been working very well so far and makes my head node listen to all
> network interfaces, both the external and the internal.
> 
> I've seen GCB and DPF from Sechang - but do you think they are stable enough
> to deploy in production mode?
> 
> Henrique
> 
> 
> > Hi Henrique,
> >
> > Those viznodes are going to have to talk to your submit node, which will
> > not be possible with their private IP addresses because I'm betting that
> > the head node is not performing IP masquerading on their behalf, right?
> > Another alternative is to run a GCB instance on the head node, though I
> > haven't actually done this myself. See the following link for details:
> >
> > http://www.cs.wisc.edu/~sschang/firewall/gcb/index.htm
> >
> > The way we do it is to give all nodes in our Condor "world" certain
> > "private" addresses that are routable within our domain (hence local
> > routers need to be suitably configured). Then the head node has IP
> > forwarding activated as well as being an ARP proxy. This then acts as the
> > gateway for the cluster nodes, and it all works well.
> >
> > Cheers,
> > Mark
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users


---------------------------------------------
Department of Earth Sciences
University of Cambridge
Downing Street
Cambridge CB2 3EQ
Phone: ( +44 ) 1223 333400
Fax: ( +44 ) 1223 333450

Prev by Date: Re: [Condor-users] flocking: trying to connect directly to theprivatenetwork
Next by Date: [Condor-users] condor_status & condor_q unresponsive periodically
Previous by thread: Re: [Condor-users] flocking: trying to connect directly to theprivatenetwork
Next by thread: [Condor-users] condor_status & condor_q unresponsive periodically
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] flocking: trying to connect directly to theprivatenetwork