[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Flocking



> On Tue, Jun 12, 2007 at 01:21:49PM +0100, Kewley, J (John) wrote:
> > Re: Flocking.
> > * Can all your submit nodes in your first pool "see" (i.e. 
> no firewalls in the way,
> >   and not behind a NAT) all execute nodes in your other pool?
> Yes, I get the full answer when I do a
> ---------------------------------------------
> condor_status -pool <manager of second pool>
> ---------------------------------------------
> on a submitter of pool A.

That queries the head node only, not all the execute nodes.
Maybe try 
condor_status -direct -pool <pool B> -name <executer in pool B>

Even still that may not be enough,
Remember, some fixed ports and an ephemeral (high) port range need
to be open in each direction for tcp AND udp.
See (quick plug)
http://epubs.cclrc.ac.uk/work-details?w=34452

for more details if you do need to open firewalls
 
> > * -remote is for direct submission to another pool, not for 
> flocking.
> Hmm, I see, but does it make sense to
> ----------------------------------------
> condor_submit -pool <manager of pool B>
> ----------------------------------------
> or should a blank 'condor_submit <submit-file>' lead to flocking if
> pool A is completely booked out?

Not really. The idea is that in the presence of flocking you submit jobs to your
own pool as normal and if the system decides it is too busy, it tries to find out if 
a pool it can flock to can share some of the load. By using -remote you are
bypassing this stage and forcing them onto the other pool (where if 2 way flocking was 
enabled they may even flock back again!)
 
> > * Check your HOSTALLOW values in pool B
> >
> Ahh! Do you mean flocking could work if I inlcude the 
> submitters of pool A into
> ------------------------
> HOSTALLOW_WRITE = ...
> ------------------------
> At least I already have
> --------------------------------------------------------------
> HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
> HOSTALLOW_WRITE_STARTD    = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
> HOSTALLOW_READ_COLLECTOR  = $(HOSTALLOW_READ), $(FLOCK_FROM)
> HOSTALLOW_READ_STARTD     = $(HOSTALLOW_READ), $(FLOCK_FROM)
> --------------------------------------------------------------
> as by default and also mentioned in the manual.

That may be the case, but I am not certain.
Which machine is that config on, A, B, all on A, all on B ?
  
> > One test you could do is to name, say, the head node of the 
> 2nd pool (assuming it
> > can run jobs) in the REQUIREMENTS statement of a job on 
> pool A. It then CANNOT
> > run on poll A and, assuming all else is setup correctly, 
> will run on pool B via flocking.
> > If that works, name one of the workers in Pool B and try 
> again. Don't use -remote for this.
> > 
> > Cheers
> > 
> > JK
> 
> How do I define such a requirement? Something like
> ------------------------------------------------- 
> Requirements = TARGET.HOST == <manager of pool B>
> ------------------------------------------------- ?

Requirements = (Machine == "<manager of pool B>")

cheers

JK