[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Flocking



On Tue, Jun 12, 2007 at 02:04:48PM +0100, Kewley, J (John) wrote:
> > On Tue, Jun 12, 2007 at 01:21:49PM +0100, Kewley, J (John) wrote:
> > > Re: Flocking.
> > > * Can all your submit nodes in your first pool "see" (i.e. 
> > no firewalls in the way,
> > >   and not behind a NAT) all execute nodes in your other pool?
> > Yes, I get the full answer when I do a
> > ---------------------------------------------
> > condor_status -pool <manager of second pool>
> > ---------------------------------------------
> > on a submitter of pool A.
> 
> That queries the head node only, not all the execute nodes.
> Maybe try 
> condor_status -direct -pool <pool B> -name <executer in pool B>
> 
> Even still that may not be enough,
> Remember, some fixed ports and an ephemeral (high) port range need
> to be open in each direction for tcp AND udp.
> See (quick plug)
> http://epubs.cclrc.ac.uk/work-details?w=34452
> 
> for more details if you do need to open firewalls
>  
> > > * -remote is for direct submission to another pool, not for 
> > flocking.
> > Hmm, I see, but does it make sense to
> > ----------------------------------------
> > condor_submit -pool <manager of pool B>
> > ----------------------------------------
> > or should a blank 'condor_submit <submit-file>' lead to flocking if
> > pool A is completely booked out?
> 
> Not really. The idea is that in the presence of flocking you submit jobs to your
> own pool as normal and if the system decides it is too busy, it tries to find out if 
> a pool it can flock to can share some of the load. By using -remote you are
> bypassing this stage and forcing them onto the other pool (where if 2 way flocking was 
> enabled they may even flock back again!)
>  
> > > * Check your HOSTALLOW values in pool B
> > >
> > Ahh! Do you mean flocking could work if I inlcude the 
> > submitters of pool A into
> > ------------------------
> > HOSTALLOW_WRITE = ...
> > ------------------------
> > At least I already have
> > --------------------------------------------------------------
> > HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
> > HOSTALLOW_WRITE_STARTD    = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
> > HOSTALLOW_READ_COLLECTOR  = $(HOSTALLOW_READ), $(FLOCK_FROM)
> > HOSTALLOW_READ_STARTD     = $(HOSTALLOW_READ), $(FLOCK_FROM)
> > --------------------------------------------------------------
> > as by default and also mentioned in the manual.
> 
> That may be the case, but I am not certain.
> Which machine is that config on, A, B, all on A, all on B ?

I added the submitter of pool A to HOSTALLOW_WRITE residing on all 
of pool B namely in pool B's global config-file. ==> Et voila!
I submitted the job on the pool-A-submitter w/o any '-pool' or '-remote'
and it flocked really to pool B, actually only to pool B's manager, because
that's the only executer in pool B with open firewalls towards pool A. 
Some other free nodes of pool B first 'matched' but then did not execute, I guess 
missing/blocked network-communication hindered the execution on them.

But so far it works and much easier (i.e. no credentials etc) than I feared!

Thnaks a lot for your help!

Urs
>   
> > > One test you could do is to name, say, the head node of the 
> > 2nd pool (assuming it
> > > can run jobs) in the REQUIREMENTS statement of a job on 
> > pool A. It then CANNOT
> > > run on poll A and, assuming all else is setup correctly, 
> > will run on pool B via flocking.
> > > If that works, name one of the workers in Pool B and try 
> > again. Don't use -remote for this.
> > > 
> > > Cheers
> > > 
> > > JK
> > 
> > How do I define such a requirement? Something like
> > ------------------------------------------------- 
> > Requirements = TARGET.HOST == <manager of pool B>
> > ------------------------------------------------- ?
> 
> Requirements = (Machine == "<manager of pool B>")
> 
> cheers
> 
> JK
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
> 
> 
> !DSPAM:466e9c0986296524071659!
>