[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How To TroubleShoot Flocking



On Wed, Jul 05, 2006 at 02:40:38PM -0500, John Alberts wrote:
> Hi.  I am trying to setup flocking between 2 condor pools.  1 pool I
> have complete control/access to, the other pool I can log in using ssh
> and submit jobs.  The administrator of the other pool is currently on
> vacation and said he has configured flocking to/from our pool.  I'm
> trying to test it, and it seems like flocking isn't working.
> 
>  
> 
> I was wondering how I can troubleshoot flocking to see what the culprit
> is.  I already tried to submit a job whose requirements can only be
> fulfilled on the other pool.  Condor_status -analyze <jobid> shows that
> all machines can't meet the requirements. 

1. I think you mean 'condor_q -analyze'

2. I'm not sure that condor_q -analyze works with remote pools.

> I have also run condor_status
> -pool <otherpoolname> and it properly displays all available machines on
> the other pool.  I'm not sure what to check next.
> 

The next thing to check is to make sure that you're actually flocking
to the remote pool. When a schedd "flocks" to a remote pool, all it does
is send a ClassAd announcing that it has idle jobs to the remote pool.
You can check to see if the remote pool know that you have idle jobs
with

condor_status -pool remote.pool.central.manager -submitters

The schedd will not flock to the remote pool right away - it will wait until
it has had a few negotiation cycles with the local pool before it 
decides to "increase the flock level". This usually happens within 
15 or 20 minutes of submtting a job that can't be satisifed in the local
pool.

-Erik