[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] How To TroubleShoot Flocking
- Date: Wed, 5 Jul 2006 14:55:02 -0500
- From: Erik Paulson <epaulson@xxxxxxxxxxx>
- Subject: Re: [Condor-users] How To TroubleShoot Flocking
On Wed, Jul 05, 2006 at 02:40:38PM -0500, John Alberts wrote:
> Hi. I am trying to setup flocking between 2 condor pools. 1 pool I
> have complete control/access to, the other pool I can log in using ssh
> and submit jobs. The administrator of the other pool is currently on
> vacation and said he has configured flocking to/from our pool. I'm
> trying to test it, and it seems like flocking isn't working.
> I was wondering how I can troubleshoot flocking to see what the culprit
> is. I already tried to submit a job whose requirements can only be
> fulfilled on the other pool. Condor_status -analyze <jobid> shows that
> all machines can't meet the requirements.
1. I think you mean 'condor_q -analyze'
2. I'm not sure that condor_q -analyze works with remote pools.
> I have also run condor_status
> -pool <otherpoolname> and it properly displays all available machines on
> the other pool. I'm not sure what to check next.
The next thing to check is to make sure that you're actually flocking
to the remote pool. When a schedd "flocks" to a remote pool, all it does
is send a ClassAd announcing that it has idle jobs to the remote pool.
You can check to see if the remote pool know that you have idle jobs
condor_status -pool remote.pool.central.manager -submitters
The schedd will not flock to the remote pool right away - it will wait until
it has had a few negotiation cycles with the local pool before it
decides to "increase the flock level". This usually happens within
15 or 20 minutes of submtting a job that can't be satisifed in the local