[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] load-balanced central manager?




Hi,

Is it possible to use Condor in a way like there's multiple running
instances of every component (including negotiator) in a pool, and in
this way to provide a load-balanced fail-tolerant environment? Or is
it possible to use only one single negotiator in a pool at once (I
know it's possible to do fail-over with had)?

It is possible to do fail-over with HAD, but it picks the one negotiator to be running at any one time,. Should the current active negotiator go down, it will pick another to start. Note that if the negotiator or the collector crash, all existing jobs stay running, and the schedds will even start new jobs running if they can re-use the claims they already have. Separately, it is also possible to tell the negotiator that it is responsible for some subset of the machines in the pool, and only provide matches to those machines.

I've read about flocking also. So in that way there'd be a number of
pools available with their own central managers. What happens before a
job get flocked?
Before a job can be flocked, it has to fail to match in the local pool (either due to load or a conflict between job and machine requirements).

  Does flocking help to provide some kind of load
balancing between several central managers? Or it makes the situation
even worse because it requires extra work from central managers?

Generally speaking, there isn't a huge load on the central manager, except in the largest of pools, and even then, claim reuse helps tremendously. What can be a problem with the central managers is when then need to communicate with schedds over high latency WAN links, especially when strong security is enabled.

-greg