[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor server requirements



On Thu, 17 Feb 2005 17:29:16 +0100, Jean-Christophe BACCON
<Jean-Christophe.Baccon@xxxxxxxxxxxxxx> wrote:
> Hi,
> 
> I am plaining the set up of condor to about 200 to 300 computers in a
> Linux environment. There will be only one machine to submit jobs, and
> this machine will be also the central manager of the pool. I have to buy
> this machine and I want to know the requirements of condor in this case.

for a pool of the size you are intending this is a spectacularly bad idea...

Each running job has an associated shadow process running on the
submit machine for the time the job is running...each one consumes a
small but non trivial amount of resources.

the schedd will be required to interact at the following points:
job submission
negotiation
job starting
job ending

All of these interactions are currently single threaded, thus they
will block...causing either significant latencies (which you might be
able to handle)

The condor design is such that it tries to remove some common
bottlenecks in batch systems as well as central sources of failure.
what you intend conflicts with this, and from personal experience, it
is best with condor to try to play it's way not your way :¬)

If you are not using the 6.7 series and making use of the job leasing
feature then when this machine crashes/reboots every single job
running will stop and have to restart from the last checkpoint...

I strongly recommend against this design unless you will have very few
non active jobs in the queue and the machine itself is some multi
thousand pound beast...

I would suggest rethinking your design strategy at this point  to see
if you can do something else.

Perhaps if you have a centralized submission point for security
reasons you could have that main machine not be the schedd you use but
have it farm out the submissions to a set of schedds running on a
selection of machines - more effort on the submission (to the extend
you would have to automate it in some way) but far less likely to
cause you problems apart from FIFO issues due to the interleaving...

Matt