[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [condor-users] Scaling to hundreds, then thousands of nodes



Paul,

I'm very interested in your setup, because it looks very similar to mine.  Could you characterize the amount of data transfer necessary for your jobs to run?  Are they small or large?  About how long do they take to run?  How much bandwidth do you have between the CM and your grid nodes?  And lastly, what are the approximate specs of the computer you're using as a CM?

Thanks,
David

-----Original Message-----
From: Paul Wilson [mailto:p.b.wilson@xxxxxxxxx]
Sent: Thursday, March 18, 2004 8:01 AM
To: condor-users@xxxxxxxxxxx
Subject: [SPAM] - RE: [condor-users] Scaling to hundreds, then thousands
of nodes - Email found in subject


Hi

A few days late, but I've been out of the country:

We run 1000 machines from a single submit node (which happens to also be
the central manager).

Our queue has regularly seen up to 5000 jobs at a time, and it has
reliably crunched it's way through them with no problems. The pool has
been running this way for 6 months now, and runs like clockwork.

However, we are about to change to remote Condor-G submission: No user
will ever physically login to the scheduler again, as Condor-G and
Globus will do it all for them from their own machines, authenticating
via UK eScience x509  user certificates which improves our security,
lowers our admin time and returns the data right back to each user's
home machine.
They could also use Dagman scripting abilities to redirect their output
directly to our projects SRB storage.

Reasons for this are also related to the fact that with a single
submitter and over 30,000 jobs per month, we get skip-loads of data
filling up the submit node's disk which we have to remind users to
remove. Using Condor-G submission local to each user will fill up the
disks of the actual users, not the Condor submit node!

One advantage of a single submit node is that it allows very easy
control of the pool from an administrative point of view, and was the
only way we could sell Condor to our system administrators!

Paul

University College, London

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>