[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor overload???



At 09:29 AM 7/14/2006 +0000, John Coulthard wrote:
I submitted 744,400 jobs to our condor cluster, was I a little over
ambitious? Is there a recommended limit?

Yes, that's probably too ambitious if you just do it all at once.

It's hard to give a precise recommended limit. People that have tuned their systems reasonably can submit a few thousand jobs at a time. Here are a few thoughts on what you can do:

1) If your jobs are short-running jobs, is it possible for you to combine your jobs? Condor excels at running longer jobs, and if you end up running fewer jobs, it will all work better.

2) You can use DAGMan to throttle your submissions. DAGman lets you manage sets of dependent jobs (job A runs, then B and C can run simultaneously, then D runs--that sort of thing), but you don't have to use it for that purpose. You can make a single DAG with 800,000 independent jobs in it, then tell DAGMan to submit the jobs bit by bit to Condor.

DAGMan is in Section 2.12 of the Condor 6.7 manual. Note the -maxidle option to limit how many idle jobs DAGMan will allow there to be: this will effectively throttle how much you submit to Condor at once.
http://www.cs.wisc.edu/condor/manual/v6.7/2_12DAGMan_Applications.html


Anyway, what I need is a method of clearing the jobs queued so I can get
back to work on smaller batches but condor_q seems to hang so I can't
actually determine what's in the queue and 'condor_rm job#' also seem to
hang.  I've tried restarting condor but obviously the queue remains.  Is
there a backdoor method of clearing this?

If you want to totally clear the queue, remove the job_queue* files in your spool directory.

-alain