[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor overload???
- Date: Fri, 14 Jul 2006 14:27:06 +0200
- From: Alain Roy <roy@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Condor overload???
At 09:29 AM 7/14/2006 +0000, John Coulthard wrote:
I submitted 744,400 jobs to our condor cluster, was I a little over
ambitious? Is there a recommended limit?
Yes, that's probably too ambitious if you just do it all at once.
It's hard to give a precise recommended limit. People that have tuned
their systems reasonably can submit a few thousand jobs at a time.
Here are a few thoughts on what you can do:
1) If your jobs are short-running jobs, is it possible for you to
combine your jobs? Condor excels at running longer jobs, and if you
end up running fewer jobs, it will all work better.
2) You can use DAGMan to throttle your submissions. DAGman lets you
manage sets of dependent jobs (job A runs, then B and C can run
simultaneously, then D runs--that sort of thing), but you don't have
to use it for that purpose. You can make a single DAG with 800,000
independent jobs in it, then tell DAGMan to submit the jobs bit by
bit to Condor.
DAGMan is in Section 2.12 of the Condor 6.7 manual. Note the -maxidle
option to limit how many idle jobs DAGMan will allow there to be:
this will effectively throttle how much you submit to Condor at once.
Anyway, what I need is a method of clearing the jobs queued so I can get
back to work on smaller batches but condor_q seems to hang so I can't
actually determine what's in the queue and 'condor_rm job#' also seem to
hang. I've tried restarting condor but obviously the queue remains. Is
there a backdoor method of clearing this?
If you want to totally clear the queue, remove the job_queue* files
in your spool directory.