[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems with jobs



Opps! I typed that wrong. You want to actually change:

 

            JOB_START_COUNT = 2

 

I said ‘JOB_START_INTERVAL’ -- that’s wrong. I think faster than I can type sometimes...

 

- Ian

 


From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: December 7, 2005 3:53 PM
To: Chris Miles; Condor-Users Mail List
Subject: Re: [Condor-users] Problems with jobs

 

That’s it! That’s the key: the jobs run very quickly (I’m guessing in the range of a few minutes, right?).

 

In that case condor can’t spawn shadows fast enough. The shadow spawn rate on the schedds is throttled to prevent overloading the machine by starting many, many processes at the same time. There are two variables that control the spawn rate. You’ll only want to change JOB_START_COUNT.

 

Put this in the condor_config file used by all your schedds:

 

            ##  Start more than one job at a time

            JOB_START_INTERVAL = 2

 

Once that’s deployed in all your condor_config files issue:

 

            condor_reconfig -all

 

From your central negotiator to reconfigure all of them.

 

You can up that number until the Claimed+Idle machines disappear but keep a careful on CPU usage on your schedd machines. It can spike spawing too many shadow processes at once.

 

- Ian

 

0 jobs; 0 idle, 0 running, 0 held

on any machine that I try. I think by the time I SSH to a node thats running a job Its already

finished hence the empty queue. The jobs run very quickly