[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] having multiple schedulers and collectors



Mag,
As you discovered having 1 scheduler service this many cores can put a lot of load on that single daemon/machine. Using multiple schedds can help this, and for convenience you can even put multiple schedulers on one server. Running condor_q puts further load on the scheduler, but thankfully, an overloaded scheduler can frequently be detected through the collector alone.

Symptoms would include the scheduler ads in the collector getting older than they normally are (default is to update 5 mins), but the most likely sign is that the slots that are matched to this scheduler will show up with "Claimed Idle" State&Activity for longer periods of time (tens of seconds to minutes). This "Claimed Idle problem" is normally a good indicator of a scheduler that's falling behind.

Submitting lots of jobs can cause this problem. Let's say you want to bulk submit 200k jobs without slowing the schedd. An easy way to make sure this doesn't cause a problem is to use a DAG to contain these jobs in batches of say 250, and tell DAGMan to submit them until a certain number remain idle. This will only put jobs into the schedd's job queue that will actually run soon. In addition, the Condor 7.5.2 series also has a number of scheduler optimizations that may improve some aspects of scheduling performance for you.

I hope this helps,
Jason


--

==================================
Jason A. Stowe
cell: 607.227.9686
main: 888.292.5320

http://twitter.com/jasonastowe/
http://twitter.com/cyclecomputing/

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com


On Mon, May 31, 2010 at 9:12 PM, Mag Gam <magawake@xxxxxxxxx> wrote:
is there a way to see if the schedd is backed up? How can I see the
real status of it?

It seems when I submit many jobs (even not running), I get this problem.



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/