Any idea why this would be the case?  I've used other queue managers in the past that have no trouble with jobs in the tens of thousands.  I will try reducing the debugging.  Any ideas on distributing the schedd load across multiple machines?  This will be a HUGE setback for us adopting Condor if I can't figure out a way to stably handle 10,000+ jobs.  

Thanks for the heads up. 


I can't give an official answer, but I can tell you that we had the same
problem with 5136 jobs.  In our cases, there were a couple other things that
contributed, so you could check these, too: high debug level on the schedd
and a supervising process that used condor_q and condor_history to monitor
jobs.  Condor_q talks to the schedd, so if you're doing anything like that
you may want to parse log files instead.

However, even after taking down debug level and using log parsing, our
schedd still struggled with 5000 jobs in the queue.


I have a remote schedd with 9000+ jobs.  The schedd is continually running
at 100% cpu.  I am hoping to gain some suggestions on how to improve the
efficiency of the schedd.  

Do I need to split the jobs between schedds on 2 or 3 more machines?  

Would it help significantly to move the negotiator and collector to another

Are there ways to speed up the schedd so that it does not take as long to
run through the job queue?

I am using Condor 6.7.7 with a nearly out-of-the-box config.



