Hi All,
A quick summary: I've hit a seemingly arbitrary limit in my condor grid, only a maximum of 76 jobs run at any given time, even though there are suitable idle servers available. I think it might be because my master is under powered and regularly hitting 100% CPU but this isn't based on anything more than a hunch, yet. More details below..
I've just added around 150 cores to our condor grid, now at a total of 190 cores.
In testing (throwing ~4000x30 minute jobs at it) I'm noticing it seems to cap at exactly 76 running jobs.
I'm pretty sure this is not a requirements issue because (as recently as yesterday) I've run exactly the same set of jobs and they have run fine on servers which are now Idle. (This was when there were only 40 cores).
My best guess so far is a resource issue on the master (which is also the scheduler and everything else, really) which I'm now regularly seeing at 100% CPU. Though I don't really understand why this would cause the problem or why it's always exactly 76 jobs running even though they are all (slightly) different sizes.
Does this hunch sound believable? I intend to investigate further, but thought it might be good to run it by the experts to see if it sounds like a good starting point.

I know my master is under powered (1 virtual core with 1.5GB RAM) so I fully intend to give this a boost anyway - just wondering if this will likely cure the issue (in which case I'll expedite this upgrade) or if there is probably a different issue too?

Thanks for any ideas!

Rob Stevenson
