[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
- Date: Tue, 23 Oct 2007 09:04:29 -0500
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
Ian Chesal wrote:
This seems counter intuitive to me. Why would _not_ nice'ing
processes on a busy submit machine be a good thing?
Ditto. Is this a Windows scheduler only thing? I'm almost certain Alan
De Smet's talk every year at Condor Week talks about using higher nice
levels on the shadows to help out a starved-for-CPU schedd process.
If you want to increase the priority of the schedd, that is possibly a
good idea. However, using SHADOW_RENICE_INCREMENT=10 to decrease the
priority of the shadows below all other normal processes on the system
degrades throughput in every case we have observed or tested in the 6.9
branch. Part of the problem is that the schedd and the shadow need to
communicate. During this communication, it is actually possible for the
schedd to be slowed down because it is stuck waiting for a response from
a low priority shadow. More common is to see connection failures in the
shadow logs due to the shadow being so cpu starved that it cannot form a
connection to the schedd, even with very generous timeouts.
Another thing that has changed is that the 6.9.4 schedd is much less cpu
hungry than 6.8. Having 10s of thousands of jobs in the queue and a few
thousand jobs running should not severely tax the 6.9.4 schedd on
reasonable server-class hardware unless the jobs are so fast that the
completion rate is greater than ~10-15 jobs per second.
I'll admit that our tests of this have all been under linux and have
been focussed on vanilla universe. We're certainly hoping for feedback
on all the other possible usage cases.