[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
- Date: Tue, 23 Oct 2007 13:36:04 -0700
- From: Stuart Anderson <anderson@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
Thanks for the explanation. What is the current default and
recommended value for SCHED_UNIV_RENICE_INCREMENT in the 6.9 series?
What about adding a LOCAL_UNIV_RENICE_INCREMENT option? For example,
I would like to make the distinction that DAGMan has a higher priority
in the scheduler universe than short running user jobs in the local
On Tue, Oct 23, 2007 at 09:04:29AM -0500, Dan Bradley wrote:
> Ian Chesal wrote:
> >> This seems counter intuitive to me. Why would _not_ nice'ing
> >> the shadow
> >> processes on a busy submit machine be a good thing?
> > Ditto. Is this a Windows scheduler only thing? I'm almost certain Alan
> > De Smet's talk every year at Condor Week talks about using higher nice
> > levels on the shadows to help out a starved-for-CPU schedd process.
> If you want to increase the priority of the schedd, that is possibly a
> good idea. However, using SHADOW_RENICE_INCREMENT=10 to decrease the
> priority of the shadows below all other normal processes on the system
> degrades throughput in every case we have observed or tested in the 6.9
> branch. Part of the problem is that the schedd and the shadow need to
> communicate. During this communication, it is actually possible for the
> schedd to be slowed down because it is stuck waiting for a response from
> a low priority shadow. More common is to see connection failures in the
> shadow logs due to the shadow being so cpu starved that it cannot form a
> connection to the schedd, even with very generous timeouts.
> Another thing that has changed is that the 6.9.4 schedd is much less cpu
> hungry than 6.8. Having 10s of thousands of jobs in the queue and a few
> thousand jobs running should not severely tax the 6.9.4 schedd on
> reasonable server-class hardware unless the jobs are so fast that the
> completion rate is greater than ~10-15 jobs per second.
> I'll admit that our tests of this have all been under linux and have
> been focussed on vanilla universe. We're certainly hoping for feedback
> on all the other possible usage cases.
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at:
Stuart Anderson anderson@xxxxxxxxxxxxxxxx