Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] renice increments (was: PASSWD_CACHE_REFRESH in 6.9.4)

Date: Tue, 23 Oct 2007 18:11:27 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] renice increments (was: PASSWD_CACHE_REFRESH in 6.9.4)

Stuart,

The default SCHED_UNIV_RENICE_INCREMENT is 0 in 6.9. Local universejobs are reniced according to JOB_RENICE_INCREMENT, which still defaultsto 10.


--Dan

Stuart Anderson wrote:

Dan,
	Thanks for the explanation. What is the current default and
recommended value for SCHED_UNIV_RENICE_INCREMENT in the 6.9 series?
What about adding a LOCAL_UNIV_RENICE_INCREMENT option? For example,
I would like to make the distinction that DAGMan has a higher priority
in the scheduler universe than short running user jobs in the local
universe.

Thanks.


On Tue, Oct 23, 2007 at 09:04:29AM -0500, Dan Bradley wrote:
Ian Chesal wrote:
This seems counter intuitive to me. Why would _not_ nice'ingthe shadow
processes on a busy submit machine be a good thing?
Ditto. Is this a Windows scheduler only thing? I'm almost certain Alan
De Smet's talk every year at Condor Week talks about using higher nice
levels on the shadows to help out a starved-for-CPU schedd process.
If you want to increase the priority of the schedd, that is possibly agood idea. However, using SHADOW_RENICE_INCREMENT=10 to decrease thepriority of the shadows below all other normal processes on the systemdegrades throughput in every case we have observed or tested in the 6.9branch. Part of the problem is that the schedd and the shadow need tocommunicate. During this communication, it is actually possible for theschedd to be slowed down because it is stuck waiting for a response froma low priority shadow. More common is to see connection failures in theshadow logs due to the shadow being so cpu starved that it cannot form aconnection to the schedd, even with very generous timeouts.
Another thing that has changed is that the 6.9.4 schedd is much less cpuhungry than 6.8. Having 10s of thousands of jobs in the queue and a fewthousand jobs running should not severely tax the 6.9.4 schedd onreasonable server-class hardware unless the jobs are so fast that thecompletion rate is greater than ~10-15 jobs per second.
I'll admit that our tests of this have all been under linux and havebeen focussed on vanilla universe. We're certainly hoping for feedbackon all the other possible usage cases.
Cheers,
--Dan

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:https://lists.cs.wisc.edu/archive/condor-users/

References:
- Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
  - From: Stuart Anderson
- Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
  - From: Ian Chesal
- Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
  - From: Dan Bradley
- Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
  - From: Stuart Anderson

Prev by Date: [Condor-users] Checking if a binary file is compiled for condor
Next by Date: Re: [Condor-users] Checking if a binary file is compiled for condor
Previous by thread: Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
Next by thread: [Condor-users] blender and condor
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] renice increments (was: PASSWD_CACHE_REFRESH in 6.9.4)