[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] schedd getting more than max_jobs_running running



Hi condor-users,

We have MAX_JOBS_RUNNING set to:

[root@fifebatch1 condor]# condor_config_val -v MAX_JOBS_RUNNING
MAX_JOBS_RUNNING = 10000
 # at: /etc/condor/config.d/02_gwms_schedds.config, line 9
 # raw: MAX_JOBS_RUNNING = 10000

[root@fifebatch1 condor]# ls -al /etc/condor/config.d/02_gwms_schedds.config -rw-r--r-- 1 root root 3528 Feb 26 14:43 /etc/condor/config.d/02_gwms_schedds.config [root@fifebatch1 condor]# grep MAX_JOBS_RUNNING /etc/condor/config.d/02_gwms_schedds.config
MAX_JOBS_RUNNING        = 10000

Grepping in our schedd log there are lines like:

SchedLog.20150324T161949:03/24/15 08:17:11 (pid:2002) Preempting 66 jobs due to MAX_JOBS_RUNNING change SchedLog.20150324T161949:03/24/15 08:27:13 (pid:2002) Preempting 88 jobs due to MAX_JOBS_RUNNING change SchedLog.20150324T161949:03/24/15 08:32:10 (pid:2002) Preempting 10 jobs due to MAX_JOBS_RUNNING change SchedLog.20150324T161949:03/24/15 10:17:10 (pid:2002) Preempting 38 jobs due to MAX_JOBS_RUNNING change SchedLog.20150324T161949:03/24/15 10:17:36 (pid:2002) Preempting 13 jobs due to MAX_JOBS_RUNNING change

The manual at:

http://research.cs.wisc.edu/htcondor/manual/v8.3/3_3Configuration.html#21897

says:

Changing this setting to be below the current number of jobs that are running will cause running jobs to be aborted until the number running is within the limit.

My problem is that we are NOT changing the value (see config file timestamp above). We're managing with puppet but certainly not running puppet every 25 seconds as the last two log lines above show so it can't even be some craziness there.

I thought I remember reading somewhere that the schedd may in fact get more than MAX_JOBS_RUNNING jobs started because of the way it works which is fine with me but I thought then it just didn't run any more until the number got below. It seems to be running more than 10k and then proceeding to kill them.

Am I wrong?

joe