[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Problem with multiple schedds in 7.7+
- Date: Mon, 13 Aug 2012 10:15:01 -0500
- From: John Weigand <weigand@xxxxxxxx>
- Subject: Re: [Condor-users] Problem with multiple schedds in 7.7+
You were correct in the problem being the job_queue.log.
A JOB_QUEUE_LOG attribute was introduced in Condor 7.7.5
.. ticket 2598 https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2598
Prior to the introduction of this feature a job_queue.log was always
maintained in the spool directory of each schedd. With this change, it
appears (either a bug or by desire), the job queue log of each additional
schedd must be defined explicitly.
SCHEDD.SCHEDDJOBS2.JOB_QUEUE_LOG = $(SCHEDD.SCHEDDJOBS2.SPOOL)/job_queue.log
If not explicitely stated, only 1 job_queue.log is used. Hence, all jobs are
assigned to all schedd queues on a restart.
On 6/4/2012 7:57 PM, Matthew Farrellee wrote:
On 05/21/2012 09:37 AM, John Weigand wrote:
There appears to be a change in behavior in Condor when multiple schedds
are defined. I have tested this with 7.7.5 and 7.8. It does not occur
in 7.6.6 and prior.
1. 3 schedds are defined
2. I submit 1 job.
3. condor_q -g shows 1 schedd queue with the job
4. I restart condor
5. condor_q -g shows the same job in all 3 schedd queues and treats
them as independent jobs.
I use the same configuration for all 3 versions of Condor for the
SCHEDDJOBS2 = $(SCHEDD)
SCHEDDJOBS2_ARGS = -local-name scheddjobs2
SCHEDD.SCHEDDJOBS2.SCHEDD_NAME = schedd_jobs2
SCHEDD.SCHEDDJOBS2.EXECUTE = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/execute
SCHEDD.SCHEDDJOBS2.LOCK = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/lock
SCHEDD.SCHEDDJOBS2.SPOOL = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/spool
SCHEDDJOBS2_LOCAL_DIR_STRING = "$(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)"
SCHEDD.SCHEDDJOBS2.SCHEDD_EXPRS = LOCAL_DIR_STRING
DAEMON_LIST = $(DAEMON_LIST), SCHEDDJOBS2
(same for schedd3)
DC_DAEMON_LIST = + SCHEDDJOBS2 SCHEDDJOBS3
This works in 7.6.6 and prior, just not in 7.7.5 and 7.8.
First thought it somehow all the Schedds are using the same spool. When you
restart them they should log something like "About to rotate ClassAd log
/var/lib/condor/spool/job_queue.log". Make sure they're all processing a
Do you happen to have a wallaby dump of your configuration to share?