[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem with multiple schedds in 7.7+



Matt,

You were correct in the problem being the job_queue.log.

A JOB_QUEUE_LOG attribute was introduced in Condor 7.7.5
.. ticket 2598 https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2598
    http://research.cs.wisc.edu/condor/manual/v7.7/3_3Configuration.html#16343

Prior to the introduction of this feature a job_queue.log was always
maintained in the spool directory of each schedd.  With this change, it
appears (either a bug or by desire), the job queue log of each additional
schedd must be defined explicitly.
  SCHEDD.SCHEDDJOBS2.JOB_QUEUE_LOG = $(SCHEDD.SCHEDDJOBS2.SPOOL)/job_queue.log

If not explicitely stated, only 1 job_queue.log is used.  Hence, all jobs are
assigned to all schedd queues on a restart.

John Weigand



On 6/4/2012 7:57 PM, Matthew Farrellee wrote:
On 05/21/2012 09:37 AM, John Weigand wrote:
There appears to be a change in behavior in Condor when multiple schedds
are defined. I have tested this with 7.7.5 and 7.8. It does not occur
in 7.6.6 and prior.

Test condition:
1. 3 schedds are defined
2. I submit 1 job.
3. condor_q -g shows 1 schedd queue with the job
4. I restart condor
5. condor_q -g shows the same job in all 3 schedd queues and treats
them as independent jobs.

I use the same configuration for all 3 versions of Condor for the
secondary schedds:

SCHEDDJOBS2 = $(SCHEDD)
SCHEDDJOBS2_ARGS = -local-name scheddjobs2
SCHEDD.SCHEDDJOBS2.SCHEDD_NAME = schedd_jobs2
SCHEDD.SCHEDDJOBS2.SCHEDD_LOG =
$(LOG)/SchedLog.$(SCHEDD.SCHEDDJOBS2.SCHEDD_NAME)
SCHEDD.SCHEDDJOBS2.LOCAL_DIR =
$(LOCAL_DIR)/$(SCHEDD.SCHEDDJOBS2.SCHEDD_NAME)
SCHEDD.SCHEDDJOBS2.EXECUTE = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/execute
SCHEDD.SCHEDDJOBS2.LOCK = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/lock
SCHEDD.SCHEDDJOBS2.PROCD_ADDRESS =
$(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/procd_pipe
SCHEDD.SCHEDDJOBS2.SPOOL = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/spool
SCHEDD.SCHEDDJOBS2.SCHEDD_ADDRESS_FILE=$(SCHEDD.SCHEDDJOBS2.SPOOL)/.schedd_address


SCHEDD.SCHEDDJOBS2.SCHEDD_DAEMON_AD_FILE=$(SCHEDD.SCHEDDJOBS2.SPOOL)/.schedd_classad


SCHEDDJOBS2_LOCAL_DIR_STRING = "$(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)"
SCHEDD.SCHEDDJOBS2.SCHEDD_EXPRS = LOCAL_DIR_STRING
DAEMON_LIST = $(DAEMON_LIST), SCHEDDJOBS2
:
(same for schedd3)
:
DC_DAEMON_LIST = + SCHEDDJOBS2 SCHEDDJOBS3


This works in 7.6.6 and prior, just not in 7.7.5 and 7.8.

Any ideas?

John Weigand

First thought it somehow all the Schedds are using the same spool. When you
restart them they should log something like "About to rotate ClassAd log
/var/lib/condor/spool/job_queue.log". Make sure they're all processing a
different job_queue.log.

Do you happen to have a wallaby dump of your configuration to share?

Best,


matt