[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem with multiple schedds in 7.7+



Matt,
Thanks for the response.  Unfortunately, the VMs I was testing with
had to be rebuilt a couple days ago.  I just got them back and it
will take a day or 2 to get the environment back up to continue.

I will get back to you then.

John Weigand

On 6/4/2012 7:57 PM, Matthew Farrellee wrote:
On 05/21/2012 09:37 AM, John Weigand wrote:
There appears to be a change in behavior in Condor when multiple schedds
are defined. I have tested this with 7.7.5 and 7.8. It does not occur
in 7.6.6 and prior.

Test condition:
1. 3 schedds are defined
2. I submit 1 job.
3. condor_q -g shows 1 schedd queue with the job
4. I restart condor
5. condor_q -g shows the same job in all 3 schedd queues and treats
them as independent jobs.

I use the same configuration for all 3 versions of Condor for the
secondary schedds:

SCHEDDJOBS2 = $(SCHEDD)
SCHEDDJOBS2_ARGS = -local-name scheddjobs2
SCHEDD.SCHEDDJOBS2.SCHEDD_NAME = schedd_jobs2
SCHEDD.SCHEDDJOBS2.SCHEDD_LOG =
$(LOG)/SchedLog.$(SCHEDD.SCHEDDJOBS2.SCHEDD_NAME)
SCHEDD.SCHEDDJOBS2.LOCAL_DIR =
$(LOCAL_DIR)/$(SCHEDD.SCHEDDJOBS2.SCHEDD_NAME)
SCHEDD.SCHEDDJOBS2.EXECUTE = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/execute
SCHEDD.SCHEDDJOBS2.LOCK = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/lock
SCHEDD.SCHEDDJOBS2.PROCD_ADDRESS =
$(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/procd_pipe
SCHEDD.SCHEDDJOBS2.SPOOL = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/spool
SCHEDD.SCHEDDJOBS2.SCHEDD_ADDRESS_FILE=$(SCHEDD.SCHEDDJOBS2.SPOOL)/.schedd_address


SCHEDD.SCHEDDJOBS2.SCHEDD_DAEMON_AD_FILE=$(SCHEDD.SCHEDDJOBS2.SPOOL)/.schedd_classad


SCHEDDJOBS2_LOCAL_DIR_STRING = "$(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)"
SCHEDD.SCHEDDJOBS2.SCHEDD_EXPRS = LOCAL_DIR_STRING
DAEMON_LIST = $(DAEMON_LIST), SCHEDDJOBS2
:
(same for schedd3)
:
DC_DAEMON_LIST = + SCHEDDJOBS2 SCHEDDJOBS3


This works in 7.6.6 and prior, just not in 7.7.5 and 7.8.

Any ideas?

John Weigand

First thought it somehow all the Schedds are using the same spool. When you
restart them they should log something like "About to rotate ClassAd log
/var/lib/condor/spool/job_queue.log". Make sure they're all processing a
different job_queue.log.

Do you happen to have a wallaby dump of your configuration to share?

Best,


matt