[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] SPOOL file clash with multiple submitters



Thanks Ian I will give this a go - I was thinking along these lines. As I see it, it

looks like the per-schedd environment doesn't get propagated properly to the shadows

so that they all pick up same the common S(SPOOL) directory. I should I have pointed

out that I'm using Condor 7.6.1 on Scientific Linux and this has a different to

spool area layout to 7.4.x (extra two layers presumably designed to limit the

number of files in each directory). I wonder if a bug crept in with the change or

if the problem was always there. Unfortunately I don't really have time to test

this with users chomping at the bit for the new server (always the way !).

 

regards,

 

-ian.

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: 30 January 2012 18:08
To: Condor-Users Mail List
Subject: Re: [Condor-users] SPOOL file clash with multiple submitters

 

On Friday, 27 January, 2012 at 9:57 AM, Smith, Ian wrote:

Thanks for this. As you say the nub of the matter seems to be how the different daemons

interpret the value of $(SPOOL) - particularly the scheduler, the negotiator and the shadows.

Can anyone from the Condor team shed any light on this - I can't find much info in the

manual.

You could try the "old" way of running multiple-schedds. Before the named instance stuff existed you had to give each schedd a unique condor_config file to read it's configuration from. It was convoluted, but it worked.

 

I'm pulling this from memory, so test this out before putting it on a production system!

 

Let's say you want to have two schedds on a machine. And for ease we'll say that you're running linux with the binaries in /opt/condor and the local directory in /opt/condor.local. You've got the following layout in /opt/condor.local:

 

/opt/condor.local/

          condor_config

          condor_config.local

          condor_config_schedd1

          condor_config_schedd2

          /1

                      /log

                      /spool

                      /execute

          /2

                      /log

                      /spool

                      /execute

          /log

          /spool

 

The condor_config file is your usual condor_config file that you'd use for your pool. You'll keep LOG set to $(LOCAL_DIR)/log in this file. That's where the MasterLog should go. The condor_config.local file defines only that there are two schedds on the machine:

 

          SCHEDD1 = $(SBIN)/condor_schedd.exe

          SCHEDD1_ENVIRONMENT = CONDOR_CONFIG=$(LOCAL_DIR)/condor_config_schedd1

          DAEMON_LIST = $(DAEMON_LIST), SCHEDD1

          DC_DAEMON_LIST = $(DC_DAEMON_LIST), SCHEDD1

 

          SCHEDD2 = $(SBIN)/condor_schedd.exe

          SCHEDD2_ENVIRONMENT = CONDOR_CONFIG=$(LOCAL_DIR)/condor_config_schedd2

          DAEMON_LIST = $(DAEMON_LIST), SCHEDD2

          DC_DAEMON_LIST = $(DC_DAEMON_LIST), SCHEDD2

 

The condor_config_schedd(1|2) files are complete copies of your condor_config file *plus* information specific to each of the schedd instances. So in condor_config_schedd1:

 

          SCHEDD_NUMBER = 1

          SCHEDD_NAME = $(SCHEDD_NUMBER)@$(FULL_HOSTNAME)

          SCHEDD_PREFIX = Q$(SCHEDD_NUMBER)

          LOG = $(LOCAL_DIR)/$(SCHEDD_NUMBER)/log

          SPOOL = $(LOCAL_DIR)/$(SCHEDD_NUMBER)/spool

          EXECUTE = $(LOCAL_DIR)/$(SCHEDD_NUMBER)/execute

 

And in condor_config_schedd2:

 

          SCHEDD_NUMBER = 2

          SCHEDD_NAME = $(SCHEDD_NUMBER)@$(FULL_HOSTNAME)

          SCHEDD_PREFIX = Q$(SCHEDD_NUMBER)

          LOG = $(LOCAL_DIR)/$(SCHEDD_NUMBER)/log

          SPOOL = $(LOCAL_DIR)/$(SCHEDD_NUMBER)/spool

          EXECUTE = $(LOCAL_DIR)/$(SCHEDD_NUMBER)/execute

 

Obviously it's not nearly as elegant as the named config stuff, but that *should* work and might help with the propagation of the SPOOL setting down to the shadows for each named schedd on the system. 

 

You may not actually need complete copies of the condor_config file for the two schedd instance files (condor_config_schedd(1|2)) but I've never spent the time to whittle the schedd-specific files down to the bare minimum to see what works.

 

Regards,

- Ian

 

---

Ian Chesal

 

Cycle Computing, LLC

Leader in Open Compute Solutions for Clouds, Servers, and Desktops

Enterprise Condor Support and Management Tools