[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] SPOOL file clash with multiple submitters

Thanks for this. As you say the nub of the matter seems to be how the different daemons

interpret the value of $(SPOOL) - particularly the scheduler, the negotiator and the shadows.

Can anyone from the Condor team shed any light on this - I can't find much info in the








From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: 27 January 2012 13:19
To: Condor-Users Mail List
Subject: Re: [Condor-users] SPOOL file clash with multiple submitters


I don't ever use checkpointing so this was never tested with the suggested config in that post, sorry.


I thought the shadow was responsible for stashing the checkpoint files -- it sounds like, with the suggested configuration, the shadows spawned by the schedd  are not inheriting the schedd settings and getting a unique SPOOL directory.


One thing you could try is to use a SPOOL setting that's unique for every single shadow:


SPOOL = $(LOCAL_DIR)/checkpoints/$(CurrentTime)/$(PID)


That'd stop PID collisions.


Honestly, I'm not sure that'll work but that's probably moving in the right direction.


There's a more convoluted way of setting up multiple schedd's that involves point the schedd at a unique configuration file. It was what we did pre-7.6.x for the 7.2 and 7.4 series. That may be a better way to propagate a unique SPOOL setting to shadows on a per-schedd basis.



- Ian



Ian Chesal


Cycle Computing, LLC

Leader in Open Compute Solutions for Clouds, Servers, and Desktops

Enterprise Condor Support and Management Tools






On Friday, 27 January, 2012 at 7:52 AM, Smith, Ian wrote:

Hello All,


I am trying to set up mutiple schedulers on our SMP central manager/submit

host along the lines suggested by Cycle Computing


This seemed to be working well until I noticed there was a clash between the

checkpoint files of jobs from one schedd and those of another. As far as I

can see the job IDs of jobs in separate queues are not unique so if a user of one

scheduler has a checkpointed job with say ID 3.1, its checkpoint files will be in




But then another user on another schedd has a job with same ID 3.1 and it

attempts to use the same directory which fails because of file permissions.


I've configured Condor with


SPOOL_ROOT = /condor_scratch/spool


SCHEDD1 = $(SBIN)/condor_schedd1

SCHEDD1_ARGS = -f -local-name Q1

SCHEDD1_LOG = $(LOG)/ScheddLog.1





SCHEDD2 = $(SBIN)/condor_schedd2

SCHEDD2_ARGS = -f -local-name Q2

SCHEDD2_LOG = $(LOG)/ScheddLog.2







but the checkpointing files always seem to get written under the common $(SPOOL)

directory rather than separate ones causing the clash.


Interestingly Condor does seem to put these files in indvidual directories (not

the common spool area):


job_queue.log job_queue.log.1 local_univ_execute spool_version


so it seems to be aware of SCHEDD.Q1.SCHEDD_LOG if not SCHEDD.Q2.SPOOL


If I take out the default spool/ directory and remove the $(SPOOL) definition,

the negotiator fails on start up. Since there's only one negotiator I would

expect it to use a common directory ???


Any suggestions would be very useful.


thanks in advance,





Dr Ian C. Smith,

Advanced Research Computing,

University of Liverpool.


Condor-users mailing list

To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a

subject: Unsubscribe

You can also unsubscribe by visiting


The archives can be found at: