Thanks for this. As you say the nub of the matter seems to be how the different daemons
interpret the value of $(SPOOL) - particularly the scheduler, the negotiator and the shadows.
Can anyone from the Condor team shed any light on this - I can't find much info in the
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of
I don't ever use checkpointing so this was never tested with the suggested config in that post, sorry.
I thought the shadow was responsible for stashing the checkpoint files -- it sounds like, with the suggested configuration, the shadows spawned by the schedd are not inheriting the schedd settings and getting a unique SPOOL directory.
One thing you could try is to use a SPOOL setting that's unique for every single shadow:
SPOOL = $(LOCAL_DIR)/checkpoints/$(CurrentTime)/$(PID)
That'd stop PID collisions.
Honestly, I'm not sure that'll work but that's probably moving in the right direction.
There's a more convoluted way of setting up multiple schedd's that involves point the schedd at a unique configuration file. It was what we did pre-7.6.x for the 7.2 and 7.4 series. That may be a better way to propagate a unique SPOOL setting to shadows on a per-schedd basis.
Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools
On Friday, 27 January, 2012 at 7:52 AM, Smith, Ian wrote: