[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Leftovers of checkpointed jobs accumulate in SPOOL



Hi,

I'm testing DMTCP-based checkpointing of vanilla job in our Condor pool
(all version 7.7.5). I noticed that jobs once evicted remain in SPOOL
even after they got restarted on an exec node again. Checkpoint files,
executable, restart script and various other files remain -- I assume
that is just everything.

Eventually condor_preen would remove most of it, e.g.

  /var/spool/condor/2030/0/cluster2030.proc0.subproc0 - Removed
  /var/spool/condor/2027/0/cluster2027.proc0.subproc0 - Removed

However, even after the preen run 
 
  /var/spool/condor/2030/0
  /var/spool/condor/2027/0

remain as empty directories.

Could this be a configuration issue? Does DMTCP-based checkpointing need
additional setup? I'm using the latest available Condor-DMTCP
integration.

Thanks in advance,

Michael


-- 
Michael Hanke
http://mih.voxindeserto.de