[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Leftovers of checkpointed jobs accumulate in SPOOL



Michael Hanke <michael.hanke@xxxxxxxxx> wrote:
> I'm testing DMTCP-based checkpointing of vanilla job in our Condor pool
> (all version 7.7.5). I noticed that jobs once evicted remain in SPOOL
> even after they got restarted on an exec node again. Checkpoint files,
> executable, restart script and various other files remain -- I assume
> that is just everything.

I'm not clear what you're reporting.  Files in SPOOL should
remain as long as the associated job is still in the queue.  Are
you saying that the job in question left the queue (is no longer
visible in condor_q), but still has a subdirectory in SPOOL?  If
so, that would likely be a bug.  That it's using DMTCP
checkpointing shouldn't have any impact on the behavior, although
it's possible that the DMTCP integration code is somehow tickling
a Condor bug other code isn't.

-- 
Alan De Smet                 Center for High Throughput Computing
adesmet@xxxxxxxxxxx                       http://chtc.cs.wisc.edu