[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] When does Condor clean up the SPOOL directory?

----- Original Message -----

From: Michael Hanke <mih@xxxxxxxxxx>
To: Rob <spamrefuse@xxxxxxxxx>; Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Sent: Friday, June 1, 2012 3:09 AM
Subject: Re: [Condor-users] When does  Condor clean up the SPOOL directory?


On Thu, May 31, 2012 at 12:17:02AM -0700, Rob wrote:
> I'm running a Condor master (version 7.7.5) on a 64 bit Fedora/linux
> system, which manages a Windows-only pool.
> The spool directory of the master (/var/lib/condor/spool/) still has
> directories and files of suspended jobs, although these jobs already
> are completed and thus not in the queue anymore.
> I thought Condor would clean up the spool directory as soon as the job
> is successfully completed. Or is this done in a periodic clean up
> procedure once a day or so?

| Hmmm, this sounds similar to
| http://bugs.debian.org/663031
| also see
| https://lists.cs.wisc.edu/archive/condor-users/2012-March/msg00052.shtml
| this is still a mystery to me. Are you seeing this for vanilla jobs that
| get evicted, or are you using DMTCP for checkpointing of vanilla
| universe jobs?

Yes, indeed, I use Vanilla universe and "when_to_transfer_output = ON_EXIT_OR_EVICT".

And the next day I received an email from the Condor system, telling me that condor_preen has removed stale files (see below). However, the empty directories are still there, e.g:





Apparently condor_preen cleans up the stale files, but not the stale directories.
I'm afraid with a large job queue, this can create enormous amounts of stale directories....


The condor_preen process has found the following stale condor files on <condor>:

 /var/lib/condor/spool/23/0/cluster23.proc0.subproc0 - Removed
 /var/lib/condor/spool/21/0/cluster21.proc0.subproc0 - Removed
 /var/lib/condor/spool/22/0/cluster22.proc0.subproc0 - Removed
 /var/lib/condor/spool/25/0/cluster25.proc0.subproc0 - Removed
 /var/lib/condor/spool/24/0/cluster24.proc0.subproc0 - Removed

What is condor_preen?

The condor_preen tool examines the directories belonging to Condor, and
removes extraneous files and directories which may be left over from Condor
processes which terminated abnormally either due to internal errors or a
system crash.  The directories checked are the LOG, EXECUTE, and SPOOL
directories as defined in the Condor configuration files.  The condor_preen
tool is intended to be run as user root (or user condor) periodically as a
backup method to ensure reasonable file system cleanliness in the face of
errors. This is done automatically by default by the condor_master daemon.
It may also be explicitly invoked on an as needed basis.

See the Condor manual section on condor_preen for more details.