On 10/3/2017 5:53 PM, Zhuang, Di wrote:
> Looking into StarterLog.slot_1, I do see the following. Could the build
> up have something to do with the line "Got SIGQUIT. Performing fast
> shutdown." What can this be caused by.
The above is normal / expected behavior.... it is just the condor_startd
telling the condor_starter to go away.
> If there are no immediate
> solution, as a temporary workaround, is there a way for me to safely
> identify which scratch directories are being worked on and remove the rest?
Are the leaked dir_xxx subdirectories empty? I.e. they do not even
contain a ".job.ad" file? In the one time I could reproduce the problem
on my machine, the subdirectory was indeed empty - in other words,
HTCondor successfully removed all the files but had an error removing
the (now empty) subdirectory. If the leaked directories are also empty
for you, that would be an easy way to identify which ones you can
remove... if the subdirectory is older than a few seconds and is empty,
you could remove it.
Are you using a real-time virus scanner like Windows Security
Essentials, Windows Defender, etc? You could try adding
C:\condor\execute folder to the list of folder excluded from being
scanned. In another thread TJ guessed that HTCondor was unable to
remove the (empty) subdirectory because a virus scanner had it