[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] clean up on execute node



Thanks, TJ! I’ll give this a shot.

 

steve

 

Stephen C. Upton

SEED (Simulation Experiments & Efficient Designs) Center

Mobile: 804-994-4257

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of John M Knoeller
Sent: Monday, October 02, 2017 10:31 AM
To: HTCondor-Users Mail List; Condor-Users Mail List (condor-users@xxxxxxxxxxx)
Subject: Re: [HTCondor-users] clean up on execute node

 

There may be some clues in the HTCondor daemon logs on the execute node.  The will be in c:\condor\log by default.

 

The first pass at cleanup will be logged by the Starter daemon in StarterLog.slotN  where N is the slot id.  If that fails to clean up, then

the Startd will also do a cleanup pass.  It’s log is the StartLog. 

 

You may get more useful messages by adding these lines to the configuration of that machine.  You can add them to the end of c:\condor\condor_config.local

 

STARTD_DEBUG = $(STARTD_DEBUG) D_FULLDEBUG D_CAT:1

STARTER_DEBUG = $(STARTD_DEBUG) D_FULLDEBUG D_CAT:1

 

The common reason that the execute directory doesn’t get cleaned up on Windows is that some process is holding the files open when HTCondor tries to delete them.  It’s possible that part of your job is doing this, but it’s unlikely since HTCondor tracks the processes that the job creates and will kill the child processes when the main process exits.   The most common reason for leaking execute directories seems to be antivirus software that is holding files open while it scans them and not releasing the files when HTCondor goes to delete them.

 

If you have antivirus software, you might tell it to ignore c:\condor\execute and all of the subdirectories below that.

 

-tj

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Upton, Stephen (Steve) (CIV)
Sent: Friday, September 29, 2017 3:32 PM
To: Condor-Users Mail List (condor-users@xxxxxxxxxxx) <condor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] clean up on execute node

 

Hi all,

 

I’m running a small HTCondor cluster, with a Mac OS10 head node, with Windows Server 2012 and Windows 7 execute nodes, all running HTCondor version 8.6.1

 

Occasionally, I’m not getting the execute directory on the execute node (a Windows 7 machine) to “clean up”, i.e., delete all the files in that directory. The files do get transferred, so I’m wondering where I can look to find out why this is the case. I’m running a model that dumps a lot of output, so having these directories still hanging around after the model has finished will eat up my disk space fast.

 

Any pointers would be greatly appreciated! Also, please let me know if you need additional information.

 

Thanx

steve

 

Stephen C. Upton

Faculty Associate - Research

SEED (Simulation Experiments & Efficient Designs) Center

Operations Research Department

Naval Postgraduate School

Mobile: 804-994-4257

NIPR: scupton@xxxxxxx

SIPR: uptonsc@xxxxxxxxxxxxxxxxx

SEED Center web site: http://harvest.nps.edu

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature