[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] shared fs high latency slow down the schedd



On 06/20/2017 08:19 AM, Alessandro Italiano wrote:
Hi

we have a HTCondor cluster for local jobs submission which exploits a shared filesystem.


Fundamentally, the schedd needs to write to the filesystem information about work it is doing, and if the filesystem is slow, there isn't much to do but wait.

However, I assume that not all of the data the schedd needs to store is on the shared filesystem. If you can move more of the data onto a local, or better-yet, SSD filesystem, this may help. The schedd writes to the SPOOL directory frequently -- please make sure that SPOOL (i.e. condor_config_val SPOOL) is on a local filesystem. Another type of file that the schedd periodically writes to is the job log. If the job log is on a shared filesystem, the schedd will get very slow. If you can move those to a local filesystem, things should improve.

-greg