[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] shared fs high latency slow down the schedd



How about putting the SPOOL directory on /dev/shm ?  Do the files need to be preserved when the system is rebooted and /dev/shm/ gets cleared?

Best - Don

> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On
> Behalf Of Greg Thain
> Sent: Tuesday, June 20, 2017 11:27 AM
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] shared fs high latency slow down the schedd
> 
> On 06/20/2017 08:19 AM, Alessandro Italiano wrote:
> > Hi
> >
> > we have a HTCondor cluster for local jobs submission which exploits a
> > shared filesystem.
> >
> 
> Fundamentally, the schedd needs to write to the filesystem information
> about work it is doing, and if the filesystem is slow, there isn't much to do but
> wait.
> 
> However, I assume that not all of the data the schedd needs to store is on
> the shared filesystem.  If you can move more of the data onto a local, or
> better-yet, SSD filesystem, this may help.  The schedd writes to the SPOOL
> directory frequently -- please make sure that SPOOL (i.e.
> condor_config_val SPOOL) is on a local filesystem. Another type of file that
> the schedd periodically writes to is the job log.  If the job log is on a shared
> filesystem, the schedd will get very slow.  If you can move those to a local
> filesystem, things should improve.
> 
> -greg
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/