[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] shared fs high latency slow down the schedd



Another idea is put
  CONDOR_FSYNC = False
in your condor_config, as fsync is often the performance bottleneck on shared filesystems. Of course, in the event of a system crash, it is possible that jobs could "disappear" from the queue if they were recently submitted, but maybe you can live with that. From the HTCondor Manual:

CONDOR_FSYNC
A boolean value that controls whether HTCondor calls fsync() when writing the user job and transaction logs. Setting this value to False will disable calls to fsync(), which can help performance for condor_schedd log writes at the cost of some durability of the log contents, should there be a power or hardware failure. The default value is True.


On 6/20/2017 12:58 PM, Dimitri Maziuk wrote:
On 06/20/2017 10:43 AM, Krieger, Donald N. wrote:
How about putting the SPOOL directory on /dev/shm ?

Aside from the disappearing job queue, filling up /dev/shm may cause all
kinds of interesting effects, depending on how and where you mount it.

This is something to keep in mind when moving spool to local drive, too:
we recently filled up / and had to move spool to a zfs pool -- the part
where *spool can get large* should be in all-caps bold in the manual.



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685