[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] shared fs high latency slow down the schedd



Hi 
thanks for the answers.

the  problem here is manly the user job log files. All other condor files e dirs are stored locally.

The users use to save the job log file on their HOME, which is of course located on the shared filesystem.
This is a sort of mandatory requirement because they remotely submit jobs to the schedd therefore they donât have access to it.

thanks for the suggestion. I will try it.

I also found these two knobs in the admin guide, maybe they can help us.
USERLOG_FILE_CACHE_MAX
USERLOG_FILE_CACHE_CLEAR_INTERVAL

Thanks again

Ale
On 20 Jun 2017, at 22:58, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:

Another idea is put
 CONDOR_FSYNC = False
in your condor_config, as fsync is often the performance bottleneck on shared filesystems.  Of course, in the event of a system crash, it is possible that jobs could "disappear" from the queue if they were recently submitted, but maybe you can live with that. From the HTCondor Manual:

CONDOR_FSYNC
   A boolean value that controls whether HTCondor calls fsync() when writing the user job and transaction logs. Setting this value to False will disable calls to fsync(), which can help performance for condor_schedd log writes at the cost of some durability of the log contents, should there be a power or hardware failure. The default value is True.


On 6/20/2017 12:58 PM, Dimitri Maziuk wrote:
On 06/20/2017 10:43 AM, Krieger, Donald N. wrote:
How about putting the SPOOL directory on /dev/shm ?
Aside from the disappearing job queue, filling up /dev/shm may cause all
kinds of interesting effects, depending on how and where you mount it.
This is something to keep in mind when moving spool to local drive, too:
we recently filled up / and had to move spool to a zfs pool -- the part
where *spool can get large* should be in all-caps bold in the manual.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature