[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Feature request: non-integer DAGMAN_SUBMIT_DELAY



What is actually causing the trouble for Lustre?

1. Submission of jobs, or
2. Running of Dagman Pre scripts on the submit nodes, or
3. Running of the actual jobs on the execute nodes ?

Actually, it's #1. It looks like Lustre has trouble handling large bursts of stat, open, etc. system calls that occur when a dagman submits a large number (1k+) of jobs at once whose submit files and logs are stored on a Lustre filesystem if its metadata server is already under high load. When Lustre starts to stutter, the entire submit machine gets slow, condor_q times out, etc. The regular condor_submit can cause similar problems, but that happens less often, for reasons I am not totally sure about.

Admittedly, Lustre is not an ideal filesystem for small files, and we discourage users from storing Condor-related files on it, but it's convenient for people to keep log and submit files close to their data files, and most of the time Lustre can handle it, but every once in a while somebody causes major slowdowns on our submitter machine.


Vlad




On 04/20/16 09:03, Todd Tannenbaum wrote:
On 4/20/2016 8:47 AM, Vladimir Brik wrote:
Hello,

Would it be possible to allow non-integer DAGMAN_SUBMIT_DELAY values in
future versions of Condor?

We use DAGMAN_SUBMIT_DELAY to limit the load big dagmans generate on our
Lustre file system. It works, but 1 second is just too much of a delay.
It would be nice if it were possible to set the delay to a
floating-point number.


Vlad

Hi Vlad,

What is actually causing the trouble for Lustre?

   1. Submission of jobs, or
   2. Running of Dagman Pre scripts on the submit nodes, or
   3. Running of the actual jobs on the execute nodes ?

I am guessing that #1 isn't the real problem, but what you are really
trying to limit is #2 or #3.  I know there are config knobs that can
control the rate at a sub-second resolution for #3 above, and possibly
for #2 as well.

regards
Todd


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/