[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DAGMan low submission rate



Whoops! Nevermind me. I just read the original post in this thread....

On Tue, Jun 27, 2017 at 12:51 PM, Edward Labao <edward.labao@xxxxxxxxxxxxxx> wrote:
Hi there!

There's a DAGMan configuration parameter called DAGMAN_MAX_JOB_IDLE. This limits the number of idle jobs allowed from any single DAGMan job on the farm at any one time and defaults to 1000. When you see your submission rates getting throttled, it could be DAGMan trying to keep the number of idle jobs at or below this value.

There's documentation for that and a few other parameters here:

http://research.cs.wisc.edu/htcondor/manual/v8.6/3_5Configuration_Macros.html#sec:Throttling

Cheers!



On Tue, Jun 27, 2017 at 12:42 PM, Benedikt Riedel <briedel@xxxxxxxxxxxx> wrote:
Hi,

By job monitoring, I mean which jobs in my set have failed, succeeded, etc. With 1.2M nodes, it can quickly become a pain to do all this outside of DAGMan.

Benedikt

On 27 June 2017 at 14:33, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:

Can you clarify what you mean by âjob monitoring?â.

Â

Â

From: HTCondor-users [mailto:htcondor-users-bounces@cs.wisc.edu] On Behalf Of Benedikt Riedel
Sent: Tuesday, June 27, 2017 10:31 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] DAGMan low submission rate

Â

Hi,

Â

This might be tangential. DAGMan and late materialization are not compatible at the moment. With late materialization users seemingly have to go off and write their own job monitoring code. DAGMan provides job monitoring capability for free.ÂAm I missing something? What is the best practice for monitoring large independent job sets when using late materialization?

Â

As for Henning's users issue, have you tried settingÂ

Â

DAGMAN_USER_LOG_SCAN_INTERVAL = 1

Â

This appears to increase the submission rate.

Â

Thanks,

Â

Benedikt

Â

Â

Â

On 27 June 2017 at 09:53, Greg Thain <gthain@xxxxxxxxxxx> wrote:

On 06/27/2017 01:30 AM, Henning Fehrmann wrote:

Hello,

one of our users started a DAG with 1.2M nodes which do not depend on
other nodes. It seems that in average 120 jobs are submitted per
minute. This number is strongly fluctuating.


If there are no dependencies between nodes, and you aren't using other DAGman features like pre/post scripts, would you consider upgrading to 8.7, to use the late materialization feature instead of dagman? It was designed exactly for this use case.

-greg



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



Â

--

Benedikt Riedel
Scientific Programmer
University of Chicago
Computation Institute


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Benedikt Riedel
Scientific Programmer
University of Chicago
Computation Institute

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/