I have many users that sending similar dags and they have no issues, this particular submitter is sending a lot more jobs and maybe it's load related.
This issue happens sporadically.
The DAG file:
SUBDAG EXTERNAL A xxx/xxx/a.dag
SUBDAG EXTERNAL B yyy/yyy/b.dag
PARENT A CHILD B
This is our configuration (DAG related):
DAGMAN_MAX_JOBS_IDLE = 25000
DAGMAN_MAX_JOB_HOLDS = 5000
DAGMAN_MAX_SUBMITS_PER_INTERVAL = 100
DAGMAN_MAX_SUBMIT_ATTEMPTS = 1
DAGMAN_USE_CONDOR_SUBMIT = TRUE
The DAGMAN_MAX_JOBS_SUBMITTED is not configured so the value = 0
Sometimes I can see that shared port daemon is under high load and refuse the connection but it's not happening at the same time.
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Mark Coatsworth <coatsworth@xxxxxxxxxxx>
Sent: 07 June 2021 19:19
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Dagman Issue V9
Can you describe the dag where this is happening (or better yet, send
me the .dag file)? When you mention a child dag, are you talking about
an external subdag or something different?
By default every dag is supposed to have these attributes in its
classad. I just did a quick test to verify this. So I'm wondering if
there's something special about your environment causing it to not be
Are you setting a custom value for max jobs? (either with the
DAGMAN_MAX_JOBS_SUBMITTED configuration knob or the -maxjobs submit
On Mon, Jun 7, 2021 at 6:54 AM <duduhandelman@xxxxxxxxxxx> wrote:
> Hi Again,
> I forgot to mention it's happening on dag that submit dag only the child dag are effected.
> Also, the child dag does not have those classads. which I don't know if it's ok or not.
> Thanks Again,
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of duduhandelman@xxxxxxxxxxx <duduhandelman@xxxxxxxxxxx>
> Sent: 07 June 2021 14:29
> To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] Dagman Issue V9
> Hi All,
> A week ago I have upgrade to condor 9.0.1 from 8.8 I'm facing an issue with Dagman Jobs,
> Most of the jobs running as expected but some DAGMan are not submitting jobs after a while.
> It seems that Dagman job is asking for DagMan_Max_jobs and sometimes gets a positive value but sometimes gets negative number and that causing the issue I assume.
> The Sched debug print:
> GetAttributeInt(968372, 0 , DAGMAN_MaxJobs) not found.
> The Dag output display every few minutes:
> Warning: failed to get attribute DAGMan_MaxIdle
> Warning: failed to get attribute DAGMan_MaxJobs
> Warning: failed to get attribute DAGMan_MaxPreScripts
> Warning: failed to get attribute DAGMan_MaxPostScripts
> Warning: failed to get attribute DAGMan_MaxHoldScripts
> It seems like the value is garbage, probably not initialized.
> Any clues? can it be a security issue?
> Many Thanks
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at:
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: