[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Random job in DAGman throwing submitERROR



Sorry for typo. Snippet was from schedlogs only not shadow.

On Thu, 9 May, 2019, 21:07 John M Knoeller, <johnkn@xxxxxxxxxxx> wrote:

The ShadowLog doesnât isnât where to look for this error. when submit fails, there will never be a shadow for that job.

you should look in the SchedLog at time 05/08/19 04:01:16 to see if it has a reason why the submit failed.

Â

-tj

Â

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Vikrant Aggarwal
Sent: Thursday, May 9, 2019 3:15 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Random job in DAGman throwing submitERROR

Â

Hello Team,

Â

We are facing weird issue with DAGman consists of 54 jobs. One of the job in DAG is randomly throwing an error at no particular frequency. I am trying to debug the reason for same.Â

Â

05/08/19 04:01:16 From submit: Submitting job(s)....
05/08/19 04:01:16 From submit: ERROR: Failed submission for job 147157.4 - aborting entire submit
05/08/19 04:01:16 From submit:
05/08/19 04:01:16 From submit: ERROR: Failed to queue job.
05/08/19 04:01:16 failed while reading from pipe.
05/08/19 04:01:16 Read so far: Submitting job(s)....ERROR: Failed submission for job 147157.4 - aborting entire submitERROR: Failed to queue job.
05/08/19 04:01:16 ERROR: submit attempt failed

Â

Shadow logs are not showing any indication for this job but it does show the "status in check_zombie"Âmessage for another job of same Dag. Most of the time I noticed this zombie message appearing in sched logs during the time of issue but it's not everytime.Â

Â

05/08/19 04:01:15 (pid:1316698) Shadow pid 1054140 for job 147147.1 reports job exit reason 100.
05/08/19 04:01:15 (pid:1316698) ERROR fetching job (147154.6) status in check_zombie !
05/08/19 04:01:15 (pid:1316698) Shadow pid 1053429 for job 147147.4 exited with status 100


condor version detailsÂ

Â

$CondorVersion: 8.5.8 Dec 13 2016 BuildID: 390781 $

$CondorPlatform: x86_64_RedHat6 $

Â

Anyone else saw this issue?Â

Â

Thanks & Regards,

Vikrant Aggarwal

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/