[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] JobRouter debug info



Hi all,

I am debugging a few jobs, where their assigned routes fails like [1,2]
with the DESYPRIO route and then the LRMS Condor route having issues
(the LRMS route probably fallout from the DESYPRIO actual route). On the
Condor side, the routed(?) job IDs do not show up [3], so that I have no
insight yet, why the submission actually failed

I suspect that it might be a user:group issue, but I am still trying to
get more output to better understand the issue. I have already set the
job router to fulldebug output
  JOBROUTER_DEBUG = D_ALL:2
assuming that the job router also uses the daemon debug level syntax -
however, I have not not much more output.

Is there maybe another knob to dig a bit deeper into the router internals?

Cheers and thanks,
  Thomas


[1]
06/07/22 13:47:14 JobRouter
(src=2895713.0,dest=8180603.0,route=Local_Condor): finalized job
06/07/22 13:47:20 WARNING: Saw slow DNS query, which may impact entire
system: getaddrinfo(grid-htc-master02.desy.de) took 5.006753 seconds.
06/07/22 13:47:20 ERROR (schedd grid-htcondorce0.desy.de at pool
condor01.desy.de:9618,grid-htc-master02.desy.de:9618) (8192573.0) Failed to
 commit job submission
06/07/22 13:47:20 JobRouter failure (src=2824628.0,route=DESYPRIO):
failed to submit job
06/07/22 13:47:20 ERROR (schedd grid-htcondorce0.desy.de at pool
condor01.desy.de:9618,grid-htc-master02.desy.de:9618) (8192574.0) Failed
to commit job submission


[2]
06/07/22 13:49:32 JobRouter failure (src=2824628.0,route=Local_Condor):
failed to submit job
06/07/22 13:49:33 ERROR (schedd grid-htcondorce0.desy.de at pool
condor01.desy.de:9618,grid-htc-master02.desy.de:9618) (8192686.0) Failed to
 commit job submission
06/07/22 13:49:33 JobRouter failure (src=2881683.0,route=Local_Condor):
failed to submit job
06/07/22 13:49:33 ERROR (schedd grid-htcondorce0.desy.de at pool
condor01.desy.de:9618,grid-htc-master02.desy.de:9618) (8192687.0) Failed to
 commit job submission

[3]
> grep -r 8192686 /var/lib/condor*/spool/history
> echo $?
1
> grep -r 8192573 /var/lib/condor*/spool/history
> echo $?
1

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature