Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Help for problem debugging needed
- Date: Tue, 21 Oct 2008 20:10:27 +0200
- From: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>
- Subject: Re: [Condor-users] Help for problem debugging needed
Hi Robert,
Robert Futrick wrote:
> You can try looking at the ShadowLog ( present on the submit/schedd
> machine the job 4205801.25 was scheduled on). The Shadow is the process
> that the starter is attempting to communicate with, and might give more
> information about the issue.
I'm copying all I could gather (omitting the user, ignoring 72+ characters/line):
egotiatorLog:10/16 11:35:26 Rejected 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301>: PREEMPTION_REQUIREMENTS == False
NegotiatorLog:10/16 11:35:54 Rejected 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301>: no match found
NegotiatorLog:10/16 11:35:55 Matched 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301> preempting none <10.10.2.92:50928> slot2@xxxxxxxxxxxxxxxxx
NegotiatorLog:10/18 01:58:54 Rejected 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301>: no match found
NegotiatorLog:10/18 01:58:57 Rejected 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301>: PREEMPTION_REQUIREMENTS == False
NegotiatorLog:10/18 01:59:36 Rejected 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301>: no match found
NegotiatorLog:10/18 01:59:39 Rejected 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301>: PREEMPTION_REQUIREMENTS == False
NegotiatorLog:10/18 02:00:35 Rejected 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301>: no match found
NegotiatorLog:10/18 02:00:38 Rejected 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301>: PREEMPTION_REQUIREMENTS == False
NegotiatorLog:10/18 02:56:21 Matched 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301> preempting none <10.10.7.18:56735> slot1@xxxxxxxxxxxxxxxxx
NegotiatorLog:10/18 06:34:16 Matched 4205801.25 XXX@xxxxxxxxxxx <10.20.30.2:37301> preempting none <10.10.8.37:53203> slot4@xxxxxxxxxxxxxxxxx
SchedLog.old:10/17 11:44:10 (pid:11045) OwnerCheck(YYY) failed in SetAttribute for job 4205801.25
SchedLog.old:10/17 17:05:30 (pid:11045) OwnerCheck(ZZZ) failed in SetAttribute for job 4205801.25
SchedLog.old:10/17 17:05:30 (pid:11045) OwnerCheck(ZZZ) failed in SetAttribute for job 4205801.25
SchedLog.old:10/17 17:05:34 (pid:11045) OwnerCheck(ZZZ) failed in SetAttribute for job 4205801.25
SchedLog.old:10/18 02:25:33 (pid:11045) Starting add_shadow_birthdate(4205801.25)
SchedLog.old:10/18 02:25:33 (pid:11045) Started shadow for job 4205801.25 on "<10.10.12.73:53004>", (shadow pid = 20484)
SchedLog.old:10/18 02:25:34 (pid:11045) Shadow pid 20484 for job 4205801.25 exited with status 108
SchedLog.old:10/18 03:17:54 (pid:11045) Starting add_shadow_birthdate(4205801.25)
SchedLog.old:10/18 03:17:54 (pid:11045) Started shadow for job 4205801.25 on "<10.10.7.18:56735>", (shadow pid = 18449)
SchedLog.old:10/18 06:23:06 (pid:11045) Starting add_shadow_birthdate(4205801.25)
SchedLog.old:10/18 06:23:06 (pid:11045) Started shadow for job 4205801.25 on "<10.10.1.20:38771>", (shadow pid = 11212)
SchedLog.old:10/18 06:33:06 (pid:11045) Shadow pid 11212 for job 4205801.25 exited with status 108
SchedLog.old:10/18 06:34:20 (pid:11045) Starting add_shadow_birthdate(4205801.25)
SchedLog.old:10/18 06:34:20 (pid:11045) Started shadow for job 4205801.25 on "<10.10.8.37:53203>", (shadow pid = 30738)
ShadowLog:10/18 10:02:33 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 10:17:40 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 10:32:40 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 10:47:42 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 11:02:42 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 11:17:43 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 11:32:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 11:47:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 12:02:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 12:17:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 12:32:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 12:47:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 13:02:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 13:17:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 13:32:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 13:47:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 14:02:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 14:17:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 14:32:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog:10/18 14:47:47 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 06:23:06 Initializing a VANILLA shadow for job 4205801.25
ShadowLog.old:10/18 06:23:06 (4205801.25) (11212): Request to run on <10.10.1.20:38771> was REFUSED
ShadowLog.old:10/18 06:23:06 (4205801.25) (11212): Job 4205801.25 is being evicted
ShadowLog.old:10/18 06:23:06 (4205801.25) (11212): logEvictEvent with unknown reason (108), aborting
ShadowLog.old:10/18 06:24:50 (4205801.25) (11212): ZKM: setting default map to (null)
ShadowLog.old:10/18 06:24:50 (4205801.25) (11212): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 108
ShadowLog.old:10/18 06:34:20 Initializing a VANILLA shadow for job 4205801.25
ShadowLog.old:10/18 06:34:20 (4205801.25) (30738): Request to run on <10.10.8.37:53203> was ACCEPTED
ShadowLog.old:10/18 06:52:30 (4205801.25) (30738): attempt to connect to <10.20.30.2:53100> failed: Connection timed out (connect errno = 110). Will keep trying for 300 total seconds (111 to go).
ShadowLog.old:10/18 06:56:55 (4205801.25) (30738): condor_read(): recv() returned -1, errno = 104, assuming failure reading 5 bytes from unknown source.
ShadowLog.old:10/18 06:56:55 (4205801.25) (30738): IO: Failed to read packet header
ShadowLog.old:10/18 07:02:06 (4205801.25) (30738): condor_read(): recv() returned -1, errno = 104, assuming failure reading 5 bytes from unknown source.
ShadowLog.old:10/18 07:02:06 (4205801.25) (30738): IO: Failed to read packet header
ShadowLog.old:10/18 07:02:06 (4205801.25) (30738): AUTHENTICATE: handshake failed!
ShadowLog.old:10/18 07:02:06 (4205801.25) (30738): Authentication Error: AUTHENTICATE:1002:Failure performing handshake
ShadowLog.old:10/18 07:17:06 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 07:32:06 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 07:47:13 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 08:02:25 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 08:17:26 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 08:32:26 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 08:47:26 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 09:02:27 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 09:17:32 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 09:32:32 (4205801.25) (30738): ZKM: setting default map to (null)
ShadowLog.old:10/18 09:47:33 (4205801.25) (30738): ZKM: setting default map to (null)
Albeit, this does not help me much, anyone with more experience?
Thanks a lot!
Carsten