Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] termination with signal 66

Date: Thu, 27 Oct 2005 12:09:17 -0500
From: Erik Paulson <epaulson@xxxxxxxxxxx>
Subject: Re: [Condor-users] termination with signal 66

On Thu, Oct 27, 2005 at 12:50:47PM -0400, Ian Chesal wrote:
> 32 seconds
> 10/27 01:24:48 (18.1) (13737): get_file(): Failed to open file
> /ttcbatch/experiments3/tvanderh/condor/armstrong2/run2/sipo40/job_done,
> errno = 13.
> 
> And in our schedd log I've got nothing useful at all around that time
> frame:
> 

The Shadow and the Schedd logs are about an hour and 10 minutes apart :)

How close are the clocks on your execute and submit machines? I'm wondering
about scenarios with disconnected shadow/starters - maybe the shadow
got disconnected from the starter and now can't reconnect, and the schedd
does the wrong thing on the shadow exit - it'd be helpful to see all of
the shadow logs involving job 18.0 - I'd like to see both instances
of it connecting, and then trying to reconnect, and the schedd log for the
whole interval.

Thanks,

-Erik

> 10/27 00:00:14 Sent ad to central manager for Priority1@xxxxxxxxxx
> 10/27 00:00:14 Sent ad to 1 collectors for Priority1@xxxxxxxxxx
> 10/27 00:01:14 Sent ad to central manager for Priority1@xxxxxxxxxx
> 10/27 00:01:14 Sent ad to 1 collectors for Priority1@xxxxxxxxxx
> 10/27 00:01:23 Shadow pid 13303 for job 18.0 exited with status 107
> 10/27 00:01:23 Sent RELEASE_CLAIM to startd on <137.57.142.38:1029>
> 10/27 00:01:23 Match record (<137.57.142.38:1029>, 18, 0) deleted
> 10/27 00:01:23 DaemonCore: Command received via TCP from host
> <137.57.142.38:4604>
> 10/27 00:01:23 DaemonCore: received command 443 (VACATE_SERVICE),
> calling handler (vacate_service)
> 10/27 00:01:23 Got VACATE_SERVICE from <137.57.142.38:4604>
> 10/27 00:02:14 Sent ad to central manager for Priority1@xxxxxxxxxx
> 
> - Ian
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users

References:
- Re: [Condor-users] termination with signal 66
  - From: Ian Chesal

Prev by Date: Re: [Condor-users] termination with signal 66
Next by Date: Re: [Condor-users] Problem using schedd web service
Previous by thread: Re: [Condor-users] termination with signal 66
Next by thread: [Condor-users] Jobs don't run
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] termination with signal 66