Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SIGQUIT / debugging

Date: Tue, 19 Feb 2013 11:32:21 -0600
From: Jaime Frey <jfrey@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] SIGQUIT / debugging

On Feb 19, 2013, at 7:34 AM, "Shrum, Donald C" <DCShrum@xxxxxxxxxxxxx> wrote:

> I periodically see jobs that fail with a SIGQUIT
> 
> In the scheduler:
> SchedLog:02/18/13 19:47:35 (pid:25985) match (slot3@xxxxxxxxxxxxxxxxxx <10.178.6.101:54726> for nmg11) switching to job 5911.734
> SchedLog:02/18/13 19:47:35 (pid:25985) Started shadow for job 5911.734 on slot3@xxxxxxxxxxxxxxxxxx <10.178.6.101:54726> for nmg11, (shadow pid = 14851)
> SchedLog:02/18/13 19:47:37 (pid:25985) Negotiating for owner: nmg11@xxxxxxxxx
> SchedLog:02/18/13 19:47:37 (pid:25985) Finished negotiating for nmg11 in local pool: 0 matched, 1 rejected
> 
> The processing node (slot3@xxxxxxxxxxxxxxxxxx  in this case) I see:
> 02/18/13 19:47:36 Create_Process succeeded, pid=5788                 
> 02/18/13 21:10:27 Process exited, pid=5788, status=0   
> 02/18/13 21:10:27 Got SIGQUIT.  Performing fast shutdown.
> 02/18/13 21:10:27 ShutdownFast all jobs.             
> 02/18/13 21:10:27 **** condor_starter (condor_STARTER) pid 5785 EXITING WITH STATUS 0
> 
> 
> I'm inclined to think the job crashed or failed and the SIGQUIT was sent to condor as a result of the crash.  Is there something else going on that I should debug.  Google has not been much help thus far  :)


These logs show that the job completed normally with exit code 0. The SIGQUIT is sent to the condor_starter process as part of the cleanup after the job completes. There's no sign here of anything unusual. Are there any other indications you're seeing that suggest that the job crashed?

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project

Follow-Ups:
- Re: [HTCondor-users] SIGQUIT / debugging
  - From: Shrum, Donald C

References:
- [HTCondor-users] SIGQUIT / debugging
  - From: Shrum, Donald C

Prev by Date: Re: [HTCondor-users] DAGman point - although no big deal
Next by Date: Re: [HTCondor-users] Want to keep and see executable, input and output files transferred/generated on condor execute machine
Previous by thread: [HTCondor-users] SIGQUIT / debugging
Next by thread: Re: [HTCondor-users] SIGQUIT / debugging
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] SIGQUIT / debugging