[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] SIGQUIT / debugging



I periodically see jobs that fail with a SIGQUIT

In the scheduler:
SchedLog:02/18/13 19:47:35 (pid:25985) match (slot3@xxxxxxxxxxxxxxxxxx <10.178.6.101:54726> for nmg11) switching to job 5911.734
SchedLog:02/18/13 19:47:35 (pid:25985) Started shadow for job 5911.734 on slot3@xxxxxxxxxxxxxxxxxx <10.178.6.101:54726> for nmg11, (shadow pid = 14851)
SchedLog:02/18/13 19:47:37 (pid:25985) Negotiating for owner: nmg11@xxxxxxxxx
SchedLog:02/18/13 19:47:37 (pid:25985) Finished negotiating for nmg11 in local pool: 0 matched, 1 rejected

The processing node (slot3@xxxxxxxxxxxxxxxxxx  in this case) I see:
02/18/13 19:47:36 Create_Process succeeded, pid=5788                 
02/18/13 21:10:27 Process exited, pid=5788, status=0   
02/18/13 21:10:27 Got SIGQUIT.  Performing fast shutdown.
02/18/13 21:10:27 ShutdownFast all jobs.             
02/18/13 21:10:27 **** condor_starter (condor_STARTER) pid 5785 EXITING WITH STATUS 0


I'm inclined to think the job crashed or failed and the SIGQUIT was sent to condor as a result of the crash.  Is there something else going on that I should debug.  Google has not been much help thus far  :)

Thanks,
Don
FSU Research Computing Center