Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] starter segfault
- Date: Tue, 11 Dec 2012 15:37:21 -0600
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] starter segfault
On Dec 7, 2012, at 9:17 AM, Michael John Breza <mjb04@xxxxxxxxxxxx> wrote:
> Has anybody had a problem with starter segfaulting while executing
> jobs in the standard universe?
...
> So, does anyone know what is causing this error from the information I
> have supplied here? Is it a problem with starter, or is it a
> configuration problem. One thought I have had is that the starter
> cannot communicate with the submitter's shadow, and so segfaults.
>
> Jobs submitted using the vanilla universe execute with no problems. It
> is only the standard universe which has these problems.
>
> Any help or suggestions would be appreciated.
I have a couple suggestions:
* Try running 'condor_starter.std -classad' on the command line on the affected machine. It should print something like this:
IsDaemonCore = False
HasRemoteSyscalls = True
HasCheckpointing = True
CondorVersion = "$CondorVersion: 7.8.6 Oct 24 2012 BuildID: 73238 $"
* Try looking in the StartLog. It should show when the starter is spawned and when and how it exits (signal 11, etc).
* Try setting CREATE_CORE_FILES to True in the config file and look for core files in the HTCondor log directory. The backtrace from a core file can help indicate what's going wrong.
Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project