[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] starter segfault



On Dec 7, 2012, at 9:17 AM, Michael John Breza <mjb04@xxxxxxxxxxxx> wrote:

> Has anybody had a problem with starter segfaulting while executing
> jobs in the standard universe?
...
> So, does anyone know what is causing this error from the information I
> have supplied here? Is it a problem with starter, or is it a
> configuration problem. One thought I have had is that the starter
> cannot communicate with the submitter's shadow, and so segfaults. 
> 
> Jobs submitted using the vanilla universe execute with no problems. It
> is only the standard universe which has these problems.
> 
> Any help or suggestions would be appreciated.


I have a couple suggestions:

* Try running 'condor_starter.std -classad' on the command line on the affected machine. It should print something like this:
IsDaemonCore = False
HasRemoteSyscalls = True
HasCheckpointing = True
CondorVersion = "$CondorVersion: 7.8.6 Oct 24 2012 BuildID: 73238 $"

* Try looking in the StartLog. It should show when the starter is spawned and when and how it exits (signal 11, etc).

* Try setting CREATE_CORE_FILES to True in the config file and look for core files in the HTCondor log directory. The backtrace from a core file can help indicate what's going wrong.

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project