[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Startd segment violation



Hi,

A user's application keeps exiting with the following message in the SchedLog on the submitting machine:

2/2 16:56:10 Shadow pid 12591 for job 2605.0 exited with status 4
2/2 16:56:10 ERROR: Shadow exited with job exception code!

However, the job then gets immediately resubmitted, leading to a perpetual cycle. The StarterLog on the execute machine shows nothing unusual, but the StartLog reports:

2/2 16:56:10 Starter pid 19086 died on signal 11 (signal 11).

That's a segment violation there. My question is, is that Condor's way of telling me that the user's application is segmenting, or the Start daemon itself? We see this behaviour on a number of linux boxes, all running dynamically linked versions of Condor 6.6.8 (seen it with 6.6.7 too), for glibc 2.2 and 2.3.

Help please, chaps.

Cheers,
Mark