[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] checkpointing produces segfault



Patrick,

> I tried to run this job on that machine by hand and it works - no
> segfaults. Thus I looked in more detail and tried to make it
> checkpoint by sending SIGTSTP and voila I get a segfault. If I
> look at the core dump and the stack I find it always looks like
> that:
> 
> > #0  0x08102788 in adler32 ()
> > #1  0x080fde76 in fill_window ()
> > #2  0x080fdc61 in deflate_slow ()
> > #3  0x080fcc87 in deflate ()
> > #4  0x080c704b in SegMap::Write ()
> > #5  0x080c682c in Image::Write ()
> > #6  0x080c6503 in Image::Write ()
> > #7  0x080c6382 in Image::Write ()
> > #8  0x080c7751 in Checkpoint ()
> > #9  <signal handler called>
> 
> It seems that 'adler32' is the last thing called. Searching the
> list archive I found one message stating a similar problem, but
> no solution. Any help would be much appreciated.

Does your program modify the "extern char **environ;" array?  This
includes calling "putenv()".

Condor uses the contents of environ[0] to determine the end address of
the stack.  If this value has been modified all bets are off.

If you have the core dump, can you examine the values of both
__environ and __environ[0]?

-- 
Daniel K. Forrest	Laboratory for Molecular and
forrest@xxxxxxxxxxxxx	Computational Genomics
(608) 262 - 9479	University of Wisconsin, Madison