[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] standalone checkpointing segmentation fault



Hi,

I'm trying to use the condor (6.6.7, dynamic) in 
"standalone" mode to checkpoint jobs.  I have a very simple 
test program that I compiled with condor_compile on both 
Solaris & RH.  During a run I "kill -TSTP pid" the job.  On 
Solaris this works fine producing a *.ckpt file that I can 
use to restart the job using the "-_condor_restart" flag and 
the *.ckpt file.  However, on Linux I get a segmentation 
fault upon kill -TSTP pid, and only a core and *ckpt.tmp 
file are generated.  For what it's worth, if I open the core 
file in a debugger it shows:

Program terminated with signal 11, Segmentation fault.
#0  0x0809ac28 in adler32 ()

(gdb) where
#0  0x0809ac28 in adler32 ()
#1  0x08096316 in fill_window ()
#2  0x08096101 in deflate_slow ()
#3  0x08095127 in deflate ()
#4  0x0804f717 in SegMap::Write (this=0x8188464, fd=3, 
pos=1024) at image.C:1446
#5  0x0804eef8 in Image::Write (this=0x81880a0, fd=3) at 
image.C:1097
#6  0x0804ebcf in Image::Write (this=0x81880a0, 
ckpt_file=0x99879c8 "./condor_test.ckpt")
    at image.C:1003
#7  0x0804ea4e in Image::Write (this=0x81880a0) at 
image.C:928
#8  0x0804fdef in Checkpoint (sig=20, code=0, scp=0x0) at 
image.C:1694
#9  <signal handler called>
#10 0x08048220 in main ()



Thanks in advance for any advice,
Jason