[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Two Problems with Condor



I have a Fedora Core 8 installation running on an x86 machine under Xen.  After installing Condor 7.0 from the .rpm and relinking my test program with condor_compile, the program will segfault upon receiving a ctrl-Z or a SIGUSR2.  This happens when trying to checkpoint the program both in the generic kernel and in the kernel running under Xen.  A debug trace reveals the following:

#0  0x080bd3c4 in adler32 ()
(gdb) up
#1  0x080b8ba2 in fill_window ()
(gdb) up
#2  0x080b8861 in deflate_slow ()
(gdb) up
#3  0x080b6f24 in deflate ()
(gdb) up
#4  0x080504e5 in SegMap::Write ()
(gdb) up
#5  0x0804fca6 in Image::Write ()
(gdb) up
#6  0x0804f97d in Image::Write ()
(gdb) up
#7  0x0804f7fc in Image::Write ()
(gdb) up
#8  0x08050beb in Checkpoint ()
(gdb) up
#9  <signal handler called>
(gdb) up
#10 0x080e4e9a in nanosleep ()

I've searched through the mailing list archive, and none of the solutions mentioned in it work.

Since this is occurring during the portion of the code that compresses the checkpoint, and since I'm not particularly concerned with checkpoint size given our setup, I decided to try to recompile and install Condor with compression disabled.  I modified the configure.ac file to do so, passed it to autoconf, then ran ./configure followed by make.  During the make process, I get the following error:

cd tmp_dir; ar x /home/dkesler2/condor_src-7.0.0/externals/install sigsuspend.o;
ar: /home/dkesler2/condor_src-7.0.0/externals/install: Is a directory

And indeed, the file it's trying to extract sigsusped.o from is a directory containing more directories, but no archives as far as I can find.

So do you have any idea what's breaking in the ./configure or make process?

Thanks,
David Kesler