[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Two Problems with Condor



The setarch i386 -R fixed the segfault problem, but I think it may have introduced another.  I start my test program, then send it a SIGTSTP.  To restart I call "setarch i386 -R ./test -_condor_restart test.ckpt" and it restarts fine, unless I send a SIGUSR2 at which point it crashes.  It  will also crash if I sent a SIGTSTP, then attempt to restart it a second time.  With -_condor_D_ALL on, this is what I see.

...
RestoreStack() Exit!
About to restore file state
CondorFileTable::resume
working dir =
Condor: Error: Couldn't move to '' (No such file or directory).  Please fix it.

Killed

In addition, when I run my program for the first time, it appears as ./test when I run ps, but when I restart it it's listed as "          i686". 

Thanks,
David Kesler

On Feb 12, 2008 5:19 PM, David Kesler <dkesler2@xxxxxxxx> wrote:
Aha, I had tried the setarch i386 before as per other posts in the mailing list, but they didn't specify -R.  It works now.  Thanks. 

David Kesler


On Feb 12, 2008 2:58 PM, Daniel Forrest <forrest@xxxxxxxxxxxxx> wrote:
David,

> I have a Fedora Core 8 installation running on an x86 machine under
> Xen.  After installing Condor 7.0 from the .rpm and relinking my
> test program with condor_compile, the program will segfault upon
> receiving a ctrl-Z or a SIGUSR2.  This happens when trying to
> checkpoint the program both in the generic kernel and in the kernel
> running under Xen.  A debug trace reveals the following:
>
> #0  0x080bd3c4 in adler32 ()
> (gdb) up
> #1  0x080b8ba2 in fill_window ()
> (gdb) up
> #2  0x080b8861 in deflate_slow ()
> (gdb) up
> #3  0x080b6f24 in deflate ()
> (gdb) up
> #4  0x080504e5 in SegMap::Write ()
> (gdb) up
> #5  0x0804fca6 in Image::Write ()
> (gdb) up
> #6  0x0804f97d in Image::Write ()
> (gdb) up
> #7  0x0804f7fc in Image::Write ()
> (gdb) up
> #8  0x08050beb in Checkpoint ()
> (gdb) up
> #9  <signal handler called>
> (gdb) up
> #10 0x080e4e9a in nanosleep ()
>
> I've searched through the mailing list archive, and none of the
> solutions mentioned in it work.

You don't say which solutions don't work, but this is most certainly
a problem with address space randomization.  Have you tried:

$ setarch i386 -R <myprog> <myargs>

--
Daniel K. Forrest       Laboratory for Molecular and
forrest@xxxxxxxxxxxxx   Computational Genomics
(608) 262 - 9479        University of Wisconsin, Madison