[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 7.0.3 Debian 4 dynamic clipped?



Hello,

> On Wed, Jul 16, 2008 at 05:09:14PM -0500, David A. Kotz wrote:
> > 
> > Initial testing of a trivial app condor_compiled under Ubuntu 8.04 with 
> > the RHEL 5 version of Condor works.  It's just a "hello, world" program, 
> > so it doesn't actually get a chance to checkpoint.
> 
> Good news that is :)
> You might add a one-hour sleep() to have plenty of time for a forced ckpt ;)
> 
> Please keep us updated, and have a look for (and at) the Condor ports table
> which should give some more information (if it's updated properly)

Slightly related, I've compiled condor with checkpointing on Debian Etch
based on Andreas Hirczy's debian package.

Remote syscalls seem to work (well, a simple 'fread' does anyway).

Checkpoint claimed to work, but when I try standalone checkpointing with
a simple sleep() programme it fails with: 

condor-vm-3:/home/condor$ ./simple.std  -_condor_D_ALL -_condor_restart simple.std.ckpt
User Job - $CondorPlatform: X86_64-LINUX_DEBIAN40 $
User Job - $CondorVersion: 7.1.1 Jul 23 2008 $
Condor: Notice: Will restart from simple.std.ckpt
Read headers OK
Read SegMap[0](DATA) OK
Read SegMap[1](STACK) OK
Read all SegMaps OK
Restoring a DATA segment
Found a DATA block, increasing heap from 0x6b2000 to 0x6b3000
About to overwrite 696320 bytes starting at 0x609000(DATA)
simple.std[11000]: segfault at 00007fffa774a488 rip 00000000004633b6 rsp
000000000068f110 error 4
Segmentation fault (core dumped)

A stack trace gives:

This GDB was configured as "x86_64-linux-gnu"...Using host libthread_db
library "/lib/libthread_db.so.1".

Core was generated by `./simple.std -_condor_D_ALL -_condor_restart
simple.std.ckpt'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004633b6 in getenv ()
(gdb) bt
#0  0x00000000004633b6 in getenv ()
#1  0x000000000045f14a in __dcigettext ()
#2  0x000000000047f9b8 in strerror_r ()
#3  0x000000000047f81e in strerror ()
#4  0x0000000000403ec5 in SegMap::Read ()
#5  0x000000000040498d in Image::RestoreSeg ()
#6  0x0000000000404a4f in RestoreStack ()
#7  0x0000000000405d94 in ExecuteOnTmpStk ()
#8  0x0000000000404b8e in Image::Restore ()
#9  0x0000000000404bf8 in restart ()
#10 0x0000000000400ca0 in MAIN ()
#11 0x000000000045dd08 in __libc_start_main ()
#12 0x00000000004001ba in _start () at ../sysdeps/x86_64/elf/start.S:113

Something to do with mmap ?.

regards,

Richard.

-- 
Richard Palmer
Systems Administration Officer / Centre for E-Research
King's College London          / Centre for Computing in the Humanities
Tel: 0207 848 1973