[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] standalone checkpointing segmentation fault





---------- Forwarded message ----------
From: Jesús Coll <jesuskoll@xxxxxxxxx>
Date: 20-nov-2007 23:36
Subject: standalone checkpointing segmentation fault
To: condor-users@xxxxxxxxxxx

Hello,

I'm trying to use the condor (6.8.6, dynamic) in "standalone" mode to checkpoint jobs. I have been able to use it on Fedora Core 2, but I want to use it on Fedora Core 4.

I have a very simple test program that I compiled with condor_compile on Fedora Core 2, 3, and 4. With Fedora Core 2 everything goes well, but with the others I get a segmentation fault when the program make a checkpoint.

I have downloaded the condor package for Fedora Core 2, 3 and 4 (condor-6.8.6-linux-x86-rhel3.dynamic.tar.gz), and they are the same all. Is it an error?



The features of the machine:
[condor@negro ~]$ uname -a
Linux negro.casa 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 athlon i386 GNU/Linux

The very simple test program:
[condor@negro ~]$ cat p.c
int main(){
  int i=0;
  for(;;)
    i++;
  return i;
}

[condor@negro ~]$ condor_compile gcc p.c -o p
LINKING FOR CONDOR : /usr/bin/ld -L/home/condor/dir_install/lib -Bstatic --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o p /home/condor/dir_install/lib/condor_rt0.o /usr/lib/gcc/i386-redhat-linux/4.0.0/../../../crti.o /usr/lib/gcc/i386-redhat-linux/4.0.0/crtbeginT.o -L/home/condor/dir_install/lib -L/usr/lib/gcc/i386-redhat-linux/4.0.0 -L/usr/lib/gcc/i386-redhat-linux/4.0.0 -L/usr/lib/gcc/i386-redhat-linux/4.0.0/../../.. /tmp/cceyq9Sp.o /home/condor/dir_install/lib/libcondorzsyscall.a /home/condor/dir_install/lib/libcondor_z.a /home/condor/dir_install/lib/libcomp_libstdc++.a /home/condor/dir_install/lib/libcomp_libgcc.a /home/condor/dir_install/lib/libcomp_libgcc_eh.a --as-needed --no-as-needed -lcondor_c -lcondor_nss_files -lcondor_nss_dns -lcondor_resolv -lcondor_c -lcondor_nss_files -lcondor_nss_dns -lcondor_resolv -lcondor_c /home/condor/dir_install/lib/libcomp_libgcc.a /home/condor/dir_install/lib/libcomp_libgcc_eh.a --as-needed --no-as-needed /usr/lib/gcc/i386-redhat-linux/4.0.0/crtend.o /usr/lib/gcc/i386-redhat-linux/4.0.0/../../../crtn.o
/home/condor/dir_install/lib/libcondorzsyscall.a(condor_file_agent.o)(.text+0x250): En la función `CondorFileAgent::open(char const*, int, int)':
/home/condor/execute/dir_13565/userdir/src/condor_ckpt/condor_file_agent.C:99: warning: the use of `tmpnam' is dangerous, better use `mkstemp'
/home/condor/dir_install/lib/libcondorzsyscall.a(switches.o)(.text+0x5fc5): En la función `__gets_chk':
/home/condor/execute/dir_13565/userdir/src/condor_syscall_lib/switches.remap-LINUX.h:435: warning: the `gets' function is dangerous and should not be used.


I suppoose that the warnings are normal. And I disable the address space randomization.
[condor@negro ~]$ cat /proc/sys/kernel/exec-shield
0
[condor@negro ~]$ setarch i386 ./p -_condor_D_ALL
User Job - $CondorPlatform: I386-LINUX_RHEL3 $
User Job - $CondorVersion: 6.8.6 Sep 13 2007 $
Condor: Notice: Will checkpoint to ./p.ckpt
Condor: Notice: Remote system calls disabled.
Got SIGTSTP
Saved signal state.
About to save file state
CondorFileTable::checkpoint

OPEN FILE TABLE:
fd 0
        logical name: default stdin
        offset:       0
        dups:         1
        open flags:   0x0
        not currently bound to a url.
fd 1
        logical name: default stdout
        offset:       0
        dups:         1
        open flags:   0x1
        not currently bound to a url.
fd 2
        logical name: default stderr
        offset:       0
        dups:         1
        open flags:   0x1
        not currently bound to a url.
working dir = /home/condor
Done saving file state
About to update MyImage
Adding a DATA segment: start[0x814d000], end [0x9e85000]
Image::AddSegment: name=[DATA], start=[814d000], end=[9e85000], len=[0x1d38000], prot=[0x0]
Adding a STACK segment: start[0xbfb66000], end [0xbfb6bfff]
Image::AddSegment: name=[STACK], start=[bfb66000], end=[bfb6bfff], len=[0x5fff], prot=[0x0]
Pos: 30639104
Pos: 30663679
Size of ckpt image = 30663679 bytes
About to write checkpoint
Image::Write(): fd -1 file_name ./p.ckpt
Checkpoint name is "./p.ckpt"
Tmp name is "./p.ckpt.tmp"
Wrote headers OK
Wrote all SegMaps OK
Writing compressed segments...
Violación de segmento

It creates a p.ckpt.tmp file.
[condor@negro ~]$ ls
condor-6.8.6                                 Desktop      local  p.c
condor-6.8.6-linux-x86-rhel3-dynamic.tar.gz  dir_install  p      p.ckpt.tmp

I don't know what is the problem. Somebody can help me?

Thanks.