[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Standalone checkpoint error ...



Hello everybody,

I'm trying to use the standalone checkpoint features provided by condor in 
our cluster. Here are the features of our machines:

[goncalo@lflip02 ~]$ uname -a
Linux lflip02.lip.pt 2.4.21-32.0.1.ELsmp #1 SMP Wed May 25 15:42:26 CDT 
2005 i686 i686 i386 GNU/Linux

I have downloaded the condor-6.6.10-linux-x86-glibc23.tar package and 
installed it in personel mode just to have acess to the compiler:

[goncalo@lflip02 ~]$ ./condor_configure 
--install=/home/na50/goncalo/condor-6.6.10/release.tar 
--install-dir=/home/na50/goncalo/local/condor-6.6.10 
--make-personal-condor

For testing, I'm using a very simple program (ever.c):

[goncalo@lflip02 ~]$ cat ever.c
#include <stdio.h>
int main(void)
{
    float x;
    long  i;
    for (;;)
    {
        for (i=0;i<=100000;i++)
            x=3.1415926*i+i+i*i*2.7182818;
    }
    return 0;
}

I have compiled the ever.c program: 

[goncalo@lflip02 ~]$ condor_compile gcc ever.c -o ever
LINKING FOR CONDOR : /usr/bin/ld 
-L/home/na50/goncalo/local/condor-6.6.10/lib -Bstatic --eh-frame-hdr -m 
elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o ever 
/home/na50/goncalo/local/condor-6.6.10/lib/condor_rt0.o 
/usr/lib/gcc-lib/i386-redhat-linux/3.2.3/../../../crti.o 
/usr/lib/gcc-lib/i386-redhat-linux/3.2.3/crtbeginT.o 
-L/home/na50/goncalo/local/condor-6.6.10/lib 
-L/usr/lib/gcc-lib/i386-redhat-linux/3.2.3 
-L/usr/lib/gcc-lib/i386-redhat-linux/3.2.3/../../.. /tmp/cciyPAqZ.o 
/home/na50/goncalo/local/condor-6.6.10/lib/libcondorzsyscall.a 
/home/na50/goncalo/local/condor-6.6.10/lib/libz.a 
/home/na50/goncalo/local/condor-6.6.10/lib/libcomp_libstdc++.a 
/home/na50/goncalo/local/condor-6.6.10/lib/libcomp_libgcc.a 
/home/na50/goncalo/local/condor-6.6.10/lib/libcomp_libgcc_eh.a 
/home/na50/goncalo/local/condor-6.6.10/lib/libcomp_libgcc_eh.a -lc 
-lnss_files -lnss_dns -lresolv -lc -lnss_files -lnss_dns -lresolv -lc 
/home/na50/goncalo/local/condor-6.6.10/lib/libcomp_libgcc.a 
/home/na50/goncalo/local/condor-6.6.10/lib/libcomp_libgcc_eh.a 
/home/na50/goncalo/local/condor-6.6.10/lib/libcomp_libgcc_eh.a 
/usr/lib/gcc-lib/i386-redhat-linux/3.2.3/crtend.o 
/usr/lib/gcc-lib/i386-redhat-linux/3.2.3/../../../crtn.o
/home/na50/goncalo/local/condor-6.6.10/lib/libcondorzsyscall.a(condor_file_agent.o)(.text+0x250): 
In function `CondorFileAgent::open(char const*, int, int)':
/home/condor/execute/dir_16550/userdir/src/condor_ckpt/condor_file_agent.C:99: 
the use of `tmpnam' is dangerous, better use `mkstemp'

There is no error message, so I gess this is normal.
When I test the program interactively, it stars running with 
the right messages:

[goncalo@lflip02 ~]$ ./ever
Condor: Notice: Will checkpoint to ./ever.ckpt
Condor: Notice: Remote system calls disabled.

Then, after login in in other console, I do a "kill -s USR2 <pid>".
The programs is stopped with a segmentation fault error and it creates a 
ever.ckpt.tmp file.
 
[goncalo@lflip02 ~]$ ./ever
Condor: Notice: Will checkpoint to ./ever.ckpt
Condor: Notice: Remote system calls disabled.
Segmentation fault (core dumped)


Then, I try to restart the program using the ever.ckpt.tmp file but it is 
immediatelly killed.

[goncalo@lflip02 ~]$ ./ever -_condor_restart ever.ckpt.tmp
Condor: Notice: Will restart from ever.ckpt.tmp
Killed

I guess this is not the expected behaviour. Maybe there is an obvious 
reason why this is happening, which I'm forgetting.

Thanks in advance.
Goncalo