[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Segfault during basic use of standalone checkpointing



Hi, I'm new to condor. I just installed condor 7.4.4 on Centos 5.5,
and I'm trying to try out standalone checkpointing for the first time.
Unfortunately, I'm getting a segmentation fault when I try to restart
a program using a checkpoint file.

I've been following the instructions in section 4.2.1 of the manual
(http://www.cs.wisc.edu/condor/manual/v6.4/4_2Condor_s_Checkpoint.html).
Details are below:

I have a program called toy.c:

$ condor_compile gcc -o toy toy.c
LINKING FOR CONDOR ......(some more output).....

$ ./toy
...(TOY PROGRAM OUTPUT)....

(control-Z)
...(PROGRAM STOPS)...

$ ./toy -_condor_restart ./toy.ckpt.tmp
Condor: Notice: Will restart from ./toy.ckpt.tmp
Segmentation fault


My eventual goal is to use condor for transparent checkpointing of
jobs using SGE (Sun Grid Engine). But at the moment I can't even get
this toy standalone example to work. (For reference, the source for
toy.c is below)

If anyone has any tips or pointers, or links to good tutorials on the
use of standalone checkpointing, I'd be much obliged.

Thanks,
Lane


//toy.c
#include <stdio.h>

int main(int argc, char **argv) {

   int i;
   int n;

   n=1024*1024*1024;

   for (i=0; i<n; i+=1) {
      printf("We calculated: %d^2=%d\n", i, i*i);
   }

   return 0;
}