[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Segfault during basic use of standalone checkpointing



Lane Schwartz wrote:

> Hi, I'm new to condor. I just installed condor 7.4.4 on Centos 5.5,
> and I'm trying to try out standalone checkpointing for the first time.
> Unfortunately, I'm getting a segmentation fault when I try to restart
> a program using a checkpoint file.
> 
> I've been following the instructions in section 4.2.1 of the manual
> (http://www.cs.wisc.edu/condor/manual/v6.4/4_2Condor_s_Checkpoint.html).
> Details are below:
> 
> I have a program called toy.c:
> 
> $ condor_compile gcc -o toy toy.c
> LINKING FOR CONDOR ......(some more output).....
> 
> $ ./toy
> ...(TOY PROGRAM OUTPUT)....
> 
> (control-Z)
> ...(PROGRAM STOPS)...
> 
> $ ./toy -_condor_restart ./toy.ckpt.tmp
> Condor: Notice: Will restart from ./toy.ckpt.tmp
> Segmentation fault
> 
> 
> My eventual goal is to use condor for transparent checkpointing of
> jobs using SGE (Sun Grid Engine). But at the moment I can't even get
> this toy standalone example to work. (For reference, the source for
> toy.c is below)
> 
> If anyone has any tips or pointers, or links to good tutorials on the
> use of standalone checkpointing, I'd be much obliged.

Look in the archives here:

https://lists.cs.wisc.edu/archive/condor-users/2010-September/msg00026.shtml

And here:

https://lists.cs.wisc.edu/archive/condor-users/2011-January/msg00060.shtml


Short answer:

$ setarch i386 -R -L ./toy

Or better so you get some debugging output:

$ setarch i386 -R -L ./toy -_condor_D_ALL


And then to restart:

$ setarch i386 -R -L ./toy -_condor_restart toy.ckpt

-- 
Dan