Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Segfault during basic use of standalone checkpointing
- Date: Tue, 8 Mar 2011 13:38:27 -0600
- From: Daniel Forrest <dan.forrest@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Segfault during basic use of standalone checkpointing
Lane Schwartz wrote:
> Hi, I'm new to condor. I just installed condor 7.4.4 on Centos 5.5,
> and I'm trying to try out standalone checkpointing for the first time.
> Unfortunately, I'm getting a segmentation fault when I try to restart
> a program using a checkpoint file.
>
> I've been following the instructions in section 4.2.1 of the manual
> (http://www.cs.wisc.edu/condor/manual/v6.4/4_2Condor_s_Checkpoint.html).
> Details are below:
>
> I have a program called toy.c:
>
> $ condor_compile gcc -o toy toy.c
> LINKING FOR CONDOR ......(some more output).....
>
> $ ./toy
> ...(TOY PROGRAM OUTPUT)....
>
> (control-Z)
> ...(PROGRAM STOPS)...
>
> $ ./toy -_condor_restart ./toy.ckpt.tmp
> Condor: Notice: Will restart from ./toy.ckpt.tmp
> Segmentation fault
>
>
> My eventual goal is to use condor for transparent checkpointing of
> jobs using SGE (Sun Grid Engine). But at the moment I can't even get
> this toy standalone example to work. (For reference, the source for
> toy.c is below)
>
> If anyone has any tips or pointers, or links to good tutorials on the
> use of standalone checkpointing, I'd be much obliged.
Look in the archives here:
https://lists.cs.wisc.edu/archive/condor-users/2010-September/msg00026.shtml
And here:
https://lists.cs.wisc.edu/archive/condor-users/2011-January/msg00060.shtml
Short answer:
$ setarch i386 -R -L ./toy
Or better so you get some debugging output:
$ setarch i386 -R -L ./toy -_condor_D_ALL
And then to restart:
$ setarch i386 -R -L ./toy -_condor_restart toy.ckpt
--
Dan