[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Segfault during basic use of standalone checkpointing



Thanks, that's exactly what I was looking for. Further reading
suggests that I can enable these properties globally by adding the
following lines to /etc/sysctl.conf:

vm.legacy_va_layout = 1
kernel.randomize_va_space = 0

I haven't tried doing this yet, but I plan to give this a try. Have
others also used this as a fix?

Thanks,
Lane



On Tue, Mar 8, 2011 at 2:38 PM, Daniel Forrest
<dan.forrest@xxxxxxxxxxxxx> wrote:
> Lane Schwartz wrote:
>
>> Hi, I'm new to condor. I just installed condor 7.4.4 on Centos 5.5,
>> and I'm trying to try out standalone checkpointing for the first time.
>> Unfortunately, I'm getting a segmentation fault when I try to restart
>> a program using a checkpoint file.
>>
>> I've been following the instructions in section 4.2.1 of the manual
>> (http://www.cs.wisc.edu/condor/manual/v6.4/4_2Condor_s_Checkpoint.html).
>> Details are below:
>>
>> I have a program called toy.c:
>>
>> $ condor_compile gcc -o toy toy.c
>> LINKING FOR CONDOR ......(some more output).....
>>
>> $ ./toy
>> ...(TOY PROGRAM OUTPUT)....
>>
>> (control-Z)
>> ...(PROGRAM STOPS)...
>>
>> $ ./toy -_condor_restart ./toy.ckpt.tmp
>> Condor: Notice: Will restart from ./toy.ckpt.tmp
>> Segmentation fault
>>
>>
>> My eventual goal is to use condor for transparent checkpointing of
>> jobs using SGE (Sun Grid Engine). But at the moment I can't even get
>> this toy standalone example to work. (For reference, the source for
>> toy.c is below)
>>
>> If anyone has any tips or pointers, or links to good tutorials on the
>> use of standalone checkpointing, I'd be much obliged.
>
> Look in the archives here:
>
> https://lists.cs.wisc.edu/archive/condor-users/2010-September/msg00026.shtml
>
> And here:
>
> https://lists.cs.wisc.edu/archive/condor-users/2011-January/msg00060.shtml
>
>
> Short answer:
>
> $ setarch i386 -R -L ./toy
>
> Or better so you get some debugging output:
>
> $ setarch i386 -R -L ./toy -_condor_D_ALL
>
>
> And then to restart:
>
> $ setarch i386 -R -L ./toy -_condor_restart toy.ckpt
>
> --
> Dan
>



-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
                -- R.A. Heinlein, "Time Enough For Love"