Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Problem restarting from standalone checkpointed programs.
- Date: Thu, 2 Sep 2010 17:15:20 -0500
- From: Daniel Forrest <dan.forrest@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Problem restarting from standalone checkpointed programs.
Tomas Oppelstrup wrote:
> Hi,
> I have a problem restarting programs with standalone checkpointing.
> I hope this may be the correct forum for this type of question,
> otherwise I apologize and wish to be pointed to an apropriate
> place.
>
> I have a workstation running 32-bit RedHat Enterprise Linux 5.
> (e.g.
>
> sh-3.2$ uname -a
> Linux sinclaire.llnl.gov 2.6.18-128.el5 #1 SMP \
> Wed Dec 17 11:42:39EST 2008 i686 i686 i386 GNU/Linux
> )
You are running a newer kernel, so...
> === RUNNING PROGRAM ===
>
> sh-3.2$ setarch i686 -R ./cotest1 -_condor_D_ALL
This is a problem. You need "setarch i686 -R -L ..."
The root problem is that in newer kernels system calls go through a
page called the VDSO and its location is also randomized.
The checkpointed program remembers the VDSO at a different location
than where it is when the program is restarted so it will fail on the
first system call after the checkpoint is restored.
Specifying the "-L" option puts the VDSO at a standard location.
Note that this limits the amount of memory that can be malloc'd to
around 1GB (the VDSO is mapped at 0x40000000-0x40001000 which limits
the top of the heap) so it isn't the greatest solution.
--
Dan