[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] checkpointing produces segfault



On Mon, Feb 27, 2006 at 02:57:22PM -0600, Patrick Huber wrote:
> Hi,
> 
> I have a somewhat strange problem. I linked my code with 
> condor_compile and everything worked just fine. Also 
> checkpointing worked fine.  Now, it stopped working, more 
> precisely: the program segfaults at random times, but runs fine 
> otherwise. It seems that only ca. 50% of jobs are affected.
> I have no clue what component in the system changed. The userlog 
> tells something like:
> 

Do you run on a pool with both regular and "bigmem" kernels? You can't
checkpoint on a regular kernel and then on a bigmem kernel, or
vice versus (ie once you checkpoint on one you must checkpoint 
on that same flavor for the the rest of your job). Same thing
moving between some 2.4 and 2.6 kernels. 

Condor 6.7.15 should rewrite your requirements expressions so
you don't run into this. 

-Erik