[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] checkpointing produces segfault



Hi Erik,

thanks for that info. Just in case I need it, what would those
special requirements look like, since I am still running 6.7.14?


Do you run on a pool with both regular and "bigmem" kernels? You can't
checkpoint on a regular kernel and then on a bigmem kernel, or
vice versus (ie once you checkpoint on one you must checkpoint on that same flavor for the the rest of your job). Same thing moving between some 2.4 and 2.6 kernels.

The segfaults also happen even before the first time the job was
checkpointed, thus this can be at best a part of the problem. I
checked my code with valgrind and found indeed something which
may have caused a heap corruption. It seems that my problem is
gone for now...

Thanks,
Patrick

--
Dr. Patrick Huber                       Physics Department
                                        University of Wisconsin
Tel.:+1 608 262 2886                    1150 University Avenue
http://pheno.physics.wisc.edu/~phuber   Madison, WI 53706, USA