[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Standalone checkpointing problem



Hello all,

I am new to Condor, and trying to use the Condor standalone checkpointing library (to later integrate it within our Grid Engine cluster); and I have a problem, which solution I couldn't find in the doc or in the mailing-list archives...

After succesful compilation of a simple example program "ever", i launch it, send it a USR2 signal, terminate it, without problem. But when i restart it from the checkpoint file, the name shown in "ps" is " i686 ./ever", which seems weird.

[acarrio@localhost ~] $ ./ever &
[1] 11254
Condor: Notice: Will checkpoint to ./ever.ckpt
Condor: Notice: Remote system calls disabled.
[acarrio@localhost ~] $ killall -s USR2 ever
[acarrio@localhost ~] $ killall ever
[acarrio@localhost ~] $ ./ever -_condor_restart ever.ckpt &
[2] 11257
[1] Terminated ./ever
Condor: Notice: Will restart from ever.ckpt
[acarrio@localhost ~] $ ps u | grep ever | grep -v grep
acarrio 11257 99.9 0.4 2028 1084 pts/3 R 11:30 0:05 i686 ./ever


I am using Fedora Core 3, with the Condor 6.6.9 RPM, and gcc 3.4.2 as provided by FC3. The CPU, in case of being relevant, is AMD Sempron...

Also, making other tests, I have seen about similar strange process names for other programs, and some checkpoint file names had strange characters.

Is it a configuration issue (I haven't changed the condor configuration file), or something else ?

Thanks in advance for any idea or explanation (or link to a part of a doc I have not found),

Alan