[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Issues with checkpointing



This is again related to the problem with jobs not checkpointing when
evicted. If anyone has any insight, I would appreciate it.

The executable is weiweicase10. I get the following message when I run the
program on a local station from a terminal:

Condor: Notice: Will checkpoint to weiweicase10.ckpt
Condor: Notice: Remote system calls disabled.
...
<program runs a while>
<I press CONTROL-Z to suspend the job>

^ZKilled
unixlab03%
--------------------
and its killed. I'm wondering if the job is supposed to be suspended
rather than be killed in order to be able to checkpoint. This executable
was compiled from a fortran 90 program.

In that case, is there something we are supposed to do to make the
executable suspendable?

Where would the checkpoints be created, and which directory?

----------------------------------------
Brian C. Dandurand
Clemson University
Department of Mathematical Sciences
Ph.D. Student
Office: Martin Hall E-6
Office Phone: (864)656-4749
----------------------------------------