[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Issues with checkpointing

This is again related to the problem with jobs not checkpointing when
evicted. If anyone has any insight, I would appreciate it.

The executable is weiweicase10. I get the following message when I run the
program on a local station from a terminal:

Condor: Notice: Will checkpoint to weiweicase10.ckpt
Condor: Notice: Remote system calls disabled.
<program runs a while>
<I press CONTROL-Z to suspend the job>

and its killed. I'm wondering if the job is supposed to be suspended
rather than be killed in order to be able to checkpoint. This executable
was compiled from a fortran 90 program.

In that case, is there something we are supposed to do to make the
executable suspendable?

Where would the checkpoints be created, and which directory?

Brian C. Dandurand
Clemson University
Department of Mathematical Sciences
Ph.D. Student
Office: Martin Hall E-6
Office Phone: (864)656-4749