[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Issues with checkpointing
- Date: Fri, 19 Oct 2007 10:38:34 -0500
- From: Daniel Forrest <forrest@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Issues with checkpointing
> This is again related to the problem with jobs not checkpointing when
> evicted. If anyone has any insight, I would appreciate it.
> The executable is weiweicase10. I get the following message when I
> run the program on a local station from a terminal:
> Condor: Notice: Will checkpoint to weiweicase10.ckpt
> Condor: Notice: Remote system calls disabled.
> <program runs a while>
> <I press CONTROL-Z to suspend the job>
> and its killed. I'm wondering if the job is supposed to be suspended
> rather than be killed in order to be able to checkpoint. This executable
> was compiled from a fortran 90 program.
> In that case, is there something we are supposed to do to make the
> executable suspendable?
> Where would the checkpoints be created, and which directory?
Checkpoints should be created in the current directory.
Try running it like this:
weiweicase10 -_condor_D_ALL [any other args]
In order to get some debugging output.
Operating system and condor version may be helpful too.
Daniel K. Forrest Laboratory for Molecular and
forrest@xxxxxxxxxxxxx Computational Genomics
(608) 262 - 9479 University of Wisconsin, Madison