[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] My problems



On Tue, Feb 15, 2005 at 02:59:17AM -0800, Yenke Blaise wrote:
> I'm using Condor and I'm facing somme difficulties
> that could be resumed as follow :
> 1) After how many seconds is a checkpoint initiated
> while running an application in Condor?

It's configurable, and defined by PERIODIC_CHECKPOINT expression.
Note that it's set on the EXECUTE machine, not the submit machine,
so some machines can be set to create a periodic checkpoint of 
the jobs running on it more or less frequently than others.

> 2) Did Condor give the possibility to a user to
> checkpoint his running program at certain moment? In
> other words what can a user do to checkpoint is
> running program?

Send yourself a SIGUSR2, or call ckpt(). See
http://www.cs.wisc.edu/condor/manual/v6.6/4_2Condor_s_Checkpoint.html

> 3) I'd to know what kind of informations ar kept in
> the .ckpt file?
> 

The memory image of the process, signal state, and a list of open file
descriptors and the state of those descriptors. See the Dr Dobbs
article from 97, or the Litzkow/Solomon paper:

http://www.cs.wisc.edu/condor/publications.html#checkpoint