[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Checkpointing on Windows pool PCs: I need little help...



Hi,

Here are my observation results on checkpointing with Windows:

A running program gets indeed the CTRL_SHUTDOWN_EVENT when Windows shuts down (and there's enough time to create checkpoint files on the local machine), but by then apparently Condor and/or the network are already in a "dead-enough" status, so that communicating with the condor master cannot happen anymore.
Upon boot up, the Windows computer does a clean up of the remainders of previous jobs, so that the job's history/checkpoint data is lost.

The only remedy here is to do regular checkpointing.

But how can I tell Condor to transfer the checkpoint files from the pool PC to the master, without evicting the job?

Thanks,

Rob