[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Checkpointing on Windows pool PCs: I need little help...
- Date: Fri, 16 Sep 2011 07:52:24 -0700 (PDT)
- From: Rob <spamrefuse@xxxxxxxxx>
- Subject: Re: [Condor-users] Checkpointing on Windows pool PCs: I need little help...
Here are my observation results on checkpointing with Windows:
A running program gets indeed the CTRL_SHUTDOWN_EVENT when Windows shuts down (and there's enough time to create checkpoint files on the local machine), but by then apparently Condor and/or the network are already in a "dead-enough" status, so that communicating with the condor master cannot happen anymore.
Upon boot up, the Windows computer does a clean up of the remainders of previous jobs, so that the job's history/checkpoint data is lost.
The only remedy here is to do regular checkpointing.
But how can I tell Condor to transfer the checkpoint files from the pool PC to the master, without evicting the job?