Hi,
Here are my observation results on checkpointing with Windows:
A running program gets indeed the CTRL_SHUTDOWN_EVENT when Windows shuts down (and there's enough time to create checkpoint files on the local machine), but by then apparently Condor and/or the network are already in a "dead-enough" status, so that communicating with the condor master cannot happen anymore.
Upon boot up, the Windows computer does a clean up of the remainders of previous jobs, so that the job's history/checkpoint data is lost.
The only remedy here is to do regular checkpointing.
But how can I tell Condor to transfer the checkpoint files from the pool PC to the master, without evicting the job?
Thanks,
Rob
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/