[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Checkpointing on Windows pool PCs: I need little help...



Hello,

I have used section 6.2.8 of the Condor manual and other references to get some kind of artificial checkpointing to work on Windows pool PCs, using the "SetConsoleCtrlHandler()" to catch the CTRL_CLOSE_EVENT, which allows me to save relevant data right before Condor throws the job from the Windows system.

In order to use the checkpointed data file, my program also checks at the beginning whether the checkpoint file exists, and if so, it initializes itself with that data, so that it continues where it has left off at the previous eviction.

All this works great!
And I can see the checkpoint files appear in the temporary spool directories of the master PC.


Now, there is another issue, that I'm unsure about:

All the Windows pool PCs are public computers in a university library.
By the end of the day, after the library has closed its doors, a library IT person shuts down all the PCs (the timing is not fixed; sometimes he does it before dinner, sometimes afterwards.....).
Especially at library closing time, most library PCs are not used and almost all are running Condor jobs. Upon shutdown, I expect Condor to just being squashed and hence no time for checkpointing. Is that right?

My question here is: would it work to also catch the "CTRL_SHUTDOWN_EVENT" in my program? Or is it already too late by then? (With "too late" I mean: at that stage the network interface and Condor are already dead!?!).

Could somebody give me some insight and considerations on this issue?

Thank you.

Rob.