[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] MS-Windows + VM universe: checkpointing always possible?




Hi,

I have a plan to create a large (~500) condor pool of Windows XP PCs.

Using the vanilla universe, unlike with Linux pool PCs, it is generally not
possible to checkpoint jobs on Windows pool PCs.

(Unless you have access to the job's source code, one can modify the
code to intercept the condor suspend signal on Windows systems, and
save certain data to disk before the job is removed; however, with
commercial executable-only code this is not possible).


I have two questions:

1.
What if VMWare is installed on the Windows PCs and I use the condor
VM universe? Is it then ALWAYS possible to checkpoint jobs on Windows
PCs when they are suspended?


2.
The manual
(http://www.cs.wisc.edu/condor/manual/v7.2/2_11Virtual_Machine.html)
says that "vm universe jobs can not use a checkpoint server."

Does that mean, the condor master also has to take care of the checkpoints?
Then: should the design considerations for the master also include the
considerations that apply to checkpoint servers (i.e. high network load due
the checkpoint traffic, huge disk space for large checkpoint files,...)?


Thanks!
Rob.