[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] checkpointing in windows

On Fri, Feb 24, 2006 at 09:15:33AM +0000, Matt Hope wrote:
> On 2/23/06, Kerbel, Kit <kkerbel@xxxxxxxxxxxxxxx> wrote:
> > Does anyone know a timeline for when checkpointing might be possible in
> > windows...as it is a bit useless to me for my purposes as is...the cluster
> > could work for 2 weeks straight, crash and the lose all work that was done.
> > Any ideas are more than welcome.
> I am not a member of the condor development team but :
> Given how complex this is don't expect it any time soon (how I would
> love to be proved wrong on this!), indeed I would be tempted to say
> that, unless you are capable of supplying serious amounts of funding
> (or have some body which is willing to do it) then ice skating to work
> will be the devil's way of avoiding fuel price rises before windows
> gets a proper standard universe.

UNIX-style checkpointing (relinking with a new C library) isn't 
likely anytime soon. What we're more likely to get in place before that
is stronger support for Virtual Machines, ala Xen/VMWare/VirtualPC. 
In that case, you'd just checkpoint the whole VM, and you can migrate
between "Linux" and "Windows" machine with no big deal. (Nearly anything
that won't work in such a checkpointing scheme wouldn't work with
UNIX-style checkpointing either). 

I don't know all of the licensing issues with running jobs in a VM
instance of Windows, so it's a little harder than Linux. It's easier
to build Linux VMs as well, since you can make them smaller - maybe
we could run Condor jobs in just a Windows PE virtual machine to make
it smaller too. 

Anyway, 6.8 will have some basic Virtual Machine support, and then
in 6.9 we'll be expanding on it. 


ps we may rename the current "virtual machines" in Condor, since
it's confusing even now, and it might just get worse if 
vm1@xxxxxxxxxxxxxxxxxx is running a VM job... we're still thinking...