[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] checkpointing in windows



On Fri, Feb 24, 2006 at 09:15:33AM +0000, Matt Hope wrote:
> On 2/23/06, Kerbel, Kit <kkerbel@xxxxxxxxxxxxxxx> wrote:
> > Does anyone know a timeline for when checkpointing might be possible in
> > windows...as it is a bit useless to me for my purposes as is...the cluster
> > could work for 2 weeks straight, crash and the lose all work that was done.
> > Any ideas are more than welcome.
> 
> I am not a member of the condor development team but :
> 
> Given how complex this is don't expect it any time soon (how I would
> love to be proved wrong on this!), indeed I would be tempted to say
> that, unless you are capable of supplying serious amounts of funding
> (or have some body which is willing to do it) then ice skating to work
> will be the devil's way of avoiding fuel price rises before windows
> gets a proper standard universe.
> 

UNIX-style checkpointing (relinking with a new C library) isn't 
likely anytime soon. What we're more likely to get in place before that
is stronger support for Virtual Machines, ala Xen/VMWare/VirtualPC. 
In that case, you'd just checkpoint the whole VM, and you can migrate
between "Linux" and "Windows" machine with no big deal. (Nearly anything
that won't work in such a checkpointing scheme wouldn't work with
UNIX-style checkpointing either). 

I don't know all of the licensing issues with running jobs in a VM
instance of Windows, so it's a little harder than Linux. It's easier
to build Linux VMs as well, since you can make them smaller - maybe
we could run Condor jobs in just a Windows PE virtual machine to make
it smaller too. 

Anyway, 6.8 will have some basic Virtual Machine support, and then
in 6.9 we'll be expanding on it. 

-Erik

ps we may rename the current "virtual machines" in Condor, since
it's confusing even now, and it might just get worse if 
vm1@xxxxxxxxxxxxxxxxxx is running a VM job... we're still thinking...