[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] checkpointing in windows



On 2/23/06, Kerbel, Kit <kkerbel@xxxxxxxxxxxxxxx> wrote:
> Does anyone know a timeline for when checkpointing might be possible in
> windows...as it is a bit useless to me for my purposes as is...the cluster
> could work for 2 weeks straight, crash and the lose all work that was done.
> Any ideas are more than welcome.

I am not a member of the condor development team but :

Given how complex this is don't expect it any time soon (how I would
love to be proved wrong on this!), indeed I would be tempted to say
that, unless you are capable of supplying serious amounts of funding
(or have some body which is willing to do it) then ice skating to work
will be the devil's way of avoiding fuel price rises before windows
gets a proper standard universe.

This applies only to standard universe style checkpointing of course.
you can do your own in response to the WM_CLOSE event. this is rather
more tricky to set up (lots more config must be set correctly for it
to actually work when you try) but is perfectly possible. You just
need to be able to save your state somehow and restart from said saved
state*.

I am looking to see if binary serialization in .Net 2.0 is totally
sorted. Make *everything* in your app [Serializable] then just write
your entire object graph, event handlers, anonymous delegates and all
to a file when you are at a well defined point where you can
'recreate' your position in the call stack.

Matt

* Oh how that glosses over it but there you go :)