[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Checkpointing Condor's vanilla universe jobs.



Thanks for your contribution.

I used it to my Condor cluster and it works greatly. Only a suggestion: you 
can trap Condor's signals to force to your programs to make a checkpoint.
When Condor vacates a program, it sends it a signal (killsig) 
(http://www.cs.wisc.edu/condor/manual/v7.0/2_7Priorities_Preemption.html#SECTION00373000000000000000)
Trapping this signal, programs could make a checkpoint before stop.

Cheers,
José



El Monday 04 February 2008 12:37:43 Mark Calleja escribió:
> Hi,
>
> In case it's of use or interest to anyone else on this mailing list,
> I've written some notes on how one can use Parrot and the BLCR kernel
> modules to transparently checkpoint Condor's vanilla universe jobs. The
> link is:
>
> http://www.escience.cam.ac.uk/projects/camgrid/blcr.html
>
> This is recent/ongoing work, so feedback and/or bug reports back to me
> please.
>
> Cheers,
> Mark