[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Checkpointing Condor's vanilla universe jobs.



Thaks, Mark

I have a doubt about your code. Why do you stop the job before checkpointing 
it? I have probed that, but the checkpoint crashs. So, I have removed those 
commands. I don't find any instructions about that in the BLCR web page.

Saludos,
José


El Wednesday 27 February 2008 10:15:12 Mark Calleja escribió:
> Hi José,
>
> I'm glad you've got a version to suit your needs. Just in case, I've
> updated my online version (with documentation to reflect it) to perform
> a similar function.
>
> Cheers,
> Mark
>
> José M. Martín wrote:
> > Actually, I have modified your version to my enviroment and I have added
> > this "feature". Now, we are testing it under a production cluster with
> > about 15 users and 150 works.
> >
> > Thanks for your time.
> > --
> > José.
> >
> > El Tuesday 26 February 2008 10:48:20 Mark Calleja escribió:
> >> Hi José,
> >>
> >> Thanks for your feedback. What you suggest should not be too difficult
> >> to add, so if you're interested I can send you a modified version of the
> >> code for you to test.
> >>
> >> Cheers,
> >> Mark
> >>
> >> José M. Martín wrote:
> >>> Thanks for your contribution.
> >>>
> >>> I used it to my Condor cluster and it works greatly. Only a suggestion:
> >>> you can trap Condor's signals to force to your programs to make a
> >>> checkpoint. When Condor vacates a program, it sends it a signal
> >>> (killsig)
> >>> (http://www.cs.wisc.edu/condor/manual/v7.0/2_7Priorities_Preemption.htm
> >>>l# SECTION00373000000000000000) Trapping this signal, programs could
> >>> make a checkpoint before stop.
> >>>
> >>> Cheers,
> >>> José
> >>>
> >>> El Monday 04 February 2008 12:37:43 Mark Calleja escribió:
> >>>> Hi,
> >>>>
> >>>> In case it's of use or interest to anyone else on this mailing list,
> >>>> I've written some notes on how one can use Parrot and the BLCR kernel
> >>>> modules to transparently checkpoint Condor's vanilla universe jobs.
> >>>> The link is:
> >>>>
> >>>> http://www.escience.cam.ac.uk/projects/camgrid/blcr.html
> >>>>
> >>>> This is recent/ongoing work, so feedback and/or bug reports back to me
> >>>> please.
> >>>>
> >>>> Cheers,
> >>>> Mark
> >>>
> >>> _______________________________________________
> >>> Condor-users mailing list
> >>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
> >>> a subject: Unsubscribe
> >>> You can also unsubscribe by visiting
> >>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>>
> >>> The archives can be found at:
> >>> https://lists.cs.wisc.edu/archive/condor-users/
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/