[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Is it possible to use condor checkpoint library with MPI


Thanks for your suggesstion. Sure I will check LAM as
you had mentioned.

But I am interested in checkpointing at individual
process level instead of job level which you have

Also I was aware of the extra work you had mentioned
as condor checkpoint library deals with standalone and
doesn't take care of intransit data as you had
mentioned. Assuming that coordination & in-transit
message are taken care at application level. Is condor
checkpoint library compatible to use for checkpointing
individual MPI process (as in standalone program)?

Thanks in advance.


--- Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:

> jakadeesan gopinatha wrote:
> > Hi,
> > 
> > I would like to know is it possible to use condor
> > checkpoint library with MPI application.
> > 
> > I am using HP XC cluster and its provided MPI
> library.
> > I would like to know is it possible to use condor
> > checkpoint library to checkpoint individual MPI
> > processes?
> > 
> No, it is not going to work out of the box, sorry. 
> There is a lot of 
> extra work that would need to happen that the Condor
> checkpoint library 
> does not deal with (like flushing data in transit on
> the wire, and 
> coordination with the other ranks).
> You may want to check out the LAM mpi
> implementation.  IIRC, LAM 
> supports checkpointing with help from the berkely
> checkpoint library. 
> And Condor's parallel universe has been used to
> manage LAM jobs...
> Good luck,
> Todd
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/

Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.