[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Is it possible to use condor checkpoint library with MPI



jakadeesan gopinatha wrote:
Hi,

I would like to know is it possible to use condor
checkpoint library with MPI application.

I am using HP XC cluster and its provided MPI library.
I would like to know is it possible to use condor
checkpoint library to checkpoint individual MPI
processes?


No, it is not going to work out of the box, sorry. There is a lot of extra work that would need to happen that the Condor checkpoint library does not deal with (like flushing data in transit on the wire, and coordination with the other ranks).

You may want to check out the LAM mpi implementation. IIRC, LAM supports checkpointing with help from the berkely checkpoint library. And Condor's parallel universe has been used to manage LAM jobs...

Good luck,
Todd