[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Fwd: Checkpointing in Condor's MPI universe




Hi all,
 Does condor support checkpointing in MPI universe? I have a simple mpi application that I want to run in condor and take checkpoint periodically.
It may be a vanilla universe job where I will have a shell script executing mpirun. I have a few naive questions to ask. Please feel free to point  me to any document you feel is going to answer my questions. So far, I have read about different checkpointing libraries for mpi apps, but have not found much on the core checkpointing scheme that condor uses for mpi applications.

1. Which mpi library should be used to compile my mpi application so that the executable is checkpointable?
2. Has anyone used mpich-V with condor's checkpoint library that they provide here? I could not even get mpich2-1.2.1p1 to install on my ubuntu machine... So thought, there might be some other way one can compile his mpi apps to make the executable checkpointable. My gcc version is 4.4.1 btw.

I have done condor_compile and taken checkpoints by sending signal to my serial jobs and that works just fine. Now its mpi's turn... I will appreciate any help I get.

Thank you,
--Tanzima