On Thu, Apr 8, 2010 at 10:34 AM, Greg Thain <gthain@xxxxxxxxxxx>
I'm sorry, but the condor checkpointing technology is unable to checkpoint processes that have inter-process communication of most any kind. It can't checkpoint MPI codes today.
Tanzima Zerin Islam wrote:
To follow up on what I have done to link 3 compiled files into 1 checkpointable MPI executable:
It is the "IS" Nas Par benchmark application written in c.
1. cc -g -o setparams setparams.c
2. cc -g -c -I/tmp/NPB3.3/NPB3.3-MPI/common is.c
3. cc -g -c -I/tmp/NPB3.3/NPB3.3-MPI/common c_print_results.c
4. cc -g -c -I/tmp/NPB3.3/NPB3.3-MPI/common c_timers.c
If all the MPI nodes are going to run on the same machine, you might want to investigate the DMTCP checkpointing libraries on source forge.
Sorry, and good luck,