To follow up on what I have done to link 3 compiled files into 1 checkpointable MPI executable:
It is the "IS" Nas Par benchmark application written in c.
1. cc -g -o setparams setparams.c
2. cc -g -c -I/tmp/NPB3.3/NPB3.3-MPI/common is.c
3. cc -g -c -I/tmp/NPB3.3/NPB3.3-MPI/common c_print_results.c
4. cc -g -c -I/tmp/NPB3.3/NPB3.3-MPI/common c_timers.c
-- These steps create is.o, c_timers.o and c_print_results.o files. Then I do,
5. condor_compile cc -g -fPIC -lmpich -static -I/tmp/NPB3.3/NPB3.3-MPI/common -L/tmp/NPB3.3/NPB3.3-MPI/common -o ../bin/is.S.4 is.o ../common/c_print_results.o ../common/c_timers.o
Steps 1-4 works fine and the compiler is mpich2-1.1.1p1/64/nemesis-gcc-4.4.0/ and redhat linux kernel 2.6.18-164.9.1.el5 (I think..). But step 5 fails. It seems like whatever I do, the linking fails by saying error messages like:
/tmp/NPB3.3/NPB3.3-MPI/IS/is.c:1120: undefined reference to `MPI_Finalize'
../common/c_timers.o: In function `timer_start':
/tmp/NPB3.3/NPB3.3-MPI/common/c_timers.c:20: undefined reference to `MPI_Wtime'
../common/c_timers.o: In function `timer_stop':
/tmp/NPB3.3/NPB3.3-MPI/common/c_timers.c:31: undefined reference to `MPI_Wtime'
But /tmp/NPB3.3/NPB3.3-MPI/common contains the mpi.h file and libmpich.a files as well..
Anybody been through this?
On Thu, Apr 1, 2010 at 6:07 PM, Tanzima Zerin Islam <tislam@xxxxxxxxxx>
Does condor support checkpointing in MPI universe? I have a simple mpi application that I want to run in condor and take checkpoint periodically.
It may be a vanilla universe job where I will have a shell script executing mpirun. I have a few naive questions to ask. Please feel free to point me to any document you feel is going to answer my questions. So far, I have read about different checkpointing libraries for mpi apps, but have not found much on the core checkpointing scheme that condor uses for mpi applications.
1. Which mpi library should be used to compile my mpi application so that the executable is checkpointable?
2. Has anyone used mpich-V with condor's checkpoint library that they provide here
? I could not even get mpich2-1.2.1p1 to install on my ubuntu machine... So thought, there might be some other way one can compile his mpi apps to make the executable checkpointable. My gcc version is 4.4.1 btw.
I have done condor_compile and taken checkpoints by sending signal to my serial jobs and that works just fine. Now its mpi's turn... I will appreciate any help I get.
Tanzima Zerin Islam
School of Electrical & Computer Engineeringweb.ics.purdue.edu/~tislam/