[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] MPI problem



Hi all,

I'm trying to get lammpi working under condor, but I'm running into some issues. It's a simple MPI hello world. It runs fine under lam:

[bgoncalves@underdark temp]$ lamboot hostfile.txt

LAM 7.1.1 /MPI 2 C++/ROMIO - Indiana University

[bgoncalves@underdark temp]$ mpirun -np 5 ./hello.x
Hello World! I am 0 of 5
Hello World! I am 2 of 5
Hello World! I am 4 of 5
Hello World! I am 1 of 5
Hello World! I am 3 of 5
[bgoncalves@underdark temp]$ lamhalt

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

[bgoncalves@underdark temp]$ 

but when I submit it to condor using:

universe = parallel
executable = lamscript
arguments = /home/bgoncalves/progs/temp/hello.x
Output = paralle.out.$(CLUSTER).$(NODE)
machine_count=5
queue 1

all I get on the "Output" files is:

error 0 chirp putting identity keys back

and condor email says:

Here are the machines that ran your MPI job.
They are listed in the order they were started
in, which is the same as MPI_Comm_rank.

   Machine Name               Result
 ------------------------    -----------
pumpkin110.physics.emory.edu    exited normally with status 255
pumpkin108.physics.emory.edu    was removed by the user
pumpkin207.physics.emory.edu    was removed by the user
pumpkin109.physics.emory.edu    exited normally with status 255
pumpkin205.physics.emory.edu    was removed by the user

Have a nice day.
What am I doing wrong?
Thanks!

Bruno

--
*******************************************
Bruno Miguel Tavares Goncalves, MS
PhD Candidate
Emory University
Department of Physics
Office No. N117-C
400 Dowman Drive
Atlanta, Georgia 30322
Homepage: www.bgoncalves.com
Email: bgoncalves@xxxxxxxxx
Phone: (404) 712-2441
Fax:   (404) 727-0873
*******************************************