[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor and MPICH2



Hi,

I have MPICH2 and condor 6.8 (setup to run parallel job) on Linux.

Suppose if we want to submit an mpi job to condor.

In the job specification, if we specify the mpi executable is enough? (Option 1)

Or do we need to create some shell script wrapper that calls the mpi executable with the mpiexec command, and specify that shell script as the executable (Option 2)

 

Option 1

*******

Universe = parallel

executable = cpi

output  = cpi$(NODE).out

error   = cpi$(NODE).error

Log     = cpi.log

machine_count = 4

should_transfer_files = yes

when_to_transfer_output = on_exit

queue

 

Option 2

*******

Universe = parallel

executable = jobfile.sh

output  = cpi$(NODE).out

error   = cpi$(NODE).error

Log     = cpi.log

machine_count = 4

should_transfer_files = yes

when_to_transfer_output = on_exit

queue

 

jobfile.sh

********

#!/bin/sh

mpiexec -np 2 cpi

 

 

When I ran using Option 1, job ran only on couple of nodes and become idle.

With this error in the log file “UserPolicy Error: No signal/exit codes in job ad!”

 

When I ran using Option 2, job fails and complaining about mpd.conf file is not available, though this file in the path and even I tried to attach with job but nothing worked.

 

Could you please let me know how to submit parallel jobs to condor which uses MPICH2.

Thanks,

Senthil