[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Parallel MPI Job

On Sun, 2012-09-02 at 10:29 -0700, patrick cadelina wrote:
> Hi,
> I'm trying to run a simple parallel MPI hello world on condor but I
> keep getting errors. My code works using mpirun. Here's my submit
> file:
> universe = parallel
> requirements = (TARGET.OpSys=="LINUX" && TARGET.Arch=="INTEL")
> executable = mp2script
> arguments = hello
> log = hello.log
> output = hello.out
> error = hello.err
> machine_count = 2
> should_transfer_files = yes
> when_to_transfer_output = on_exit
> transfer_input_files = hello
> +ParallelShutdownPolicy = "WAIT_FOR_ALL"
> queue
> And here's the error that I get from the generated files:
> mpd.out.0:
> /var/lib/condor/execute/dir_3282/condor_exec.exe:
> 60: /var/lib/condor/execute/dir_3282/condor_exec.exe: mpd: not found
> mpd.out.1:
> /var/lib/condor/execute/dir_5103/condor_exec.exe:
> 101: /var/lib/condor/execute/dir_5103/condor_exec.exe: mpd: not found
When you say your code works outside of Condor using mpirun and succeeds
and you have no mpd installed according to mp2script that tells me
mpirun is using a different process manager than mpd (which is a good
thing IMHO).

Before pursuing installation of mpd, I would look to see if other
process managers are being used.  As I recall some mpi implementations
have a mechanism to run in mpich1 mode, which doesn't use mpd. You might
want to look at your mpirun or mpiexec man page to see if you have that
option or the option to use hydra.

Here's a script (to replace mp2script) that I've used with intel mpi to
avoid using mpd. I've also replaced the mpirun line in the same script

mpiexec -launcher ssh  -n $_CONDOR_NPROCS -f ${MACHINE_FILE} $EXECUTABLE

for MPICH2 (MPDIR=/usr/lib64/mpich2/bin) where the launcher was hydra.



export PATH


# Remove the contact file, so if we are held and released
# it can be recreated anew


PATH=`condor_config_val libexec`/:$PATH

if [ $_CONDOR_PROCNO -eq 0 ]
      echo "trying"

	echo "setting up "
	SLOTS=$($(condor_config_val libexec)/condor_chirp get_job_attr

	echo $SLOTS |  sed -e 's/\"\(.*\)\".*/\1/' -e 's/,/\n/g' |tr  "@" "\n"|
grep -v slot >> ${MACHINE_FILE}
        echo "---"
	echo "---"

        echo "running job"
	## run the actual mpijob in mpich1 mode
       	mpirun  -f ${MACHINE_FILE} -machinefile ${MACHINE_FILE} -n

	sleep 20
	echo "first node out"
	echo $e
	echo "second node out"

> Any help would be appreciated. Thanks!
> Regards,
> Pat
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
> The information transmitted is intended only for the person or entity
> to which it is addressed and may contain confidential and/or
> privileged material. Any review,retransmission,dissemination or other
> use of, or taking of any action in reliance upon, this information by
> persons or entities other than the intended recipient is prohibited.
> If you received this in error, please contact the sender and delete
> the material from any computer.
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/