[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Cannot use multiple arguments to run MPI application in "parallel" universe



In the submit description you could even add job attributes to define the arguments, and then have the script pull them out of the .job.ad file, to have a bit more flexibility:

MY.Node0Args = âargs1â
MY.OtherNodeArgs = âargs2â

#!/bin/sh
âetcâ
If [ â$_CONDOR_PROCNOâ -eq 0 ]; then
	exec ./mympiapp $(condor_q -jobads $_CONDOR_JOB_AD -af Node0Args)
else
	exec ./mympiapp $(condor_q -jobads $_CONDOR_JOB_AD -af OtherNodeArgs)
fi

Michael V. Pelletier
Information Technology
Digital Transformation & Innovation
Integrated Defense Systems
Raytheon Company

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Jason Patton
Sent: Tuesday, October 30, 2018 4:27 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [External] Re: [HTCondor-users] Cannot use multiple arguments to run MPI application in "parallel" universe

hufh, 

We have a workaround for submitting your job with the current openmpiscript. You can write a wrapper script for your mpi application that uses some logic to decide which arguments should be passed:

---
#!/bin/sh
# mpi_wrapper.sh

# use whatever logic you like to decide which application gets which arguments
if [ "$_CONDOR_PROCNO" -lt 2 ]; then
  exec ./mympiapp args1
else
  exec ./mympiapp args2
fi
---

HTCondor provides the $_CONDOR_PROCNO variable, which is the node number. There may also be useful values in the job ClassAd, which is in a file named .job.ad in the same directory that the wrapper script runs.

Then you can submit the job with:

---
universe = parallel
executable = /public/openmpiscript
arguments = mpi_wrapper.sh
transfer_input_files = mympiapp, mpi_wrapper.sh
machine_count = 4
queue
---


I ran a job on my test cluster using a wrapper script with a "hello world" mpi application that prints out arguments and got this output:
Hello world from processor condor-el7.test, rank 1 out of 4 processors
I was given argument _CONDOR_PROCNO=1
I was given argument args1
Hello world from processor condor-el7.test, rank 0 out of 4 processors
I was given argument _CONDOR_PROCNO=0
I was given argument args1
Hello world from processor condor-el7-clone.test, rank 2 out of 4 processors
I was given argument _CONDOR_PROCNO=2
I was given argument args2
Hello world from processor condor-el7-clone.test, rank 3 out of 4 processors
I was given argument _CONDOR_PROCNO=3
I was given argument args2


Will this work for you?

Jason

On Tue, Oct 30, 2018 at 8:43 AM Jason Patton <jpatton@xxxxxxxxxxx> wrote:
As it's currently written, the openmpiscript helper script for running Open MPI jobs in parallel universe cannot support MPMD jobs. I have a plan for how to add support, though, and have created a ticket to track progress:Âhttps://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6811

Jason