[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Cannot use multiple arguments to run MPI application in "parallel" universe



hufh,

We have a workaround for submitting your job with the current openmpiscript. You can write a wrapper script for your mpi application that uses some logic to decide which arguments should be passed:

---
#!/bin/sh
# mpi_wrapper.sh

# use whatever logic you like to decide which application gets which arguments
if [ "$_CONDOR_PROCNO" -lt 2 ]; then
    exec ./mympiapp args1
else
    exec ./mympiapp args2
fi
---

HTCondor provides the $_CONDOR_PROCNO variable, which is the node number. There may also be useful values in the job ClassAd, which is in a file named .job.ad in the same directory that the wrapper script runs.

Then you can submit the job with:

---
universe = parallel
executable = /public/openmpiscript
arguments = mpi_wrapper.sh
transfer_input_files = mympiapp, mpi_wrapper.sh
machine_count = 4
queue
---


I ran a job on my test cluster using a wrapper script with a "hello world" mpi application that prints out arguments and got this output:
Hello world from processor condor-el7.test, rank 1 out of 4 processors
I was given argument _CONDOR_PROCNO=1
I was given argument args1
Hello world from processor condor-el7.test, rank 0 out of 4 processors
I was given argument _CONDOR_PROCNO=0
I was given argument args1
Hello world from processor condor-el7-clone.test, rank 2 out of 4 processors
I was given argument _CONDOR_PROCNO=2
I was given argument args2
Hello world from processor condor-el7-clone.test, rank 3 out of 4 processors
I was given argument _CONDOR_PROCNO=3
I was given argument args2


Will this work for you?

Jason

On Tue, Oct 30, 2018 at 8:43 AM Jason Patton <jpatton@xxxxxxxxxxx> wrote:
As it's currently written, the openmpiscript helper script for running Open MPI jobs in parallel universe cannot support MPMD jobs. I have a plan for how to add support, though, and have created a ticket to track progress: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6811

Jason

On Tue, Oct 30, 2018 at 6:33 AM hufh <hufh2004@xxxxxxxxx> wrote:
HI Jason,

I tried to google something about condor's MPMD support, but only got one email thread without answer from condor user mail list,
Could you tell me if condor supports MPMD?

In addition, i need to schedule my MPI jobs using "condor_submit_dag" because there is a requirement for workflow.

Thanks.

hufh


On Tue, Oct 30, 2018 at 7:34 AM Jason Patton <jpatton@xxxxxxxxxxx> wrote:
Are you wanting something like this?


Jason

On Mon, Oct 29, 2018, 4:51 PM Jason Patton <jpatton@xxxxxxxxxxx> wrote:
hufh,

The way openmpiscript works is that mpirun is executed on a head node, which then sets up worker processes on all of the nodes. Setting up the worker processes includes passing arguments that were passed to mpirun on the head node. Thus, only arguments that are passed to the head node ("args1") will be reflected in your mpi application. I believe this is how mpirun works in general... I'm not aware of how to pass per-worker argument lists to mpi applications.

What is it that your job is trying to accomplish? Is "mympiapp" meant to be run with a controlling head process?

Jason

On Fri, Oct 26, 2018 at 11:04 AM hufh <hufh2004@xxxxxxxxx> wrote:
Dear,

I tried to run a OpenMPI application with different arguments by writing two "queue" section in"parallel" universe, 

Here is my submission file:
universe=parallel
executable=/public/openmpiscript
machine_count=2
arguments="mympiapp args1"
queue

machine_count=2
arguments="mympiapp args2"
queue

I am expecting that two "mympiapp" instances running with "args1" and "args2" respectively, each running on two machines, but the reality is that only the first one("mympiapp args1") is running, but on four machines, not two machines, looks like it is using the resource claimed in two queues).

Who can give me hand? Thanks a lot.

hufh


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/