[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Cannot use multiple arguments to run MPI application in "parallel" universe



As it's currently written, the openmpiscript helper script for running Open MPI jobs in parallel universe cannot support MPMD jobs. I have a plan for how to add support, though, and have created a ticket to track progress: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6811

Jason

On Tue, Oct 30, 2018 at 6:33 AM hufh <hufh2004@xxxxxxxxx> wrote:
HI Jason,

I tried to google something about condor's MPMD support, but only got one email thread without answer from condor user mail list,
see https://lists.cs.wisc.edu/archive/htcondor-users/2005-July/msg00284.shtml
Could you tell me if condor supports MPMD?

In addition, i need to schedule my MPI jobs using "condor_submit_dag" because there is a requirement for workflow.

Thanks.

hufh


On Tue, Oct 30, 2018 at 7:34 AM Jason Patton <jpatton@xxxxxxxxxxx> wrote:
Are you wanting something like this?


Jason

On Mon, Oct 29, 2018, 4:51 PM Jason Patton <jpatton@xxxxxxxxxxx> wrote:
hufh,

The way openmpiscript works is that mpirun is executed on a head node, which then sets up worker processes on all of the nodes. Setting up the worker processes includes passing arguments that were passed to mpirun on the head node. Thus, only arguments that are passed to the head node ("args1") will be reflected in your mpi application. I believe this is how mpirun works in general... I'm not aware of how to pass per-worker argument lists to mpi applications.

What is it that your job is trying to accomplish? Is "mympiapp" meant to be run with a controlling head process?

Jason

On Fri, Oct 26, 2018 at 11:04 AM hufh <hufh2004@xxxxxxxxx> wrote:
Dear,

I tried to run a OpenMPI application with different arguments by writing two "queue" section in"parallel" universe, 

Here is my submission file:
universe=parallel
executable=/public/openmpiscript
machine_count=2
arguments="mympiapp args1"
queue

machine_count=2
arguments="mympiapp args2"
queue

I am expecting that two "mympiapp" instances running with "args1" and "args2" respectively, each running on two machines, but the reality is that only the first one("mympiapp args1") is running, but on four machines, not two machines, looks like it is using the resource claimed in two queues).

Who can give me hand? Thanks a lot.

hufh


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/