[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Cannot use multiple arguments to run MPI application in "parallel" universe



On Thu, Nov 15, 2018 at 10:31 AM hufh <hufh2004@xxxxxxxxx> wrote:
Hi Jason, 

Is "a.out" in your script a MPI program?

Yes. It has to be referenced in both the submit file (to be transferred to the execute node) and the wrapper script (to be exec'd).

Here's my code for reference:

---
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  MPI_Init(NULL, NULL);

  // number of processes
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);

  // rank of the this process
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

  // name of this processor
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);

  // print hello world message
  printf("Hello world from processor %s, rank %d out of %d processors\n",
         processor_name, world_rank, world_size);

  // print arguments, one on each line
  for (int i = 1; i < argc; ++i) {
    printf("I was given argument %s\n",
argv[i]);
  }

  sleep(5);

  MPI_Finalize();
}
---

Jason
 

hufh

On Thu, Nov 15, 2018 at 11:03 PM Jason Patton <jpatton@xxxxxxxxxxx> wrote:
Here's my submit file:

---
universe = parallel

executable = openmpiscript
arguments = mpi_wrapper.sh
transfer_input_files = a.out, mpi_wrapper.sh
getenv = true

should_transfer_files = yes
when_to_transfer_output = on_exit_or_evict
+ParallelShutdownPolicy = "WAIT_FOR_ALL"

output = out.$(NODE)
error  = err.$(NODE)
log    = log

request_cpus = 1
machine_count = 4

queue
---

Here's mpi_wrapper.sh:

---
#!/bin/sh

if [ "$_CONDOR_PROCNO" -lt 2 ]; then
    exec ./a.out '_CONDOR_PROCNO='$_CONDOR_PROCNO args1
else
    exec ./a.out '_CONDOR_PROCNO='$_CONDOR_PROCNO args2
fi
---

I'm using $_CONDOR_PROCNO to figure out which node of my MPI job is running and passing arguments to my MPI application (a.out) based on its value.

Jason



On Thu, Nov 15, 2018 at 6:12 AM hufh <hufh2004@xxxxxxxxx> wrote:
Hi Jason,

Sorry for late reply. I have tried your method, but it didn't work. Could you please send me your submit file and other stuff so that I can try it on my machines.

Thanks for your help!

hufh
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/