Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] htcondor + gpudirect + openmpi

Date: Tue, 05 Sep 2017 21:04:20 +0200
From: Harald van Pee <pee@xxxxxxxxxxxxxxxxx>
Subject: [HTCondor-users] htcondor + gpudirect + openmpi

Dear all,

we want to use htcondor 8.6.5 in a gpu cluster with openmpi in the parallel 
universe.
Our main task will be to run openmpi with up to 16 gpus on nodes with 4 or 8 
gpus installed.
To profit from the p2p connection on the board we want to have 4 or 8 mpi 
processes running on one machine and not distributed over the whole cluster.

If we use for example
universe = parallel
executable = /mpi/openmpiscript
arguments = a.out 
machine_count = 2
request_cpus = 4
request_gpus = 4

the slots are reserved correct, but openmpiscript ignores the cpu request and 
starts 2 mpi processes in total and not 4 on each node used.

if I just copy the hosts 4 times 
sort -n -k 1 < $CONDOR_CONTACT_FILE | awk '{print $1}' > machines
sort -n -k 1 < $CONDOR_CONTACT_FILE | awk '{print $1}' >> machines
sort -n -k 1 < $CONDOR_CONTACT_FILE | awk '{print $1}' >> machines
sort -n -k 1 < $CONDOR_CONTACT_FILE | awk '{print $1}' >> machines
and use 
mpirun ... -n 8  -hostfile machines ...

the a.out processes are start, 4 on each machine, but all 4 processes a bound 
to the same core.

How can I manage that 4 a.out processes run on each machine and use 4 cores in 
total or even more if each of them uses threads.

Best
Harald

Follow-Ups:
- Re: [HTCondor-users] htcondor + gpudirect + openmpi
  - From: Jason Patton

Prev by Date: Re: [HTCondor-users] Drain HTCondor worker by setting instance metadata value
Next by Date: Re: [HTCondor-users] htcondor + gpudirect + openmpi
Previous by thread: Re: [HTCondor-users] BOSCO question
Next by thread: Re: [HTCondor-users] htcondor + gpudirect + openmpi
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[HTCondor-users] htcondor + gpudirect + openmpi