[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] multicore and multinode run



Hi,
With two compute nodes with 2 and 4 cores, I submitted an mpi job with this content:

universe = parallel
executable = /opt/openmpi/bin/mpirun
arguments = mpihello
log = hellompi.log
output = hellompi.out
error = hellompi.err
machine_count = 2
queue


After the submission, I see this in the output file

--------------------------------------------------------------------------
Hello world from processor compute-0-1.local, rank 0 out of 4 processors
Hello world from processor compute-0-1.local, rank 1 out of 4 processors
Hello world from processor compute-0-1.local, rank 2 out of 4 processors
Hello world from processor compute-0-1.local, rank 3 out of 4 processors
Hello world from processor compute-0-1.local, rank 0 out of 4 processors
Hello world from processor compute-0-1.local, rank 1 out of 4 processors
Hello world from processor compute-0-1.local, rank 2 out of 4 processors
Hello world from processor compute-0-1.local, rank 3 out of 4 processors



So, it seems that the scheduler submits the job to compute-0-1 and run it twice due to the machine count. Is that right? Then why?

I also used

machine_count = 2
request_cpus = 1

to allocate two machines and  one cpu on each of them. However, I see

Hello world from processor compute-0-1.local, rank 2 out of 4 processors
Hello world from processor compute-0-1.local, rank 0 out of 4 processors
Hello world from processor compute-0-1.local, rank 3 out of 4 processors
Hello world from processor compute-0-1.local, rank 1 out of 4 processors



Can someone shed a light on that? Note

# condor_status -af:h Machine DedicatedScheduler
Machine           DedicatedScheduler                         
compute-0-0.local DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxx
compute-0-0.local DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxx
compute-0-1.local DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxx
compute-0-1.local DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxx
compute-0-1.local DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxx
compute-0-1.local DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxx




Regards,
Mahmood