[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Whole node affinity spanning multiple nodes



Hi All,
     I am looking for suggestions on the best way to schedule MPI based jobs that want to spawn a large number of threads, but be scheduled on the least number of multi-core nodes in hopes of keeping inter-nodal, interprocess communication to a minimum.

For example:
A user would like to request 64 MPI threads. Using no selection criteria in Condor, it allocates 64 "slots" which correspond to cores across the architecture specified. We have a 40 node cluster with dual quad core cpus ( 8 cores total per node) and 1 GbE interconnects . Here there are 320 "slots" spanning 40 nodes. The job in theory can get sent to 64 slots spanning all 40 nodes. The nature of  the algorithm requires more inter-process communication as his algorithm runs. The user would like  to have Condor use the minimum number of nodes necessary to accommodate his 64 threads. (In the case of the cluster node described above,   that has 8 slots per node, he would want to span only 8 nodes.)

 I have found that in the Condor submit file if we use a construct such as:

requirements = (Subnet == "192.168.45")
+RequiresWholeMachine = True

we can have the MPI job take up, reserve, and use one whole machine (read node)  (which  is good for up to how many cores are on a single node - our largest cluster nodes have 8 cores per node), but we have not come across the way to construct a job submission that will span multiple whole machines. (The behavior of the MPI based job if it needs more threads than the number of cores, using the above Condor submit file construct, Condor spawns  all the job thread requests on that one node, and not migrate to other nodes - raising machine load on that one node). Please let me know if I am using the wrong require parameter and there are others I should be looking at, or whether this is not really possible. I believe this should be possible as it is a feature of other batch submission subsystems and I am just not looking at the right Condor submit constructs.

Thank you for any help you can provide.
Cheers,
--Brandon