[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How to allocate sequential slots for an MPI job



On Jul 1, 2010, at 5:12 PM, David Kotz wrote:

> I don't think this gets you quite what you want, but if you look at the
> docs for parallel scheduling groups here:
> 
> http://www.cs.wisc.edu/condor/manual/v7.4/3_13Setting_Up.html#SECTION0041310000000000000000

Thank you .... a private email response pointed that section out to me - I had overlooked those notes

> 
> you can see how to set up each machine as a parallel scheduling group,
> ParallelSchedulingGroup = "$(HOSTNAME)", then setting
> +WantParallelSchedulingGroups=True in your submit scripts, but by my
> reading of the docs, that would limit your jobs to using only the 8
> cores on a single node,

As experiment has shown. 

> which I'm guessing is not what you want.

You are correct, the user would like their 64 node job to fully pack 8 nodes.

> 
> You might be able to set the ParallelSchedulingGroup, have your master
> process write out its ParallelSchedulingGroup to a file, then script it
> so that your slave processes are submitted with RANK based on that
> ParallelSchedulingGroup.
> 
> I think someone has posted information about a similar problem in the
> past.  They needed to get a group of jobs running together on one
> machine.  I believe they did some slot definition trickery to achieve
> it.  Unfortunately, I don't recall any good keywords to use in searching
> the list archives.

OK, I will poke around.

> 
> Obviously, parallel is not my area of expertise, but maybe someone will
> be inspired by my bad suggestions to give you a better one.

Many thanks for you time and effort,

Steve

> 
> - dave
> 
> 
> On Wed, 2010-06-30 at 11:56 -0400, Steve Lidie wrote:
>> Our cluster of 8-core nodes is configured for dedicated jobs using MPI in the parallel universe.  Slots are assigned more or less randomly, so that if one looks at the "machine" file generated by mp1script it might look like this:
>> 
>> node1
>> node2
>> node6
>> node6
>> node7
>> node4
>> node6
>> node5
>> node7
>> node5
>> node8
>> node6
>> node7
>> node8
>> node5
>> node6
>> 
>> There is now a requirement for the slots to be assigned in sequence, so that all 8 cores are used in one node, etc.  Is this possible?
>> 
>> Thanks,
>> 
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
> 
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/