[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Getting closer with Parallel Universe on Dynamic slots



On Fri, Nov 25, 2011 at 09:12:49AM -0500, Ian Chesal wrote:
> > > > RANK = 0
> > > > NEGOTIATOR_PRE_JOB_RANK = 1000000000 + 1000000000 * (TARGET.JobUniverse =?= 11) * (TotalCpus+TotalSlots) - 1000 * Memory
> > > > 
> > > > universe = parallel
> > > > initialdir = /home/steffeng/tests/mpi/
> > > > executable = /home/steffeng/tests/mpi/mpitest
> > > > arguments = $(Process) $(NODE)
> > > > output = out.$(NODE)
> > > > error = err.$(NODE)
> > > > log = log
> > > > notification = Never
> > > > on_exit_remove = (ExitBySignal == False) || ((ExitBySignal == True) && (ExitSignal != 11))
> > > > should_transfer_files = yes
> > > > when_to_transfer_output = on_exit
> > > > Requirements = ( TotalCpus == 4 )
> > > > request_memory = 500
> > > > machine_count = 10
> 
> See the section labeled 'Macros' in the condor_submit manual:
> 
> http://research.cs.wisc.edu/condor/manual/v7.6/condor_submit.html#74467
> 
> Specifically:
> request_cpus = $$(totalcpus)
> 
> I'm not saying this is going to work for you, but just that it might be worth trying.

Thanks Ian, for pointing me to that.

It turns out that request_cpus=n, independent of n, will result in one slot 
claimed per machine, as I could prove with "request_cpus=4" and "machine_count=4" 
which claimed a single slot on four machines, same as would "request_cpus=1" or 
"request_cpus=2" would have done.

"machine_count" obviously gets translated into the number of individual MPI jobs (nodes),
and "request_cpus" would define the number of CPU cores assigned to each of them.
It's my problem if the nodes don't know about multi-core on their own.

Apparently, dynamic slot provisioning doesn't work well with parallel universe yet.

As soon as I return to old-style slot splitting (NUM_SLOTS=4, cpu=1, memory=25%, etc.)
I get the "proximity" I'm looking for - of machine_count=10, the first 4 nodes get 
sent to one node, 4 to the next one, 2 to another.

So I either do hard partitioning, and get proper MPI behaviour, or dynamic partitioning,
and am able to run memory-hungry jobs.
Unfortunately, the users have been asking for both (and the mix is unpredictable).

To add to the inconvenience, for each reconfig Condor has to be stopped completely 
on the machines affected.

Are there plans to make Condor more flexible? 
Using up as many dynamic slots as possible on the same machine would help a lot.
In the manual, and everywhere else I looked, "dynamic slots" and "parallel universe"
seem to be disjoint concepts...

BTW:
*If* there was proper co-existence of dynamic slots and parallel universe, one would
have to look for a N_P_J_R expression that yields best results for the parallel job
while harming as little other jobs as possible - perhaps such a thing doesn't even
exist if preemption is allowed?
Without preemption things should be easier:
- Rank by number of unclaimed CPUs?
  How to do that? have another machine ClassAd attribute UnclaimedCpus?
  I vaguely remember someone had come up with a huge ifThenElse construction to
  sum up the resources "bound" by claimed dynamic slots, but there should be a
  solution that still works for 64 cores...)

S