[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Getting closer with Parallel Universe on Dynamic slots



On Friday, 25 November, 2011 at 8:55 AM, Steffen Grunewald wrote:
On Fri, Nov 25, 2011 at 01:12:01PM +0100, Lukas Slebodnik wrote:
On Fri, Nov 25, 2011 at 12:14:19PM +0100, Steffen Grunewald wrote:
... but still no cigar.

The setup consists of 5 4-core machines and some more 2-cores machines.
All of them have been configured as single, partitionable slots.
Preemption is forbidden completely.
The rank definitions are as follows:
RANK = 0
NEGOTIATOR_PRE_JOB_RANK = 1000000000 + 1000000000 * (TARGET.JobUniverse =?= 11) * (TotalCpus+TotalSlots) - 1000 * Memory

I'd expect this to favour big machines over small ones (for Parallel jobs),
and partially occupied ones over empty ones.

What I see with the following submit file, is quite different:

universe = parallel
initialdir = /home/steffeng/tests/mpi/
executable = /home/steffeng/tests/mpi/mpitest
arguments = $(Process) $(NODE)
output = out.$(NODE)
error = err.$(NODE)
log = log
notification = Never
== False) || ((ExitBySignal == True) && (ExitSignal != 11))
should_transfer_files = yes
when_to_transfer_output = on_exit
Requirements = ( TotalCpus == 4 )
request_memory = 500
machine_count = 10

(mpitest is the ubiquitous "MPI hello world" program trying to get rank and
size from MPI_COMM_WORLD)

- if I leave the Requirements out, the 10 MPI nodes will end up on the big
5 machines (one per machine) plus 5 small ones
If you did not specify request_cpus, then default value (1) will be used.

Yes.
I cannot specify "request_cpus=4" as this would let my jobs idle if the big nodes
were taken by someone else.
And AFAICT, there's no "request_cpus=all" or "request_cpus=TARGET.TotalCpus".
See the section labeled 'Macros' in the condor_submit manual:

http://research.cs.wisc.edu/condor/manual/v7.6/condor_submit.html#74467

Specifically:


----

In addition to the normal macro, there is also a special kind of macro called a substitution macro that allows the substitution of a ClassAd attribute value defined on the resource machine itself (gotten after a match to the machine has been made) into specific commands within the submit description file. The substitution macro is of the form:

$$(attribute)
A common use of this macro is for the heterogeneous submission of an executable:

executable = povray.$$(opsys).$$(arch)

Values for the opsys and arch attributes are substituted at match time for any given resource. This allows Condor to automatically choose the correct executable for the matched machine.

----

So in your case:

request_cpus = $$(totalcpus)

I'm not saying this is going to work for you, but just that it might be worth trying.

Regards,
- Ian



---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com
http://twitter.com/cyclecomputing