[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Getting closer with Parallel Universe on Dynamic slots

On Friday, 25 November, 2011 at 8:55 AM, Steffen Grunewald wrote:
On Fri, Nov 25, 2011 at 01:12:01PM +0100, Lukas Slebodnik wrote:
On Fri, Nov 25, 2011 at 12:14:19PM +0100, Steffen Grunewald wrote:
... but still no cigar.

The setup consists of 5 4-core machines and some more 2-cores machines.
All of them have been configured as single, partitionable slots.
Preemption is forbidden completely.
The rank definitions are as follows:
RANK = 0
NEGOTIATOR_PRE_JOB_RANK = 1000000000 + 1000000000 * (TARGET.JobUniverse =?= 11) * (TotalCpus+TotalSlots) - 1000 * Memory

I'd expect this to favour big machines over small ones (for Parallel jobs),
and partially occupied ones over empty ones.

What I see with the following submit file, is quite different:

universe = parallel
initialdir = /home/steffeng/tests/mpi/
executable = /home/steffeng/tests/mpi/mpitest
arguments = $(Process) $(NODE)
output = out.$(NODE)
error = err.$(NODE)
log = log
notification = Never
== False) || ((ExitBySignal == True) && (ExitSignal != 11))
should_transfer_files = yes
when_to_transfer_output = on_exit
Requirements = ( TotalCpus == 4 )
request_memory = 500
machine_count = 10

(mpitest is the ubiquitous "MPI hello world" program trying to get rank and
size from MPI_COMM_WORLD)

- if I leave the Requirements out, the 10 MPI nodes will end up on the big
5 machines (one per machine) plus 5 small ones
If you did not specify request_cpus, then default value (1) will be used.

I cannot specify "request_cpus=4" as this would let my jobs idle if the big nodes
were taken by someone else.
And AFAICT, there's no "request_cpus=all" or "request_cpus=TARGET.TotalCpus".
See the section labeled 'Macros' in the condor_submit manual:




In addition to the normal macro, there is also a special kind of macro called a substitution macro that allows the substitution of a ClassAd attribute value defined on the resource machine itself (gotten after a match to the machine has been made) into specific commands within the submit description file. The substitution macro is of the form:

A common use of this macro is for the heterogeneous submission of an executable:

executable = povray.$$(opsys).$$(arch)

Values for the opsys and arch attributes are substituted at match time for any given resource. This allows Condor to automatically choose the correct executable for the matched machine.


So in your case:

request_cpus = $$(totalcpus)

I'm not saying this is going to work for you, but just that it might be worth trying.

- Ian

Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools