[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Guiding machine choices for the parallel universe



Hi,

Is there a way to "guide" condor's choice of nodes to satisfy a
parallel universe job automatically? Basically the cost of
communication between all nodes in one of my pools is not equal
because of network topology. Given this I'm looking for a way to make
the Dedicated Scheduler aware of this and prefer to match nodes that
are close to each other on the network, but not prevent larger
parallel jobs using all the machines.

Clearly users could write a requirements =  or rank = line in their
job submission file, but I don't think it's reasonable or fair to
expect users to be doing this.

I was thinking of doing something along the lines of logically
dividing the nodes into n sub-pools (within which connectivity is good
between all the nodes), and giving each sub-pool a number. This would
then mean that an expression something like:

NEGOTIATOR_PRE_JOB_RANK = (MY.Universe == PARALLEL) *
((free_nodes_in_my_subpool - MY.Machine_count) * my_subpool_id)

Where sub-pool ID's were suitably large would achieve this. Obviously
this isn't syntactically correct just yet!

Actually in practice doing this is slightly harder than I'd hoped.

Firstly is it true to say that (False * x) == 0? and (True * x) == x?

Secondly how would I go about writing an expression that maps machine
names into some (pre-defined) sub-pool Id's? Or am I better off
putting that as a custom attribute in the startds ads?

Thirdly is MachineCount an attribute in parallel universe job
classads? I can't see it listed in
http://www.cs.wisc.edu/condor/manual/v7.0/Appendix_A_ClassAd.html

Fourthly how could free_nodes_in_my_subpool be implemented?

Or generally is there a nicer way to solve this without topological
changes to the network or intervention from each user of the parallel
universe?

Thanks,
Alan