[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Submitting Rank strategy



On 5/25/07, Finch, Ralph <rfinch@xxxxxxxxxxxx> wrote:
condor -version
$CondorVersion: 6.8.1 Sep 18 2006  $
$CondorPlatform: INTEL-WINNT50 $

We have a pool of about 15 dual-cpu Windows machines in our Condor pool.
The machines are comparable in speed.  Some machines are used
during the day, some sit idle all the time.

I would like to set up a Rank to prefer:

1. Both CPUs on machines that sit idle.
2. Then use 1 CPU on machines that are used interactively.
3. Then use the 2nd CPU on machines that are used interactively.

I tried this, using 3 days inactive as a breakpoint, but it doesn't
work.
It does prefer both cpus of inactive machines, but when it starts using
active machines it uses both CPUs.

Can someone advise how to achieve the Rank strategy outlined above?

# prefer unused machines but avoid using both CPUs of active machines
# this doesn't quite work
Rank = KeyboardIdle - ( (State=="Claimed")*3*24*60 + \
       ((vm1_Activity == "Busy") + (vm2_Activity == "Busy")) * \
        3*24*60 )

There are limitations to condor's handling of SMP machines during
negotiation that cause problems when the machine enters a negotiation
cycle with multiple VM's free.
The negotiator/collector's view during the negotiation is not updated
to include any change in state due to any job assignment during that
cycle.

I would very much like this to change but appreciate it would be a
complex change.

I suspect there are some serious architectural changes required under
the hood to handle SMP machines more efficiently but suspect we may be
waiting till a 6.11 or higher for any such changes :)

As a suggestion for a rough hack for a 6.9 change forcing only one
node per machine to negotiate at a time would work around several of
these issues (at the cost of others if your pool is non homogeneous
and/or you do not tend to prefer breadth rather than depth first
filling). It is possible that you could achieve something similar by
making use of the VirtualMachineId in the Start expression so that vm2
on became available while the machine was in use once the condor load
went above some threshold (indicating that it was  in use). This would
delay the start of jobs on the second node, thus forcing the
negotiation to split things properly over two passes.

I suggest this so you *can* do it if you really want to, I in no way
recommend it...

Matt