[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Re: split processors among multiple jobs



On Fri, 18 Mar 2005 18:32:27 -0800  John Wheez wrote:

> ok i see what is happening..by using priority = -$(Process) all the 
> frames in each job ina  cluster are being given an order in which to be 
> computed...

sort of.  it's saying that the *first* job in each cluster is equally
interesting to the schedd.  if given the choice of two idle jobs in
the queue, the schedd (in this case) would rather start the first job
in another cluster, instead of running more jobs in an already-running
cluster.  

> What is really needed is teh ability to assign a priority to the
> cluster..and if two clusters have the same priority then each
> cluster will get some cpus.

that's basically what jaime's solution gives you.

> the method below does split cpus between all clusters that a user 
> submits but it does it in a nonintelligent fashion..for example..if i 
> submit  cluster A & B at teh same time then the cpus will be split..but, 
> if i enter a new cluster C five minutes after A & B..then the cpus will 
> all go to cluster C until it's jobs have reached teh same process number 
> as A or B.

sort of.  it means that until cluster C "catches up" with clusters A
and B, C will get idle resources first until things are evenly split.
if another cluster shows up, it will get preference until it catches
up, too.

> What would be nice is if we could have the option to assign
> priorities to clusters and have condor use that priority to decide
> what percent of resources should go to that cluster. that way even
> if a cluster is submitted 5 minutes later..it will not suck up all
> the resources.

what if A, B, C, and D, all submitted at different times, are supposed
to have an even share of resources (since they're all at the same
cluster priority), and you don't want to preempt jobs to enforce this
job-based priority stuff?  what would you expect the schedd to do?
exactly what jaime's solution gives you: while the relative # of jobs
across the 4 clusters are different, give all the resources you can to
the ones that are under their quota until things are equal again.
this solution appears to give you exactly the behavior you want, and
you don't have to wait for us to hack the schedd code some more and
make another release...

granted, without allowing the "priority" setting in the submit file to
be a real expression, there's no particularly easy way (short of
having a script generate your submit files) to use this basic trick to
get different clusters with different relative priorities.  i.e., if
you want A to have twice as many jobs running as B, you might want A's
jobs to have priorities: 0, -1, -2, ..., while B's should have: 0, -2,
-4, ...  that's easy to do with a script, at least.

good luck,
-derek