[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dynamic Slots & Parallel Universe



Hi all:

We have here a very similar problem, only that more general. We not only have parallel jobs that may request a full machine (or several), but the following situation:

* Our cluster is at full usage with 1-cpu jobs (this means that with the dynamic partitioning, each slot has only one cpu). * A user with high priority demands a n-cpu job, with n>1. There is no slot with cpus>1, so no slot is even considered for eviction. Then no slot can be assigned to him, because the scheduler currently doesn't do "multiple slot evictions" in order to free up resources for a higher priority, more resource hungry, job. * Other lower priority, lower requirements jobs keep getting queued, and fill the slots as soon as they become free.

Then, the high priority, high demanding job is starved and will never be able to run as long as the cluster is in a high usage state.

The parallel universe then is only one occurrence of this more general problem.

Are there any plans to address this inherent limitation of the dynamic slot model?

Thanks in advance.

Joan

El 31/08/10 17:04, David J. Herzfeld escribió:
Hi Erik:

Thanks for the response. From the remarks in the ticket, this looks to
be exactly what we want to #3! Is there any estimate on when this will
get incorporated into the stable release?

This is exciting.

David

On 08/31/2010 09:42 AM, Erik Erlandson wrote:
Regarding dynamic slots and parallel universe:  The dedicated scheduler
(used by PU jobs) does not currently handle dynamic slots correctly.   A
patch to correct this has been submitted and is pending review:

https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=986,0


-Erik



On Tue, 2010-08-31 at 08:56 -0500, David J. Herzfeld wrote:
Hi All:

We have currently been working on a 1024 core cluster (8 cores per
machines) using a pretty standard Condor config. Each core shows up as a
single slot, etc.

Users are starting to use multi-process jobs on the cluster - leading to
over scheduling. One way to combat this problem is the "whole machine"
configuration presented on the Wiki at
<https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=WholeMachineSlots>.
However, most of our users don't require the full machine (combinations
of 2, 3, 4, 5.. cores). We could modify this config to supply slots for
1/2 a machine, etc.

So a couple of questions:
1) Does this seem like a job for dynamic slots? or should we modify the
"whole machine" config?

2) If dynamic slots are the way to go, has this shown to be stable in
production environments?

3) Can we combine the dynamic slot allocations with the Parallel
Universe to provide similar-to-PBS allocations. Something like
machine_count = 4
request_cpus = 8

To match 4 machines with 8 CPUs a piece? Similar to
#PBS -l nodes=4:ppn=8

As always - thanks a lot!
David
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
--------------------------------------------------------------------------
Joan Josep Piles Contreras -  Analista de sistemas
I3A - Instituto de Investigación en Ingeniería de Aragón
Tel: 976 76 10 00 (ext. 5454)
http://i3a.unizar.es -- jpiles@xxxxxxxxx
--------------------------------------------------------------------------