[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dynamic Slots & Parallel Universe



Hi David,

I do not know how it will be prioritized relative to all the other
development in the queue.  It's a relatively significant change to the
dedicated scheduler, so I know the UW team expects to do a thorough
review and testing before approving it for inclusion.

There are some other users who are interested in having this enhancement
and so I will make sure it doesn't fall off the radar.

-Erik


On Tue, 2010-08-31 at 10:04 -0500, David J. Herzfeld wrote:
> Hi Erik:
> 
> Thanks for the response. From the remarks in the ticket, this looks to
> be exactly what we want to #3! Is there any estimate on when this will
> get incorporated into the stable release?
> 
> This is exciting.
> 
> David
> 
> On 08/31/2010 09:42 AM, Erik Erlandson wrote:
> > Regarding dynamic slots and parallel universe:  The dedicated scheduler
> > (used by PU jobs) does not currently handle dynamic slots correctly.   A
> > patch to correct this has been submitted and is pending review:
> > 
> > https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=986,0
> > 
> > 
> > -Erik
> > 
> > 
> > 
> > On Tue, 2010-08-31 at 08:56 -0500, David J. Herzfeld wrote:
> >> Hi All:
> >>
> >> We have currently been working on a 1024 core cluster (8 cores per
> >> machines) using a pretty standard Condor config. Each core shows up as a
> >> single slot, etc.
> >>
> >> Users are starting to use multi-process jobs on the cluster - leading to
> >> over scheduling. One way to combat this problem is the "whole machine"
> >> configuration presented on the Wiki at
> >> <https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=WholeMachineSlots>.
> >> However, most of our users don't require the full machine (combinations
> >> of 2, 3, 4, 5.. cores). We could modify this config to supply slots for
> >> 1/2 a machine, etc.
> >>
> >> So a couple of questions:
> >> 1) Does this seem like a job for dynamic slots? or should we modify the
> >> "whole machine" config?
> >>
> >> 2) If dynamic slots are the way to go, has this shown to be stable in
> >> production environments?
> >>
> >> 3) Can we combine the dynamic slot allocations with the Parallel
> >> Universe to provide similar-to-PBS allocations. Something like
> >> machine_count = 4
> >> request_cpus = 8
> >>
> >> To match 4 machines with 8 CPUs a piece? Similar to
> >> #PBS -l nodes=4:ppn=8
> >>
> >> As always - thanks a lot!
> >> David
> >> _______________________________________________
> >> Condor-users mailing list
> >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>
> >> The archives can be found at:
> >> https://lists.cs.wisc.edu/archive/condor-users/
> > 
> > 
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/