[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor and GPUs



> Exactly. Pre-defined static slots have to be replaced by
> something that's aware of the whole picture (sees the
> *machine* that hosts the slots which can be dynamically
> reconfigured, or even created on demand).

I'm at the point where this is almost becoming a need, not a want. We're
parallelizing code left right and center to take advantage of multi-core
CPUs and our admins are flipping configurations on an almost daily basis
to load balance parallel v. serial jobs in our pools.

> Of course, with dynamic slot creation, another problem comes along:
> If a machine is already partially taken, how to define a
> ranking among machines to allow for maximum flexibility in
> the future?

Prior to using Condor our home grown sol'n allowed for dynamic
machine/slot allocations *but* we handled the scenario you described by
simplifying things down to only a handful of constraints per job: OS,
number of CPUs required, memory. We always negotiated for the biggest
CPU, biggest memory request jobs in the queue first. Taking the approach
that you fill the jar with rocks, then pebbles, then sand.

I certainly don't envy the Condor Team -- I know Derek has talked about
adaptive machine setups but how it'd work in the face of all those
constraints I can't imagine. Maybe it'd make a good thesis? Who ever
does get this into Condor is my hero though. :)

> IMHO all boils down to dynamic slot definition. Something
> that would no longer happen on the execute node but on the master ...

Interesting. So the startd's would tell the collector what they have in
total. And the negotiator would read this. Assign a job. Subtract what
the job estimates it will use or what it says it wants, and updates the
ad in the collector for the machine. Sort of a "best guess" ad. And then
the startd can correct anything the negotiator got wrong at a later
point in time. Interesting...

Count me among the Condor users who really, really needs dynamic machine
slots. Multi-core machines and parallel software are the future in the
EDA industry.

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.