Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Scheduling interactive and Batch use of GPUs

Date: Tue, 23 Jan 2018 14:02:56 -0600
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Scheduling interactive and Batch use of GPUs

Hi Chris,

Hope time finds you well!

Your questions below are asking about various knobs and mechanisms inHTCondor, but IMHO the first step is to decide what scheduling policyyou want. I suggest you momentarily forget about HTCondor knobs, andinstead simply tell us in plain english what you want to happen ---without any references to HTCondor. Once you know what policy you want(this can be hard to do, esp if you need lots of agreement within anorganization!), the next step is to "implement" it by configuringvarious HTCondor knobs. Usually the implement step is easier than thefiguring out what you really want to do :).

So, with that said, I understand you want to mix interactive and batchjobs on the same server. Do you want to prioritize interactive jobsover batch jobs, or vice versa? Should interactive jobs be removed ifthey cannot be started within X amount of minutes? Can batch jobs bepreempted (i.e. killed, and then restarted over again later)?

Assuming you don't want interactive jobs starting in the middle of thenight, typically mixing interactive and batch requires making afundamental decision between either 1) allowing the preemption of batchjobs, or 2) reserving some percent of resources exclusively forinteractive use if preemption cannot be tolerated.

For instance, if preemption is not allowed, maybe you want a policy like"1 out of 4 GPU devices will be reserved to only run interactive jobs",or something like "GPU devices will be reserved for interactive jobsbetween 9am and 9pm, and batch jobs will only be allowed to run between9pm and 9am". Or if you can tolerate preemption, you can increaseutilization by having a policy like "all 4 GPU devices prefer to runinteractive jobs, but to maximize utilization, batch jobs will beallowed to start iff there are no interactive jobs and if an idleinteractive job has waited more than 20 minutes to start, thenpreemption of a batch job is allowed".

Every policy has pros and cons, you cannot make everyone happy all thetime (unless you have so many resources available that there is nocontention!), so the trick is trying to understand your users and theirtypical job workload to guide your decisions.


Hope the above helps,
regards,
Todd

On 1/23/2018 6:26 AM, chris.brew@xxxxxxxxxx wrote:

Hi,

Weâve just been given some money to buy a nice shiny GPU test box.

I would like to make the resources available to local users for interactive use (itâs a test box after all) but also for local and grid batch use (we want to test this too).

I know condor can manage the scheduling of the GPUs with the âuse feature : GPUsâ knob, I was wondering about how best to integrate the local interactive users.

My initial thought is to get local users to submit interactive jobs, that should be fine as long as the resources are not too heavily loaded, but if (whwn) the system gets more loaded we may end up with some dead time if the interactive job does not get scheduled until the middle of the night or over the weekend.

Now maybe thatâs the sign to ask for more money to expand the resource but in lieu of that I was looking at either âJob Deferralâ or âComputing on Demandâ.

If a user submitted a deferred job on Friday evening, would the job block the resource over the weekend or would it not attempt to match until itâs deferral time came up? And I assume I can use whether the job is interactive in the startd rank expression to heavily prioritise the interactive jobs.

Or would the âComputing on Demandâ feature work with GPUs? Is it even possible to suspend a GPU job and use the GPU for another job?

Is there another way to achieve this that I havenât thought of?

Many Thanks,
Chris.

--
Dr Chris Brew
Scientific Computing Manager
Particle Physics Department
STFC - Rutherford Appleton Laboratory
Harwell Oxford,
Didcot
OX11 0QX
+44 1235 446326


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685

Follow-Ups:
- Re: [HTCondor-users] Scheduling interactive and Batch use of GPUs
  - From: chris . brew

References:
- [HTCondor-users] Scheduling interactive and Batch use of GPUs
  - From: chris . brew

Prev by Date: Re: [HTCondor-users] multicore and multinode run
Next by Date: Re: [HTCondor-users] Scheduling interactive and Batch use of GPUs
Previous by thread: [HTCondor-users] Scheduling interactive and Batch use of GPUs
Next by thread: Re: [HTCondor-users] Scheduling interactive and Batch use of GPUs
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Scheduling interactive and Batch use of GPUs