[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fractional GPU



Proceeding under the assumption that condor does not directly support
fractional GPUs, I am trying what I read here:
https://www-auth.cs.wisc.edu/lists/htcondor-users/2020-December/msg00018.shtml:

>You can get HTCondor to do this just by having the same device show up more than once in the device enumeration.
>For instance, if you have two GPUs and your configuration is
>MACHINE_RESOURCE_GPUS = CUDA0, CUDA1
>You can run two jobs on each GPU by configuring
>MACHINE_RESOURCE_GPUS = CUDA0, CUDA1, CUDA0, CUDA1

I have 1 GPU and this is what I have in my config file:

#use feature:GPUs
#GPU_DISCOVERY_EXTRA = -extra
MACHINE_RESOURCE_GPUs = CUDA0, CUDA0, CUDA0, CUDA0

and this env setting: CUDA_VISIBLE_DEVICES="0"

But when I run multiple jobs requesting a GPU they run serially, not
in parallel.

Has anyone been able to get something like this working?

On Thu, Feb 22, 2024 at 3:53âPM Larry Martell <larry.martell@xxxxxxxxx> wrote:
>
> Does condor support fractional GPUs? I am setting request_GPUs = 0.25
> and it is matching (I can see that with -better-analyze and in the
> StartLog) but the job never runs, it stays in idle state.