[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fractional GPU



Thanks! Using that I was able to get this working.

On Thu, Feb 22, 2024 at 11:40âPM Vikrant Aggarwal <ervikrant06@xxxxxxxxx> wrote:
>
> Hello,
>
> AFAIK condor doesn't support fractional GPUs.
>
> Probably you will find the following response from John on a similar topic helpful.
>
> https://www-auth.cs.wisc.edu/lists/htcondor-users/2024-February/msg00037.shtml
>
> Thanks & Regards,
> Vikrant Aggarwal
>
>
> On Fri, Feb 23, 2024 at 4:38âAM Larry Martell <larry.martell@xxxxxxxxx> wrote:
>>
>> Proceeding under the assumption that condor does not directly support
>> fractional GPUs, I am trying what I read here:
>> https://www-auth.cs.wisc.edu/lists/htcondor-users/2020-December/msg00018.shtml:
>>
>> >You can get HTCondor to do this just by having the same device show up more than once in the device enumeration.
>> >For instance, if you have two GPUs and your configuration is
>> >MACHINE_RESOURCE_GPUS = CUDA0, CUDA1
>> >You can run two jobs on each GPU by configuring
>> >MACHINE_RESOURCE_GPUS = CUDA0, CUDA1, CUDA0, CUDA1
>>
>> I have 1 GPU and this is what I have in my config file:
>>
>> #use feature:GPUs
>> #GPU_DISCOVERY_EXTRA = -extra
>> MACHINE_RESOURCE_GPUs = CUDA0, CUDA0, CUDA0, CUDA0
>>
>> and this env setting: CUDA_VISIBLE_DEVICES="0"
>>
>> But when I run multiple jobs requesting a GPU they run serially, not
>> in parallel.
>>
>> Has anyone been able to get something like this working?
>>
>> On Thu, Feb 22, 2024 at 3:53âPM Larry Martell <larry.martell@xxxxxxxxx> wrote:
>> >
>> > Does condor support fractional GPUs? I am setting request_GPUs = 0.25
>> > and it is matching (I can see that with -better-analyze and in the
>> > StartLog) but the job never runs, it stays in idle state.