[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPUs detected but not assigned



That appears to have solved it for the 2 GPU machine, so thank you everyone! The 10 GPU machine is still shutting down peacefully, so it will take a while until I'm able to restart the service. I'll write again if it somehow is not fixed.

For completeness, this is now the output of the command condor_status -const "GPUs=!=UNDEFINED && SlotType==\"Partitionable\"" -af:ht Name AssignedGPUs (machine names are anonimized):

NameÂÂÂÂÂÂÂÂÂÂÂ AssignedGPUsÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Â
slot1@Machine1Â GPU-26455510,GPU-7655ffe8
slot1@Machine2Â GPU-5b862b1f,GPU-19994730,GPU-f51b9bb1,GPU-3f09d5c1,GPU-41fb1ac9,GPU-71245f0d,GPU-53044473,GPU-0ba8f63b,GPU-29af7eea
slot1@Machine3Â GPU-9d54717e,GPU-c667b609,GPU-a9b262af,GPU-8cb5f972,GPU-9898b927,GPU-8675f80c

The mystery now is why the 6 GPU machine, having the exact same configuration (it is actually shared through a NFS mount), didn't have this problem. By the same logic, 6 * 0.95 = 5.7 which would be rounded down to 5, right? Its HTCondor version is 9.9, just like it was for the other two.


    
El 19/9/22 a las 17:16, John M Knoeller via HTCondor-users escribiÃ:
It does matter for GPUs.   If you give a slot 95% of the GPUs and you have 2 gpus, the slot will be allocated 1.9 GPUs.  Since fractional GPUs are not supported at this time, the slot will be assigned 1 GPU after the 1.9 is converted to an integer.

I think what you want is 

SLOT_TYPE_1 = 95%, GPUS=100%


-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Todd L Miller via HTCondor-users
Sent: Monday, September 19, 2022 9:41 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Todd L Miller <tlmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] GPUs detected but not assigned

SLOT_TYPE_1=95%
 	Pretty sure this isn't supposed to be matter for GPUs, but what 
happens if you set this to 100%?

- ToddM
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/