[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Removing / disabling GPUs without stopping jobs



Hi,

We have managed to get some promising results with this by manually setting `AssignedGPUs`, `GPUs`, `TotalGPUs` and `TotalSlotGPUs`.

If we set these to 0, the slot stops accepting any more GPU jobs. Now, the issue is that `condor_update_machine_ad` affects all the slots, so it has some side effects because it also affect the "children" slots: if we change the `GPUs` of the children slots it seems it thinks each of them is using all the cards.

Is there any way to limit it to the parent slot?

Best,

Joan

On 14/5/20 20:28, Todd L Miller wrote:
Are you saying that changing OFFLINE_MACHINE_RESOURCE_<name> in the config and then running condor_reconfig does not take the GPU offline?

If it does not, I would consider that bug.

 ÂÂÂÂIt this doesn't work, condor_update_machine_ad might be a work-around.

- ToddM
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

--
Dr. Joan Josep Piles-Contreras
ZWE Scientific Computing
Max Planck Institute for Intelligent Systems
(p) +49 7071 601 1750

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature