Re: [HTCondor-users] Removing / disabling GPUs without stopping jobs


We have managed to get some promising results with this by manually setting `AssignedGPUs`, `GPUs`, `TotalGPUs` and `TotalSlotGPUs`.

If we set these to 0, the slot stops accepting any more GPU jobs. Now, the issue is that `condor_update_machine_ad` affects all the slots, so it has some side effects because it also affect the "children" slots: if we change the `GPUs` of the children slots it seems it thinks each of them is using all the cards.

Is there any way to limit it to the parent slot?



On 14/5/20 20:28, Todd L Miller wrote:
Are you saying that changing OFFLINE_MACHINE_RESOURCE_<name> in the config and then running condor_reconfig does not take the GPU offline?

If it does not, I would consider that bug.

 ÂÂÂÂIt this doesn't work, condor_update_machine_ad might be a work-around.

- ToddM
