[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Preventing HTCondor assignment of a given GPU based on GPU state policy?



Thanks for that idea! I think that ought to work well enough until we can get the other GPU jobs pulled into HTCondor along with TensorFlow. It’s a surplus machine for a side project some folks are working on, so they’re stuck sharing it.

 

                -Michael Pelletier.

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of John M Knoeller
Sent: Tuesday, May 16, 2017 5:46 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Preventing HTCondor assignment of a given GPU based on GPU state policy?

 

As for The negotiation, the only thing that matters there is the numeric value of GPUs.

Changing the AssignedGPUs value that the negotiator sees won’t have any effect.

 

I did a little digging, and it appears that a restart is required to set a GPU even for partitionable slots.   I guess I remembered that wrong.

 

I can’t think if any way to do what you want dynamically other than something really hacky –

 

Use startd cron to set the attribute for in-use gpus into all slots, and then add something into the START _expression_ that makes  it go to false when the AssignedGPUs value is the same as your offline GPUs.   You want this to still be TRUE in the partitionable slot when there is at lease one free GPU, but go to false when the dynamic slot ends up being assigned an in-use GPU.

 

You will get some false matches that get rejected at the last minute by the STARTD that way. But that’s the best I can think of.

 

-tj