[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Selecting or partitioning GPUs



All issues have been resolved. There was some configuration problem on the repository side and Tim quickly resolved it (thank you very much!).


All machines in our cluster are now running version 9.1.0 without a problem. I will be waiting for the next 9.1.x release and will report on the binding of GPUs on different slots.


El 12/7/21 a las 14:37, Tim Theisen escribió:

The get_htcondor script fetches the "current" channel by default. This corresponds to the 9.1.x series. One may also select the "stable" channel. This corresponds to the 9.0.x series.


On 7/9/21 5:44 AM, Javier Barbero wrote:

Thank you for your response


We have successfully updated from version 8.8 in order to be ready when the update is released.


Unfortunately, I did not realize that the Debian repository that the get_htcondor script used for installation contains the 9.0 branch instead of the 9.1 branch. Is there currently any way to obtain the 9.1 branch on Debian 10 (buster)? I have tried compiling from the source on the V_9_1_0 branch, but I have ran into a lot of problems there.


Thank you


Javier Barbero


El 30/6/21 a las 21:06, John M Knoeller escribió:
 Is there a way to specify in the submit file a specific capability or device name while keeping it a single partitionable slot?

We are working on this, but HTCondor can't do this yet.  

The next 9.1 release (coming in a few weeks)  will have a way to bind specific GPUs to specific static slots or p-slots at the time you configure the Startd; And it can subsequently advertise only the properties for the GPUs that are assigned to that slot.   

This will allow jobs to use Requirements expressions to match slots that have the desired GPU properties, but It would require that GPU properties by homogeneous with a slot.

That release it will still not have any way for a job to request GPU properties and have the d-slot creation code pay attention to the job request.   We are actively working on this, but do not have a release target for that feature at this time.  

-tj


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Javier Barbero Gómez <jbarbero@xxxxxx>
Sent: Wednesday, June 30, 2021 5:02 AM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Selecting or partitioning GPUs
 

We have a single machine with 40 CPU cores, 188 GiB of memory and 10 GPUs:

  • 4x GeForce RTX 3090 (Compute Capability 8.6)
  • 5x GeForce RTX 2080 Ti (Compute Capability 7.5)
  • 1x TITAN Xp (Compute Capability 6.1)

Ideally I would like to configure this machine to be a single partitionable slot, where cores, memory and GPUs are allocated as needed into dynamic slots.

Right now, this is the content of /etc/condor/config.d/30-dynamic:

NUM_SLOTS=1
NUM_SLOTS_TYPE_1=1
SLOT_TYPE_1=100%
SLOT_TYPE_1_PARTITIONABLE=true
JOB_DEFAULT_REQUESTMEMORY=4800M

And this is the content of /etc/condor/config.d/60-dynamic:

@use feature : GPUs
GPU_DISCOVERY_EXTRA = -extra -properties

This mostly works, and everything is allocated accordingly, but it is impossible to select based on CUDADeviceName or CUDACapability because they are all heterogeneous and several attributes are assigned to the same slot (CUDA0Capability, CUDA1Capability, CUDA2Capability,...). Is there a way to specify in the submit file a specific capability or device name while keeping it a single partitionable slot?

In case it is not possible I would like to create 3 different partitionable slots, each one including GPUs of the same type. How can I achieve this?

Thank you!

We are using HTCondor 8.8


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/