[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Detecting GPU



there sems to be a little something missing somewhere ;)

I had similar problems when we started to use GPUs, the cause was an individual configuration overwriting the feature config.

What does condor_config_val say, it should look somehow similar to this:

[root@batchg003 ~]# condor_config_val -dump | grep -i gpu
ENVIRONMENT_FOR_AssignedGPUs = GPU_DEVICE_ORDINAL=/(CUDA|OCL)//  CUDA_VISIBLE_DEVICES
ENVIRONMENT_VALUE_FOR_UnAssignedGPUs = 10000
MACHINE_RESOURCE_INVENTORY_GPUs = $(LIBEXEC)/condor_gpu_discovery -properties $(GPU_DISCOVERY_EXTRA)
SLOT_TYPE_1 = GPUs=1, CPUs=2
SLOT_WEIGHT = GPUs
START = (NODE_IS_HEALTHY =?= True) && (StartJobs =?= True) && TARGET.RequestGpus && (RequestRuntime <= 12000)
STARTD_CRON_GPUs_MONITOR_EXECUTABLE = $(LIBEXEC)/condor_gpu_utilization
STARTD_CRON_GPUs_MONITOR_METRICS = SUM:GPUs, PEAK:GPUsMemory
STARTD_CRON_GPUs_MONITOR_MODE = WaitForExit
STARTD_CRON_GPUs_MONITOR_PERIOD = 1
STARTD_CRON_JOBLIST = NODEHEALTH GPUs_MONITOR GPUs_MONITOR

Best
Christoph



--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Josef MitlÃhner" <josef.mitlohner@xxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Freitag, 3. April 2020 11:32:48
Betreff: Re: [HTCondor-users] Detecting GPU

Hello everyone,
I made a task where there was only "condor_gpu_discovery -extra" and the output was only "DetectedGPUs = 0". However, when I execute the command manually, it returns:

 C: \> condor_gpu_discovery -extra
DetectedGPUs = "CUDA1"
CUDACapability = 1.2
CUDAClockMhz = 1402.00
CUDAComputeUnits = 2
CUDACoresPerCU = 8
CUDADeviceName = "GeForce 210"
CUDADevicePciBusId = "0000: 05: 00.0"
CUDADeviceUuid = "00000000-0000-0000-0000-000000000000"
CUDADriverVersion = 6.50
CUDAECCEnabled = false
CUDAGlobalMemoryMb = 1024
CUDARuntimeVersion = 10.20

So in the configuration context, condor_gpu_discovery does not have access to any GPU information.

Best regards
Josef


On 2.4.2020 13:34, Josef MitlÃhner wrote:
Hi,

lspci | grep -i nvidia
05:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)

C:\>condor_status -l mitlohner-w764 | grep -i gpu
DetectedGPUs = 0
GPUs = 0
MachineResources = "Cpus Memory Disk Swap GPUs"
TotalGPUs = 0
TotalSlotGPUs = 0

Best regards
Josef

On 2.4.2020 12:45, Beyer, Christoph wrote:
hmm,

what does

lspci | grep -i nvidia

say ?

condor_Status should look somehow like this:

[root@batchg003 ~]# condor_status -l batchg003 | grep -i gpu
AssignedGPUs = "CUDA0"
DetectedGPUs = 1
GPUs = 1
MachineResources = "Cpus Memory Disk Swap GPUs"
SlotWeight = GPUs
Start = (NODE_IS_HEALTHY =?= true) && (StartJobs =?= true) && TARGET.RequestGpus && (RequestRuntime <= 12000)
TotalGPUs = 1
TotalSlotGPUs = 1
[root@batchg003 ~]# condor_status -l batchg003 | grep -i cuda
AssignedGPUs = "CUDA0"
CUDACapability = 6.1
CUDADeviceName = "GeForce GTX 1080 Ti"
CUDADevicePciBusId = "0000:65:00.0"
CUDADeviceUuid = "3f2d719f-7d89-c75c-1a71-94316a2fcd12"
CUDADriverVersion = 10.2
CUDAECCEnabled = false
CUDAGlobalMemoryMb = 11178

Best
Christoph


--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Josef MitlÃhner" <josef.mitlohner@xxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 2. April 2020 12:08:40
Betreff: Re: [HTCondor-users] Detecting GPU

Hi,
thank you for your reply.

The result is the same. The only change is (after installing CUDA pagkage) in the "condor_gpu_disovery -properties" listing:

DetectedGPUs="CUDA0"
CUDACapability=1.2
CUDADeviceName="GeForce 210"
CUDADevicePciBusId="0000:05:00.0"
CUDADeviceUuid="00000000-0000-0000-0000-000000000000"
CUDADriverVersion=6.50
CUDAECCEnabled=false
CUDAGlobalMemoryMb=1024
CUDARuntimeVersion=10.20

Thanks for help,
Best regards
Josef

On 2.4.2020 10:24, Beyer, Christoph wrote:
Hi,

try
@use feature : GPUs
@use feature : GPUsMonitor

The second one is not mandatory of course but you will want it ;)

install the cuda and nvidia-driver pkgs (I think those cone with the cuda pkg though)

cuda.x86_64

Restart the host and check ...

Best
christoph


--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Josef MitlÃhner" <josef.mitlohner@xxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 2. April 2020 10:13:53
Betreff: [HTCondor-users] Detecting GPU

Hello,
when I run the command "condor_gpu_discovery -properties" on my computer it detects one GPU

DetectedGPUs="CUDA0"
can't open SOFTWARE\NVIDIA Corporation\GPU Computing Toolkit\CUDA
CUDACapability=1.2
CUDADeviceName="GeForce 210"
CUDADevicePciBusId="0000:05:00.0"
CUDADeviceUuid="00000000-0000-0000-0000-000000000000"
CUDADriverVersion=6.50
CUDAECCEnabled=false
CUDAGlobalMemoryMb=1024

In condor.config i have a line with "use feature : GPUs"


Why does my HTCondor server say (condor_status -l):
...
DetectedGPUs = 0
...

?
Thank you for reply
Josef


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/