[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPU discovery with CUDA 9.x and NVIDIA Volta cards?



condor_gpu_discovery depends on finding libcudart.so or libnvcuda.so in the library path.

It further depends on these libraries have exported functions. cudaGetDeviceCount or cuDeviceGetCount respectively.

It will call one of these functions, and if it succeeds, will return the number of devices.  If it fails, or the 
function can't be found, it will report 0.

you could try using 

  condor_gpu_discovery -verbose -diag

If the library is loading, but doesn't have the right function, that will report which library failed to load.

By the way. there is a comment in the code that that indicates if the CUDA driver and runtime are mismatched,
calling cudaGetDeviceCount will return an error - which we report as 0 devices.

-tj


-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael Pelletier
Sent: Friday, December 15, 2017 2:03 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] GPU discovery with CUDA 9.x and NVIDIA Volta cards?

I suppose no clue is a clue in and of itself:

[pelletm@prc-microway condor]$ ./condor_gpu_discovery -verbose
DetectedGPUs=0
[pelletm@prc-microway condor]$

It's not carping about the CUDA libs, and I see it open them when doing an strace...

	-Michael Pelletier.


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/