Re: [HTCondor-users] GPU discovery with CUDA 9.x and NVIDIA Volta cards?

condor_gpu_discovery depends on finding libcudart.so or libnvcuda.so in the library path.

It further depends on these libraries have exported functions. cudaGetDeviceCount or cuDeviceGetCount respectively.

It will call one of these functions, and if it succeeds, will return the number of devices.  If it fails, or the 
function can't be found, it will report 0.

you could try using 

  condor_gpu_discovery -verbose -diag

If the library is loading, but doesn't have the right function, that will report which library failed to load.

By the way. there is a comment in the code that that indicates if the CUDA driver and runtime are mismatched,
calling cudaGetDeviceCount will return an error - which we report as 0 devices.


I suppose no clue is a clue in and of itself:

[pelletm@prc-microway condor]$ ./condor_gpu_discovery -verbose
[pelletm@prc-microway condor]$

It's not carping about the CUDA libs, and I see it open them when doing an strace...

	-Michael Pelletier.

