[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] NVIDIA L40 Identified as an OCL Device





On Wed, Jul 5, 2023 at 10:45 Todd L Miller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
> We recently deployed an a machine with NVIDIA L40s. Condor identifies this
> as an OCL device rather than a CUDA device, see

[...]

> Is there a setting we missed or to force it to be a âCUDAâ device?

    Pretty much all cards will show up as OCL, but only if they don't
show up as a CUDA device. (Unless you've added the -opencl flag, in
which case, stop doing that.)Â My guess would be that there's some other
software problem: a missing driver or CUDA library, or the like. What
does nvidia-smi report? (What does condor_gpu_discovery -diagnostic
show?)

We did see an error with cuInit, but after restarting condor things showed up properly. Is there a way to have condor check these things from time toÂtime?

Benedikt




- ToddM_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Benedikt Riedel
Global Computing Coordinator IceCube Neutrino Observatory
Technical Coordinator IceCube Neutrino Observatory
Computing Manager Wisconsin IceCube Particle Astrophysics Center
University of Wisconsin-Madison