[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPUsUsage



Try running

 

c:\condor\bin\condor_gpu_monitor

 

It may print out a message telling you what is wrong.  If all you see is

 

  Hanging to prevent process churn.

 

then neither nvcuda.dll nor cudart.dll is in the PATH.  If that happens, try running

 

c:\condor\bin\condor_gpu_discovery -verbose

 

We would expect that to fail also, and for the same reason.   That would mean that you donât actually have the NVIDIA drivers or runtime installed properly.

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Martin Sajdl
Sent: Saturday, November 28, 2020 1:09 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] GPUsUsage

 

Hi,

 

we would like to monitor GPU load on our machines in the pool during running jobs (or even without a running job). We found that there is machine classad which shows that, so we started to use it, but now it does not work in some machines. We have the same GPU cards there, same drivers, same HTCondor configuration (just "use feature:GPUs").

Could someone tell me what are the conditions when the classad is provided or if there is another one we could use for gpu load monitoring? We are using Windows version of HTCondor - 8.8.10. Unfortunately, there is almost no mention about this classad in the documentation.

 

Thank you in advance!

Masaj