[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPUsUsage



Oh.  I meant condor_gpu_utilization

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Martin Sajdl
Sent: Monday, November 30, 2020 1:17 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] GPUsUsage

 

Unfortunately, the mentioned utility (condor_gpu_monitor) is not a part of my installation. There are just the following two in the bin directory:

condor_gpu_discovery.exe

condor_gpu_utilization.exe

 

The output of condor_gpu_discovery -verbose is:

DetectedGPUs="CUDA0"

 

With -extra parameter, it is:

DetectedGPUs="CUDA0"

CUDACapability=7.5

CUDAClockMhz=1845.00

CUDAComputeUnits=48

CUDADeviceName="GeForce RTX 2080 SUPER"

CUDADevicePciBusId="0000:05:00.0"

CUDADeviceUuid="132fe854-4afe-4e24-82ae-4eb1ef2dd963"

CUDADriverVersion=10.10

CUDAECCEnabled=false

CUDAGlobalMemoryMb=8192

CUDARuntimeVersion=10.10

 

Any hint?

 

Masaj

 


---------- PÅvodnà e-mail ----------
Od: John M Knoeller <johnkn@xxxxxxxxxxx>
Komu: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Datum: 30. 11. 2020 18:03:04
PÅedmÄt: Re: [HTCondor-users] GPUsUsage

Try running

 

c:\condor\bin\condor_gpu_monitor

 

It may print out a message telling you what is wrong.  If all you see is

 

  Hanging to prevent process churn.

 

then neither nvcuda.dll nor cudart.dll is in the PATH.  If that happens, try running

 

c:\condor\bin\condor_gpu_discovery -verbose

 

We would expect that to fail also, and for the same reason.   That would mean that you donât actually have the NVIDIA drivers or runtime installed properly.

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Martin Sajdl
Sent: Saturday, November 28, 2020 1:09 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] GPUsUsage

 

Hi,

 

we would like to monitor GPU load on our machines in the pool during running jobs (or even without a running job). We found that there is machine classad which shows that, so we started to use it, but now it does not work in some machines. We have the same GPU cards there, same drivers, same HTCondor configuration (just "use feature:GPUs").

Could someone tell me what are the conditions when the classad is provided or if there is another one we could use for gpu load monitoring? We are using Windows version of HTCondor - 8.8.10. Unfortunately, there is almost no mention about this classad in the documentation.

 

Thank you in advance!

Masaj

 

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/