[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPUs_MONITOR resource usage



I was wondering if I was supposed to reply to that. :)

On the machine in question I believe we are running the nvidia drivers. I see the x-org nouveau drivers installed, but I'm not sure if that's an issue or not.

The used by numbers on the nvidia module are a little eye-opening, but they are high on other nodes which don't have the condor gpu monitor running (although condor is running if that's an issue).

I can provide any other other output you want, but here's what I'm looking at:

root@gpu2:~# lsmod | grep nv
nvidia_uvm      757760 4
nvidia_drm      Â40960 0
nvidia_modeset   Â1114112 1 nvidia_drm
nvidia       14364672 632 nvidia_modeset,nvidia_uvm
drm_kms_helper    172032 2 ast,nvidia_drm
drm         Â401408 5 ast,ttm,nvidia_drm,drm_kms_helper
ipmi_msghandler    53248 4 nvidia,ipmi_ssif,ipmi_devintf,ipmi_si

root@gpu2:~# apt search nvidia | grep installed

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

cuda-drivers/unknown,now 384.81-1 amd64 [installed]
cuda-libraries-9-0/unknown,now 9.0.176-1 amd64 [installed,automatic]
cuda-libraries-dev-9-0/unknown,now 9.0.176-1 amd64 [installed,automatic]
cuda-visual-tools-9-0/unknown,now 9.0.176-1 amd64 [installed,automatic]
libcuda1-384/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-cfg1-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-common-390/bionic,bionic,now 390.87-0ubuntu0~gpu18.04.1 all [installed,automatic]
libnvidia-compute-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-container-tools/bionic,now 1.0.0-1 amd64 [installed,automatic]
libnvidia-container1/bionic,now 1.0.0-1 amd64 [installed,automatic]
libnvidia-decode-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-encode-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-fbc1-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-gl-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libnvidia-ifr1-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
libvdpau1/bionic,now 1.1.1-3ubuntu1 amd64 [installed,automatic]
nvidia-384/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-384-dev/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-compute-utils-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-container-runtime-hook/bionic,now 1.4.0-1 amd64 [installed,automatic]
nvidia-dkms-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-driver-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-headless-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-headless-no-dkms-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-kernel-common-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-kernel-source-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-modprobe/bionic,now 384.111-2 amd64 [installed,automatic]
nvidia-opencl-icd-384/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
nvidia-prime/bionic-updates,bionic-updates,now 0.8.8.2 all [installed,automatic]
nvidia-utils-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]
vdpau-driver-all/bionic,now 1.1.1-3ubuntu1 amd64 [installed,automatic]
xserver-xorg-video-nouveau/bionic,now 1:1.0.15-2 amd64 [installed,automatic]
xserver-xorg-video-nvidia-390/bionic,now 390.87-0ubuntu0~gpu18.04.1 amd64 [installed,automatic]




On Fri, Jan 11, 2019 at 12:21 PM Todd L Miller <tlmiller@xxxxxxxxxxx> wrote:
> *) This was on an totally idle system. On all 9 test machines I have 6 to 9%
> CPU consistently.

    Oops. I meant to ask sander@xxxxxxxxxxxx, sorry. Still
interesting, and still higher than we'd like, though.

> *) 'perf top' shows that it is indeed the nvml library; more in detail: (to
> 5% only)

    Thanks for the info. I'll have to see what can be done about
this.

- ToddM
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/