Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Startd crash w/ "Failed to bind local resource 'GPUs'"
- Date: Wed, 15 Nov 2017 06:13:16 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Startd crash w/ "Failed to bind local resource 'GPUs'"
On 11/15/2017 3:57 AM, bert.deknuydt@xxxxxxxxxxxxxxxx wrote:
I've seen this with 8.6.5 and 8.6.6 (Fedora 26 rpms). It's sporadic
(couple of times a week) on machines
with just one GPU but quite frequent (up to several per hour) on
machines with 8 GPUs.
Has anyone see something similar? Any suggestions to figure out what
happens?
On a GPU equipped machine that has lots of problems (i.e. your 8 GPU
machines), what does
condor_config_val -dump GPU
say? In other words, are you doing anything special to configure gpu
management beyond just having
use feature:gpus
in your config?
regards,
Todd