[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Startd fatal exception: Failed to bind GPUs



Ah, there was another bug in the same area fixed in 8.8.10.

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Kenyi Hurtado Anampa
Sent: Tuesday, September 15, 2020 4:31 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Irena Johnson <ijohnso1@xxxxxx>
Subject: Re: [HTCondor-users] Startd fatal exception: Failed to bind GPUs

 

Hi John,

 

Thanks for the help! We are using condor 8.8.8 for all condor components, including the workers.

Please, let us know if we can do anything to help debugging the issue.

 

Best regards,

Kenyi

 

On Tue, Sep 15, 2020 at 5:23 PM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:

Failed to bind local resource 'GPUs' 

 

Is the symptom of a known bug fixed in the 8.8.4 release

https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=7104

 

If Iâm reading this part of the log correctly, you had a total of  4 Gpus, all of which were in use

when a new slot was created requesting 3 GPUs,  that attempt to bind 3 GPUs failed and the Startd aborted.

the problem is that the Startd should have rejected the request for 3 gpus before getting that far, this

might be another symptom of the bug above, or might be a new bug.

 

What version of HTCondor are you running?  If it is older than 8.8.4 please upgrade and let us know.

 

-tj

 

09/14/20 11:44:45 (D_ALWAYS:2) slot1: Match requesting resources: cpus=1 memory=20480 disk=0.1% GPUs=3
09/14/20 11:44:45 (D_ALWAYS:2) Got execute_dir = /var/condor/execute
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) bind_DevIds for slot1.3 before : GPUs:{CUDA0, CUDA1, CUDA2, CUDA3, }{1_1, 1_1, 1_2, 1_2, }

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Kenyi Hurtado Anampa
Sent: Monday, September 14, 2020 12:41 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Irena Johnson <ijohnso1@xxxxxx>
Subject: [HTCondor-users] Startd fatal exception: Failed to bind GPUs

 

Hello,

 

We are seeing a lot of errors with the Startd crashing on our gpu compute nodes (at Notre Dame). 

Do you know what could be causing this?

 

Logs and details below:

 

"/opt/condor/RedHat7/sbin/condor_startd" on "qa-rtx6k-023.crc.nd.edu" exited with status 4.
Condor will automatically restart this process in 10 seconds.

*** Last 200 line(s) of file /var/condor/log/StartLog:
09/14/20 11:43:48 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:43:48 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:43:59 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot0'
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:43:59 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot1'
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:43:59 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:43:59 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:10 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot0'
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:10 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot1'
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:10 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:10 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:21 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot0'
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:21 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot1'
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:21 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:21 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:32 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot0'
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:32 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot1'
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:32 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:32 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:43 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot0'
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:43 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot1'
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:43 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:43 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Schedd addr = <10.32.8.21:9618?addrs=10.32.8.21-9618&noUDP&sock=1606_2930_3>
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Alive interval = 300
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Schedd sending 2 preempting claims.
09/14/20 11:44:45 (D_ALWAYS:2) slot1_1: Canceled ClaimLease timer (27)
09/14/20 11:44:45 (D_ALWAYS) slot1_1: Changing state and activity: Claimed/Busy -> Preempting/Killing
09/14/20 11:44:45 (D_ALWAYS:2) slot1_1[8070.0]: In Starter::kill() with pid 219022, sig 3 (SIGQUIT)
09/14/20 11:44:45 (D_ALWAYS:2) Send_Signal(): Doing kill(219022,3) [SIGQUIT]
09/14/20 11:44:45 (D_ALWAYS:2) slot1_1[8070.0]: in starter:killHard starting kill timer
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) slot1_1: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) slot1_2: Canceled ClaimLease timer (30)
09/14/20 11:44:45 (D_ALWAYS) slot1_2: Changing state and activity: Claimed/Busy -> Preempting/Killing
09/14/20 11:44:45 (D_ALWAYS:2) slot1_2[8071.0]: In Starter::kill() with pid 219023, sig 3 (SIGQUIT)
09/14/20 11:44:45 (D_ALWAYS:2) Send_Signal(): Doing kill(219023,3) [SIGQUIT]
09/14/20 11:44:45 (D_ALWAYS:2) slot1_2[8071.0]: in starter:killHard starting kill timer
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) slot1_2: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Received ClaimId from schedd (<10.32.89.19:9618?addrs=10.32.89.19-9618&noUDP&sock=217785_f951_3>#1600097425#6#...)
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Match requesting resources: cpus=1 memory=20480 disk=0.1% GPUs=3
09/14/20 11:44:45 (D_ALWAYS:2) Got execute_dir = /var/condor/execute
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) bind_DevIds for slot1.3 before : GPUs:{CUDA0, CUDA1, CUDA2, CUDA3, }{1_1, 1_1, 1_2, 1_2, }
09/14/20 11:44:45 (D_ALWAYS|D_FAILURE) ERROR "Failed to bind local resource 'GPUs'" at line 1272 in file /var/lib/condor/execute/slot10/dir_11497/sources/src/condor_startd.V6/ResAttributes.cpp
09/14/20 11:44:45 (D_ALWAYS:2) CronJobMgr: 1 jobs alive
09/14/20 11:44:45 (D_ALWAYS|D_FAILURE) startd exiting because of fatal exception.
*** End of file StartLog

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/