[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Startd fatal exception: Failed to bind GPUs



Gotcha! We will update and let youÂknow how things go. Thanks again!

On Tue, Sep 15, 2020 at 6:52 PM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:

Ah, there was another bug in the same area fixed in 8.8.10.

Â

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Kenyi Hurtado Anampa
Sent: Tuesday, September 15, 2020 4:31 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Irena Johnson <ijohnso1@xxxxxx>
Subject: Re: [HTCondor-users] Startd fatal exception: Failed to bind GPUs

Â

Hi John,

Â

Thanks for the help! We are using condor 8.8.8 for all condor components, including the workers.

Please, let us know if we can do anything to help debugging the issue.

Â

Best regards,

Kenyi

Â

On Tue, Sep 15, 2020 at 5:23 PM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:

Failed to bind local resource 'GPUs'Â

Â

Is the symptom of a known bug fixed in the 8.8.4 release

https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=7104

Â

If Iâm reading this part of the log correctly, you had a total of Â4 Gpus, all of which were in use

when a new slot was created requesting 3 GPUs, that attempt to bind 3 GPUs failed and the Startd aborted.

the problem is that the Startd should have rejected the request for 3 gpus before getting that far, this

might be another symptom of the bug above, or might be a new bug.

Â

What version of HTCondor are you running? If it is older than 8.8.4 please upgrade and let us know.

Â

-tj

Â

09/14/20 11:44:45 (D_ALWAYS:2) slot1: Match requesting resources: cpus=1 memory=20480 disk=0.1% GPUs=3
09/14/20 11:44:45 (D_ALWAYS:2) Got execute_dir = /var/condor/execute
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) bind_DevIds for slot1.3 before : GPUs:{CUDA0, CUDA1, CUDA2, CUDA3, }{1_1, 1_1, 1_2, 1_2, }

Â

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Kenyi Hurtado Anampa
Sent: Monday, September 14, 2020 12:41 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Irena Johnson <ijohnso1@xxxxxx>
Subject: [HTCondor-users] Startd fatal exception: Failed to bind GPUs

Â

Hello,

Â

We are seeing a lot of errors with the Startd crashing on our gpu compute nodes (at Notre Dame).Â

Do you know what could be causing this?

Â

Logs and details below:

Â

"/opt/condor/RedHat7/sbin/condor_startd" on "qa-rtx6k-023.crc.nd.edu" exited with status 4.
Condor will automatically restart this process in 10 seconds.

*** Last 200 line(s) of file /var/condor/log/StartLog:
09/14/20 11:43:48 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:43:48 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:48 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:43:59 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot0'
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:43:59 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot1'
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:43:59 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:43:59 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:43:59 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:10 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot0'
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:10 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot1'
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:10 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:10 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:10 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:21 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot0'
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:21 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot1'
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:21 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:21 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:21 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:32 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot0'
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:32 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot1'
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:32 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:32 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:32 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:43 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot0'
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:43 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot1'
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:43 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot2'
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) GPUs_MONITOR: 3 lines in Queue
09/14/20 11:44:43 (D_ALWAYS:2) Updating ClassAd for 'GPUs_MONITOR.GPUsSlot3'
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_2 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'kflops' to slot1_1 [InSlotList matches]
09/14/20 11:44:43 (D_ALWAYS:2) Publishing ClassAd 'mips' to slot1_1 [InSlotList matches]
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Schedd addr = <10.32.8.21:9618?addrs=10.32.8.21-9618&noUDP&sock=1606_2930_3>
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Alive interval = 300
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Schedd sending 2 preempting claims.
09/14/20 11:44:45 (D_ALWAYS:2) slot1_1: Canceled ClaimLease timer (27)
09/14/20 11:44:45 (D_ALWAYS) slot1_1: Changing state and activity: Claimed/Busy -> Preempting/Killing
09/14/20 11:44:45 (D_ALWAYS:2) slot1_1[8070.0]: In Starter::kill() with pid 219022, sig 3 (SIGQUIT)
09/14/20 11:44:45 (D_ALWAYS:2) Send_Signal(): Doing kill(219022,3) [SIGQUIT]
09/14/20 11:44:45 (D_ALWAYS:2) slot1_1[8070.0]: in starter:killHard starting kill timer
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) slot1_1: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) slot1_2: Canceled ClaimLease timer (30)
09/14/20 11:44:45 (D_ALWAYS) slot1_2: Changing state and activity: Claimed/Busy -> Preempting/Killing
09/14/20 11:44:45 (D_ALWAYS:2) slot1_2[8071.0]: In Starter::kill() with pid 219023, sig 3 (SIGQUIT)
09/14/20 11:44:45 (D_ALWAYS:2) Send_Signal(): Doing kill(219023,3) [SIGQUIT]
09/14/20 11:44:45 (D_ALWAYS:2) slot1_2[8071.0]: in starter:killHard starting kill timer
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) slot1_2: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Received ClaimId from schedd (<10.32.89.19:9618?addrs=10.32.89.19-9618&noUDP&sock=217785_f951_3>#1600097425#6#...)
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Match requesting resources: cpus=1 memory=20480 disk=0.1% GPUs=3
09/14/20 11:44:45 (D_ALWAYS:2) Got execute_dir = /var/condor/execute
09/14/20 11:44:45 (D_ALWAYS:2) slot1: Total execute space: 451561480
09/14/20 11:44:45 (D_ALWAYS:2) bind_DevIds for slot1.3 before : GPUs:{CUDA0, CUDA1, CUDA2, CUDA3, }{1_1, 1_1, 1_2, 1_2, }
09/14/20 11:44:45 (D_ALWAYS|D_FAILURE) ERROR "Failed to bind local resource 'GPUs'" at line 1272 in file /var/lib/condor/execute/slot10/dir_11497/sources/src/condor_startd.V6/ResAttributes.cpp
09/14/20 11:44:45 (D_ALWAYS:2) CronJobMgr: 1 jobs alive
09/14/20 11:44:45 (D_ALWAYS|D_FAILURE) startd exiting because of fatal exception.
*** End of file StartLog

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/