Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Trying to understand DedicatedScheduler related problems

Date: Fri, 02 Oct 2020 16:58:48 +0200
From: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>
Subject: Re: [HTCondor-users] Trying to understand DedicatedScheduler related problems

Hi again,

we just had another job behave like this.

It was submitted (requesting 32 nodes which were free at that point),
one could watch

condor_status -const 'PartitionableSlot isnt true' -af ClientMachine
RemoteUser Cpus JobId

with a rising number of slots with an undefined JobId until it reached
30. At that point condor_q showed this job as running and within seconds
it re-appeared to be 'idle' and 12 nodes were left from condor_status'
view without a defined JobId.

Looking further through the logs, not much is seen, e.g.Schedlog:

10/02/20 14:43:44 (pid:1398) Starting add_shadow_birthdate(969.0)
10/02/20 14:43:44 (pid:1398) Started shadow for job 969.0 on
slot1@xxxxxxxxxxxxxxxxx
<10.10.82.1:9618?addrs=10.10.82.1-9618&noUDP&sock=2209_c4cc_3> for
DedicatedSchedule
r, (shadow pid = 1864058)
10/02/20 14:43:45 (pid:1398) Received a superuser command
10/02/20 14:43:45 (pid:1398) Number of Active Workers 0
10/02/20 14:43:46 (pid:1398) In DedicatedScheduler::reaper pid 1864058
has status 27648
10/02/20 14:43:46 (pid:1398) Shadow pid 1864058 exited with status 108
10/02/20 14:43:46 (pid:1398) Dedicated job abnormally ended, releasing claim
10/02/20 14:43:46 (pid:1398) Dedicated job abnormally ended, releasing claim
[..]

Thus, still being puzzled about it. Anyone with an idea, where to dig
out more information about what may have gotten wrong?

Cheers
Carsten

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Prev by Date: [HTCondor-users] condor_q and condor_status filter
Next by Date: Re: [HTCondor-users] condor_q and condor_status filter
Previous by thread: Re: [HTCondor-users] condor_q and condor_status filter
Next by thread: [HTCondor-users] oversubscribing gpus
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Trying to understand DedicatedScheduler related problems