[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] startd bug? Seems to be able to reliably kill startd with GPU preemption on 8.8.7



As a side node, while trying and debugging various preemption
requirements I faced this weird error message:

04/09/20 12:25:48 Classad debug: 52 --> 52
04/09/20 12:25:48 Classad debug: [0.00882ms] ClusterId --> 52
04/09/20 12:25:48 Classad debug: 7 --> 7
04/09/20 12:25:48 Classad debug: [0.00715ms] ProcId --> 7
04/09/20 12:25:48 Classad debug: "51.2" --> 51.2
04/09/20 12:25:48 Classad debug: [0.00715ms] JobId --> 51.2
04/09/20 12:25:48 Classad debug: [0.00119ms] RemoteUserPrio --> 2339.86
04/09/20 12:25:48 Classad debug: SubmitterUserPrio --> 1.56091e+07
04/09/20 12:25:48 Classad debug: [0.00119ms] 0 --> 0
04/09/20 12:25:48 Classad debug: [0.00906ms] TotalSlotGpus --> 0
04/09/20 12:25:48 Classad debug: 1 --> 1
04/09/20 12:25:48 Classad debug: [0.00906ms] RequestGpus --> 1
04/09/20 12:25:48 Classad debug: [0.10991ms] (ClusterId > 0 && ProcId >
0 && JobId isnt "") && (((RemoteUserPrio > SubmitterUserPrio *
1.200000000000000E+00) &&  -(TotalSlotGpus isnt 0) &&  -(RequestGpus is
0)) ||  -(TotalSlotGpus isnt 0) || ((RequestGpus is 0) &&
-(TotalSlotGpus isnt 0))) --> ERROR (attribute
LastNegotiationCycleMatchRateSustained99 not found to be deleted)

True, the first three conditions are just for me staying
sane^W^Wdebugging the expression, but this is not what ought to happen
right?

Weird thing, this throws an error:

04/09/20 13:02:21 Classad debug: [0.17381ms] (((RemoteUserPrio >
SubmitterUserPrio * 1.200000000000000E+00) && ( -(TotalSlotGpus isnt 0))
&& ( -(RequestGpus is 0))) || ( -(TotalSlotGpus isnt 0)) ||
((RequestGpus is 0) && ( -(TotalSlotGpus isnt 0)))) && (ClusterId > 0 &&
ProcId > 0 && JobId isnt "") --> ERROR (attribute
LastNegotiationCycleMatchRateSustained99 not found to be deleted)

I have yet to find which expression is triggering this, any help
appreciated :)

Cheers

Carsten

-- 
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany
Phone: +49 511 762 17185

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature