[HTCondor-users] Parallel schedd starts two jobs on the same slot.


We have HTCondor v8.9.11 cluster which starts parallel tasks with dynamic slots via dedicated scheduler.

Sometimes schedd crashes when it trying to release claim for already deleted match record. I managed to trace this to DedicatedScheduler::createAllocations function and found that sometimes schedd uses match record from already running job as a slot for a new job. This happened because match record state is changed from M_ACTIVE to M_CLAIMED here: https://github.com/htcondor/htcondor/blob/master/src/condor_schedd.V6/schedd.cpp#L7795 . If I forbid change of M_ACTIVE state schedd does not crash. But it seems to me that I hide a real source of problem instead of fixing it. Can anyone advice where else I can look to trace this bug? 

