[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job stays in queue for approx 20m before match making



Hello Experts,Â

Sorry in the last email I mentioned 20m but actually it's approx 10m.


On Fri, Oct 6, 2023 at 4:51âPM Vikrant Aggarwal <ervikrant06@xxxxxxxxx> wrote:
Hello Experts,

We are seeing issues with the 9.0.17 submitter box (all-in-one) with multiple pools in the flocking list. Flocking pools are running with the 8.8.5 version.Â

Job submitted but it wasn't even considered for matchmaking by the negotiator.Â

Logs from submit node. I don't see any attempt in Negotiator logs during this time to match the job.Â

10/06/23 10:18:30 (pid:1811906) job_transforms for 1129266.0: 5 considered, 5 applied
===== Lot of logs =====
10/06/23 10:29:50 (pid:1811906) Starting add_shadow_birthdate(1129266.0)

I do see messages about "rebuilt prioritized runnable list"Â
# awk '/10\/06\/23 10:18:30/,/10\/06\/23 10:29:51/ {print $0}' /var/log/condor/SchedLog | grep 'Rebuilt prioritized runnable job list in' | head
10/06/23 10:18:34 (pid:1811906) Rebuilt prioritized runnable job list in 0.014s.
10/06/23 10:18:52 (pid:1811906) Rebuilt prioritized runnable job list in 0.004s.
This bug [1] is already fixed in the version we are using on submitter, and afaiu it's only related to submitter not master or worker nodes, anything else which can cause this issue? 

[1] https://opensciencegrid.atlassian.net/browse/HTCONDOR-769

Thanks & Regards,
Vikrant Aggarwal