[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs scheduling with flock_to

Hi Vikrant:

A couple of questions first -- what version of HTCondor are you running? And do your jobs eventually make it into all the pools, or do they never make it into the latter pools.

By default, flocking tries the pools in order, and assumes that the order of the flock list is important, and it shouldn't try to sent the jobs to the nth pool in the flock list until it has exhausted all the possibilities in the previous pools.

If you don't want this ordered behavior, you can set FLOCK_INCREMENT to the number of flocked pools in your FLOCK_TO list, which tells HTCondor to try all those pools in parallel.

Also, we just fixed a bug in HTCondor 8.9.7 (https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=7549) where we weren't as aggressive about flocking as we should have been. This bug was triggered when USE_RESOURCE_REQUEST_COUNTS was true, which became the default early in the 8.9 series.


On 4/18/20 1:50 AM, ervikrant06@xxxxxxxxx wrote:
Hello Experts,

Any inputs on this issue is highly appreciated.

Thanks & Regards,
Vikrant Aggarwal

On Tue, Apr 14, 2020 at 1:16 PM Vikrant Aggarwal <ervikrant06@xxxxxxxxx> wrote:
Hello Experts,Â

Looking for some assistance in troubleshooting sched issue with flock configuration.Â

Issue Description:

Jobs remainÂin queueÂin idle status for a long time when ample amount of slots are available (in pools configured in flocking list of scheduler) to accomodateÂjobs. However jobs with requirement of primary cluster are getting scheduled without any issue.Â

Jobs have requirement of match for pool C,E

FLOCK_TO = A, B, C, D, E, F


- better-analyze never helps in this case to identify the issue.Â

condor_q <jobid> -better-analyze -p <poolname>Â

- Negotiation logs of C,E pool not showing the jobs to be consider for match making that means sched is not presenting the jobs for matchmaking.Â

04/14/20 02:54:43 Â Negotiating with group_user1.testuser1@xxxxxxxxxxx at <xx.xx.xx.xx:9618?addrs=xx.xx.xx.xx-9618&noUDP&sock=6712_c42a_3>
04/14/20 02:54:43 0 seconds so far for this submitter
04/14/20 02:54:43 0 seconds so far for this schedd
04/14/20 02:54:43 Â Â Got NO_MORE_JOBS; Âschedd has no more requests

- As per understanding it shouldn't have even consider other pools for match making but when I check the scheduler debug logs, it's considering D,F pool more times than C,E.. Why is't so?

- To fix the issue I have to change the flocking order as A,B were cloud pools but again they were not mentioned in requirement of job.Â

 FLOCK_TO = C, D, E, F, A, B

Job match making became very quick after doing this change.Â

grep 'Finished negotiating for group_user1.testuser1@xxxxxxxxxxx Âin pool' /var/log/condor/SchedLog | grep '04/14/20 03:'| awk '{print $(NF-4)}'| sor
t | uniq -c

For how long job needs to wait on one pool in flock list before moving to next one and if we are passing requirements in job shouldn't it only consider the pools mentioned in requirement? Also is thr any timeout value after which jobs give up for matchmaking?

Thanks & Regards,
Vikrant Aggarwal

HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: