[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Execute last DAGMan job as soon as possible



Hi Todd,

> In other words, I do not understand why you enabled autoregroup, surplus, etc. Just makes things unnecessarily complicated. With the above, group_jobendGrp jobs should get first crack at slots until that group has 1000000 cpu cores (unless you also edited SLOT_WEIGHT).

I was trying the attributes suggested by Michael in an earlier reply. Using groups was a great way to prioritize a type of job regardless of user and machine priority, but the slow matchmaking was an issue that we tried to solve.

> Are there always idle jobs waiting in the queue submitting to group_jobendGrp? If not, what will happen is at times when there are no idle group_jobendGrp jobs idle, your regular non-jobendGrp jobs will claim the slots, and you will need to wait for the claim on those slots to be relinquished (unless you setup preemption of your non-jobendGrp jobs, which has its own drawbacks).

There are always idle jobs in the group. They accumulate for a while (the delay seams to correlate with the number of submitting users) and it takes more than ten minutes to match. And in the meantime lots of ungrouped jobs are matched and started.

Cheers,
Szabolcs