[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Matchmaking priority issue



Hi,

I encounter a weird matchmaking situation sometimes that I don't understand: 
- A single user submits multiple jobs into the queue. 
- The priority of some earlier submitted jobs is raised later, to force them to execute first. The priority is surely higher than any other jobs in the queue. 
- Both machine rank and job priority is higher for these jobs, there are available slots (specified by the concurrency limits) but still, the earlier jobs get executed. For hours. 
- The only way I can force the older jobs to execute is to set the attributes to force a higher machine rank.

Which attribute (or attributes) might cause this behaviour? What can I do to solve it or where to look to debug whats going wrong? 
(Using Condor 8.1.4) 
By the way I suspect that its only happening when the jobs are limited by concurrency limits.

Cheers, 
Szabolcs

ps. The result of better-analyzing a job:

--- 
20858347.000: Run analysis summary. Of 214 machines, 
ÂÂÂÂ 105 are rejected by your job's requirements 
ÂÂÂÂÂ 82 reject your job because of their own requirements 
ÂÂÂÂÂÂ 0 match and are already running your jobs 
ÂÂÂÂÂÂ 7 match but are serving other users 
ÂÂÂÂÂ 20 are available to run your job

The Requirements expression for your job is:

ÂÂÂÂ ( ( HAS_ABCD is true ) && ( OpSys == "LINUX" && Arch == "X86_64" && 
ÂÂÂÂÂÂÂÂ Memory > 1024 ) && ( Name isnt LastRemoteHost ) ) && 
ÂÂÂÂ ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && 
ÂÂÂÂ ( TARGET.HasFileTransfer )

Your job defines the following attributes:

ÂÂÂÂ DiskUsage = 7 
ÂÂÂÂ ImageSize = 7 
ÂÂÂÂ RequestDisk = 7 
ÂÂÂÂ RequestMemory = 1

The Requirements expression for your job reduces to these conditions:

ÂÂÂÂÂÂÂÂÂ Slots 
Step Matched Condition 
-----Â --------Â --------- 
[0]ÂÂÂÂÂÂÂÂ 109Â HAS_ABCD is true 
[1]ÂÂÂÂÂÂÂÂ 214Â OpSys == "LINUX" 
[2]ÂÂÂÂÂÂÂÂ 214Â Arch == "X86_64" 
[4]ÂÂÂÂÂÂÂÂ 214Â Memory > 1024

Suggestions:

ÂÂÂÂ ConditionÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Machines MatchedÂÂÂ Suggestion 
ÂÂÂÂ ---------ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ----------------ÂÂÂ ---------- 
1ÂÂ ( target.HAS_ABCD is true )ÂÂÂÂÂÂ 109 
2ÂÂ ( target.OpSys == "LINUX" && target.Arch == "X86_64" && target.Memory > 1024 ) 214 
3ÂÂ ( target.Name isnt target.LastRemoteHost )214 
4ÂÂ ( TARGET.Disk >= 7 )ÂÂÂÂÂÂÂÂÂÂÂÂÂ 214 
5ÂÂ ( TARGET.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) ) 214 
6ÂÂ ( TARGET.HasFileTransfer )ÂÂÂÂÂÂÂ 214

The following attributes are missing from the job ClassAd:

CheckpointPlatform

---