[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor priority model



The scheduling logic is a little opaque.

key things to remember is:

if you have a claim to a machine and the machine doesn't want to
preempt it then you can keep on sending jobs till the cows come home.

the action of claiming a machine to a user being disassociated from
the action almost certainly comes because the scheduling logic can be
faster and in most cases it does not significantly impact users (esp
in a preemption-> checkpoint environment)

sadly for a bunch of people like windows users with job not user
ranking requirements this is not the case (see many posts previous).
The condor guys have nicely put in some new 'retirement' functionality
which should effectively work round this (I'm going to test it out as
soon as I get the time).

If your job latency is important remember that condor is geared up for
high throughput not low latency...

On Tue, 14 Sep 2004 21:26:32 +0200, Carsten Aulbert
<carsten@xxxxxxxxxxxxxxxx> wrote:
> Hi,
> 
> I've just got a brief question about the priority system. This afternoon the
> situation on our local cluster the situation looked like this (I shrinked
> the output a bit, the columns are: effective priority, real priority, number
> of jobs currently running, hours used so far)
> 
> condor_userprio -all -allusers
> 
> Y@cluster      21.68  21.68   2  64348.11
> B@cluster      82.24  82.24   0  80052.14
> S@cluster   47254.34  47.25  53  63419.34
> Me@cluster 186124.47 186.12 304 129802.05
> 
> The strange thing which hit me (Me@cluster) here was, that user Y submitted
> a cluster with 500 jobs and only 2 of them were running (although all nodes
> in the cluster should match the job requirements [1]). If I understand the
> priority model correctly, as soon as my jobs finish a job from Y should be
> started until the ratio between Y's jobs and my jobs (plus the other
> scheduled jobs represent the effective priority) - but my jobs kept on being
>  started instead of the jobs from Y queuing up.
> 
> Anyone reading this here, who can point me to a possible solution/trick?
> 
> Carsten
> 
> PS: I don't mind that my jobs are actually running, but I just thought it to
> be unfair ;)
> 
> [1] After I put my cluster on hold, user Y got 310 VMs
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users
>