[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Adjusting machine RANK classad expr based ontotalqueue time for a job



Hmm. So I went with the RANK expression:

RANK = ((TARGET.JobStatus =?= 1) * ((CurrentTime -
TARGET.EnteredCurrentStatus)/60))

My plan was to make sure jobs that are queued rank higher the longer
they've been in the queued state. In this case, +1 for every minute
they've been sitting idle.

To test this I submitted some jobs in the held state. Jobs are simple:
go to the machine and sleep for an hour.

I released three of the held jobs. My machine immediately picked up 44.0
from the cluster and started running. 

I let the other two released jobs build up some queue time while 44.0
slept on a machine. At one point I did see condor_status show my 44.0 as
being in the "Retiring" state instead of the "Busy" state -- that is
good news. We have a long  MaxJobRetirementTime so this is expected.

I let about 8 minutes lapse I then I issued the commmand:

condor_hold 44.1
condor_release 44.1

So this reset the EnteredCurrentStatus time on 44.1. I now have 44.0
running, but retiring and the remaining two jobs each have
EnteredCurrentStatus as follows:

44.1 1098910859
44.2 1098910279

By this output I expect 44.2 to have the higher rank. 44.0 is still
running so I removed it with:

condor_rm 44.0

I expected the machine to pick up 44.2 as the next job because it's rank
is higher, having been queued for a longer time that 44.1.

Not so. The machine picked up 44.1. I'm the only user in the system so
it's not a matter of EUP. What's up? Why is it 44.2 didn't rank higher?
Can anyone see how I messed up my prediction for next job to run? I'm
stumped. I thought I had it all figured out.

Thanks!

Ian

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
> Sent: October 27, 2004 11:34 AM
> To: Condor-Users Mail List
> Subject: [Condor-users] Adjusting machine RANK classad expr 
> based on totalqueue time for a job
> 
> I'm toying with adjusting the RANK expression to achieve a 
> more FIFO-like consideration when condor runs jobs. The idea 
> is to rank jobs on machines based on their time in the queue. 
> I wanted to bounce the rank expression and idea off the list. 
> The rank expression for machines I'm thinking of using is:
> 
> RANK = ((TARGET.JobStatus =?= 1) * ((CurrentTime -
> TARGET.EnteredCurrentStatus)/600))
> 
> This would give a job queued 10 minutes longer than another 
> job a higher rank on the machine.
> 
> The other option is:
> 
> RANK = ((CurrentTime - TARGET.QDate)/600)
> 
> But this would track cumulative queue time (so if the job 
> queued, ran for a bit, then got sent back to the queue) 
> right? Or is Qdate reset every time a job returns to the 
> queue, not just the first time it's queued up by condor_submit?
> 
> Comments? Opinions? Much appreciated.
> 
> Ian
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users
>