[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Condor-users] Adjusting machine RANK classad expr based ontotalqueue time for a job
- Date: Wed, 27 Oct 2004 17:18:11 -0400
- From: "Ian Chesal" <ICHESAL@xxxxxxxxxx>
- Subject: RE: [Condor-users] Adjusting machine RANK classad expr based ontotalqueue time for a job
Hmm. So I went with the RANK expression:
RANK = ((TARGET.JobStatus =?= 1) * ((CurrentTime -
My plan was to make sure jobs that are queued rank higher the longer
they've been in the queued state. In this case, +1 for every minute
they've been sitting idle.
To test this I submitted some jobs in the held state. Jobs are simple:
go to the machine and sleep for an hour.
I released three of the held jobs. My machine immediately picked up 44.0
from the cluster and started running.
I let the other two released jobs build up some queue time while 44.0
slept on a machine. At one point I did see condor_status show my 44.0 as
being in the "Retiring" state instead of the "Busy" state -- that is
good news. We have a long MaxJobRetirementTime so this is expected.
I let about 8 minutes lapse I then I issued the commmand:
So this reset the EnteredCurrentStatus time on 44.1. I now have 44.0
running, but retiring and the remaining two jobs each have
EnteredCurrentStatus as follows:
By this output I expect 44.2 to have the higher rank. 44.0 is still
running so I removed it with:
I expected the machine to pick up 44.2 as the next job because it's rank
is higher, having been queued for a longer time that 44.1.
Not so. The machine picked up 44.1. I'm the only user in the system so
it's not a matter of EUP. What's up? Why is it 44.2 didn't rank higher?
Can anyone see how I messed up my prediction for next job to run? I'm
stumped. I thought I had it all figured out.
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
> Sent: October 27, 2004 11:34 AM
> To: Condor-Users Mail List
> Subject: [Condor-users] Adjusting machine RANK classad expr
> based on totalqueue time for a job
> I'm toying with adjusting the RANK expression to achieve a
> more FIFO-like consideration when condor runs jobs. The idea
> is to rank jobs on machines based on their time in the queue.
> I wanted to bounce the rank expression and idea off the list.
> The rank expression for machines I'm thinking of using is:
> RANK = ((TARGET.JobStatus =?= 1) * ((CurrentTime -
> This would give a job queued 10 minutes longer than another
> job a higher rank on the machine.
> The other option is:
> RANK = ((CurrentTime - TARGET.QDate)/600)
> But this would track cumulative queue time (so if the job
> queued, ran for a bit, then got sent back to the queue)
> right? Or is Qdate reset every time a job returns to the
> queue, not just the first time it's queued up by condor_submit?
> Comments? Opinions? Much appreciated.
> Condor-users mailing list