It looks like it was my use of condor_rm that messed up my
predictability. I continued the experiment but this time I made sure the
running 44.1 process finished normally instead of being pre-maturly
terminated by condor_rm.
I had two queued jobs with their EnteredCurrentStatus times:
44.2 1098912677
44.3 1098910808
I expected 44.2 to rank lower than 44.3 by ~31. So 44.3 should be the
next job picked up.
And this was the case. My rank expression worked this time. Excellent.
So here's a question for the condor team: If I was a "sneaky user" I
could write a job that, after processing was complete sent me an email
and then went to sleep for a long, long time. Upon receiving that email,
if I used condor_rm to terminate the job I'd be able to hang on to the
resource it was using and run another job on it. Even if another job,
from another user, had a higher rank because condor_rm seems to prevent
the machine from re-negotiating. This would give me infinite access to a
resource. Can this happen?
Ian
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: October 27, 2004 5:18 PM
To: Condor-Users Mail List
Subject: RE: [Condor-users] Adjusting machine RANK classad
expr based ontotalqueue time for a job
Hmm. So I went with the RANK expression:
RANK = ((TARGET.JobStatus =?= 1) * ((CurrentTime -
TARGET.EnteredCurrentStatus)/60))
My plan was to make sure jobs that are queued rank higher the
longer they've been in the queued state. In this case, +1 for
every minute they've been sitting idle.
To test this I submitted some jobs in the held state. Jobs are simple:
go to the machine and sleep for an hour.
I released three of the held jobs. My machine immediately
picked up 44.0 from the cluster and started running.
I let the other two released jobs build up some queue time
while 44.0 slept on a machine. At one point I did see
condor_status show my 44.0 as being in the "Retiring" state
instead of the "Busy" state -- that is good news. We have a
long MaxJobRetirementTime so this is expected.
I let about 8 minutes lapse I then I issued the commmand:
condor_hold 44.1
condor_release 44.1
So this reset the EnteredCurrentStatus time on 44.1. I now
have 44.0 running, but retiring and the remaining two jobs
each have EnteredCurrentStatus as follows:
44.1 1098910859
44.2 1098910279
By this output I expect 44.2 to have the higher rank. 44.0 is
still running so I removed it with:
condor_rm 44.0
I expected the machine to pick up 44.2 as the next job
because it's rank is higher, having been queued for a longer
time that 44.1.
Not so. The machine picked up 44.1. I'm the only user in the
system so it's not a matter of EUP. What's up? Why is it 44.2
didn't rank higher?
Can anyone see how I messed up my prediction for next job to
run? I'm stumped. I thought I had it all figured out.
Thanks!
Ian
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: October 27, 2004 11:34 AM
To: Condor-Users Mail List
Subject: [Condor-users] Adjusting machine RANK classad expr
based on
totalqueue time for a job
I'm toying with adjusting the RANK expression to achieve a more
FIFO-like consideration when condor runs jobs. The idea is to rank
jobs on machines based on their time in the queue.
I wanted to bounce the rank expression and idea off the list.
The rank expression for machines I'm thinking of using is:
RANK = ((TARGET.JobStatus =?= 1) * ((CurrentTime -
TARGET.EnteredCurrentStatus)/600))
This would give a job queued 10 minutes longer than another job a
higher rank on the machine.
The other option is:
RANK = ((CurrentTime - TARGET.QDate)/600)
But this would track cumulative queue time (so if the job
queued, ran
for a bit, then got sent back to the queue) right? Or is
Qdate reset
every time a job returns to the queue, not just the first time it's
queued up by condor_submit?
Comments? Opinions? Much appreciated.
Ian
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users