[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Trouble with job priority and job retirement




The rounding off of resource share is known to cause a resource to go unused under certain circumstances. I don't understand how this could happen with only one submitter, however. Is she also submitting nice-user jobs?


--Dan

Ian Chesal wrote:

I really think this has to do with the fact that my one user had
received 0 resources from the system during the negoiation cycle. Even
though there were no other users vying for resources here effective user
priority was high and netted here 0 resources so the negotiator ignored
her new job that had higher priority than her old jobs. Does this seem
plausible?

- Ian



-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: December 14, 2004 12:49 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Trouble with job priority and job retirement



I cannot reproduce any problems with a match record not getting deleted when a claim timeout happens. If you are still having a problem, please send the relevant StartLog, NegotiatorLog, and SchedLog to condor-admin and I'll try to see what is going on.


--Dan

Dan Bradley wrote:



Ian,

In a case such as the one you describe, where job 2.0

preempts job 1.0

and has to wait around for 1.0 to finish, there are two possible cases. One is that 1.0 finishes and 2.0 claims the machine. The other is that the schedd times out waiting for 2.0 to get an active claim (controlled by REQUEST_CLAIM_TIMEOUT), and it tries getting a new match for 2.0. From your description of what is

happening, I am

concerned that when the timeout happens, the previous match is not getting correctly removed. I will double-check this case

and get back

to you. If you set REQUEST_CLAIM_TIMEOUT to a very large

number, you

should be able to remove this case from even being a possibility.

You also asked about the meaning of, "Over submitter resource limit
(0) ... only consider startd ranks." This means that when Condor sliced up the resource pie between job submittors, this user got a slice of size 0.


--Dan

Ian Chesal wrote:



I'm trying to get a better handle on job retirement. I'm

observing a

strange situation in our current 6.7.2 system which uses the retirement feature. We have a fairly long retirement time set (2 days). I have a user that has 100 jobs queued as cluster

1. 2 of the

jobs are running on the available resources. She queues up a 101th job at a higher priority than the previously 100 queued

jobs as cluster 2.


The negotiator log at time t indicates that is has matched her 2.0 job and is preempting job 1.0 running on machine-A. At negotiation cycle t+1 later job 1.1 finishes running on machine-B. Rather than assign the high priority job, 2.0, to the now free machine-B at negotiation cycle t+2 I'm seeing a lower priority job,

1.11, get assigned to the machine.


My question is this: once a job is moved to retirement on

behalf of a

queued, higher priority job, is that waiting job bound to

be assigned

to that particular machine? Can it not use the next available resource? I get the feeling that the job is exempted from future negotiation cycles because once I see a message saying job 1.0 is being preempted for job 2.0 I don't see any more

negotiator messages

for job 2.0 in subsequent negotiation cycles. Is there a point in time when the 2.0 job will give up waiting for the 1.0 job

to retire and be renegotiated?


I am also seeing this very odd message in my NegotiatorLog

printed at

the start of her portion of the negotiation cycle:

12/13 16:00:02 Over submitter resource limit (0) ...

only consider


startd ranks

This is printed for the user "bchan" who is experiencing the inability to get her higher priority job running before

her lower priority jobs.


What does this message mean? I couldn't find an answer

searching the

archives unfortunatly, although I did notice this question

has been

asked a few times.

Myself and another user tested that priority works, and for us it wasn't a problem. But in the NegotiatorLog file there were

no "Over submitter"


messages for our sections of the negotiation cycle. I suspect her problems relate to this message.

Thanks!

- Ian Chesal




-- Ian R. Chesal <ichesal@xxxxxxxxxx> Senior Software Engineer

Altera Corporation
Toronto Technology Center
Tel: (416) 926-8300


_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx http://lists.cs.wisc.edu/mailman/listinfo/condor-users




_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users


_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users




_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users