[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs License Management



(inline, brief)

Jason Stowe wrote:
Matthew Farrellee wrote:
(inline)

Jason Stowe wrote:
Stuart,
This limiting is happening at matchmaking time, and lasts the life of
a job. This is why there is no preemption based upon decreasing limits
below currency usage.
Actually, that's not true. There are few technical reasons for why no
preemption happens. The Negotiator already plays a central role in deciding
preemptions in other cases. The real reason is more philosophical as to how
Condor has worked in the past and continues to work in the future.

So obviously for preemption to be considered, we would need a method
for picking which currently constrained job to preempt, which is
problematic given each job could have different limits, etc. etc.

Again, more philosophical than technical reasons.


For the sake of clarity and helping the original question, which was
asking whether a job can "drop a constraint",
it is correct to say in 7.1.3:
The limiting occurs during the match-making process (i.e. by the
negotiator) and lasts the life of a job (i.e. a job isn't allowed to
drop or acquire a constraint while running and no mechanism exists to
reconsider all the limited jobs for preemption), so, for example,
currently there is no preemption based upon decreasing limits.

To make sure it is clear, the above statement breaks into two parts:
1- limiting occurs during the match-making/negotiation process (i.e.
by the negotiator), and
2- the limit lasts the life of the job, i.e. is not reconsidered from
a preemption/updating standpoint while the job runs, so there is no
preemption.

Walking through (1), the NegotiatorLog below captures match-making for
a set of jobs that have limits. It indicates the limiting is occurring
when the negotiator considers the job as part of its matchmaking
cycle, filtering which jobs it will consider further based upon the
limits.

Here the negotiator rejects further consideration of a job (29.1)
because the required concurrency limit isn't met (for lic_a),
10/15 5:28:51     Getting reply from schedd ...
10/15 5:28:51     Got JOB_INFO command; getting classad/eom
10/15 5:28:51     Request 00029.00001:
10/15 5:28:51 Concurrency Limit: lic_a is 3
10/15 5:28:51 Concurrency Limit lic_a is 3, cannot exceed 2
10/15 5:28:51       Rejected 29.1 jstowe@xxxxxxxxxxxxxxxxxx
<128.200.3.94:3261>: concurrency limit reached
10/15 5:28:51     Sending SEND_JOB_INFO/eom

Now, a job's (29.2) limit is acceptable, so the negotiator continues
with the matchmaking process, then notifies the accountant when the
job with a limit matches a slot.
10/15 5:28:51     Getting reply from schedd ...
10/15 5:28:51     Got JOB_INFO command; getting classad/eom
10/15 5:28:51     Request 00029.00002:
10/15 5:28:51 Concurrency Limit: lic_b is 0
10/15 5:28:51 Start of sorting MatchList (len=13)
10/15 5:28:51 Finished sorting MatchList
10/15 5:28:51       Connecting to startd
slot12@xxxxxxxxxxxxxxxxxxxxxxxxxxx at <128.200.3.94:3262>
10/15 5:28:51       Sending PERMISSION, capability, startdAd to schedd
10/15 5:28:51       Matched 29.2 jstowe@xxxxxxxxxxxxxxxxxx
<128.200.3.94:3261> preempting none <128.200.3.94:3262>
slot12@xxxxxxxxxxxxxxxxxxxxxxxxxxx
10/15 5:28:51       Notifying the accountant

This is the limiting occurring during match-making.

For the second half:
2- the limit lasts the life of the job, i.e. is not reconsidered from
a preemption/updating standpoint while the job runs, so there is no
preemption.

Here I think we're saying the same thing because in a later post, you said that:
Right now the limits exist for the lifetime of the job. It is
conceivable that jobs able to modify their ad, via chirp, would be able
to update the limits they use. However, this is currently not part of
the implementation.

and in another post:
Condor will not actively preempt or otherwise stop jobs when a limit is
exceeded, such as if you lower it. When a limit is reached or exceeded,
no new jobs requiring the limit are matched. They will be rejected with

You're mostly accurate, but no details mentioned above precludes a job from dropping a limit during its lifecycle, which was the question. Your (2) is too constraining.


Now, with regard to the feature itself:
This would enable more load-based limits like:
concurrency_limits = LOAD_LIMIT:1.25, APPNAME
and on other jobs:
concurrency_limits = LOAD_LIMIT:0.5

This kind of fractional specification would help increase the number
of applicable use cases for this feature.
You can already do this, just not briefly if you want significance out to
the hundredths place.
So currently if we wanted to measure a load-based limit as 37.14 for a
certain job, in 7.1.3, unless some other way exists, we'd have to
multiply this out to an even integer (3714), and multiply the other
jobs to that same scale, and then specify concurrency_limits =
LOAD_LIMIT_A, LOAD_LIMIT_A, LOAD_LIMIT_A, LOAD_LIMIT_A,
LOAD_LIMIT_A....

Yup, that's what I said.


I'm a big fan of this capability and am hoping we can enable as many
possible use cases when we put this in by enabling an arbitrary number
for the limit by the time this hits 7.2. Limits could apply to more
than just whole number license resources (e.g. bandwidth resources,
storage resources, etc.).

Best,
Jason

Best,


matt