[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Execution job with no latency



On 2/11/2013 2:55 AM, Slimane Amar wrote:
Hi,

We have setup an internal (private) and small (2 machines) pool
and we want to execute jobs with no delay (immediately after a submit).
Actually, we have a latency of a few seconds even when machines are
available.

For that, we have changed some values in the configuration file (as
attached)
but we have always a latency of a few seconds (> 15).


You config is real close. So the latency that happens when you start using UNCLAIMED machines is waiting for the condor_negotiator (matchmaker) to give the machine to your schedd. I would suggest adding the following to your config:

   NEGOTIATOR_CYCLE_DELAY = 2

and doing a condor_reconfig. With the setting you already have plus the above, you should only ever see a latency of a second or two.

Do you have only one user submitting from just one schedd? If it is only one user/one schedd, I think you could get the latency always well below < 1sec by adding the following into your job submit files:

  KeepClaimIdle = 50000000

This tells the schedd that once it has CLAIMED a machine, do not give it back to the negotiator once there are no more jobs... instead, the schedd will "keep" the machine CLAIMED for the user so that if the user submits another job [ within 50000000 seconds :) ] the schedd will launch it right away, since it will not need to wait for the negotiator. With this change, when you do a condor_submit, instead of seeing UNCLAIMED/Idle when all jobs in the queue complete, you should see "CLAIMED/Idle". The following can go at the end of your condor_config file to automatically add the above into every job submitted if editing your job submit files is a pain:

    KeepClaimIdle = 50000000
    SUBMIT_EXPRS = $(SUBMIT_EXPRS) KeepClaimIdle

Hope the above makes sense, I haven't had enough caffeine yet this morning.

Yeah, the default config knob tuning is setup more for a pool with a few hundred machines and lots of submit points. The above is from memory (didn't actually test it myself), but I think it will get what you want.

Let us know how it goes,
Todd