[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor 8.6.3 - Jobs evicted even if other slots are free



That setting for NEGOTIATOR_PRE_JOB_RANK should make the negotiator prefer unused machines.

Are your execute machines configured to have static slots or partitionable slots?
Is ALLOW_PSLOT_PREEMPTION set in your configuration?

Have you tried running condor_q -analyze to ensure that the jobs match all of the idle machines?

The first error message suggests that COLLECTOR_HOST isnât set on the central manager.

For the second error message, it sounds like a client timed out trying to connect to the negotiator while the negotiator was busy. The two things that would talk to the negotiator are condor_userprio and condor_schedd.

 - Jaime

On Apr 3, 2019, at 2:07 AM, Giuseppe Di Biase <giuseppe.dibiase@xxxxxxxxx> wrote:

Hi Jaime,

on Negotiator machines:

[root@condorcl1 condor]# condor_config_val NEGOTIATOR_PRE_JOB_RANK
RemoteOwner =?= UNDEFINED


We did't touch it and jobs don't use all free slots.

I can add that on NegotiatorLog we have a lot of:

"Failing attempt to update or invalidate collector ad because of missing daemon address (probably an unresolved hostname; daemon name is '""')."

and

"DaemonCore: Can't receive command request from 90.147.139.40 (perhaps a timeout?)"

where 90.147.139.40 is Negotiator IP Addr.

and we are not able to fix.


How we can force to use free slots then?

Thanks again

Giuseppe


On 4/2/19 11:48 PM, Jaime Frey wrote:
Because of your machine RANK _expression_, DetChar jobs will always preempt NoEMiTest jobs on any machine. So the key to preventing the undesirable preemptions is to ensure that when the negotiator sorts all of the machines that match a DetChar job, the idle machines appear at the top of the list. With the default settings and the local config changes listed, that should be happening. 

The configuration parameter NEGOTIATOR_PRE_JOB_RANK is how you enforce a particular ordering of machines for each job being matched. If all of your machines have the same RANK _expression_, then the default value for NEGOTIATOR_PRE_JOB_RANK should make the negotiator to prefer matching idle machines. Have you changed this setting?