[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs being preempted with default rank settings



Thanks Todd & Steven for the tips.

I'm definitely going to mess with the preemption setting, but I'm still curious about the greedy over-matching going on.

condor_status -available shows 73 available unclaimed nodes, and yet a 44 job submission will match against occupied nodes.

buster:~/work/desolv_data_old_p_i/benchmark dynerman$ condor_config_val NEGOTIATOR_PRE_JOB_RANK
RemoteOwner =?= UNDEFINED

I take it that this is the default - prefer nodes where the RemoteOwner doesn't exist?

How "fuzzy" is this matching? I've been running a lot of jobs so my priority is really low. Is it possible that even with the correct NEGOTIATOR_PRE_JOB_RANK it's still matching to nodes I'm using because my priority is so low?

Todd Tannenbaum wrote:
David Dynerman wrote:
To update some info on my own question.

I think what's going on is the standard dynamic user priority. I had a number of jobs running, so the new user evicted some of my jobs to run hers.

While I could see this being a cool feature,

Yep...

it kind of sucks for us since we're losing 6 hours of computation when this happens.


You can disable it per node, per job, or pool-wide. Perhaps you want to put into your central manager config file
   PREEMPTION_REQUIREMENTS = False
since it sounds like you never want jobs preempted, and you already don't use machine RANK.

The thing that seems broken is that there should be enough resources for everything to get matched. We have 128 unclaimed slots. My jobs take up 23, and the next users jobs take up 44 slots.

However, instead of co-existing peacefully, her 44 slots are evicting my jobs. Has anyone seen anything like this?

The slots are all on a cluster of identical 1U's, so there shouldn't be any preferential matching.

Also, we're using really simple submit ClassAds (our only requirement is FileSystemDomain so we wind up on the cluster...)


Strange. I would double check that these machines really are reporting as identical. The default setup for the config setting NEGOTIATOR_PRE_JOB_RANK is to prefer unclaimed resources whenever possible. Has the value for this setting been messed with?

Or just disable the preemption and be done...

regards,
Todd

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/