[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CondorCE: job transform for normalizing jobs' core/mem ratio?



Hi Thomas, all,

I can wholeheartedly recommend *not* to scale the jobs in the CE. This will only lead to wasting cores or memory, as others have pointed out.
Do so as a last resort, if CPUs are the âcurrencyâ users are billed by *and* you do not have enough memory.

As long as jobs are averaging around or below the available memory/core, PartionableSlots will naturally attract a mix of jobs to balance mem/core requirements. Simply put, there are only so many high-mem jobs to start before only low-mem jobs fit.
Be aware you need *some* extra memory for this, or you end up with fragmentation similar to the Multi-Core/Single-Core problems.
Groups abusing the mem/core lenience still get penalised by having to wait longer if resources are scarce. So there is still incentive to send well-behaved jobs to you.

If you want to help things along, use a RANK that selects the Startds with the best memory/core ratio after a match. Our setup for this is explained briefly in [0].
In my experience, any policy that is based on the machine feature works best. E.g. if you have some machines with 2GB/core and some with 3GB/core (sooner or later you will) there is no point enforcing a global ratio.

In case you are worried about getting jagged leftovers or fragmentation, but have on average enough memory, consider to quantize requests into comfortable chunks [1]. For example, we quantize memory to 512MB (versus default 128MB) with a minimum of 2GB to avoid very small memory requests from skewing the usage ratio and to enforce that a âstandardâ WLCG job always fits when a slot is freed.

Cheers,
Max

[0] See Section 3.3 RemainderScheduling
https://www.epj-conferences.org/articles/epjconf/abs/2019/19/epjconf_chep2018_03053/epjconf_chep2018_03053.html

[1] See MODIFY_REQUEST_EXPR_REQUESTMEMORY etc.
https://htcondor.readthedocs.io/en/stable/admin-manual/configuration-macros.html#condor-startd-configuration-file-macros

On 31. Jul 2020, at 12:02, Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:

Hi all,

on your CondorCEs, do you normalize incoming jobs for their core/memory
requirements?

Thing is, that we normally assume a ratio of ~ 1core/2GB memory.
Now let's say a user/VO submits jobs with a skewed ration like
1core/8GB. Which would probably lead to draining for memory and leave a
few cores idle.
So, I had been thinking, if it might make sense to rescale a job's core
or memory requirements in a transform to get the job close to the
implicitly assumed core/mem ratio.

Does that make sense? ð

Cheers,
 Thomas

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature