[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CondorCE: job transform for normalizing jobs' core/mem ratio?



Hi Thomas,

See my comment below:

On Mon, Aug 3, 2020 at 10:50 AM Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
Hi Brian,

yes, from the technical view you are absolutely right.

My worries just go into the 'political direction' ;)

So far, if a VO want's to run highmem jobs, i.e., core/mem < 1/2GB, they
have to scale by cores.
With cores and memory decoupled, I might worry, that we could become
more attractive to VOs to run their highmem jobs - and we starve in the
end there and have cores idleing, that are not accounted (and cause
discussions later on...)
Probably the primary 'issue' is, that AFAIS cores are somewhat the base
currency - in the end the 'relevant' pi charts are just about the
delivered core scaled walltime :-/
We have discussed in CMS several times the option of updating the "currency" as you named it, from CPU cores to the number of "unit cells" occupied by each jobs, when each "cell" is a multidimensional unit, e.g in 2D, CPU x memory, the unitÂcell being 1 CPU core x 2 GB. So each user would be charged on the basis of the max between the number of CPU cores and the number of 2 GB quanta employed. I condor terms (correct me if I'm wrong), that is managed by the slot weight, which can take such an _expression_ as formula.Â

In fact, what we had in mind was somehow charging the "extra cost" to the user requesting more memory, to discourage such requests (=CPU is consumed faster => lower priority), but still keep the CPU core available for potential matchmaking, as Brian explained, to improve the overall utilization of the resources.

Despite discussions, we have not (yet) taken the steps to put this into effect as in the end the cases where jobs do require higher than standard memory/core are generally marginal. If they became more frequent, we'd look into this possibility.Â

I somehow feel the political side of things as you described it would still be complicated ;-)

Cheers,
Antonio.Â

Cheers,
 Thomas

On 31/07/2020 20.58, Bockelman, Brian wrote:
> Hi Thomas,
>
> We do not normalize incoming requirements.
>
> In your example, I'm not sure if I'm following the benefit. You are suggesting changing:
>
> 1 core / 8GB -> 4 core / 8 GB
>
> Right? To me, in that case, you now have 3 idle cores inside the job - guaranteed to not be used - rather than 3 idle cores in condor which possibly are not used unless another VO comes in with odd requirements.
>
> Now, some sites *do* charge for jobs according to both memory and CPU. So, in your case of 1 core / 2GB being nominal, they would charge the user's fairshare for 4 units if the user submitted a 1 core / 8 GB job.
>
> Or am I looking at this from the wrong direction?
>
> Brian
>
>> On Jul 31, 2020, at 5:02 AM, Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
>>
>> Hi all,
>>
>> on your CondorCEs, do you normalize incoming jobs for their core/memory
>> requirements?
>>
>> Thing is, that we normally assume a ratio of ~ 1core/2GB memory.
>> Now let's say a user/VO submits jobs with a skewed ration like
>> 1core/8GB. Which would probably lead to draining for memory and leave a
>> few cores idle.
>> So, I had been thinking, if it might make sense to rescale a job's core
>> or memory requirements in a transform to get the job close to the
>> implicitly assumed core/mem ratio.
>>
>> Does that make sense? ð
>>
>> Cheers,
>>Â Thomas
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Antonio Perez-Calero Yzquierdo, PhD
CIEMAT & Port d'Informacià Cientifica, PIC.
Campus Universitat Autonoma de Barcelona, Edifici D, E-08193 Bellaterra, Barcelona, Spain.
Phone: +34 93 170 27 21