Re: [HTCondor-users] CondorCE: job transform for normalizing jobs' core/mem ratio?

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

On Wed, Aug 5, 2020 at 4:07 PM Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:

OK - I just noticed, that also for my own trace jobs the actual Condor
job gets various Glidein job ads and more elaborate requirement ads

I had assumed from the glideins, that it would be a CMS feature ð

Sorry for the noise
Â Thomas

On 05/08/2020 10.45, Thomas Hartmann wrote:
> Hi again,
>
> I just stumbled over a (CMS ð) job, that looks somewhat odd [1]
> regarding its requirements.
> For one, the memory requirement seems not really limited with
> RequestMemory derived from MemoryUsage, i.e., the a priori limit
> depending on the later mem usage?
>
> For the core requirements, I wonder why the values change between the
> CondorCE view and the Condor view of the same job (especially since
> condor_ce_history is just a wrapper around condor_history - I guess that
> there is some transformation happening here somewhere, or?)
>
> in the [condorce] view, the job comes with a cpu request of 1 - but then
> the [condor] view of the same job morphed to 8 cores AFAIS? Glidein??
> At the moment I do not see, how the RequestCpus_ce = 1 becomes
> OriginalCpus = 8 (getting fed into RequestCpus_batch)
>
> tbh I would prefer to strip of such dynamic behaviour in favour of a
> one-to-one matching of resources.
>
> Cheers,
>Â ÂThomas
>
>
> [1]
>Â ÂRequestMemory = ifthenelse(MemoryUsage =!=
> undefined,MemoryUsage,(ImageSize + 1023) / 1024)
>Â ÂRequestCpus = 1
>Â ÂRequestDisk = DiskUsage
>Â ÂMemoryUsage = ((ResidentSetSize + 1023) / 1024)
>
> [condorce]
>> grep Cpus
> RequestCpus = 1
>
> [condor]
>> grep Cpus
> CpusProvisioned = 8
> GlideinCpusIsGood =Â !isUndefined(MATCH_EXP_JOB_GLIDEIN_Cpus) &&
> (int(MATCH_EXP_JOB_GLIDEIN_Cpus) =!= error)
> JOB_GLIDEIN_Cpus = "$$(ifThenElse(WantWholeNode is true,
> !isUndefined(TotalCpus) ? TotalCpus : JobCpus, OriginalCpus))"
> JobCpus = JobIsRunning ? int(MATCH_EXP_JOB_GLIDEIN_Cpus) : OriginalCpus
> JobIsRunning = (JobStatus =!= 1) && (JobStatus =!= 5) && GlideinCpusIsGood
> OriginalCpus = 8
> RequestCpus = ifThenElse(WantWholeNode =?= true, !isUndefined(TotalCpus)
> ? TotalCpus : JobCpus,OriginalCpus)
> orig_RequestCpus = 1
>
>
> On 04/08/2020 02.44, Antonio Perez-Calero Yzquierdo wrote:
>> Hi Thomas,
>>
>> See my comment below:
>>
>> On Mon, Aug 3, 2020 at 10:50 AM Thomas Hartmann <thomas.hartmann@xxxxxxx
>> <mailto:thomas.hartmann@xxxxxxx>> wrote:
>>
>>Â Â ÂHi Brian,
>>
>>Â Â Âyes, from the technical view you are absolutely right.
>>
>>Â Â ÂMy worries just go into the 'political direction' ;)
>>
>>Â Â ÂSo far, if a VO want's to run highmem jobs, i.e., core/mem < 1/2GB, they
>>Â Â Âhave to scale by cores.
>>Â Â ÂWith cores and memory decoupled, I might worry, that we could become
>>Â Â Âmore attractive to VOs to run their highmem jobs - and we starve in the
>>Â Â Âend there and have cores idleing, that are not accounted (and cause
>>Â Â Âdiscussions later on...)
>>Â Â ÂProbably the primary 'issue' is, that AFAIS cores are somewhat the base
>>Â Â Âcurrency - in the end the 'relevant' pi charts are just about the
>>Â Â Âdelivered core scaled walltime :-/
>>
>> We have discussed in CMS several times the option of updating the
>> "currency" as you named it, from CPU cores to the number of "unit cells"
>> occupied by each jobs, when each "cell" is a multidimensional unit, e.g
>> in 2D, CPU x memory, the unitÂcell being 1 CPU core x 2 GB. So each user
>> would be charged on the basis of the max between the number of CPU cores
>> and the number of 2 GB quanta employed. I condor terms (correct me if
>> I'm wrong), that is managed by the slot weight, which can take such an
>> _expression_ as formula.Â
>>
>> In fact, what we had in mind was somehow charging the "extra cost" to
>> the user requesting more memory, to discourage such requests (=CPU is
>> consumed faster => lower priority), but still keep the CPU core
>> available for potential matchmaking, as Brian explained, to improve the
>> overall utilization of the resources.
>>
>> Despite discussions, we have not (yet) taken the steps to put this into
>> effect as in the end the cases where jobs do require higher than
>> standard memory/core are generally marginal. If they became more
>> frequent, we'd look into this possibility.Â
>>
>> I somehow feel the political side of things as you described it would
>> still be complicated ;-)
>>
>> Cheers,
>> Antonio.Â
>>
>>
>>Â Â ÂCheers,
>>Â Â ÂÂ Thomas
>>
>>Â Â ÂOn 31/07/2020 20.58, Bockelman, Brian wrote:
>>Â Â Â> Hi Thomas,
>>Â Â Â>
>>Â Â Â> We do not normalize incoming requirements.
>>Â Â Â>
>>Â Â Â> In your example, I'm not sure if I'm following the benefit.Â You
>>Â Â Âare suggesting changing:
>>Â Â Â>
>>Â Â Â> 1 core / 8GB -> 4 core / 8 GB
>>Â Â Â>
>>Â Â Â> Right?Â To me, in that case, you now have 3 idle cores inside the
>>Â Â Âjob - guaranteed to not be used - rather than 3 idle cores in condor
>>Â Â Âwhich possibly are not used unless another VO comes in with odd
>>Â Â Ârequirements.
>>Â Â Â>
>>Â Â Â> Now, some sites *do* charge for jobs according to both memory and
>>Â Â ÂCPU.Â So, in your case of 1 core / 2GB being nominal, they would
>>Â Â Âcharge the user's fairshare for 4 units if the user submitted a 1
>>Â Â Âcore / 8 GB job.
>>Â Â Â>
>>Â Â Â> Or am I looking at this from the wrong direction?
>>Â Â Â>
>>Â Â Â> Brian
>>Â Â Â>
>>Â Â Â>> On Jul 31, 2020, at 5:02 AM, Thomas Hartmann
>>Â Â Â<thomas.hartmann@xxxxxxx <mailto:thomas.hartmann@xxxxxxx>> wrote:
>>Â Â Â>>
>>Â Â Â>> Hi all,
>>Â Â Â>>
>>Â Â Â>> on your CondorCEs, do you normalize incoming jobs for their
>>Â Â Âcore/memory
>>Â Â Â>> requirements?
>>Â Â Â>>
>>Â Â Â>> Thing is, that we normally assume a ratio of ~ 1core/2GB memory.
>>Â Â Â>> Now let's say a user/VO submits jobs with a skewed ration like
>>Â Â Â>> 1core/8GB. Which would probably lead to draining for memory and
>>Â Â Âleave a
>>Â Â Â>> few cores idle.
>>Â Â Â>> So, I had been thinking, if it might make sense to rescale a
>>Â Â Âjob's core
>>Â Â Â>> or memory requirements in a transform to get the job close to the
>>Â Â Â>> implicitly assumed core/mem ratio.
>>Â Â Â>>
>>Â Â Â>> Does that make sense? ð
>>Â Â Â>>
>>Â Â Â>> Cheers,
>>Â Â Â>>Â Thomas
>>Â Â Â>>
>>Â Â Â>> _______________________________________________
>>Â Â Â>> HTCondor-users mailing list
>>Â Â Â>> To unsubscribe, send a message to
>>Â Â Âhtcondor-users-request@xxxxxxxxxxx
>>Â Â Â<mailto:htcondor-users-request@xxxxxxxxxxx> with a
>>Â Â Â>> subject: Unsubscribe
>>Â Â Â>> You can also unsubscribe by visiting
>>Â Â Â>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>Â Â Â>>
>>Â Â Â>> The archives can be found at:
>>Â Â Â>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>Â Â Â>
>>Â Â Â>
>>Â Â Â> _______________________________________________
>>Â Â Â> HTCondor-users mailing list
>>Â Â Â> To unsubscribe, send a message to
>>Â Â Âhtcondor-users-request@xxxxxxxxxxx
>>Â Â Â<mailto:htcondor-users-request@xxxxxxxxxxx> with a
>>Â Â Â> subject: Unsubscribe
>>Â Â Â> You can also unsubscribe by visiting
>>Â Â Â> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>Â Â Â>
>>Â Â Â> The archives can be found at:
>>Â Â Â> https://lists.cs.wisc.edu/archive/htcondor-users/
>>Â Â Â>
>>
>>Â Â Â_______________________________________________
>>Â Â ÂHTCondor-users mailing list
>>Â Â ÂTo unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>Â Â Â<mailto:htcondor-users-request@xxxxxxxxxxx> with a
>>Â Â Âsubject: Unsubscribe
>>Â Â ÂYou can also unsubscribe by visiting
>>Â Â Âhttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>>Â Â ÂThe archives can be found at:
>>Â Â Âhttps://lists.cs.wisc.edu/archive/htcondor-users/
>>
>>
>>
>> --
>> Antonio Perez-Calero Yzquierdo, PhD
>> CIEMAT & Port d'InformaciÃ Cientifica, PIC.
>> Campus Universitat Autonoma de Barcelona, Edifici D, E-08193 Bellaterra,
>> Barcelona, Spain.
>> Phone: +34 93 170 27 21
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Mailing List Archives

Public Access

Re: [HTCondor-users] CondorCE: job transform for normalizing jobs' core/mem ratio?