[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CondorCE: job transform for normalizing jobs' core/mem ratio?



OK - I just noticed, that also for my own trace jobs the actual Condor
job gets various Glidein job ads and more elaborate requirement ads

I had assumed from the glideins, that it would be a CMS feature ð

Sorry for the noise
  Thomas


On 05/08/2020 10.45, Thomas Hartmann wrote:
> Hi again,
> 
> I just stumbled over a (CMS ð) job, that looks somewhat odd [1]
> regarding its requirements.
> For one, the memory requirement seems not really limited with
> RequestMemory derived from MemoryUsage, i.e., the a priori limit
> depending on the later mem usage?
> 
> For the core requirements, I wonder why the values change between the
> CondorCE view and the Condor view of the same job (especially since
> condor_ce_history is just a wrapper around condor_history - I guess that
> there is some transformation happening here somewhere, or?)
> 
> in the [condorce] view, the job comes with a cpu request of 1 - but then
> the [condor] view of the same job morphed to 8 cores AFAIS? Glidein??
> At the moment I do not see, how the RequestCpus_ce = 1 becomes
> OriginalCpus = 8 (getting fed into RequestCpus_batch)
> 
> tbh I would prefer to strip of such dynamic behaviour in favour of a
> one-to-one matching of resources.
> 
> Cheers,
>   Thomas
> 
> 
> [1]
>   RequestMemory = ifthenelse(MemoryUsage =!=
> undefined,MemoryUsage,(ImageSize + 1023) / 1024)
>   RequestCpus = 1
>   RequestDisk = DiskUsage
>   MemoryUsage = ((ResidentSetSize + 1023) / 1024)
> 
> [condorce]
>> grep Cpus
> RequestCpus = 1
> 
> [condor]
>> grep Cpus
> CpusProvisioned = 8
> GlideinCpusIsGood =  !isUndefined(MATCH_EXP_JOB_GLIDEIN_Cpus) &&
> (int(MATCH_EXP_JOB_GLIDEIN_Cpus) =!= error)
> JOB_GLIDEIN_Cpus = "$$(ifThenElse(WantWholeNode is true,
> !isUndefined(TotalCpus) ? TotalCpus : JobCpus, OriginalCpus))"
> JobCpus = JobIsRunning ? int(MATCH_EXP_JOB_GLIDEIN_Cpus) : OriginalCpus
> JobIsRunning = (JobStatus =!= 1) && (JobStatus =!= 5) && GlideinCpusIsGood
> OriginalCpus = 8
> RequestCpus = ifThenElse(WantWholeNode =?= true, !isUndefined(TotalCpus)
> ? TotalCpus : JobCpus,OriginalCpus)
> orig_RequestCpus = 1
> 
> 
> On 04/08/2020 02.44, Antonio Perez-Calero Yzquierdo wrote:
>> Hi Thomas,
>>
>> See my comment below:
>>
>> On Mon, Aug 3, 2020 at 10:50 AM Thomas Hartmann <thomas.hartmann@xxxxxxx
>> <mailto:thomas.hartmann@xxxxxxx>> wrote:
>>
>>     Hi Brian,
>>
>>     yes, from the technical view you are absolutely right.
>>
>>     My worries just go into the 'political direction' ;)
>>
>>     So far, if a VO want's to run highmem jobs, i.e., core/mem < 1/2GB, they
>>     have to scale by cores.
>>     With cores and memory decoupled, I might worry, that we could become
>>     more attractive to VOs to run their highmem jobs - and we starve in the
>>     end there and have cores idleing, that are not accounted (and cause
>>     discussions later on...)
>>     Probably the primary 'issue' is, that AFAIS cores are somewhat the base
>>     currency - in the end the 'relevant' pi charts are just about the
>>     delivered core scaled walltime :-/
>>
>> We have discussed in CMS several times the option of updating the
>> "currency" as you named it, from CPU cores to the number of "unit cells"
>> occupied by each jobs, when each "cell" is a multidimensional unit, e.g
>> in 2D, CPU x memory, the unitÂcell being 1 CPU core x 2 GB. So each user
>> would be charged on the basis of the max between the number of CPU cores
>> and the number of 2 GB quanta employed. I condor terms (correct me if
>> I'm wrong), that is managed by the slot weight, which can take such an
>> expression as formula.Â
>>
>> In fact, what we had in mind was somehow charging the "extra cost" to
>> the user requesting more memory, to discourage such requests (=CPU is
>> consumed faster => lower priority), but still keep the CPU core
>> available for potential matchmaking, as Brian explained, to improve the
>> overall utilization of the resources.
>>
>> Despite discussions, we have not (yet) taken the steps to put this into
>> effect as in the end the cases where jobs do require higher than
>> standard memory/core are generally marginal. If they became more
>> frequent, we'd look into this possibility.Â
>>
>> I somehow feel the political side of things as you described it would
>> still be complicated ;-)
>>
>> Cheers,
>> Antonio.Â
>>
>>
>>     Cheers,
>>     Â Thomas
>>
>>     On 31/07/2020 20.58, Bockelman, Brian wrote:
>>     > Hi Thomas,
>>     >
>>     > We do not normalize incoming requirements.
>>     >
>>     > In your example, I'm not sure if I'm following the benefit. You
>>     are suggesting changing:
>>     >
>>     > 1 core / 8GB -> 4 core / 8 GB
>>     >
>>     > Right? To me, in that case, you now have 3 idle cores inside the
>>     job - guaranteed to not be used - rather than 3 idle cores in condor
>>     which possibly are not used unless another VO comes in with odd
>>     requirements.
>>     >
>>     > Now, some sites *do* charge for jobs according to both memory and
>>     CPU. So, in your case of 1 core / 2GB being nominal, they would
>>     charge the user's fairshare for 4 units if the user submitted a 1
>>     core / 8 GB job.
>>     >
>>     > Or am I looking at this from the wrong direction?
>>     >
>>     > Brian
>>     >
>>     >> On Jul 31, 2020, at 5:02 AM, Thomas Hartmann
>>     <thomas.hartmann@xxxxxxx <mailto:thomas.hartmann@xxxxxxx>> wrote:
>>     >>
>>     >> Hi all,
>>     >>
>>     >> on your CondorCEs, do you normalize incoming jobs for their
>>     core/memory
>>     >> requirements?
>>     >>
>>     >> Thing is, that we normally assume a ratio of ~ 1core/2GB memory.
>>     >> Now let's say a user/VO submits jobs with a skewed ration like
>>     >> 1core/8GB. Which would probably lead to draining for memory and
>>     leave a
>>     >> few cores idle.
>>     >> So, I had been thinking, if it might make sense to rescale a
>>     job's core
>>     >> or memory requirements in a transform to get the job close to the
>>     >> implicitly assumed core/mem ratio.
>>     >>
>>     >> Does that make sense? ð
>>     >>
>>     >> Cheers,
>>     >>Â Thomas
>>     >>
>>     >> _______________________________________________
>>     >> HTCondor-users mailing list
>>     >> To unsubscribe, send a message to
>>     htcondor-users-request@xxxxxxxxxxx
>>     <mailto:htcondor-users-request@xxxxxxxxxxx> with a
>>     >> subject: Unsubscribe
>>     >> You can also unsubscribe by visiting
>>     >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>     >>
>>     >> The archives can be found at:
>>     >> https://lists.cs.wisc.edu/archive/htcondor-users/
>>     >
>>     >
>>     > _______________________________________________
>>     > HTCondor-users mailing list
>>     > To unsubscribe, send a message to
>>     htcondor-users-request@xxxxxxxxxxx
>>     <mailto:htcondor-users-request@xxxxxxxxxxx> with a
>>     > subject: Unsubscribe
>>     > You can also unsubscribe by visiting
>>     > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>     >
>>     > The archives can be found at:
>>     > https://lists.cs.wisc.edu/archive/htcondor-users/
>>     >
>>
>>     _______________________________________________
>>     HTCondor-users mailing list
>>     To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>     <mailto:htcondor-users-request@xxxxxxxxxxx> with a
>>     subject: Unsubscribe
>>     You can also unsubscribe by visiting
>>     https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>>     The archives can be found at:
>>     https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>>
>>
>> -- 
>> Antonio Perez-Calero Yzquierdo, PhD
>> CIEMAT & Port d'Informacià Cientifica, PIC.
>> Campus Universitat Autonoma de Barcelona, Edifici D, E-08193 Bellaterra,
>> Barcelona, Spain.
>> Phone: +34 93 170 27 21
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature