[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CPU share handling changed with 8.6.5-1?



Hi Michael,

OK - many thanks for cross-checking!
I have to dig in - might be that something else has changed (I have to
check our grid ARC CE? had updated that also recently).
Currently I find nodes with CPU shares with 100 as well as nodes with
1024 as basic share.

Other thing coming to my mind might be the current kernel bugfix. Might
be, that with applying the kernel the reboot, i.e., remount of cgroups,
finally brought some changes to the top that we had made already some
time ago?? (However, the basic cpu share unit seems to be not
necessarily coupled to the kernel/uptime??)

At the moment I am quite confused - if I find something, I will send an
update

Cheers,
  Thomas


On 2017-09-28 17:50, Michael Pelletier wrote:
> Hey Tom,
> 
> I'm not seeing this under 8.6.6-1:
> 
> [condor_data_condor_execute_slot1_1@...]# condor_q 2361 -af requestcpus
> 32
> [condor_data_condor_execute_slot1_1@...]# cat cpu.shares
> 3200
> 
> $CondorVersion: 8.6.6 Sep 12 2017 BuildID: 416237 $
> $CondorPlatform: x86_64_RedHat7 $
> 
> 
> 	-Michael Pelletier.
> 
> 
>> -----Original Message-----
>> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
>> Of Thomas Hartmann
>> Sent: Thursday, September 28, 2017 11:35 AM
>> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
>> Subject: [External] [HTCondor-users] CPU share handling changed with
>> 8.6.5-1?
>>
>> Hi all,
>>
>> after updating to ~8.6.5 it seems that the scaling fractions for the CPU
>> share cgroup has changed.
>>
>> So far I had assumed the CPU share to be requested CPU x 100 for the jobs'
>> child groups with the condor parent share normally equivalent to the
>> global share.
>> Now the basic scaling factor within the child cgroups seems to be 1024,
>> or?
>>
>> Might also be a local problem at us - as also explicit 8core jobs now end
>> up with the same CPU share of 1024 as single core jobs but not 8192 [1]?
>>
>>
>> Cheers,
>>   Thomas
>>
>>
>> [1]
>>  353503 ?        Ss     0:00      \_ condor_starter -f -a slot1_3
>> grid-arcce0.desy.de
>>  369890 ?        S      0:00      |                           \_
>> condor_starter -f -a slot1_1 vocms0250.cern.ch
>>  370188 ?        S      0:00      |                           \_
>> condor_starter -f -a slot1_2 vocms0250.cern.ch
>>  371000 ?        S      0:00      |                           \_
>> condor_starter -f -a slot1_3 vocms0250.cern.ch
>>  371002 ?        S      0:00      |                           \_
>> condor_starter -f -a slot1_4 vocms0250.cern.ch
>>  371135 ?        S      0:00      |                           \_
>> condor_starter -f -a slot1_7 vocms0250.cern.ch
>>  371137 ?        S      0:00      |                           \_
>> condor_starter -f -a slot1_8 vocms0250.cern.ch
>>  371138 ?        S      0:00      |                           \_
>> condor_starter -f -a slot1_5 vocms0250.cern.ch
>>  371139 ?        S      0:00      |                           \_
>> condor_starter -f -a slot1_6 vocms0250.cern.ch
>>
>>> cat
>> /cgroup/cpu/htcondor/condor_var_lib_condor_execute_slot1_3\@batch0930.desy
>> .de/cpu.shares
>>
>> 1024
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature