[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] ResidentSetSize report on Almalinux 9 and condor 10.5.0//23.0.0//23.0.1



Dear all,

The ResidentSetSize values reported for the jobs running on AlmaLinux 9 WNs (we are testing version 10.5.0, 23.0.0 and 23.0.1) are anonymouslyÂhigh, causing the jobs to be put on hold by our rules. The same type of jobs running on CentO7s WNs do not present this high memory report.Â

Here is an example:

[root@ce13 ~]# condor_q 21200475 21200476 -af ClusterId ProcId Owner RemoteHost ResidentSetSize/1024
21200475 0 lhpilot001 slot1_43@xxxxxxxxxxxx 12207
21200476 0 lhpilot001 slot1_51@xxxxxxxxxxxx 1708

td805 is an AlmaLinux 9 // HTCondor 10.5.0 WN, while td827 is CentOs7//HTCondor. Just with a top command, we can check that the RES consumed memory in the AlmaLinux 9 WNs for the lhpilot001 jobs is the expected (between 1.5-1.7 GB).Â

I can see these lines repeated in the StarterLog from all AlmaLinux9 startd clients (10.5.0 or 23.0.X):

11/09/23 09:25:10 (pid:3124691) ProcFamilyDirectCgroupV2::get_usage cannot open /sys/fs/cgroup/htcondor/condor_home_execute_slot1_43@xxxxxxxxxxxx/memory.peak: 2 No such file or directory

Has anyone found a similar behavior? I know that there was some issue regarding the memory on versions prior to 10.6.0 but I've also found the same with 23.0.0 and 23.0.1 version.Â

Cheers,

Carles

--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es