[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] nodes without cgrouped jobs?



On 6/7/2018 10:44 AM, Thomas Hartmann wrote:
> Hi all,
> 
> I just noticed, that a few of our nodes have their jobs not confined in
> cgroups - i.e., no condor slice at all [1]. These nodes are setup the
> same and on the same release [2] as the majority of the nodes where the
> jobs are properly cgrouped.
> We are going to drain and reboot these nodes, but maybe somebody has an
> idea, what might have gone wrong here?
> 
> Cheers,
>   Thomas
> 

Hi Thomas,

Unlike some others on this list, I am not a cgroup expert, but what does "condor_config_val BASE_CGROUP" have to say on these two machines?  The default value is "htcondor", so to poke around in /sys/fs/cgroup, I would not be going into system.slice subdirectory (systemd settings), but would do something like:

# ls /sys/fs/cgroup/cpu,cpuacct/htcondor/condor_var_lib_condor_execute_slot1_slot1_*

Hope the above helps
Todd



> 
> [1]
> [root@batch0202 ~]# ls
> /sys/fs/cgroup/cpu,cpuacct/system.slice/condor.service/condor_var_lib_condor_execute_slot1_*
> ls: cannot access
> /sys/fs/cgroup/cpu,cpuacct/system.slice/condor.service/condor_var_lib_condor_execute_slot1_*:
> No such file or directory
> 
> [root@batch0203 ~]# ls
> /sys/fs/cgroup/cpu,cpuacct/system.slice/condor.service/condor_var_lib_condor_execute_slot1_*
> /sys/fs/cgroup/cpu,cpuacct/system.slice/condor.service/condor_var_lib_condor_execute_slot1_10@xxxxxxxxxxxxxxxxx:
> cgroup.clone_children
> ...
> 
> [2]
> condor-classads-8.6.11-1.el7.x86_64
> condor-8.6.11-1.el7.x86_64
> condor-python-8.6.11-1.el7.x86_64
> condor-procd-8.6.11-1.el7.x86_64
> condor-external-libs-8.6.11-1.el7.x86_64
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 


-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685