[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] [HTCondor-CE] Maximum number of established TCP connections


First of all, I hope you all keep safe in such a troubled moment.

I am running a small set of HTCondor pool plus HTCondor-CE for taking jobs from ALICE VO. The pool is capable to offer around 3,500 concurrent job slots however more recently I just realised that when the number of established TCP connections (by condor_shadow) through 9618 is above 3,000, the number of concurrent jobs looks somehow hitting the ceiling. Precisely, the JobRouter is holding additional jobs in the queue of CE while no idle/hold jobs in local queue. As you might notice, JobRouter, Schedd, Collectord and Negotiatord are running on the same host for HTCondor-CE.

After searching on Google, the theoretical number of tcp connections on a Linux system is 64k (=65,535) and several guides on how to tune in terms of network performance exist. I am also consulting with the system administrators in my team and the number of connections and open files are limited by system parameters such as sysctl and/or limits.conf (ulimit). A network expert says it is not advised to increase the number of TCP connections above 4k, instead it is advised to have additional host, e.g. the second HTCondor-CE. 

So I was just wondering whether there exists any guidance on performance tuning for HTCondor/HTCondor-CE regarding networking and also there exists a limit on the capability of JobRouter.

FYI, the HTCondor-CE is running on a virtual machine with 8 cores and 16GB memory. The versions are 3.4.0 and 8.8.7 for HTCondor-CE and HTCondor, respectively. Please just let me know if you need more information.

Thank you.

Best regards,