[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [HTCondor-CE] Maximum number of established TCP connections



Hi Sang-Un,

The theoretical number of TCP connections is 64K for a given IP address pair (to a fixed port) -- so, 3.5k connections doesn't sound like it's stressing the system too much.  In practice, we've been able to establish hundreds of thousands of connections (although that level requires quite a bit of tuning!).

Could you provide a few log snippets?  That might allow us to look at the exact error messages and provide some advice and/or insights.

Happy to help,

Brian

> On Apr 2, 2020, at 7:36 PM, Sang Un Ahn <sahn@xxxxxxxxxxx> wrote:
> 
> Hello,
> 
> First of all, I hope you all keep safe in such a troubled moment.
> 
> I am running a small set of HTCondor pool plus HTCondor-CE for taking jobs from ALICE VO. The pool is capable to offer around 3,500 concurrent job slots however more recently I just realised that when the number of established TCP connections (by condor_shadow) through 9618 is above 3,000, the number of concurrent jobs looks somehow hitting the ceiling. Precisely, the JobRouter is holding additional jobs in the queue of CE while no idle/hold jobs in local queue. As you might notice, JobRouter, Schedd, Collectord and Negotiatord are running on the same host for HTCondor-CE.
> 
> After searching on Google, the theoretical number of tcp connections on a Linux system is 64k (=65,535) and several guides on how to tune in terms of network performance exist. I am also consulting with the system administrators in my team and the number of connections and open files are limited by system parameters such as sysctl and/or limits.conf (ulimit). A network expert says it is not advised to increase the number of TCP connections above 4k, instead it is advised to have additional host, e.g. the second HTCondor-CE. 
> 
> So I was just wondering whether there exists any guidance on performance tuning for HTCondor/HTCondor-CE regarding networking and also there exists a limit on the capability of JobRouter.
> 
> FYI, the HTCondor-CE is running on a virtual machine with 8 cores and 16GB memory. The versions are 3.4.0 and 8.8.7 for HTCondor-CE and HTCondor, respectively. Please just let me know if you need more information.
> 
> Thank you.
> 
> Best regards,
> Sang-Un
> 
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/