[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Collector daemon crashing on Windows due to file descriptor limit



Hi Peet

 

âAs an aside, is there any documentation/literature for the maximum no. of nodes/slots a Windows and/or Linux CM can support?â

 

Not sure what the limits are, but we have a windows only execute pool/s that all have linux (Ubuntu 20.04) CMs. These VMs have

2 cores and 4Gb RAM. All our submit nodes are Windows Server 2016 with 8-cores and 32Gb RAM.

 

Total pool numbers = 1,900 machines with 10,000 slots/cores

 

Our largest individual pool = 630 machines with 3,300 slots/cores

 

All CMs also forward info to a condorview âsupercollectorâ but this is just to get total numbers and is only running a collector (no negotiator).

 

Cheers

 

Greg

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of John M Knoeller
Sent: Wednesday, 1 June 2022 11:42 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Collector daemon crashing on Windows due to file descriptor limit

 

The 1024 comes from the compatibility layer that we use to make the Windows and Linux code bases the same, so we canât make it a knob unfortunately.  We would have to change the Windows code base to use a more Windows style mechanism for detecting hot sockets in order to go beyond this limit.

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Peet Whittaker
Sent: Wednesday, June 1, 2022 10:16 AM
To: Greg Thain <gthain@xxxxxxxxxxx>
Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Collector daemon crashing on Windows due to file descriptor limit

 

1014 comes from a hard coded limit of 1024 file descriptors for HTCondor on Windows (with a cushion), and there's no easy way to increase that, I fear.

 

Ah, fair enough. I wonder whether this could be made into a config value in the future, as I think Windows can actually support a greater number of file descriptors (up to ~8k); see: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=msvc-160

 

But yes, we will look to migrate our CM to Linux (well, just the collector/negotiator parts anyway 😉).

 

As an aside, is there any documentation/literature for the maximum no. of nodes/slots a Windows and/or Linux CM can support? I did find the following page, but it mainly talks about RAM requirements (there is nothing related to file descriptors f.ex.) and is obviously quite old now (2007):

 

https://research.cs.wisc.edu/htcondor/CondorWeek2007/large_condor_pools.html

 

Kind regards,

 

Peet Whittaker

Discipline Lead for DevOps | Principal Software Developer

 

From: Greg Thain <gthain@xxxxxxxxxxx>
Sent: 31 May 2022 22:37
To: Peet Whittaker <Peet.Whittaker@xxxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Collector daemon crashing on Windows due to file descriptor limit

 

On 5/31/22 16:23, Peet Whittaker wrote:

Hi Greg,

 

Thanks for the quick reply!

 

Ideally, yes (and indeed will be adding Linux execute nodes in the future). However, it would be quite a big task to shift the CM to Linux (there are other processes running on the CM too).

 

Note that the CM in our terminology is just the collector and negotiator process, not the schedd.  I assume that if you split the collector & negotiator onto a linux machine, the other processes on your CM would stay on the Windows schedd/submit machine?

1014 comes from a hard coded limit of 1024 file descriptors for HTCondor on Windows (with a cushion), and there's no easy way to increase that, I fear.

-greg

 

 

 

JBA Consulting, 1 Broughton Park, Old Lane North, Broughton, Skipton, North Yorkshire, BD23 3FD. Telephone: +441756699500

Visit our new website at  www.jbaconsulting.com.

This email is covered by the JBA Consulting email disclaimer
JBA Consulting is a trading name of Jeremy Benn Associates Limited, registered in England, company number 03246693, 1 Broughton Park, Old Lane North, Broughton, Skipton, North Yorkshire, BD23 3FD.

Image removed by sender. JBA CONSULTING