[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_shadow and file descriptors



On Sep 27, 2013, at 2:04 PM, Paul Brenner <paul.r.brenner@xxxxxx> wrote:

> RedHat 6.X with 128GB of RAM and 64cores on the frontend that was most recently crashed.  I would certainly guess that scientific linux defaults are much higher for the "non enterprise workloads".  I guess if you figure a "default" RedHat/CentOS in the enterprise world may be a modest LAMPS webserver the file descriptor defaults may not be that surprising.  As mentioned we have used the current configuration for many years with Grid Engine running 10K+ concurrent jobs and never experienced a file descriptor limitation.  In the world of Condor/HTC the configuration tuning is definitely weighted differently.
> 
> Good to know that SL runs at a million plus as a "default".  Our smallest head node has 32GB of RAM so growing 10x from 65K to 650K should be reasonably low risk.  
> 

Hi Paul,

Actually, the behavior I described (initial limit based on installed memory) is a kernel default (not a SL-specific tuning).  I'm surprised that RHEL would turn it lower!  I'm kinda scratching my head on that one.

I didn't know that running out of file descriptors could crash the kernel.  What do the tracebacks look like?

Brian