[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_shadow and file descriptors



On Sep 27, 2013, at 11:34 AM, Paul Brenner <paul.r.brenner@xxxxxx> wrote:

> Thanks Mats, Dan, and Greg,
> 
> We were trying to count how many files each Condor job transferred/opened and could not justify the massive file descriptor requirement.  Now that we understand each shadow process can open 40-50 file descriptors it is clear that 65K file descriptors is not enough for 2K concurrently running jobs.  The RHEL defaults are an order of magnitude lower than 65K.  Sounds like we will need to raise this another order of magnitude.  
> 
> We regularly run 10K+ concurrent jobs from the same submit hosts with Grid Engine but the master/slave submission model is totally different.  We will do some quick research regarding any pitfalls for raising the file descriptor count even higher and then proceed accordingly (all of our cluster frontends [20+]) have the same image so we need to be careful with any base OS config changes.
> 
> 

Hi Paul,

What version of RHEL do you run?

I'm scratching my head a bit because modern kernels set /proc/sys/fs/file-max according to the amount of memory on the machine.  For example, on a machine with 8GB of RAM, this works out to be .8M file descriptors.  A machine with 32GB of RAM should out-of-the-box have a maximum of 3.2M.

Of course, "out of the box" to me refers to SL, not RHEL.  Is it possible that is a difference?

Brian