[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] slow creation of condor_shadow processes



Thanks Todd. After your comment I found the following link with details:Â

https://stackoverflow.com/questions/56650579/why-should-i-close-all-file-descriptors-after-calling-fork-and-prior-to-callin

I was trying to understand further how htcondor is using FDs by submitting a batch of 3k jobs. I reduced the limit of open files 100 (soft) and 2k (hard). I thought maybe I would not be able to run more than 2k jobs, I did see 3k jobs running.Â

Number of file handles increased by approx 60k and reduced to 21k after removing the jobs.Â

# cat /proc/sys/fs/file-nr
21568 0 6573632

# cat /proc/sys/fs/file-nr
81472 0 6573632

Enabling logging doesn't show me too many FDs used by condor.Â

SCHEDD_DEBUG = D_FDS
SHADOW_DEBUG = D_FDS
SHARED_PORT_DEBUG = D_FDS

Basically I am trying to understand: where condor uses FD? It can help me to answer what limits condor can hit if we don't bump the value of descriptors.Â

Thanks & Regards,
Vikrant Aggarwal


On Mon, Mar 8, 2021 at 10:22 PM Todd L Miller <tlmiller@xxxxxxxxxxx> wrote:
> Finally able to get the parameters due to which it was happening but didn't
> understand why it's happening.

    IIRC, HTCondor closes (almost?) all FDs after fork()ing* but
before exec()ing the shadow. There was not, until relatively recently, a
way to close all the FDs associated with a process; you had to make a
system call for each FD. When you have to close 102,400 FDs, that's a lot
of system calls, and it takes a while.

- ToddM

*: On Linux, HTCondor actually calls clone().
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/