[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] dags and max open files



We recently upgraded to condor 9.0.15 (which may or may not be relevant) and are now seeing some schedds reporting "too many open files", for example:

08/12/22 10:43:02 (pid:4627) Daemon::startCommand(INVALIDATE_SUBMITTOR_ADS,...) making connection to <10.13.5.25:9618?alias=ldas-condor.ldas.ligo-la.caltech.edu> 08/12/22 10:43:02 (pid:4627) Can't open directory "/etc/condor/passwords.d" as PRIV_ROOT, errno: 24 (Too many open files) 08/12/22 10:43:02 (pid:4627) Can't open directory "/etc/condor/passwords.d" as PRIV_ROOT, errno: 24 (Too many open files) 08/12/22 10:43:02 (pid:4627) Can't open directory "/etc/condor/tokens.d" as PRIV_ROOT, errno: 24 (Too many open files) 08/12/22 10:43:02 (pid:4627) getTokenSigningKey(): read_secure_file(/etc/condor/condor_cred) failed!
08/12/22 10:43:02 (pid:4627) TOKEN: No token found.
08/12/22 10:43:02 (pid:4627) SECMAN: required authentication with collector ldas-condori failed, so aborting command INVALIDATE_SUBMITTOR_ADS.

I'm able to work around this by increasing the file descriptor limit on the schedd from the default of 4096 with:

SCHEDD_MAX_FILE_DESCRIPTORS = 32768

Looking in /proc/$pid/fd for the condor_schedd process, I see almost all open files are related to user .out, .err, and /dev/null fds from user dagman jobs.

Is it to be expected that there would be a lot of open files from dagman jobs?

--Mike