[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAGs and Error msg "Too open files"



On Fri, 9 Dec 2011, Michael O'Donnell wrote:

I am running a DAG in our pool (all machines including the central manager
run on a Windows OS (XP, 7, R2) with about 200 available machines for this
specific job and my DAG reports an error which changes its status from
Running to Idle (that is to say the jobs are running fine but the
condor_dag.exe produces an error).

The error log file the DAG produces this:
12/09/11 06:22:48 Can't open "ExtSimVal_DAG.dag.dagman.out"
dprintf() had a fatal error in pid 9684
Can't open "ExtSimVal_DAG.dag.dagman.out"
errno: 24 (Too many open files)

I'm not an expert on the Windows side of things, but one thing you *can* do is reduce the number of file descriptors DAGMan uses by having all of your node jobs use the same log file. The easiest way to do this (assuming you're running a fairly recent DAGMan -- 7.5 or later should be good, I think) is to just not specify a log file in your submit files, and let DAGMan generate a default log for you.

(One note: we are working on changes to DAGMan that will limit the number of file descriptors it uses at any given time, no matter how many log files are used by the DAG node jobs. But those changes are not ready yet...)

Kent Wenger
Condor Team