[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DAGMan write jobs log into *.nodes.log even if log file specified for all jobs



On Tue, 12 Nov 2013, Tom Downes wrote:

I expressed some concern a while back about these logs. I can't remember the
details, but the underlying issue was that was large DAGs can generate tens
of thousands of separate open/close operations on these logs over the course
of a day. The way things ran here, the default location of these logs ended
up on NFS.

Okay, I guess one thing is that we'd recommend not having the logs on NFS, especially the default/workflow log. (We already have this ticket:
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3930,4, I guess
that should be a pretty high priority so it's easy for you guys to *not* have the default/workflow log on NFS.)

There are two places that would be doing opens/closes on the log files:
1) The schedd
2) DAGMan

The default/workflow log should really cut down on the opens/closes that DAGMan does -- in most cases it will probably just open/close the log file once or a handful of times.

Obviously the schedd will be doing more opens/closes if it's writing to two log files instead of one for each job (although not all events are written to the workflow log, so it's not twice as many).

The way I read the manual at the time, I wasn't expecting the log files to
show up at all given the Log-ging settings in the submit file. I am busy
with other matters right now, but you might take a 2nd look at Condor 8.x
era e-mails from me (perhaps on the LIGO list).

Hmm -- I guess one option would be to have the schedd *not* write to the log file specified in the job's submit file when a default/workflow log file is specified by DAGMan. (I'm not sure I really like that idea, though.)

Would this be significanly less of a problem if it were easier to at least move the default/workflow log off of NFS?

Kent