Re: [Condor-users] Too many open files

Hey Ian,

Yeah, we've made the switch over to logging to the local disk. We'd encountered so much trouble with file locks and such that it seemed the only real way to go. I have setup a samba mount but that is purely for viewing the logs from a windows machine after the jobs have completed. 

On 7 November 2011 16:15, Ian Chesal wrote:
Hi Chris,

On Thursday, 3 November, 2011 at 4:30 PM, Christopher Martin wrote:


We're getting errors in the job log files indicating that there are too many files open:
007 (196430.005.000) 11/03 08:13:00 Shadow exception!
Error from slot12@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: Failed to open '/mnt/render/jobs/job_141798_rndrgatebegin_yko_120_0400_syanye/chr_all_rp_tcrender-196430-5-stdout.txt' as standard output: Too many open files (errno 24)
0  -  Run Bytes Sent By Job
0  -  Run Bytes Received By Job

The file it's complaining about is the stdout from the job's executable. I've taken a look at the submit/scheduler machine and we're nowhere near the file limit. Same thing on the execution machine. We are however logging to a Windows share mounted to the submit/scheduler machine over CIFS. We've been experiencing extremely heavy load on the windows filer that we're logging to so I'm guessing it's a result of that but I wanted to throw this out there in case anyone else has run into similar issues before.
Samba mount? I'm not particularly fond of Samba in large deployments -- it doesn't scale up well. Windows file access semantics use locks over zealously and SMB is an aging protocol, Samba can't really keep up. It usually adds up to disaster above a 200 hundred concurrent handles or so, no matter how powerful the underlying hardware.

Your best bet is to move logging to local disk. You could try NFS-mounted remote but there are file lock issues on NFS to contend with as well.

