[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Too many open files



Hey Ian,

Yeah, we've made the switch over to logging to the local disk. We'd encountered so much trouble with file locks and such that it seemed the only real way to go. I have setup a samba mount but that is purely for viewing the logs from a windows machine after the jobs have completed. 

Thanks for your help,
Chris.

On 7 November 2011 16:15, Ian Chesal <ichesal@xxxxxxxxxxxxxxxxxx> wrote:
Hi Chris,

On Thursday, 3 November, 2011 at 4:30 PM, Christopher Martin wrote:

Hi,

We're getting errors in the job log files indicating that there are too many files open:
...
007 (196430.005.000) 11/03 08:13:00 Shadow exception!
Error from slot12@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: Failed to open '/mnt/render/jobs/job_141798_rndrgatebegin_yko_120_0400_syanye/chr_all_rp_tcrender-196430-5-stdout.txt' as standard output: Too many open files (errno 24)
0  -  Run Bytes Sent By Job
0  -  Run Bytes Received By Job

The file it's complaining about is the stdout from the job's executable. I've taken a look at the submit/scheduler machine and we're nowhere near the file limit. Same thing on the execution machine. We are however logging to a Windows share mounted to the submit/scheduler machine over CIFS. We've been experiencing extremely heavy load on the windows filer that we're logging to so I'm guessing it's a result of that but I wanted to throw this out there in case anyone else has run into similar issues before.
Samba mount? I'm not particularly fond of Samba in large deployments -- it doesn't scale up well. Windows file access semantics use locks over zealously and SMB is an aging protocol, Samba can't really keep up. It usually adds up to disaster above a 200 hundred concurrent handles or so, no matter how powerful the underlying hardware.

Your best bet is to move logging to local disk. You could try NFS-mounted remote but there are file lock issues on NFS to contend with as well.

Regards,
- Ian

---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com
http://twitter.com/cyclecomputing 


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/