[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] "file descriptors" problem again



Hi,

I recently upgraded to Condor 6.8.0 on our central manager in order to
fix a problem with Condor. See:

https://lists.cs.wisc.edu/archive/condor-users/2006-August/msg00039.shtml

This solved the problem but instead I started to see exactly the
same "out of file descriptors" messages errors as reported
in

https://lists.cs.wisc.edu/archive/condor-users/2006-April/msg00191.shtml

The symptoms are the same - after the daily reboot of the windows
execution hosts a large number sit idle even though there is a big
(20,000) queue of jobs waiting to run. When I went back to 6.6.9 the problem
disappeared.

I'm wondering if, as has been suggested, that the "out of file descriptors"
is a red herring - the OS is the same (solaris 8) and none of the limits
have been changed. At most there are around 100 jobs running concurrently
with vanilla universe. The default limit (ulimit -n) is 256 (although I
understand that this is per process).

Any ideas about this ? Would a diff(1) of the two codes show up anything.
I could move the Condor-G to another hosts to get around the first problem
but I'm more concerned that the Windows central manager is going to get
stuck with an out of date version of condor.

cheers,

-ian.

-----------------------------------
Dr Ian C. Smith,
e-Science team,
University of Liverpool
Computing Services Department