[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] no ccbid currently registered.


I'm seeing the following message a significant number of times in some of the larger runs we've started to do:

02/08/13 12:22:26 CCB: rejecting request from SCHEDD <www.xxx.yyy.zzz:50190> on <www.xxx.yyy.zzz:40460> for ccbid 6987 because no daemon is currently registered with that id (perhaps it recently disconnected).

Eventually, we get:

**** PANIC -- OUT OF FILE DESCRIPTORS at line 175 in /slots/01/dir_65060/userdir/src/condor_io/reli_sock.cpp

And in /var/log/messages, I'm seeing:

Feb  8 10:59:59 lsst-launch kernel: possible SYN flooding on port 9618. Sending cookies.

We had been running jobs of about 500 slots or so, and have started to try and run at 1000+ slots simultaneously.  The Collector machine and the submit machine both have up-ed the number of file descriptors to over 400,000 per process.

Any ideas?