[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] no ccbid currently registered.
- Date: Fri, 08 Feb 2013 14:14:02 -0600
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] no ccbid currently registered.
To answer the question of why the schedd cannot connect to a target
daemon that is no longer registered with CCB, it may help to look in the
target daemon's log file, if you can locate it. If the daemon is still
running at the time when it is not registered with CCB, you should see a
log message that says it became disconnected from CCB and you should
also see periodic attempts to reconnect to CCB. The log message showing
the disconnect from CCB may help understand why this is happening. If,
on the other hand, the daemon is not alive, then we need to understand
why. The log file may help with that too.
Regarding the exhaustion of file descriptors: if condor is started as
root (the default for an rpm installation), the best way to configure
the maximum number of file descriptors available to the collector is to
use something like the following configuration setting in the htcondor
COLLECTOR_MAX_FILE_DESCRIPTORS = 10000
When the collector starts up, you will see a line in the log file that
looks like this:
"Setting maximum file descriptors to 10000."
If condor is started as root, it can set its limit higher than the
default hard limit. If it is not started as root, then it can only
decrease the limit. I recommend using this configuration setting,
rather than trying to set the per-process default, because some
mechanisms for setting the per-process default (e.g. PAM settings) are
not necessarily applied to condor processes, and, anyway, the
consequences of having a huge file descriptor limit for all processes
may not be good. For example, many processes use more memory when the
file descriptor limit is high. For a process such as the condor_shadow,
this may add up to a lot of memory, since there may be many instances of
the shadow process.
On 2/8/13 1:46 PM, Stephen Pietrowicz wrote:
I'm seeing the following message a significant number of times in some of the larger runs we've started to do:
02/08/13 12:22:26 CCB: rejecting request from SCHEDD <www.xxx.yyy.zzz:50190> on <www.xxx.yyy.zzz:40460> for ccbid 6987 because no daemon is currently registered with that id (perhaps it recently disconnected).
Eventually, we get:
**** PANIC -- OUT OF FILE DESCRIPTORS at line 175 in /slots/01/dir_65060/userdir/src/condor_io/reli_sock.cpp
And in /var/log/messages, I'm seeing:
Feb 8 10:59:59 lsst-launch kernel: possible SYN flooding on port 9618. Sending cookies.
We had been running jobs of about 500 slots or so, and have started to try and run at 1000+ slots simultaneously. The Collector machine and the submit machine both have up-ed the number of file descriptors to over 400,000 per process.
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: