[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] CCB Server - Client Communication (Condor 7.3.1)

Herzfeld, David wrote:
07/20 11:36:55 CCBClient:received failure message from CCB server in response to (non-blocking) request for reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <>#1248077180#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1197 because no daemon is currently registered with that id (perhaps it recently disconnected).
The problem is that from the CCB server's point of view, the startd registered with CCB id 1197 has disconnected its CCB listener socket. Please send me the full CollectorLog and StartLog (off list), or see if you can follow the history of id 1197 in the CollectorLog and confirm that it got disconnected, and then see in the StartLog whether there is any record of it being disconnected from the CCB server.

Do you have a large number of execute nodes? I have observed cases where iptable limits on the CCB server machine (ip_conntrack_max) caused TCP connections to be abruptly closed. If /proc/sys/net/ipv4/netfilter/ip_conntrack_count is on the same order as /proc/sys/net/ipv4/ip_conntrack_max, then this could be your problem.