[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] flocking / CCB

Hi Dan,


The error message has disappeared.  I did two things – I restarted condor on the processing nodes and I changed PRIVATE_NETWORK_NAME to local our internal domain is local –

[root@condor-36 condor]# host condor-36

condor-36.local has address


I’m not sure which of those things fixed it but it is fixed.  I previously had a unique identifier in PRIVATE_NETWORK_NAME (fsu-hpc-condor) that was not reflective of our internal domain.


I’m sending this so my solution is stuffed into the archives :)


The full message is below -


StartLog:06/09/12 16:33:04 CCBListener: registered with CCB server as ccbid

StartLog:06/09/12 16:39:05 CCBListener: failed to receive message from CCB server

StartLog:06/09/12 16:39:05 CCBListener: connection to CCB server failed; will try to reconnect in 60 seconds.


From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: Tuesday, June 12, 2012 11:48 AM
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] flocking / CCB


Hi Don,

06/09/12 16:39:05 CCBListener: failed to receive message from CCB server


Could you provide more logs?  I'm specifically interested in any log message containing CCB.

It also may be helpful to add D_FULLDEBUG and D_COMMAND to COLLECTOR_DEBUG on the machine serving as your CCB server.  This will give you messages when daemons try to register themselves for CCB access.


On 6/9/12 4:16 PM, Shrum, Donald C wrote:

I'm trying to get a test job to flock between FSU and USF here in Florida.


As our cluster is on a private network and we have a public IP only on the central manager I added the following to condor_config on the central manager - 


PRIVATE_NETWORK_NAME = fsu-hpc-condor-private




I added CCB_ADDRESS and the same PRIVATE_NETWORK_NAME to the processing nodes' condor_config.


So far as I can tell the CCB daemon runs on the collector so I don't need to explicitly set it to run. 




I must be missing something simple in the setup.  I see errors that read - 

06/09/12 16:39:05 CCBListener: failed to receive message from CCB server


I ran condor_reconfig on the processing nodes.  Do I need to restart condor on all the nodes as a result of the change?  The error message makes me think not.


Any pointers to debug this would be appreciated.


Thanks for the help.





Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
The archives can be found at: