Re: [Condor-users] flocking / CCB

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Hi Dan,

The error message has disappeared. I did two things – I restarted condor on the processing nodes and I changed PRIVATE_NETWORK_NAME to local our internal domain is local –

[root@condor-36 condor]# host condor-36

condor-36.local has address 10.178.6.36

I’m not sure which of those things fixed it but it is fixed. I previously had a unique identifier in PRIVATE_NETWORK_NAME (fsu-hpc-condor) that was not reflective of our internal domain.

I’m sending this so my solution is stuffed into the archives :)

The full message is below -

StartLog:06/09/12 16:33:04 CCBListener: registered with CCB server 10.178.6.5 as ccbid 144.174.50.29:9618?PrivNet=fsu-hpc-condor-private#124

StartLog:06/09/12 16:39:05 CCBListener: failed to receive message from CCB server 10.178.6.5

StartLog:06/09/12 16:39:05 CCBListener: connection to CCB server 10.178.6.5 failed; will try to reconnect in 60 seconds.

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: Tuesday, June 12, 2012 11:48 AM
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] flocking / CCB

Hi Don,

06/09/12 16:39:05 CCBListener: failed to receive message from CCB server 10.178.6.5

Could you provide more logs? I'm specifically interested in any log message containing CCB.

It also may be helpful to add D_FULLDEBUG and D_COMMAND to COLLECTOR_DEBUG on the machine serving as your CCB server. This will give you messages when daemons try to register themselves for CCB access.

--Dan

On 6/9/12 4:16 PM, Shrum, Donald C wrote:

I'm trying to get a test job to flock between FSU and USF here in Florida.

As our cluster is on a private network and we have a public IP only on the central manager I added the following to condor_config on the central manager -

PRIVATE_NETWORK_NAME = fsu-hpc-condor-private

PRIVATE_NETWORK_INTERFACE = 10.178.6.5

I added CCB_ADDRESS and the same PRIVATE_NETWORK_NAME to the processing nodes' condor_config.

So far as I can tell the CCB daemon runs on the collector so I don't need to explicitly set it to run.

I must be missing something simple in the setup. I see errors that read -

06/09/12 16:39:05 CCBListener: failed to receive message from CCB server 10.178.6.5

I ran condor_reconfig on the processing nodes. Do I need to restart condor on all the nodes as a result of the change? The error message makes me think not.

Any pointers to debug this would be appreciated.

Thanks for the help.

Don

FSU HPC

_______________________________________________

Condor-users mailing list

To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a

subject: Unsubscribe

You can also unsubscribe by visiting

https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:

https://lists.cs.wisc.edu/archive/condor-users/

Mailing List Archives

Public Access

Re: [Condor-users] flocking / CCB