[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor not spawning secondary collectors on specified ports



Hi all,

I was taking a look into secondary collectors as we'll be needing them soon enough.

I followed the wiki guides on the config needed for them, for reference we're running the 8.3.4 release of HTCondor.

## Configure the sub-collectors for tiered collecting.
## Reduces load on the central collector
COLLECTOR2 = $(COLLECTOR)
COLLECTOR2_ARGS = -f -p 10002
COLLECTOR2_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector2Log"
COLLECTOR3 = $(COLLECTOR)
COLLECTOR3_ARGS = -f -p 10003
COLLECTOR3_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector3Log"
COLLECTOR4 = $(COLLECTOR)
COLLECTOR4_ARGS = -f -p 10004
COLLECTOR4_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector4Log"
CONDOR_VIEW_HOST = $(COLLECTOR_HOST)

As you can see I specify the ports as 10001-10004. The new processes start up fine and everything looks okay. So i went ahead and added a randomly chosen port selection to our worker nodes.

However this is the result on the worker nodes:

04/22/15 09:09:53 attempt to connect to <(cm_ip):10002> failed: Connection refused (connect errno = 111).
04/22/15 09:09:53 ERROR: SECMAN:2004:Failed to create security session to <128.142.152.233:10002> with TCP.|SECMAN:2003:TCP connection to <128.142.152.233:10002> failed.
04/22/15 09:09:53 Failed to start non-blocking update to <128.142.152.233:10002>.

I checked and temporarily disabled the firewall initially thinking that may have been the problem. That wasn't the case.

The PID for the collector supposed to be running on port 10002 is 1980256, however when I check netstat I get the following:

~]# netstat -tulpn | grep 1980256
tcp        0      0 0.0.0.0:41384               0.0.0.0:*                   LISTEN      1980256/condor_coll
udp        0      0 0.0.0.0:41384               0.0.0.0:*                               1980256/condor_coll

Sure enough changing the randomly chosen port on the worker node to 49225 results in the collector receiving the payload and registering the worker.

Anyone got any suggestions, have I perhaps got a typo on the collector spawning that you can spot.

Thanks,

Iain