[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] what is connect errno=111



Hi all,

I bumped our central manager from centos 6 to 7 and condor: from 8.2.10
to 8.6.1. All configs are the same. condor_status gives

> # condor_status
> Error: communication error
> CEDAR:6001:Failed to connect to <144.92.167.251:9618>

Master log says

> 03/14/17 12:50:38 attempt to connect to <144.92.167.251:9618> failed: Connection refused (connect errno = 111).
> 03/14/17 12:50:38 ERROR: SECMAN:2003:TCP connection to collector exocet.bmrb.wisc.edu failed.
> 03/14/17 12:50:38 Failed to start non-blocking update to <144.92.167.251:9618>.
> 03/14/17 12:55:38 attempt to connect to <144.92.167.251:9618> failed: Connection refused (connect errno = 111).
> 03/14/17 12:55:38 ERROR: SECMAN:2003:TCP connection to collector exocet.bmrb.wisc.edu failed.
> 03/14/17 12:55:38 Failed to start non-blocking update to <144.92.167.251:9618>.

Collector log complains about old history files then says

> 03/14/17 13:06:00 CollectorAd  : Inserting ** "< BioMagResBank, UW-Madison@xxxxxxxxxxxxxxxxxxxx >"
> 03/14/17 13:06:00 attempt to connect to <144.92.167.251:9618> failed: Connection refused (connect errno = 111).
> 03/14/17 13:06:00 Failed to send update to collector exocet.bmrb.wisc.edu.
> 03/14/17 13:06:00 Unable to send UPDATE_COLLECTOR_AD to all configured collectors

Start, sched, and negotiator logs end with the same "Connection refused
(connect errno = 111)". There's nothing in any of the /var/log/condor/*
logs that indicates any problem.

The port is open and iptables has blanket accept for loopback and local
subnet.

> # lsof -i -P | grep 9618
> condor_co 4071  condor   12u  IPv4  59575      0t0  UDP *:9618
> condor_co 4071  condor   14u  IPv6  59577      0t0  UDP *:9618
> # iptables -nvL
> Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
>  pkts bytes target     prot opt in     out     source               destination         
>  1074 77368 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
> ...
> 56917 5941K ACCEPT     all  --  *      *       144.92.167.128/25    0.0.0.0/0       

Yet

> # telnet localhost 9618
> Trying ::1...
> telnet: connect to address ::1: Connection refused
> Trying 127.0.0.1...
> telnet: connect to address 127.0.0.1: Connection refused

If I stop condor and run netcat on port 9618 I get a whole lot of stuff,
coming from other nodes presumably. So it looks like the port's fine and
it's the collector that's refusing to talk to itself.

Any suggestions as to where to look next?

TIA
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature