[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Problem with HTCondor in a multinic machine



Hello there. I used the Ubuntu 14.04 deb file to install HTCondor on two machines on my cluster: one is within the network of the cluster, with no NIC with an external network, and the other one with a NIC with an external network. HTCondor ran with no problems with the first machine but not in the second.

Executing condor_status in the second machine gets me this:
Error: communication error
CEDAR:6001:Failed to connect to <192.168.1.7:9618>

No machine in the external network have that IP. I did read the section 3.7.3 of the tutorial regarding the multinic enviroment and made some modifications in the condor_config file setting BIND_ALL_INTERFACES to false and NETWORK_INTERFACE to 192.168.0.*. After restarting HTCondor I still get the same output when I do condor_status. I don't why it keeps connecting to that IP.

StartLog:
10/22/15 10:13:56 ERROR: SECMAN:2003:TCP connection to collector ---- failed.
10/22/15 10:13:56 Failed to start non-blocking update to <192.168.1.7:9618>.
10/22/15 10:13:59 attempt to connect to <192.168.1.7:9618> failed: No route to host (connect errno = 113).

CollectorLog:
10/22/15 10:08:47 stats: Inserting new hashent for 'Collector':'My Pool - ----@----':'192.168.1.135'
10/22/15 10:08:50 attempt to connect to <192.168.1.7:9618> failed: No route to host (connect errno = 113).
10/22/15 10:08:50 Failed to send update to collector godzilla.ica.luz.edu.ve.
10/22/15 10:08:50 Unable to send UPDATE_COLLECTOR_AD to all configured collectors


And the condor service doesn't show up when I do nmap localhost. I do see it in the other machine. This other machine can run the quickstart tutorial with no problems.