[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Job does not start: Failed to reverse connect to daemon via CCB



Hello,

I have two machines on the local network:
  1) Central Manager (172.17.0.7)
  2) Submit node (172.17.0.8)
The configuration of both contains the following:
  USE_SHARED_PORT = TRUE
  SHARED_PORT_PORT = 9618
  UPDATE_COLLECTOR_WITH_TCP = TRUE
  BIND_ALL_INTERFACES = TRUE
  PRIVATE_NETWORK_NAME = pseven-htcondor
  CCB_ADDRESS = $(COLLECTOR_HOST)

And I configured port forwarding from my "public" IP (let it be 1.1.1.99):
  ÂA. Forwarding from 1.1.1.99:9618 -> 172.17.0.7:9618
  ÂB. Forwarding from 1.1.1.99:9619 -> 172.17.0.8:9618

And I added windows Execute node (it has 1.1.1.51 IP) with the following configuration settings:
 ÂCONDOR_HOST = 1.1.1.99
 ÂCOLLECTOR_HOST = $(CONDOR_HOST):9618
 ÂHOST_ALIAS = pseven-htcondor-remote
 ÂPRIVATE_NETWORK_NAME = pseven-htcondor-remote
 ÂDAEMON_LIST = STARTD
 ÂCONDOR_ADMIN =
 Âuse POLICY : ALWAYS_RUN_JOBS
 ÂWANT_VACATE = FALSE
 ÂWANT_SUSPEND = TRUE

When I submit a task it changes its status to RUN (and never goes into DONE). And I see an repetitive error in the `StarterLog.slot1` log:
  01/13/20 18:59:58 (fd:8) (pid:4380) (D_ALWAYS) Failed to reverse connect to daemon at <172.17.0.8:9618> via CCB.
  01/13/20 18:59:58 (fd:8) (pid:4380) (D_ALWAYS) FileTransfer: Unable to connect to server <172.17.0.8:9618?CCBID=172.17.0.7:9618%3faddrs%3d172.17.0.7-9618%26alias%3dpseven-htcondormanager-7948d7cc4d-rq7pt.pseven-htcondor%26noUDP%26sock%3dcollector#1316&PrivNet=pseven-htcondor&addrs=172.17.0.8-9618&alias=submit.pseven-htcondor&noUDP&sock=shadow_20567_0fdd_89>

How to get around this problem?

So the output of the `condor_status -master -long | grep MyAddress` command is:
  MyAddress = "<1.1.1.51:9618?addrs=1.1.1.51-9618&alias=pseven-htcondor-remote&noUDP&sock=master_5108_7c8e>"
  MyAddress = "<172.17.0.7:9618?addrs=172.17.0.7-9618&alias=pseven-htcondormanager-7948d7cc4d-rq7pt.pseven-htcondor&noUDP&sock=master_11_1fa1>"
  MyAddress = "<172.17.0.8:9618?CCBID=172.17.0.7:9618%3faddrs%3d172.17.0.7-9618%26alias%3dpseven-htcondormanager-7948d7cc4d-rq7pt.pseven-htcondor%26noUDP%26sock%3dcollector#1316&PrivNet=pseven-htcondor&addrs=172.17.0.8-9618&alias=submit.pseven-htcondor&noUDP&sock=master_11_1fa1>"`

Is it correct output if I have the CCB configuration with StartdD behind a firewall?
Maybe I need to set CCBID to 1.1.1.99:9618 instead of 172.17.0.7:9618? Is it possible? In other words: How to configure an Execute node to make TCP connections through 1.1.1.99 IP (as set out in CONDOR_HOST and COLLECTOR_HOST settings) but not through local ones (172.17.0.7/172.17.0.8)?

Thanks in advance for any help!

__
Sincerely yours,
Ivan Ergunov                         mailto:hozblok@xxxxxxxxx