[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Understand the working of condor_shared_port



Hello Condor Experts,

I have a general query which may be more related to network stuff but I can't find information on this topic hence reaching out to a wider audience. After reading documentation [1] it's clear that we are using the condor_shared_port daemon to optimize TCP connections established by condor jobs. I started a batch of 20 jobs but I still see condor_shadow processes establishing TCP connection with remote nodes.Â

# lsof -ni:9618
COMMAND Â Â Â PID Â Â ÂUSER Â FD Â TYPE Â Â DEVICE SIZE/OFF NODE NAME
condor_ma 1089274  Âcondor  Â3u ÂIPv4 1729915543   Â0t0 ÂTCP xx.xx.244.10:25603->xx.xx.242.11:condor (ESTABLISHED)
condor_sh 1089276  Âcondor  10u ÂIPv4 Â456896756   Â0t0 ÂTCP *:condor (LISTEN)
condor_sc 1089304  Âcondor  Â3u ÂIPv4 1729926410   Â0t0 ÂTCP xx.xx.244.10:21783->xx.xx.242.11:condor (ESTABLISHED)
condor_sc 1089304  Âcondor  13u ÂIPv4 1729926411   Â0t0 ÂTCP xx.xx.244.10:9453->xx.xx.250.11:condor (ESTABLISHED)
condor_sc 1089304  Âcondor  15u ÂIPv4 1729926412   Â0t0 ÂTCP xx.xx.244.10:11651->xx.xx.55.254:condor (ESTABLISHED)
condor_sc 1089304  Âcondor  19u ÂIPv4 Â683305201   Â0t0 ÂTCP xx.xx.244.10:condor->xx.xx.242.11:22131 (ESTABLISHED)
condor_sc 1089304  Âcondor  21u ÂIPv4 Â722285952   Â0t0 ÂTCP xx.xx.244.10:condor->xx.xx.55.254:33489 (ESTABLISHED)
condor_sc 1089304  Âcondor  23u ÂIPv4 1717552965   Â0t0 ÂTCP xx.xx.244.10:condor->xx.xx.250.11:12815 (ESTABLISHED)
condor_sh 1724703 testuser1 Â Â4u ÂIPv4 1729893884 Â Â Â0t0 ÂTCP xx.xx.244.10:28873->xx.xx.250.25:condor (ESTABLISHED)
condor_sh 1724704 testuser1 Â Â4u ÂIPv4 1729871588 Â Â Â0t0 ÂTCP xx.xx.244.10:28079->xx.xx.250.52:condor (ESTABLISHED)
condor_sh 1724705 testuser1 Â Â4u ÂIPv4 1729915552 Â Â Â0t0 ÂTCP xx.xx.244.10:29329->xx.xx.250.25:condor (ESTABLISHED)
condor_sh 1724706 testuser1 Â Â4u ÂIPv4 1729921016 Â Â Â0t0 ÂTCP xx.xx.244.10:17811->xx.xx.250.24:condor (ESTABLISHED)
condor_sh 1724707 testuser1 Â Â4u ÂIPv4 1729915562 Â Â Â0t0 ÂTCP xx.xx.244.10:22091->xx.xx.250.52:condor (ESTABLISHED)
condor_sh 1724708 testuser1 Â Â4u ÂIPv4 1729915567 Â Â Â0t0 ÂTCP xx.xx.244.10:25819->xx.xx.250.25:condor (ESTABLISHED)
condor_sh 1724709 testuser1 Â Â4u ÂIPv4 1729915572 Â Â Â0t0 ÂTCP xx.xx.244.10:20045->xx.xx.250.32:condor (ESTABLISHED)
condor_sh 1724710 testuser1 Â Â4u ÂIPv4 1729893890 Â Â Â0t0 ÂTCP xx.xx.244.10:12681->xx.xx.250.28:condor (ESTABLISHED)
condor_sh 1724711 testuser1 Â Â4u ÂIPv4 1729935716 Â Â Â0t0 ÂTCP xx.xx.244.10:16195->xx.xx.250.35:condor (ESTABLISHED)
condor_sh 1724712 testuser1 Â Â4u ÂIPv4 1729915578 Â Â Â0t0 ÂTCP xx.xx.244.10:21043->xx.xx.250.31:condor (ESTABLISHED)
condor_sh 1724713 testuser1 Â Â4u ÂIPv4 1729893895 Â Â Â0t0 ÂTCP xx.xx.244.10:29975->xx.xx.250.29:condor (ESTABLISHED)
condor_sh 1724714 testuser1 Â Â4u ÂIPv4 1729915584 Â Â Â0t0 ÂTCP xx.xx.244.10:remctl->xx.xx.250.33:condor (ESTABLISHED)
condor_sh 1724715 testuser1 Â Â4u ÂIPv4 1729915589 Â Â Â0t0 ÂTCP xx.xx.244.10:20425->xx.xx.250.26:condor (ESTABLISHED)
condor_sh 1724716 testuser1 Â Â4u ÂIPv4 1729893901 Â Â Â0t0 ÂTCP xx.xx.244.10:22133->xx.xx.250.27:condor (ESTABLISHED)
condor_sh 1724717 testuser1 Â Â4u ÂIPv4 1729915595 Â Â Â0t0 ÂTCP xx.xx.244.10:26021->xx.xx.250.47:condor (ESTABLISHED)
condor_sh 1724718 testuser1 Â Â4u ÂIPv4 1729915600 Â Â Â0t0 ÂTCP xx.xx.244.10:15531->xx.xx.250.40:condor (ESTABLISHED)
condor_sh 1724719 testuser1 Â Â4u ÂIPv4 1729893907 Â Â Â0t0 ÂTCP xx.xx.244.10:19013->xx.xx.250.52:condor (ESTABLISHED)
condor_sh 1724720 testuser1 Â Â4u ÂIPv4 1729893909 Â Â Â0t0 ÂTCP xx.xx.244.10:9169->xx.xx.250.24:condor (ESTABLISHED)
condor_sh 1724722 testuser1 Â Â4u ÂIPv4 1729915615 Â Â Â0t0 ÂTCP xx.xx.244.10:22215->xx.xx.250.25:condor (ESTABLISHED)
condor_sh 1724898 testuser1 Â Â4u ÂIPv4 1729915646 Â Â Â0t0 ÂTCP xx.xx.244.10:12917->xx.xx.250.24:condor (ESTABLISHED)

condor_shadow processes are opening two sockets: stream and unix domain socket.Â

condor_sh 1724703 testuser1Â Â 4u ÂIPv4 Â Â Â Â 1729893884 Â Â Â0t0 Â Â Â ÂTCP submit.com:28873->worker.com:condor (ESTABLISHED)
condor_sh 1724703 testuser1Â Â38u Âunix 0xffff95de014ac800 Â Â Â0t0 1729931330 @5deeac046c251886e10930cdc29b3a91f776a3804faf07d6421097b93f34c968/1089304_f601_619628

I was expecting that condor_shared_port will be a single point of communication between submitter and worker nodes. condor_shadow processes having only unix domain sockets.Â

Can anyone please help to understand how condor_shared_port helps to bring port utilization?Â

[1]Âhttps://htcondor.readthedocs.io/en/latest/admin-manual/networking.html

Thanks & Regards,
Vikrant Aggarwal