[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Edgar, your HTCondor shared port question



Hi Todd,

Thanks for following up with this. This is a vanilla OSG RPM install although 3.3 so HTCondor 8.4:

condor_version 
$CondorVersion: 8.4.11 Feb 24 2017 $
$CondorPlatform: X86_64-CentOS_6.8 $

rpm -q condor
condor-8.4.11-1.1.osg33.el6.x86_64

So yes condor daemons running as root. 

condor_config_val DAEMON_SOCKET_DIR
auto

Well LOCK is in the non standard one because that is the standard gwms configurations:

/etc/condor/config.d/00_gwms_general.config

https://github.com/holzman/glideinWMS/blob/master/install/templates/00_gwms_general.config#L13

So that is the case in every single glideinwms submit host in the planet. 



I tried again after creating the daemon_sock

But still same problem even running condor_tail as root:

[1012] root@uaf-10 /var/log/condor# condor_tail -debug 7952406.0 
03/02/18 10:12:54 Result of reading /etc/issue:  CentOS release 6.9 (Final)
 
03/02/18 10:12:54 Growing processor array to 64
03/02/18 10:12:54 Using IDs: 48 processors, 24 CPUs, 24 HTs
03/02/18 10:12:54 Reading condor configuration from '/etc/condor/condor_config'
03/02/18 10:12:54 Enumerating interfaces: lo 127.0.0.1 up
03/02/18 10:12:54 Enumerating interfaces: eth0 169.228.130.74 up
03/02/18 10:12:54 Enumerating interfaces: eth0:0 169.254.100.2 up
03/02/18 10:12:54 Enumerating interfaces: eth0:1 169.228.130.39 up
03/02/18 10:12:54 Initializing Directory: curr_dir = /etc/condor/config.d
03/02/18 10:12:54 WARNING: Config source is empty: /etc/condor/config.d/00personal_condor.config
03/02/18 10:12:54 Locating daemon process
03/02/18 10:12:54 IPVERIFY: checking uaf-10.t2.ucsd.edu against 169.228.130.74
03/02/18 10:12:54 IPVERIFY: matched 169.228.130.74 to 169.228.130.74
03/02/18 10:12:54 IPVERIFY: ip found is 1
03/02/18 10:12:54 Response for GET_JOB_CONNECT_INFO:
StarterIpAddr = "<169.228.132.103:5993?CCBID=169.228.130.23:9633%3faddrs%3d169.228.130.23-9633#330705&PrivNet=sdsc-4.t2.ucsd.edu&addrs=169.228.132.103-5993&noUDP>"
Result = true
ServerTime = 1520014374
CondorVersion = "$CondorVersion: 8.4.8 Jun 30 2016 BuildID: 373513 $"

03/02/18 10:12:54 Got connect info for starter <169.228.132.103:5993?CCBID=169.228.130.23:9633%3faddrs%3d169.228.130.23-9633#330705&PrivNet=sdsc-4.t2.ucsd.edu&addrs=169.228.132.103-5993&noUDP>
03/02/18 10:12:54 Requesting GoAhead from the transfer queue manager.
03/02/18 10:12:54 Received GoAhead from the transfer queue manager.
03/02/18 10:12:54 IPVERIFY: checking glidein-collector.t2.ucsd.edu against 169.228.130.23
03/02/18 10:12:54 IPVERIFY: matched 169.228.130.23 to 169.228.130.23
03/02/18 10:12:54 IPVERIFY: ip found is 1
03/02/18 10:12:54 CCBClient: received failure message from CCB server collector 169.228.130.23:9633?addrs=169.228.130.23-9633 in response to request for reversed connection to starter at <169.228.132.103:5993>: failed to connect
03/02/18 10:12:54 Failed to reverse connect to starter at <169.228.132.103:5993> via CCB.
Failed to peek at file from starter: Failed to connect to starter
[1012] root@uaf-10 /var/log/condor# condor_tail -debug 7952406.0 
03/02/18 10:13:26 Result of reading /etc/issue:  CentOS release 6.9 (Final)
 
03/02/18 10:13:26 Growing processor array to 64
03/02/18 10:13:26 Using IDs: 48 processors, 24 CPUs, 24 HTs
03/02/18 10:13:26 Reading condor configuration from '/etc/condor/condor_config'
03/02/18 10:13:26 Enumerating interfaces: lo 127.0.0.1 up
03/02/18 10:13:26 Enumerating interfaces: eth0 169.228.130.74 up
03/02/18 10:13:26 Enumerating interfaces: eth0:0 169.254.100.2 up
03/02/18 10:13:26 Enumerating interfaces: eth0:1 169.228.130.39 up
03/02/18 10:13:26 Initializing Directory: curr_dir = /etc/condor/config.d
03/02/18 10:13:26 WARNING: Config source is empty: /etc/condor/config.d/00personal_condor.config
03/02/18 10:13:26 Locating daemon process
03/02/18 10:13:26 IPVERIFY: checking uaf-10.t2.ucsd.edu against 169.228.130.74
03/02/18 10:13:26 IPVERIFY: matched 169.228.130.74 to 169.228.130.74
03/02/18 10:13:26 IPVERIFY: ip found is 1
03/02/18 10:13:26 Response for GET_JOB_CONNECT_INFO:
StarterIpAddr = "<169.228.132.103:5993?CCBID=169.228.130.23:9633%3faddrs%3d169.228.130.23-9633#330705&PrivNet=sdsc-4.t2.ucsd.edu&addrs=169.228.132.103-5993&noUDP>"
Result = true
ServerTime = 1520014406
CondorVersion = "$CondorVersion: 8.4.8 Jun 30 2016 BuildID: 373513 $"

03/02/18 10:13:26 Got connect info for starter <169.228.132.103:5993?CCBID=169.228.130.23:9633%3faddrs%3d169.228.130.23-9633#330705&PrivNet=sdsc-4.t2.ucsd.edu&addrs=169.228.132.103-5993&noUDP>
03/02/18 10:13:26 Requesting GoAhead from the transfer queue manager.
03/02/18 10:13:26 Received GoAhead from the transfer queue manager.
03/02/18 10:13:26 IPVERIFY: checking glidein-collector.t2.ucsd.edu against 169.228.130.23
03/02/18 10:13:26 IPVERIFY: matched 169.228.130.23 to 169.228.130.23
03/02/18 10:13:26 IPVERIFY: ip found is 1
03/02/18 10:13:26 CCBClient: received failure message from CCB server collector 169.228.130.23:9633?addrs=169.228.130.23-9633 in response to request for reversed connection to starter at <169.228.132.103:5993>: failed to connect
03/02/18 10:13:26 Failed to reverse connect to starter at <169.228.132.103:5993> via CCB.
Failed to peek at file from starter: Failed to connect to starter

But as soon as I turn the firewall off it works:

condor_tail -debug 7952406.0 
03/02/18 10:14:29 Result of reading /etc/issue:  CentOS release 6.9 (Final)
 
03/02/18 10:14:29 Growing processor array to 64
03/02/18 10:14:29 Using IDs: 48 processors, 24 CPUs, 24 HTs
03/02/18 10:14:29 Reading condor configuration from '/etc/condor/condor_config'
03/02/18 10:14:29 Enumerating interfaces: lo 127.0.0.1 up
03/02/18 10:14:29 Enumerating interfaces: eth0 169.228.130.74 up
03/02/18 10:14:29 Enumerating interfaces: eth0:0 169.254.100.2 up
03/02/18 10:14:29 Enumerating interfaces: eth0:1 169.228.130.39 up
03/02/18 10:14:29 Initializing Directory: curr_dir = /etc/condor/config.d
03/02/18 10:14:29 WARNING: Config source is empty: /etc/condor/config.d/00personal_condor.config
03/02/18 10:14:29 Locating daemon process
03/02/18 10:14:29 IPVERIFY: checking uaf-10.t2.ucsd.edu against 169.228.130.74
03/02/18 10:14:29 IPVERIFY: matched 169.228.130.74 to 169.228.130.74
03/02/18 10:14:29 IPVERIFY: ip found is 1
03/02/18 10:14:29 Response for GET_JOB_CONNECT_INFO:
StarterIpAddr = "<169.228.132.103:5993?CCBID=169.228.130.23:9633%3faddrs%3d169.228.130.23-9633#330705&PrivNet=sdsc-4.t2.ucsd.edu&addrs=169.228.132.103-5993&noUDP>"
Result = true
ServerTime = 1520014469
CondorVersion = "$CondorVersion: 8.4.8 Jun 30 2016 BuildID: 373513 $"

03/02/18 10:14:29 Got connect info for starter <169.228.132.103:5993?CCBID=169.228.130.23:9633%3faddrs%3d169.228.130.23-9633#330705&PrivNet=sdsc-4.t2.ucsd.edu&addrs=169.228.132.103-5993&noUDP>
03/02/18 10:14:29 Requesting GoAhead from the transfer queue manager.
03/02/18 10:14:29 Received GoAhead from the transfer queue manager.
03/02/18 10:14:29 IPVERIFY: checking glidein-collector.t2.ucsd.edu against 169.228.130.23
03/02/18 10:14:29 IPVERIFY: matched 169.228.130.23 to 169.228.130.23
03/02/18 10:14:29 IPVERIFY: ip found is 1
03/02/18 10:14:29 IPVERIFY: checking sdsc-4.t2.ucsd.edu against 169.228.132.103
03/02/18 10:14:29 IPVERIFY: matched 169.228.132.103 to 169.228.132.103
03/02/18 10:14:29 IPVERIFY: ip found is 1
TransferOffsets = { 18554 }
Result = true
TransferFiles = { 0 }



Edgar M Fajardo Hernandez



On Mar 2, 2018, at 4:13 AM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:

On 2/23/2018 3:07 PM, Edgar M Fajardo Hernandez wrote:
Hi Todd,
Thank you for following up with this.
I tried doing it as root:
[1258] root@uaf-10 /etc/condor/config.d# condor_tail -debug 7900841.0
[snip]
And still have same error.

Hi Edgar,

Strange....  looks like we have a bug hunt here.

On your submit machine, are the HTCondor daemons running as root? Did you install them from the RPM or the tarball? What is the value of knob DAEMON_SOCKET_DIR ?

Could you try making the directory yourself and let us know if that helps, ie doing

  sudo mkdir /var/log/condor/daemon_sock
  sudo chown condor.condor /var/log/condor/daemon_sock
  sudo chmod 1777 /var/log/condor/daemon_sock


ps I am also wondering how/why LOCK on your machine ended up in in the non-default /var/log/condor instead of /var/lock/condor....

thanks
Todd