[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SharedPointEndpoint fails to accept connection



Many thanks Brian and Todd. I set it to permissive for about an hour of normal operation then used audit2allow to amend the policy. Working well so far.

Michael McInerny Murphy
Engineer
IERUS Technologies, Inc.
2904 Westcorp Blvd., Ste. 210
(256) 319-2026 x 107

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Tuesday, August 9, 2016 4:16 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] SharedPointEndpoint fails to accept connection

Adding SELinux rules for the condor_shared_port is on the short term todo list (i.e. hopefully the next version).  The ticket about this has some work-around ideas; see
   https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5449

regards
Todd

On 8/9/2016 3:59 PM, Michael Murphy wrote:
> Hi Brian,
>
> SELinux is in enforcing mode (I'm required to). However, when I put it 
> in enforcing mode temporarily, the problem cleared. Is there a 
> workaround for this problem for those of use required to run SELinux 
> in enforcing mode?
>
> On 08/09/2016 03:50 PM, Brian Lin wrote:
>> Does your SharedPortLog have Permission Denied errors when trying to 
>> write files in /var/lock? If so, try setting SELinux to permissive 
>> mode to see if that helps.
>>
>> Cheers,
>> Brian
>>
>> On 08/09/2016 03:50 PM, Michael Murphy wrote:
>>> Hello,
>>>
>>> I am unable to get Condor's shared port to function properly on a
>>> Centos7 client machine (MASTER, START, SCHEDD, SHARED_PORT, KBDD 
>>> daemons are active). My shared port configuration is the following:
>>>
>>> DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT USE_SHARED_PORT = TRUE 
>>> SHARED_PORT_PORT = 9618 COLLECTOR_HOST = $(CONDOR_HOST) 
>>> UPDATE_COLLECTOR_WITH_TCP = TRUE
>>>
>>> The SCHEDD daemon logs are full of the following:
>>>
>>> <Omitted for brevity>
>>>
>>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept 
>>> connection on
>>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227
>>> 681_ce89_3
>>>
>>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept 
>>> connection on
>>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227
>>> 681_ce89_3
>>>
>>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept 
>>> connection on
>>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227
>>> 681_ce89_3
>>>
>>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept 
>>> connection on
>>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227
>>> 681_ce89_3
>>>
>>> 08/09/16 15:38:29 (pid:227709) MaxLog = 10485760 bytes, length =
>>> 10485845
>>> 08/09/16 15:38:29 (pid:227709) Saving log file to 
>>> "/var/log/condor/SchedLog.old"
>>>
>>> The MasterLog doesn't shows the result of the Schedd connectivity 
>>> issue
>>>
>>> 08/09/16 15:38:24 
>>> ******************************************************
>>> 08/09/16 15:38:24 ** condor_master (CONDOR_MASTER) STARTING UP
>>> 08/09/16 15:38:24 ** /usr/sbin/condor_master
>>> 08/09/16 15:38:24 ** SubsystemInfo: name=MASTER type=MASTER(2)
>>> class=DAEMON(1)
>>> 08/09/16 15:38:24 ** Configuration: subsystem:MASTER local:<NONE> 
>>> class:DAEMON
>>> 08/09/16 15:38:24 ** $CondorVersion: 8.3.8 Jan 14 2016 BuildID:
>>> RH-8.3.8-1.el7 $
>>> 08/09/16 15:38:24 ** $CondorPlatform: X86_64-RedHat_7.2 $
>>> 08/09/16 15:38:24 ** PID = 227681
>>> 08/09/16 15:38:24 ** Log last touched time unavailable (No such file 
>>> or
>>> directory)
>>> 08/09/16 15:38:24 
>>> ******************************************************
>>> 08/09/16 15:38:24 Using config source: /etc/condor/condor_config
>>> 08/09/16 15:38:24 Using local config sources:
>>> 08/09/16 15:38:24    /etc/condor/config.d/00-IERUS_WorkstationNode.conf
>>> 08/09/16 15:38:24    /etc/condor/config.d/41-sharedport.conf
>>> 08/09/16 15:38:24 config Macros = 114, Sorted = 114, StringBytes = 
>>> 4901, TablesBytes = 4160
>>> 08/09/16 15:38:24 CLASSAD_CACHING is OFF
>>> 08/09/16 15:38:24 Daemon Log is logging: D_ALWAYS D_ERROR
>>> 08/09/16 15:38:25 SharedPortEndpoint: waiting for connections to 
>>> named socket 227681_ce89
>>> 08/09/16 15:38:25 SharedPortEndpoint: failed to open
>>> /var/lock/condor/shared_port_ad: No such file or directory
>>> 08/09/16 15:38:25 SharedPortEndpoint: did not successfully find 
>>> SharedPortServer address. Will retry in 60s.
>>> 08/09/16 15:38:25 DaemonCore: private command socket at 
>>> <192.168.6.135:0?sock=227681_ce89>
>>> 08/09/16 15:38:25 Master restart (GRACEFUL) is watching 
>>> /usr/sbin/condor_master (mtime:1452815958)
>>> 08/09/16 15:38:25 Collector port not defined, will use default: 9618
>>> 08/09/16 15:38:25 Started DaemonCore process 
>>> "/usr/libexec/condor/condor_shared_port", pid and pgroup = 227708
>>> 08/09/16 15:38:25 Waiting for /var/lock/condor/shared_port_ad to appear.
>>> 08/09/16 15:38:26 Found /var/lock/condor/shared_port_ad.
>>> 08/09/16 15:38:26 Started DaemonCore process 
>>> "/usr/sbin/condor_schedd", pid and pgroup = 227709
>>> 08/09/16 15:38:26 Started DaemonCore process 
>>> "/usr/sbin/condor_startd", pid and pgroup = 227710
>>> 08/09/16 15:38:26 Started DaemonCore process 
>>> "/usr/sbin/condor_kbdd", pid and pgroup = 227711
>>> 08/09/16 15:43:30 condor_write(): Socket closed when trying to write
>>> 1421 bytes to collector boss.hq.ierustech.com, fd is 12
>>> 08/09/16 15:43:30 Buf::write(): condor_write() failed
>>>
>>> Where should I start looking to fix this. I am by no means a condor pro.
>>> I just enjoy it when it works.--
>>>
>>> Michael McInerny Murphy
>>> Engineer & Physicist
>>> IERUS Technologies, Inc.
>>> 2904 Westcorp Blvd. Ste 210
>>> Huntsville, AL  35805
>>> (256) 319-2026 ext 107
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting 
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/