[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SharedPointEndpoint fails to accept connection



You should be able to customize the site SELinux policy using the standard audit2allow tools.  Google has found good tutorials for me when I hit this issue myself.  I'm hoping the HTCondor team will fix the policies in the next release.

Other than that, you're looking at poking holes in the firewall for each daemon.

Brian

Sent from my iPhone

> On Aug 9, 2016, at 3:59 PM, Michael Murphy <Michael.Murphy@xxxxxxxxxxxxx> wrote:
> 
> Hi Brian,
> 
> SELinux is in enforcing mode (I'm required to). However, when I put it
> in enforcing mode temporarily, the problem cleared. Is there a
> workaround for this problem for those of use required to run SELinux in
> enforcing mode?
> 
>> On 08/09/2016 03:50 PM, Brian Lin wrote:
>> Does your SharedPortLog have Permission Denied errors when trying to
>> write files in /var/lock? If so, try setting SELinux to permissive
>> mode to see if that helps.
>> 
>> Cheers,
>> Brian
>> 
>>> On 08/09/2016 03:50 PM, Michael Murphy wrote:
>>> Hello,
>>> 
>>> I am unable to get Condor's shared port to function properly on a
>>> Centos7 client machine (MASTER, START, SCHEDD, SHARED_PORT, KBDD daemons
>>> are active). My shared port configuration is the following:
>>> 
>>> DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
>>> USE_SHARED_PORT = TRUE
>>> SHARED_PORT_PORT = 9618
>>> COLLECTOR_HOST = $(CONDOR_HOST)
>>> UPDATE_COLLECTOR_WITH_TCP = TRUE
>>> 
>>> The SCHEDD daemon logs are full of the following:
>>> 
>>> <Omitted for brevity>
>>> 
>>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept
>>> connection on
>>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227681_ce89_3
>>> 
>>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept
>>> connection on
>>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227681_ce89_3
>>> 
>>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept
>>> connection on
>>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227681_ce89_3
>>> 
>>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept
>>> connection on
>>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227681_ce89_3
>>> 
>>> 08/09/16 15:38:29 (pid:227709) MaxLog = 10485760 bytes, length =
>>> 10485845
>>> 08/09/16 15:38:29 (pid:227709) Saving log file to
>>> "/var/log/condor/SchedLog.old"
>>> 
>>> The MasterLog doesn't shows the result of the Schedd connectivity issue
>>> 
>>> 08/09/16 15:38:24 ******************************************************
>>> 08/09/16 15:38:24 ** condor_master (CONDOR_MASTER) STARTING UP
>>> 08/09/16 15:38:24 ** /usr/sbin/condor_master
>>> 08/09/16 15:38:24 ** SubsystemInfo: name=MASTER type=MASTER(2)
>>> class=DAEMON(1)
>>> 08/09/16 15:38:24 ** Configuration: subsystem:MASTER local:<NONE>
>>> class:DAEMON
>>> 08/09/16 15:38:24 ** $CondorVersion: 8.3.8 Jan 14 2016 BuildID:
>>> RH-8.3.8-1.el7 $
>>> 08/09/16 15:38:24 ** $CondorPlatform: X86_64-RedHat_7.2 $
>>> 08/09/16 15:38:24 ** PID = 227681
>>> 08/09/16 15:38:24 ** Log last touched time unavailable (No such file or
>>> directory)
>>> 08/09/16 15:38:24 ******************************************************
>>> 08/09/16 15:38:24 Using config source: /etc/condor/condor_config
>>> 08/09/16 15:38:24 Using local config sources:
>>> 08/09/16 15:38:24    /etc/condor/config.d/00-IERUS_WorkstationNode.conf
>>> 08/09/16 15:38:24    /etc/condor/config.d/41-sharedport.conf
>>> 08/09/16 15:38:24 config Macros = 114, Sorted = 114, StringBytes = 4901,
>>> TablesBytes = 4160
>>> 08/09/16 15:38:24 CLASSAD_CACHING is OFF
>>> 08/09/16 15:38:24 Daemon Log is logging: D_ALWAYS D_ERROR
>>> 08/09/16 15:38:25 SharedPortEndpoint: waiting for connections to named
>>> socket 227681_ce89
>>> 08/09/16 15:38:25 SharedPortEndpoint: failed to open
>>> /var/lock/condor/shared_port_ad: No such file or directory
>>> 08/09/16 15:38:25 SharedPortEndpoint: did not successfully find
>>> SharedPortServer address. Will retry in 60s.
>>> 08/09/16 15:38:25 DaemonCore: private command socket at
>>> <192.168.6.135:0?sock=227681_ce89>
>>> 08/09/16 15:38:25 Master restart (GRACEFUL) is watching
>>> /usr/sbin/condor_master (mtime:1452815958)
>>> 08/09/16 15:38:25 Collector port not defined, will use default: 9618
>>> 08/09/16 15:38:25 Started DaemonCore process
>>> "/usr/libexec/condor/condor_shared_port", pid and pgroup = 227708
>>> 08/09/16 15:38:25 Waiting for /var/lock/condor/shared_port_ad to appear.
>>> 08/09/16 15:38:26 Found /var/lock/condor/shared_port_ad.
>>> 08/09/16 15:38:26 Started DaemonCore process "/usr/sbin/condor_schedd",
>>> pid and pgroup = 227709
>>> 08/09/16 15:38:26 Started DaemonCore process "/usr/sbin/condor_startd",
>>> pid and pgroup = 227710
>>> 08/09/16 15:38:26 Started DaemonCore process "/usr/sbin/condor_kbdd",
>>> pid and pgroup = 227711
>>> 08/09/16 15:43:30 condor_write(): Socket closed when trying to write
>>> 1421 bytes to collector boss.hq.ierustech.com, fd is 12
>>> 08/09/16 15:43:30 Buf::write(): condor_write() failed
>>> 
>>> Where should I start looking to fix this. I am by no means a condor pro.
>>> I just enjoy it when it works.--
>>> 
>>> Michael McInerny Murphy
>>> Engineer & Physicist
>>> IERUS Technologies, Inc.
>>> 2904 Westcorp Blvd. Ste 210
>>> Huntsville, AL  35805
>>> (256) 319-2026 ext 107
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> -- 
> Michael McInerny Murphy
> Engineer & Physicist
> IERUS Technologies, Inc.
> 2904 Westcorp Blvd. Ste 210
> Huntsville, AL  35805
> (256) 319-2026 ext 107
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/