[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SharedPointEndpoint fails to accept connection



I mean temporarily in 'permissive mode'. Sorry.


On 08/09/2016 03:50 PM, Brian Lin wrote:
> Does your SharedPortLog have Permission Denied errors when trying to
> write files in /var/lock? If so, try setting SELinux to permissive
> mode to see if that helps.
>
> Cheers,
> Brian
>
> On 08/09/2016 03:50 PM, Michael Murphy wrote:
>> Hello,
>>
>> I am unable to get Condor's shared port to function properly on a
>> Centos7 client machine (MASTER, START, SCHEDD, SHARED_PORT, KBDD daemons
>> are active). My shared port configuration is the following:
>>
>> DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
>> USE_SHARED_PORT = TRUE
>> SHARED_PORT_PORT = 9618
>> COLLECTOR_HOST = $(CONDOR_HOST)
>> UPDATE_COLLECTOR_WITH_TCP = TRUE
>>
>> The SCHEDD daemon logs are full of the following:
>>
>> <Omitted for brevity>
>>
>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept
>> connection on
>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227681_ce89_3
>>
>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept
>> connection on
>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227681_ce89_3
>>
>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept
>> connection on
>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227681_ce89_3
>>
>> 08/09/16 15:38:29 (pid:227709) SharedPortEndpoint: failed to accept
>> connection on
>> 637e42b175e7a0a6281c03f433343d5911f698ca2da42a477b3b6a4e58d2f771/227681_ce89_3
>>
>> 08/09/16 15:38:29 (pid:227709) MaxLog = 10485760 bytes, length =
>> 10485845
>> 08/09/16 15:38:29 (pid:227709) Saving log file to
>> "/var/log/condor/SchedLog.old"
>>
>> The MasterLog doesn't shows the result of the Schedd connectivity issue
>>
>> 08/09/16 15:38:24 ******************************************************
>> 08/09/16 15:38:24 ** condor_master (CONDOR_MASTER) STARTING UP
>> 08/09/16 15:38:24 ** /usr/sbin/condor_master
>> 08/09/16 15:38:24 ** SubsystemInfo: name=MASTER type=MASTER(2)
>> class=DAEMON(1)
>> 08/09/16 15:38:24 ** Configuration: subsystem:MASTER local:<NONE>
>> class:DAEMON
>> 08/09/16 15:38:24 ** $CondorVersion: 8.3.8 Jan 14 2016 BuildID:
>> RH-8.3.8-1.el7 $
>> 08/09/16 15:38:24 ** $CondorPlatform: X86_64-RedHat_7.2 $
>> 08/09/16 15:38:24 ** PID = 227681
>> 08/09/16 15:38:24 ** Log last touched time unavailable (No such file or
>> directory)
>> 08/09/16 15:38:24 ******************************************************
>> 08/09/16 15:38:24 Using config source: /etc/condor/condor_config
>> 08/09/16 15:38:24 Using local config sources:
>> 08/09/16 15:38:24    /etc/condor/config.d/00-IERUS_WorkstationNode.conf
>> 08/09/16 15:38:24    /etc/condor/config.d/41-sharedport.conf
>> 08/09/16 15:38:24 config Macros = 114, Sorted = 114, StringBytes = 4901,
>> TablesBytes = 4160
>> 08/09/16 15:38:24 CLASSAD_CACHING is OFF
>> 08/09/16 15:38:24 Daemon Log is logging: D_ALWAYS D_ERROR
>> 08/09/16 15:38:25 SharedPortEndpoint: waiting for connections to named
>> socket 227681_ce89
>> 08/09/16 15:38:25 SharedPortEndpoint: failed to open
>> /var/lock/condor/shared_port_ad: No such file or directory
>> 08/09/16 15:38:25 SharedPortEndpoint: did not successfully find
>> SharedPortServer address. Will retry in 60s.
>> 08/09/16 15:38:25 DaemonCore: private command socket at
>> <192.168.6.135:0?sock=227681_ce89>
>> 08/09/16 15:38:25 Master restart (GRACEFUL) is watching
>> /usr/sbin/condor_master (mtime:1452815958)
>> 08/09/16 15:38:25 Collector port not defined, will use default: 9618
>> 08/09/16 15:38:25 Started DaemonCore process
>> "/usr/libexec/condor/condor_shared_port", pid and pgroup = 227708
>> 08/09/16 15:38:25 Waiting for /var/lock/condor/shared_port_ad to appear.
>> 08/09/16 15:38:26 Found /var/lock/condor/shared_port_ad.
>> 08/09/16 15:38:26 Started DaemonCore process "/usr/sbin/condor_schedd",
>> pid and pgroup = 227709
>> 08/09/16 15:38:26 Started DaemonCore process "/usr/sbin/condor_startd",
>> pid and pgroup = 227710
>> 08/09/16 15:38:26 Started DaemonCore process "/usr/sbin/condor_kbdd",
>> pid and pgroup = 227711
>> 08/09/16 15:43:30 condor_write(): Socket closed when trying to write
>> 1421 bytes to collector boss.hq.ierustech.com, fd is 12
>> 08/09/16 15:43:30 Buf::write(): condor_write() failed
>>
>> Where should I start looking to fix this. I am by no means a condor pro.
>> I just enjoy it when it works.--
>>
>> Michael McInerny Murphy
>> Engineer & Physicist
>> IERUS Technologies, Inc.
>> 2904 Westcorp Blvd. Ste 210
>> Huntsville, AL  35805
>> (256) 319-2026 ext 107
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Michael McInerny Murphy
Engineer & Physicist
IERUS Technologies, Inc.
2904 Westcorp Blvd. Ste 210
Huntsville, AL  35805
(256) 319-2026 ext 107