[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Failed to locate startd - Can't find address for startd



Hi.

I'm wondering if anyone can help me, please? My condor_status comes up empty. Here is some relevant information:

--
Kind regards,

Justin Fisher

Centos 7 - all machines
SeLinux is disabled on all machines

condor_version
$CondorVersion: 8.8.5 Sep 04 2019 BuildID: 480168 PackageID: 8.8.5-1 $
$CondorPlatform: x86_64_RedHat7 $


Worker machine:

DAEMON_LIST = MASTER, COLLECTOR, STARTD, SCHEDD, SHARED_PORT

sudo systemctl status condor.service
â condor.service - Condor Distributed High-Throughput-Computing
 ÂLoaded: loaded (/usr/lib/systemd/system/condor.service; enabled; vendor preset: disabled)
 ÂActive: active (running) since Mon 2019-09-09 13:23:42 CEST; 1h 19min ago
ÂMain PID: 5976 (condor_master)
 ÂStatus: "All daemons are responding"
 ÂMemory: 26.1M
 ÂCGroup: /system.slice/condor.service
     Âââ5976 /usr/sbin/condor_master -f
     Âââ6206 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 983
     Âââ6207 condor_shared_port -f -p 9618
     Âââ6216 condor_collector -f
     Âââ6217 condor_startd -f
     Âââ6218 condor_schedd -f


ps -ef | grep condor
condor   15912    1 Â0 15:30 ?    Â00:00:00 /usr/sbin/condor_master -f
root    16032  15912 Â0 15:30 ?    Â00:00:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 983
condor   16033  15912 Â0 15:30 ?    Â00:00:00 condor_shared_port -f -p 9618
condor   16034  15912 Â0 15:30 ?    Â00:00:00 condor_collector -f
condor   16035  15912 Â0 15:30 ?    Â00:00:00 condor_startd -f
condor   16036  15912 Â0 15:30 ?    Â00:00:00 condor_schedd -f
jfisher  Â19755  Â8348 Â0 16:34 pts/0  Â00:00:00 grep --color=auto condor


condor_status
Error: communication error
CONDOR_STATUS:1:Unable to resolve COLLECTOR_HOST (:9618).

condor_status -direct 192.168.1.206
Error: Failed to locate startd 192.168.1.206





Master Machine

DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT

condor_status -direct 192.168.1.206
Error: Failed to locate startd 192.168.1.206
Can't find address for startd 192.168.1.206


ps -ef | grep condor
condor   42828    1 Â0 16:19 ?    Â00:00:00 /usr/sbin/condor_master -f
root    42881  42828 Â0 16:19 ?    Â00:00:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 979
condor   42882  42828 Â0 16:19 ?    Â00:00:00 condor_shared_port -f -p 9618
condor   42885  42828 Â0 16:19 ?    Â00:00:00 condor_negotiator -f
condor   42886  42828 Â0 16:19 ?    Â00:00:00 condor_schedd -f
condor   43387  42828 Â0 16:27 ?    Â00:00:00 condor_collector -f
jfisher  Â43637  43334 Â0 16:30 pts/0  Â00:00:00 grep --color=auto condor

CollectorLog

09/09/19 16:44:11 Now in new log file /var/log/condor/CollectorLog
09/09/19 16:44:11 Enabling CCB Server.
09/09/19 16:44:11 m_reconnect_fname = /var/lib/condor/spool/192.168.1.206-9618.ccb_reconnect
09/09/19 16:44:11 Configuration: SAMPLING_INTERVAL=60, MAX_STORAGE=10000000, MaxFileSize=333333, POOL_HISTORY_DIR=/var/ViewHist
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist0.0.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist0.0.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist0.1.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist0.1.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist0.2.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist0.2.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist1.0.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist1.0.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist1.1.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist1.1.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist1.2.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist1.2.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist2.0.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist2.0.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist2.1.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist2.1.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist2.2.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist2.2.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist3.0.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist3.0.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist3.1.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist3.1.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist3.2.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist3.2.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist4.0.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist4.0.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist4.1.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist4.1.new , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist4.2.old , StartTime=-1
09/09/19 16:44:11 FileName=/var/ViewHist/viewhist4.2.new , StartTime=-1
09/09/19 16:44:11 DC_AUTHENTICATE: attempt to open invalid session jfisher:43847:1568040061:3, failing; this session was requested by <192.168.1.206:11115> with return address <192.168.1.206:9618?addrs=192.168.1.206-9618&noUDP&sock=42828_0631>
09/09/19 16:44:12 CollectorAd Â: Inserting ** "< My Pool - jfisher.ingenazure.com@xxxxxxxxxxxxxxxxxxxxxx >"
09/09/19 16:44:42 DC_AUTHENTICATE: attempt to open invalid session jfisher:43847:1568039862:1, failing; this session was requested by <192.168.1.206:7312> with return address <192.168.1.206:9618?addrs=192.168.1.206-9618&noUDP&sock=42828_0631_3>
09/09/19 16:44:43 Got QUERY_STARTD_PVT_ADS
09/09/19 16:44:43 QueryWorker: forked new high priority worker with id 44616 ( max 4 active 1 pending 0 )
09/09/19 16:44:43 (Sending 0 ads in response to query)
09/09/19 16:44:43 Query info: matched=0; skipped=0; query_time=0.000164; send_time=0.000062; type=MachinePrivate; requirements={true}; locate=0; limit=0; from=NEGOTIATOR; peer=<192.168.1.206:27041>; projection={}
09/09/19 16:44:43 QueryWorker: forked new high priority worker with id 44617 ( max 4 active 1 pending 0 )
09/09/19 16:44:43 (Sending 0 ads in response to query)
09/09/19 16:44:43 Query info: matched=0; skipped=1; query_time=0.000231; send_time=0.000055; type=Any; requirements={(((MyType == "Scheduler") || (MyType == "Submitter")) || ((MyType == "Machine")))}; locate=0; limit=0; from=NEGOTIATOR; peer=<192.168.1.206:12622>; projection={}
09/09/19 16:44:43 AccountingAd Â: Inserting ** "< <none>jfisher.ingenazure.com >"
09/09/19 16:44:51 ScheddAd   : Inserting ** "< jfisher.ingenazure.com , 192.168.1.206 >"
09/09/19 16:45:43 Got QUERY_STARTD_PVT_ADS
09/09/19 16:45:43 QueryWorker: forked new high priority worker with id 44656 ( max 4 active 1 pending 0 )
09/09/19 16:45:43 (Sending 0 ads in response to query)
09/09/19 16:45:43 Query info: matched=0; skipped=0; query_time=0.000159; send_time=0.000055; type=MachinePrivate; requirements={true}; locate=0; limit=0; from=NEGOTIATOR; peer=<192.168.1.206:17677>; projection={}
09/09/19 16:45:43 QueryWorker: forked new high priority worker with id 44657 ( max 4 active 1 pending 0 )
09/09/19 16:45:43 (Sending 1 ads in response to query)
09/09/19 16:45:43 Query info: matched=1; skipped=2; query_time=0.000274; send_time=0.000407; type=Any; requirements={(((MyType == "Scheduler") || (MyType == "Submitter")) || ((MyType == "Machine")))}; locate=0; limit=0; from=NEGOTIATOR; peer=<192.168.1.206:31536>; projection={}
09/09/19 16:46:43 Got QUERY_STARTD_PVT_ADS
09/09/19 16:46:43 QueryWorker: forked new high priority worker with id 44675 ( max 4 active 1 pending 0 )
09/09/19 16:46:43 (Sending 0 ads in response to query)
09/09/19 16:46:43 Query info: matched=0; skipped=0; query_time=0.000190; send_time=0.000059; type=MachinePrivate; requirements={true}; locate=0; limit=0; from=NEGOTIATOR; peer=<192.168.1.206:1767>; projection={}
09/09/19 16:46:43 QueryWorker: forked new high priority worker with id 44676 ( max 4 active 1 pending 0 )
09/09/19 16:46:43 (Sending 1 ads in response to query)
09/09/19 16:46:43 Query info: matched=1; skipped=2; query_time=0.000244; send_time=0.000417; type=Any; requirements={(((MyType == "Scheduler") || (MyType == "Submitter")) || ((MyType == "Machine")))}; locate=0; limit=0; from=NEGOTIATOR; peer=<192.168.1.206:31281>; projection={}
09/09/19 16:47:43 Got QUERY_STARTD_PVT_ADS
09/09/19 16:47:43 QueryWorker: forked new high priority worker with id 44696 ( max 4 active 1 pending 0 )
09/09/19 16:47:43 (Sending 0 ads in response to query)
09/09/19 16:47:43 Query info: matched=0; skipped=0; query_time=0.000212; send_time=0.000055; type=MachinePrivate; requirements={true}; locate=0; limit=0; from=NEGOTIATOR; peer=<192.168.1.206:27982>; projection={}
09/09/19 16:47:43 QueryWorker: forked new high priority worker with id 44697 ( max 4 active 1 pending 0 )
09/09/19 16:47:43 (Sending 1 ads in response to query)
09/09/19 16:47:43 Query info: matched=1; skipped=2; query_time=0.000464; send_time=0.000845; type=Any; requirements={(((MyType == "Scheduler") || (MyType == "Submitter")) || ((MyType == "Machine")))}; locate=0; limit=0; from=NEGOTIATOR; peer=<192.168.1.206:9918>; projection={}
09/09/19 16:48:43 Got QUERY_STARTD_PVT_ADS
09/09/19 16:48:43 QueryWorker: forked new high priority worker with id 44724 ( max 4 active 1 pending 0 )
09/09/19 16:48:43 (Sending 0 ads in response to query)
09/09/19 16:48:43 Query info: matched=0; skipped=0; query_time=0.000227; send_time=0.000070; type=MachinePrivate; requirements={true}; locate=0; limit=0; from=NEGOTIATOR; peer=<192.168.1.206:24285>; projection={}
09/09/19 16:48:43 QueryWorker: forked new high priority worker with id 44725 ( max 4 active 1 pending 0 )
09/09/19 16:48:43 (Sending 1 ads in response to query)
09/09/19 16:48:43 Query info: matched=1; skipped=2; query_time=0.000307; send_time=0.000718; type=Any; requirements={(((MyType == "Scheduler") || (MyType == "Submitter")) || ((MyType == "Machine")))}; locate=0; limit=0; from=NEGOTIATOR; peer=<192.168.1.206:22064>; projection={}


MasterLog
09/09/19 16:17:55 ******************************************************
09/09/19 16:17:55 ** condor_master (CONDOR_MASTER) STARTING UP
09/09/19 16:17:55 ** /usr/sbin/condor_master
09/09/19 16:17:55 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
09/09/19 16:17:55 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
09/09/19 16:17:55 ** $CondorVersion: 8.8.5 Sep 04 2019 BuildID: 480168 PackageID: 8.8.5-1 $
09/09/19 16:17:55 ** $CondorPlatform: x86_64_RedHat7 $
09/09/19 16:17:55 ** PID = 42671
09/09/19 16:17:55 ** Log last touched 9/9 16:17:50
09/09/19 16:17:55 ******************************************************
09/09/19 16:17:55 Using config source: /etc/condor/condor_config
09/09/19 16:17:55 Using local config sources:
09/09/19 16:17:55 Â Â/etc/condor/config.d/00master.config
09/09/19 16:17:55 Â Â/etc/condor/condor_config.local
09/09/19 16:17:55 config Macros = 77, Sorted = 77, StringBytes = 1998, TablesBytes = 2820
09/09/19 16:17:55 CLASSAD_CACHING is OFF
09/09/19 16:17:55 Daemon Log is logging: D_ALWAYS D_ERROR
09/09/19 16:17:56 Removed /var/lock/condor/shared_port_ad (assuming it is left over from previous run)
09/09/19 16:17:56 SharedPortEndpoint: waiting for connections to named socket 42671_55d3
09/09/19 16:17:56 SharedPortEndpoint: failed to open /var/lock/condor/shared_port_ad: No such file or directory
09/09/19 16:17:56 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
09/09/19 16:17:56 DaemonCore: private command socket at <192.168.1.206:0?sock=42671_55d3>
09/09/19 16:17:56 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1567657410)
09/09/19 16:17:56 Started DaemonCore process "/usr/libexec/condor/condor_shared_port", pid and pgroup = 42721
09/09/19 16:17:56 Waiting for /var/lock/condor/shared_port_ad to appear.
09/09/19 16:17:57 Found /var/lock/condor/shared_port_ad.
09/09/19 16:17:57 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 42722
09/09/19 16:17:57 Waiting for /var/log/condor/.collector_address to appear.
09/09/19 16:17:58 Found /var/log/condor/.collector_address.
09/09/19 16:17:58 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 42723
09/09/19 16:17:58 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 42724
09/09/19 16:19:39 Got SIGQUIT. Performing fast shutdown.
09/09/19 16:19:39 Sent SIGQUIT to COLLECTOR (pid 42722)
09/09/19 16:19:39 Sent SIGQUIT to NEGOTIATOR (pid 42723)
09/09/19 16:19:39 Sent SIGQUIT to SCHEDD (pid 42724)
09/09/19 16:19:39 AllReaper unexpectedly called on pid 42723, status 0.
09/09/19 16:19:39 The NEGOTIATOR (pid 42723) exited with status 0
09/09/19 16:19:39 AllReaper unexpectedly called on pid 42722, status 0.
09/09/19 16:19:39 The COLLECTOR (pid 42722) exited with status 0
09/09/19 16:19:39 AllReaper unexpectedly called on pid 42724, status 0.
09/09/19 16:19:39 The SCHEDD (pid 42724) exited with status 0
09/09/19 16:19:39 Sent SIGTERM to SHARED_PORT (pid 42721)
09/09/19 16:19:39 AllReaper unexpectedly called on pid 42721, status 0.
09/09/19 16:19:39 The SHARED_PORT (pid 42721) exited with status 0
09/09/19 16:19:39 About to tell the ProcD to exit
09/09/19 16:19:39 procd (pid = 42720) exited with status 0
09/09/19 16:19:39 All daemons are gone. Exiting.
09/09/19 16:19:39 **** condor_master (condor_MASTER) pid 42671 EXITING WITH STATUS 0
09/09/19 16:19:39 ******************************************************
09/09/19 16:19:39 ** condor_master (CONDOR_MASTER) STARTING UP
09/09/19 16:19:39 ** /usr/sbin/condor_master
09/09/19 16:19:39 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
09/09/19 16:19:39 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
09/09/19 16:19:39 ** $CondorVersion: 8.8.5 Sep 04 2019 BuildID: 480168 PackageID: 8.8.5-1 $
09/09/19 16:19:39 ** $CondorPlatform: x86_64_RedHat7 $
09/09/19 16:19:39 ** PID = 42828
09/09/19 16:19:39 ** Log last touched 9/9 16:19:39
09/09/19 16:19:39 ******************************************************
09/09/19 16:19:39 Using config source: /etc/condor/condor_config
09/09/19 16:19:39 Using local config sources:
09/09/19 16:19:39 Â Â/etc/condor/config.d/00master.config
09/09/19 16:19:39 Â Â/etc/condor/condor_config.local
09/09/19 16:19:39 config Macros = 77, Sorted = 77, StringBytes = 2000, TablesBytes = 2820
09/09/19 16:19:39 CLASSAD_CACHING is OFF
09/09/19 16:19:39 Daemon Log is logging: D_ALWAYS D_ERROR
09/09/19 16:19:40 SharedPortEndpoint: waiting for connections to named socket 42828_0631
09/09/19 16:19:40 SharedPortEndpoint: failed to open /var/lock/condor/shared_port_ad: No such file or directory
09/09/19 16:19:40 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
09/09/19 16:19:40 DaemonCore: private command socket at <192.168.1.206:0?sock=42828_0631>
09/09/19 16:19:40 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1567657410)
09/09/19 16:19:40 Started DaemonCore process "/usr/libexec/condor/condor_shared_port", pid and pgroup = 42882
09/09/19 16:19:40 Waiting for /var/lock/condor/shared_port_ad to appear.
09/09/19 16:19:41 Found /var/lock/condor/shared_port_ad.
09/09/19 16:19:41 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 42883
09/09/19 16:19:41 Waiting for /var/log/condor/.collector_address to appear.
09/09/19 16:19:42 Found /var/log/condor/.collector_address.
09/09/19 16:19:42 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 42885
09/09/19 16:19:42 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 42886
09/09/19 16:27:41 DefaultReaper unexpectedly called on pid 42883, status 1024.
09/09/19 16:27:41 The COLLECTOR (pid 42883) exited with status 4
09/09/19 16:27:41 Sending obituary for "/usr/sbin/condor_collector"
09/09/19 16:27:41 restarting /usr/sbin/condor_collector in 10 seconds
09/09/19 16:27:41 condor_write(): Socket closed when trying to write 1439 bytes to collector jfisher.ingenazure.com:9618, fd is 10
09/09/19 16:27:41 Buf::write(): condor_write() failed
09/09/19 16:27:51 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 43387
09/09/19 16:27:51 condor_write(): Socket closed when trying to write 1448 bytes to collector jfisher.ingenazure.com:9618, fd is 10, errno=104 Connection reset by peer
09/09/19 16:27:51 Buf::write(): condor_write() failed
09/09/19 16:32:51 condor_write(): Socket closed when trying to write 1465 bytes to collector jfisher.ingenazure.com:9618, fd is 10, errno=104 Connection reset by peer
09/09/19 16:32:51 Buf::write(): condor_write() failed
09/09/19 16:35:51 DefaultReaper unexpectedly called on pid 43387, status 1024.
09/09/19 16:35:51 The COLLECTOR (pid 43387) exited with status 4
09/09/19 16:35:51 Sending obituary for "/usr/sbin/condor_collector"
09/09/19 16:35:51 restarting /usr/sbin/condor_collector in 10 seconds
09/09/19 16:35:51 condor_write(): Socket closed when trying to write 1439 bytes to collector jfisher.ingenazure.com:9618, fd is 10
09/09/19 16:35:51 Buf::write(): condor_write() failed
09/09/19 16:36:01 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 43847
09/09/19 16:36:01 condor_write(): Socket closed when trying to write 1448 bytes to collector jfisher.ingenazure.com:9618, fd is 10, errno=104 Connection reset by peer
09/09/19 16:36:01 Buf::write(): condor_write() failed
09/09/19 16:41:01 condor_write(): Socket closed when trying to write 1465 bytes to collector jfisher.ingenazure.com:9618, fd is 10, errno=104 Connection reset by peer
09/09/19 16:41:01 Buf::write(): condor_write() failed
09/09/19 16:44:01 DefaultReaper unexpectedly called on pid 43847, status 1024.
09/09/19 16:44:01 The COLLECTOR (pid 43847) exited with status 4
09/09/19 16:44:01 Sending obituary for "/usr/sbin/condor_collector"
09/09/19 16:44:01 restarting /usr/sbin/condor_collector in 10 seconds
09/09/19 16:44:01 condor_write(): Socket closed when trying to write 1456 bytes to collector jfisher.ingenazure.com:9618, fd is 10
09/09/19 16:44:01 Buf::write(): condor_write() failed
09/09/19 16:44:11 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 44606
09/09/19 16:44:11 condor_write(): Socket closed when trying to write 1466 bytes to collector jfisher.ingenazure.com:9618, fd is 10, errno=104 Connection reset by peer
09/09/19 16:44:11 Buf::write(): condor_write() failed
09/09/19 16:49:11 condor_write(): Socket closed when trying to write 1467 bytes to collector jfisher.ingenazure.com:9618, fd is 10, errno=104 Connection reset by peer
09/09/19 16:49:11 Buf::write(): condor_write() failed
09/09/19 16:52:11 DefaultReaper unexpectedly called on pid 44606, status 1024.
09/09/19 16:52:11 The COLLECTOR (pid 44606) exited with status 4
09/09/19 16:52:11 Sending obituary for "/usr/sbin/condor_collector"
09/09/19 16:52:11 restarting /usr/sbin/condor_collector in 10 seconds
09/09/19 16:52:11 condor_write(): Socket closed when trying to write 1458 bytes to collector jfisher.ingenazure.com:9618, fd is 10
09/09/19 16:52:11 Buf::write(): condor_write() failed

NegotiatorLog
09/09/19 16:23:42 ---------- Started Negotiation Cycle ----------
09/09/19 16:23:42 Phase 1: ÂObtaining ads from collector ...
09/09/19 16:23:42 Â Getting startd private ads ...
09/09/19 16:23:42 Â Getting Scheduler, Submitter and Machine ads ...
09/09/19 16:23:42 Â Sorting 1 ads ...
09/09/19 16:23:42 Got ads: 1 public and 0 private
09/09/19 16:23:42 Public ads include 0 submitter, 0 startd
09/09/19 16:23:42 Phase 2: ÂPerforming accounting ...
09/09/19 16:23:42 Phase 3: ÂSorting submitter ads by priority ...
09/09/19 16:23:42 Starting prefetch round; 0 potential prefetches to do.
09/09/19 16:23:42 Prefetch summary: 0 attempted, 0 successful.
09/09/19 16:23:42 Phase 4.1: ÂNegotiating with schedds ...
09/09/19 16:23:42 ÂnegotiateWithGroup resources used submitterAds length 0
09/09/19 16:23:42 ---------- Finished Negotiation Cycle ----------

ProcLog
09/09/19 16:22:40 : taking a snapshot...
09/09/19 16:22:40 : ProcAPI: new boottime = 1568017676; old_boottime = 1568017676; /proc/stat boottime = 1568017676; /proc/uptime boottime = 1568017676
09/09/19 16:22:40 : process 42958 (not in monitored family) has exited
09/09/19 16:22:40 : process 42809 (not in monitored family) has exited
09/09/19 16:22:40 : process 42808 (not in monitored family) has exited
09/09/19 16:22:40 : process 40467 (not in monitored family) has exited
09/09/19 16:22:40 : no methods have determined process 42997 to be in a monitored family
09/09/19 16:22:40 : no methods have determined process 43007 to be in a monitored family
09/09/19 16:22:40 : ...snapshot complete
09/09/19 16:23:40 : taking a snapshot...
09/09/19 16:23:40 : ProcAPI: new boottime = 1568017676; old_boottime = 1568017676; /proc/stat boottime = 1568017676; /proc/uptime boottime = 1568017676
09/09/19 16:23:40 : process 43007 (not in monitored family) has exited
09/09/19 16:23:40 : no methods have determined process 43055 to be in a monitored family
09/09/19 16:23:40 : ...snapshot complete

SchedLog
09/09/19 16:19:42 (pid:42886) ******************************************************
09/09/19 16:19:42 (pid:42886) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
09/09/19 16:19:42 (pid:42886) ** /usr/sbin/condor_schedd
09/09/19 16:19:42 (pid:42886) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
09/09/19 16:19:42 (pid:42886) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
09/09/19 16:19:42 (pid:42886) ** $CondorVersion: 8.8.5 Sep 04 2019 BuildID: 480168 PackageID: 8.8.5-1 $
09/09/19 16:19:42 (pid:42886) ** $CondorPlatform: x86_64_RedHat7 $
09/09/19 16:19:42 (pid:42886) ** PID = 42886
09/09/19 16:19:42 (pid:42886) ** Log last touched 9/9 16:19:39
09/09/19 16:19:42 (pid:42886) ******************************************************
09/09/19 16:19:42 (pid:42886) Using config source: /etc/condor/condor_config
09/09/19 16:19:42 (pid:42886) Using local config sources:
09/09/19 16:19:42 (pid:42886) Â Â/etc/condor/config.d/00master.config
09/09/19 16:19:42 (pid:42886) Â Â/etc/condor/condor_config.local
09/09/19 16:19:42 (pid:42886) config Macros = 78, Sorted = 78, StringBytes = 2046, TablesBytes = 2856
09/09/19 16:19:42 (pid:42886) CLASSAD_CACHING is ENABLED
09/09/19 16:19:42 (pid:42886) Daemon Log is logging: D_ALWAYS D_ERROR
09/09/19 16:19:42 (pid:42886) SharedPortEndpoint: waiting for connections to named socket 42828_0631_4
09/09/19 16:19:42 (pid:42886) DaemonCore: command socket at <192.168.1.206:9618?addrs=192.168.1.206-9618&noUDP&sock=42828_0631_4>
09/09/19 16:19:42 (pid:42886) DaemonCore: private command socket at <192.168.1.206:9618?addrs=192.168.1.206-9618&noUDP&sock=42828_0631_4>
09/09/19 16:19:42 (pid:42886) History file rotation is enabled.
09/09/19 16:19:42 (pid:42886) Â Maximum history file size is: 20971520 bytes
09/09/19 16:19:42 (pid:42886) Â Number of rotated history files is: 2
09/09/19 16:19:42 (pid:42886) Reloading job factories
09/09/19 16:19:42 (pid:42886) Loaded 0 job factories, 0 were paused, 0 failed to load
09/09/19 16:19:48 (pid:42886) TransferQueueManager stats: active up=0/100 down=0/100; waiting up=0 down=0; wait time up=0s down=0s
09/09/19 16:19:48 (pid:42886) TransferQueueManager upload 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
09/09/19 16:19:48 (pid:42886) TransferQueueManager download 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
09/09/19 16:29:50 (pid:42886) condor_write(): Socket closed when trying to write 4096 bytes to collector jfisher.ingenazure.com:9618, fd is 14
09/09/19 16:29:50 (pid:42886) Buf::write(): condor_write() failed
09/09/19 16:34:50 (pid:42886) condor_write(): Socket closed when trying to write 4096 bytes to collector jfisher.ingenazure.com:9618, fd is 14, errno=104 Connection reset by peer
09/09/19 16:34:50 (pid:42886) Buf::write(): condor_write() failed
09/09/19 16:39:50 (pid:42886) condor_write(): Socket closed when trying to write 4096 bytes to collector jfisher.ingenazure.com:9618, fd is 14
09/09/19 16:39:50 (pid:42886) Buf::write(): condor_write() failed
09/09/19 16:44:51 (pid:42886) condor_write(): Socket closed when trying to write 4096 bytes to collector jfisher.ingenazure.com:9618, fd is 14, errno=104 Connection reset by peer
09/09/19 16:44:51 (pid:42886) Buf::write(): condor_write() failed
09/09/19 16:54:52 (pid:42886) condor_write(): Socket closed when trying to write 4096 bytes to collector jfisher.ingenazure.com:9618, fd is 14
09/09/19 16:54:52 (pid:42886) Buf::write(): condor_write() failed

SharedPortLog
09/09/19 16:44:01 SharedPortServer: server was busy, failed to connect collector as requested by <192.168.1.206:10396>: primary (f18b97e1577f56b8a325498acd2dde3025439e453b9c2b7f81246ac7f1a621bf/collector): Connection refused (111); alt (/var/lock/condor/daemon_sock/collector): Connection refused (111)
09/09/19 16:44:40 About to update statistics in shared_port daemon ad file at /var/lock/condor/shared_port_ad :
ForkedChildrenPeak = 0
RequestsBlocked = 4
ForkedChildrenCurrent = 0
RequestsSucceeded = 80
RequestsPendingPeak = 2
RequestsPendingCurrent = 0
RequestsFailed = 4
SharedPortCommandSinfuls = "<192.168.1.206:9618>"
MyAddress = "<192.168.1.206:9618?addrs=192.168.1.206-9618&noUDP>"