[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] CCB Server - Client Communication (Condor 7.3.1)



Hello All,

Dan: I have looked in the StartLog on the workers. There is no evidence that it is either failing to connect or losing connection with the schedd. In fact, the workers are sending UPD updates to the collector at regular intervals without a problem. By the way, thanks for all of your help thus far.

Below is a much longer excerpt from the schedd log in case it helps:

Enact = "NO"
Subsystem = "SCHEDD"
ParentUniqueID = "sunpassage:29474:1247691823"
ServerPid = 29477
SessionDuration = "86400"
NewSession = "YES"
07/20 11:36:55 SECMAN: not negotiating, just sending command (68)
07/20 11:36:55 Authorizing server '*/134.48.90.158'.
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback CCB_REQUEST> (24)
07/20 11:36:55 SECMAN: resuming command 68 CCB_REQUEST to collector 134.48.90.158:9618 from TCP port 54968 (non-blocking).
07/20 11:36:55 SECMAN: no cached key for {<134.48.90.158:9618>,<68>}.
07/20 11:36:55 SECMAN: Security Policy:
MyType = ""
TargetType = ""
AuthMethods = "CLAIMTOBE, FS, SSL, PASSWORD"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "OPTIONAL"
Authentication = "OPTIONAL"
Encryption = "OPTIONAL"
Integrity = "OPTIONAL"
Enact = "NO"
Subsystem = "SCHEDD"
ParentUniqueID = "sunpassage:29474:1247691823"
ServerPid = 29477
SessionDuration = "86400"
NewSession = "YES"
07/20 11:36:55 SECMAN: not negotiating, just sending command (68)
07/20 11:36:55 Authorizing server '*/134.48.90.158'.
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback CCB_REQUEST> (26)
07/20 11:36:55 SECMAN: resuming command 68 CCB_REQUEST to collector 134.48.90.158:9618 from TCP port 51578 (non-blocking).
07/20 11:36:55 SECMAN: no cached key for {<134.48.90.158:9618>,<68>}.
07/20 11:36:55 SECMAN: Security Policy:
MyType = ""
TargetType = ""
AuthMethods = "CLAIMTOBE, FS, SSL, PASSWORD"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "OPTIONAL"
Authentication = "OPTIONAL"
Encryption = "OPTIONAL"
Integrity = "OPTIONAL"
Enact = "NO"
Subsystem = "SCHEDD"
ParentUniqueID = "sunpassage:29474:1247691823"
ServerPid = 29477
SessionDuration = "86400"
NewSession = "YES"
07/20 11:36:55 SECMAN: not negotiating, just sending command (68)
07/20 11:36:55 Authorizing server '*/134.48.90.158'.
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (5)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:53981>#1248077180#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1197 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:53981>#1248077180#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (6)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:53981>#1248077180#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:53981>#1248077180#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:53981>#1248077180#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:53981>#1248077180#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:53981>#1248077180#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 850.3) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (7)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1202 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (8)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 850.4) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (9)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1202 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (10)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52535>#1248077384#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 851.0) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (11)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1203 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (12)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 851.1) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (13)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1198 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (14)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 851.2) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (15)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1203 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (16)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:52869>#1248077397#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 851.3) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (17)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1198 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (18)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:54713>#1248077200#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 851.4) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback REQUEST_CLAIM> (19)
07/20 11:36:55 Completed REQUEST_CLAIM to startd worker-134-48-158-179.marqnet.mu.edu <134.48.158.179:52822>#1248096634#2#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx
07/20 11:36:55 IpVerify::PunchHole: opened DAEMON level to 134.48.158.179
07/20 11:36:55 IpVerify::PunchHole: opened WRITE level to 134.48.158.179
07/20 11:36:55 IpVerify::PunchHole: opened READ level to 134.48.158.179
07/20 11:36:55 IpVerify::PunchHole: open count at level READ for 134.48.158.179 now 2
07/20 11:36:55 Job 852.0: is runnable
07/20 11:36:55 Scheduler::start_std - job=852.0 on <134.48.158.179:52822>
07/20 11:36:55 Queueing job 852.0 in runnable job queue
07/20 11:36:55 start next job after 0 sec, JobsThisBurst 0
07/20 11:36:55 Match (worker-134-48-158-179.marqnet.mu.edu <134.48.158.179:52822>#1248096634#2#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx) - running 852.0
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback REQUEST_CLAIM>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (20)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1199 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (21)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#5#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 852.1) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (22)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1199 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (23)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51373>#1248077257#6#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 852.2) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (24)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1200 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (25)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#3#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 852.3) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 Calling Timer handler 8439 (StartJobHandler)
07/20 11:36:55 Job prep for 852.0 will not block, calling aboutToSpawnJobHandler() directly
07/20 11:36:55 aboutToSpawnJobHandler() completed for job 852.0, attempting to spawn job handler
07/20 11:36:55 Starting add_shadow_birthdate(852.0)
07/20 11:36:55 Create_Process: using fast clone() to create child process.
07/20 11:36:55 Added shadow record for PID 14875, job (852.0)
07/20 11:36:55
07/20 11:36:55 ..................
07/20 11:36:55 .. Shadow Recs (1/2)
07/20 11:36:55 .. 14875, 852.0, F, <134.48.158.179:52822>, cur_hosts=1, status=2
07/20 11:36:55 ..................

07/20 11:36:55 Started shadow for job 852.0 on worker-134-48-158-179.marqnet.mu.edu <134.48.158.179:52822>#1248096634#2#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, (shadow pid = 14875)
07/20 11:36:55 -------- Begin starting jobs --------
07/20 11:36:55 match (worker-134-48-158-179.marqnet.mu.edu <134.48.158.179:52822>#1248096634#2#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx) already running a job
07/20 11:36:55 match (slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx) waiting for startd contact
07/20 11:36:55 -------- Done starting jobs --------
07/20 11:36:55 Return from Timer handler 8439 (StartJobHandler)
07/20 11:36:55 Calling Handler <DCMessenger::receiveMsgCallback CCB_REQUEST> (26)
07/20 11:36:55 Completed CCB_REQUEST to collector 134.48.90.158:9618
07/20 11:36:55 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: CCB server rejecting request for ccbid 1200 because no daemon is currently registered with that id (perhaps it recently disconnected).
07/20 11:36:55 CCBClient: no more CCB servers to try for requesting reversed connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx; giving up.
07/20 11:36:55 Calling Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM> (27)
07/20 11:36:55 SECMAN: resuming command 442 REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx from TCP port -1 (non-blocking).
07/20 11:36:55 SECMAN: TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Failed to send REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to startd slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx failed.
07/20 11:36:55 Match record (slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <10.0.2.15:51379>#1248077274#4#... for herzfeldd@xxxxxxxxxxxxxxxxxxxxx, 852.4) deleted
07/20 11:36:55 Return from Handler <SecManStartCommand::WaitForSocketCallback REQUEST_CLAIM>
07/20 11:36:55 Return from Handler <DCMessenger::receiveMsgCallback CCB_REQUEST>
07/20 11:36:55 DC_AUTHENTICATE: received UDP packet from <134.48.90.158:53408>.
07/20 11:36:55 DaemonCore received UNAUTHENTICATED command 60008 DC_CHILDALIVE.
07/20 11:36:55 IPVERIFY: matched user * from * to allow list
07/20 11:36:55 Adding to resolved authorization table: */134.48.90.158: DAEMON
07/20 11:36:55 PERMISSION GRANTED to unauthenticated user from host 134.48.90.158 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: DAEMON authorization policy allows IP address 134.48.90.158; identifiers used for this remote host: 134.48.90.158,sunpassage.eng.mu.edu,sunpassage
07/20 11:36:55 Received UDP command 60008 (DC_CHILDALIVE) from  <134.48.90.158:51234>, access level DAEMON
07/20 11:36:55 Calling HandleReq <HandleChildAliveCommand> (0)
07/20 11:36:55 Return from HandleReq <HandleChildAliveCommand> (handler: 0.000s, sec: 0.001s)
07/20 11:36:56 DaemonCore received UNAUTHENTICATED command 507 TRANSFER_QUEUE_REQUEST.
07/20 11:36:56 IPVERIFY: matched user * from * to allow list
07/20 11:36:56 Adding to resolved authorization table: */134.48.90.158: WRITE
07/20 11:36:56 PERMISSION GRANTED to unauthenticated user from host 134.48.90.158 for command 507 (TRANSFER_QUEUE_REQUEST), access level WRITE: reason: WRITE authorization policy allows IP address 134.48.90.158; identifiers used for this remote host: 134.48.90.158,sunpassage.eng.mu.edu,sunpassage
07/20 11:36:56 Received TCP command 507 (TRANSFER_QUEUE_REQUEST) from  <134.48.90.158:54605>, access level WRITE
07/20 11:36:56 Calling HandleReq <TransferQueueManager::HandleRequest> (0)
07/20 11:36:56 TransferQueueManager: enqueueing <134.48.90.158:54605> uploading job 852.0 (initial file /home/herzfeldd/calculate_pi.sh).
07/20 11:36:56 Return from HandleReq <TransferQueueManager::HandleRequest> (handler: 0.000s, sec: 0.001s)
07/20 11:36:56 Calling Timer handler 8441 (CheckTransferQueue)
07/20 11:36:56 TransferQueueManager: sending GoAhead to <134.48.90.158:54605> uploading job 852.0 (initial file /home/herzfeldd/calculate_pi.sh).
07/20 11:36:56 Return from Timer handler 8441 (CheckTransferQueue)
07/20 11:36:56 Calling Handler <HandleDisconnect()> (5)
07/20 11:36:56 TransferQueueManager: dequeueing <134.48.90.158:54605> uploading job 852.0 (initial file /home/herzfeldd/calculate_pi.sh).
07/20 11:36:56 Return from Handler <HandleDisconnect()>
07/20 11:36:56 Calling Timer handler 8442 (CheckTransferQueue)
07/20 11:36:56 Return from Timer handler 8442 (CheckTransferQueue)
________________________________________
From: condor-users-bounces@xxxxxxxxxxx [condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley [dan@xxxxxxxxxxxx]
Sent: Monday, July 20, 2009 10:12 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] CCB Server - Client Communication (Condor 7.3.1)

Herzfeld, David wrote:
> During this time, the Collector shows no errors - all authorizations are granted. In the SchedLogs:
> 07/18 01:48:45 CCBClient:received failure message from CCB server 134.48.90.158:9618 in response to (non-blocking) request for reversed connection to startd slot1@xxxxxxxxxxxxxxxxxxxxx
>
The above log message doesn't look complete.  After "connection to
startd blah" there should be a ':' followed by a description of the error.

Anyway, I recommend looking in the StartLog on the execute node
mentioned in the above log message.  Perhaps it is failing to connect to
the schedd for some reason.

--Dan

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/