[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor_write() problems



Iâm having issues where some machines are not taking jobs. On those machine, the StartLog shows many failures of condor_write(). However, these machines are configured the same (I thought!) as others in the pool that are happily accepting jobs and running fine.

Do these errors point toward something I should investigate?

Many Thanks!
Mike Fienen
US Geological Survey

01/09/18 17:26:59 slot1_3: New machine resource of type -1 allocated
01/09/18 17:26:59 Setting up slot pairings
01/09/18 17:26:59 slot1_3: Request accepted.
01/09/18 17:27:19 slot1_3: Remote owner is user@xxxxxxxxxxx
01/09/18 17:27:19 slot1_3: State change: claiming protocol successful
01/09/18 17:27:19 slot1_3: Changing state: Owner -> Claimed
01/09/18 17:27:19 slot1_5: State change: received RELEASE_CLAIM command
01/09/18 17:27:19 slot1_5: Changing state and activity: Claimed/Idle -> Preempting/Vacating
01/09/18 17:27:19 slot1_5: State change: No preempting claim, returning to owner
01/09/18 17:27:19 slot1_5: Changing state and activity: Preempting/Vacating -> Owner/Idle
01/09/18 17:27:19 slot1_5: State change: IS_OWNER is false
01/09/18 17:27:19 slot1_5: Changing state: Owner -> Unclaimed
01/09/18 17:27:19 slot1_5: Changing state: Unclaimed -> Delete
01/09/18 17:27:19 slot1_5: Resource no longer needed, deleting
01/09/18 17:27:19 condor_write(): Socket closed when trying to write 13 bytes to , fd is 8
01/09/18 17:27:19 Buf::write(): condor_write() failed
01/09/18 17:27:19 SharedPortEndpoint: failed to send final status (success) for SHARED_PORT_PASS_SOCK
01/09/18 17:27:19 slot1_1: Got activate_claim request from shadow (xxx.xx.xx.xx)
01/09/18 17:27:19 condor_write(): Socket closed when trying to write 13 bytes to <xxx.xx.xx.xx:25208>, fd is 9
01/09/18 17:27:19 Buf::write(): condor_write() failed
01/09/18 17:27:19 slot1_1: Can't send eom to shadow.
01/09/18 17:27:19 condor_write(): Socket closed when trying to write 13 bytes to , fd is 8
01/09/18 17:27:19 Buf::write(): condor_write() failed
01/09/18 17:27:19 SharedPortEndpoint: failed to send final status (success) for SHARED_PORT_PASS_SOCK
01/09/18 17:27:19 condor_write(): Socket closed when trying to write 28 bytes to <xxx.xx.xx.xx:15197>, fd is 9
01/09/18 17:27:19 Buf::write(): condor_write() failed
01/09/18 17:27:19 slot1_4: Called deactivate_claim()
01/09/18 17:27:19 slot1_5: New machine resource of type -1 allocated
01/09/18 17:27:19 Setting up slot pairings
01/09/18 17:27:19 slot1_5: Request accepted.
01/09/18 17:27:39 slot1_5: Remote owner is user@xxxxxxxxxxx
01/09/18 17:27:39 slot1_5: State change: claiming protocol successful
01/09/18 17:27:39 slot1_5: Changing state: Owner -> Claimed
01/09/18 17:27:39 slot1_4: State change: received RELEASE_CLAIM command
01/09/18 17:27:39 slot1_4: Changing state and activity: Claimed/Idle -> Preempting/Vacating
01/09/18 17:27:39 slot1_4: State change: No preempting claim, returning to owner
01/09/18 17:27:39 slot1_4: Changing state and activity: Preempting/Vacating -> Owner/Idle
01/09/18 17:27:39 slot1_4: State change: IS_OWNER is false
01/09/18 17:27:39 slot1_4: Changing state: Owner -> Unclaimed
01/09/18 17:27:39 slot1_4: Changing state: Unclaimed -> Delete
01/09/18 17:27:39 slot1_4: Resource no longer needed, deleting
01/09/18 17:27:39 condor_write(): Socket closed when trying to write 13 bytes to , fd is 8
01/09/18 17:27:39 Buf::write(): condor_write() failed
01/09/18 17:27:39 SharedPortEndpoint: failed to send final status (success) for SHARED_PORT_PASS_SOCK
01/09/18 17:27:39 condor_write(): Socket closed when trying to write 28 bytes to <xxx.xx.xx.xx:7501>, fd is 9
01/09/18 17:27:39 Buf::write(): condor_write() failed
01/09/18 17:27:39 slot1_1: Called deactivate_claim()
01/09/18 17:27:39 condor_write(): Socket closed when trying to write 13 bytes to , fd is 8
01/09/18 17:27:39 Buf::write(): condor_write() failed
01/09/18 17:27:39 SharedPortEndpoint: failed to send final status (success) for SHARED_PORT_PASS_SOCK
01/09/18 17:27:39 slot1_3: Got activate_claim request from shadow (xxx.xx.xx.xx)
01/09/18 17:27:39 condor_write(): Socket closed when trying to write 13 bytes to <xxx.xx.xx.xx:10752>, fd is 9
01/09/18 17:27:39 Buf::write(): condor_write() failed
01/09/18 17:27:39 slot1_3: Can't send eom to shadow.
01/09/18 17:27:39 slot1_4: New machine resource of type -1 allocated