[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] error delegating credential to startd: delegateX509Proxy



Hi,

quite all sections running on our Condor pool (6.9.1, gLExec activated) suffer from frequent restarts (10-15/hour) of the start daemon on the worker node. In the ShadowLog is visible the error below. Do you have some idea on the reason of this problem?

1/12 17:32:40 (167275.0) (18120): UserLog = /export/CafCondor/cafIn/ submit_fabhap_medium_218532_8401/job.log
1/12 17:32:40 (167275.0) (18120): *** Reserved Swap = 5120
1/12 17:32:40 (167275.0) (18120): *** Free Swap = 2070936
1/12 17:32:40 (167275.0) (18120): in RemoteResource::initStartdInfo()
1/12 17:32:40 (167275.0) (18120): Granting remote host "131.225.212.184" (<131.225.212.184:33443>) WRITE and DAEMON permission. 1/12 17:32:40 (167275.0) (18120): trying early delegation (for glexec) of proxy: /export/CafCondor/tickets/x509cc_fabhap
1/12 17:32:40 (167275.0) (18120): Entering DCStartd::delegateX509Proxy()
1/12 17:32:40 (167673.0) (1479): Proxy timestamps: remote estimated 1168634306, local 1168558121 (-76185 difference) 1/12 17:32:40 (167275.0) (18120): attempt to connect to <131.225.212.184:33443> failed: Connection refused (connect errno = 111). 1/12 17:32:40 (167275.0) (18120): error delegating credential to startd: DCStartd::delegateX509Proxy: Failed to send command DELEGATE_GSI_CRED_STARTD to the startd
1/12 17:32:40 (167275.0) (18120): Entering DCStartd::activateClaim()
1/12 17:32:40 (167275.0) (18120): attempt to connect to <131.225.212.184:33443> failed: Connection refused (connect errno = 111). 1/12 17:32:40 (167275.0) (18120): DCStartd::activateClaim: Failed to send command ACTIVATE_CLAIM to the startd 1/12 17:32:40 (167275.0) (18120): setting exit reason on vm2@8302@fcdfcaf1035.fnal.gov to 108 1/12 17:32:40 (167275.0) (18120): Resource vm2@8302@fcdfcaf1035.fnal.gov changing state from PRE to FINISHED
1/12 17:32:40 (167275.0) (18120): Job 167275.0 is being evicted
1/12 17:32:40 (167275.0) (18120): Entering DCStartd::deactivateClaim (forceful) 1/12 17:32:40 (167275.0) (18120): attempt to connect to <131.225.212.184:33443> failed: Connection refused (connect errno = 111). 1/12 17:32:40 (167275.0) (18120): RemoteResource::killStarter(): Could not send command to startd 1/12 17:32:40 (167275.0) (18120): logEvictEvent with unknown reason (108), aborting 1/12 17:32:40 (167275.0) (18120): STARTCOMMAND: starting 1111 to <131.225.240.106:32903> on TCP port 45499. 1/12 17:32:40 (167275.0) (18120): SECMAN: command 1111 to <131.225.240.106:32903> on TCP port 45499 (blocking). 1/12 17:32:40 (167275.0) (18120): SECMAN: no cached key for {<131.225.240.106:32903>,<1111>}.

Many Thanks
Renzo