[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Rejected jobs for unknown reasons and SECMAN:2003 error in ScheddLog



Dear all,

I have been using condor with no problems in our pool. However, I included 25 computers in other building (they use different routers.). 
My jobs that I sent those computers were rejected on condor system. I called condor_q -ana -l <clusterid> command and checked schedd log file and negotiator log file.

Here are the errors I got:
>>>>>>>>>>>Negotiator Log
05/21/12 11:20:04 ---------- Started Negotiation Cycle ----------
05/21/12 11:20:04 Phase 1:  Obtaining ads from collector ...
05/21/12 11:20:04   Getting all public ads ...
05/21/12 11:20:04   Sorting 110 ads ...
05/21/12 11:20:04   Getting startd private ads ...
05/21/12 11:20:04 condor_read() failed: recv() returned -1, errno = 10054 , reading 5 bytes from collector at <10.1.144.12:9618>.
05/21/12 11:20:04 IO: Failed to read packet header
05/21/12 11:20:04 Couldn't fetch ads: communication error
05/21/12 11:20:04 Aborting negotiation cycle

>>>>>>>>>>>>Schedd Log file (In short, TCP connection error and reading from socket errors, I have getting)
IO: Failed to read packet header
05/21/12 11:20:06 (pid:4592) Response problem from startd when requesting claim slot1@PCLAB-PC <10.1.115.250:49170> for jlab 4183.0.
05/21/12 11:20:06 (pid:4592) Failed to send REQUEST_CLAIM to startd slot1@PCLAB-PC <10.1.115.250:49170> for jlab: CEDAR:6004:failed reading from socket
05/21/12 11:20:06 (pid:4592) Match record (slot1@PCLAB-PC <10.1.115.250:49170> for jlab, 4183.0) deleted
05/21/12 11:20:06 (pid:4592) Finished negotiating for jlab in local pool: 21 matched, 1 rejected
05/21/12 11:20:06 (pid:4592) condor_read() failed: recv() returned -1, errno = 10053 , reading 5 bytes from startd slot1@PCLAB-PC <10.1.119.206:49173> for jlab.
05/21/12 11:22:08 (pid:4592) IO: Failed to read packet header
05/21/12 11:22:08 (pid:4592) Response problem from startd when requesting claim slot2@PCLAB-PC <10.1.119.206:49173> for jlab 4175.0.
05/21/12 11:22:08 (pid:4592) Failed to send REQUEST_CLAIM to startd slot2@PCLAB-PC <10.1.119.206:49173> for jlab: CEDAR:6004:failed reading from socket
05/21/12 11:22:08 (pid:4592) Match record (slot2@PCLAB-PC <10.1.119.206:49173> for jlab, 4175.0) deleted
05/21/12 11:22:08 (pid:4592) attempt to connect to <127.0.0.1:49167> failed: connect errno = 10061 connection refused.
05/21/12 11:22:08 (pid:4592) Failed to send REQUEST_CLAIM to startd slot2@PCLAB-PC <127.0.0.1:49167> for jlab: SECMAN:2003:TCP connection to startd slot2@PCLAB-PC <127.0.0.1:49167> for jlab failed.
05/21/12 11:22:08 (pid:4592) Match record (slot2@PCLAB-PC <127.0.0.1:49167> for jlab, 4167.0) deleted
05/21/12 11:22:08 (pid:4592) attempt to connect to <127.0.0.1:49167> failed: connect errno = 10061 connection refused.
05/21/12 11:22:08 (pid:4592) Failed to send REQUEST_CLAIM to startd slot3@PCLAB-PC <127.0.0.1:49167> for jlab: SECMAN:2003:TCP connection to startd slot3@PCLAB-PC <127.0.0.1:49167> for jlab failed.
05/21/12 11:22:08 (pid:4592) Match record (slot3@PCLAB-PC <127.0.0.1:49167> for jlab, 4168.0) deleted
05/21/12 11:22:08 (pid:4592) attempt to connect to <127.0.0.1:49167> failed: connect errno = 10061 connection refused.
05/21/12 11:22:08 (pid:4592) Failed to send REQUEST_CLAIM to startd slot4@PCLAB-PC <127.0.0.1:49167> for jlab: SECMAN:2003:TCP connection to startd slot4@PCLAB-PC <127.0.0.1:49167> for jlab failed.
05/21/12 11:22:08 (pid:4592) Match record (slot4@PCLAB-PC <127.0.0.1:49167> for jlab, 4169.0) deleted

NO_DNS option is true and DEFAULT_DOMAIN_NAME is disabled.

If any of you can help me, I would be appreciated.
Regards,
Canan