[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Can't submit jobs using 7.4.1 (windows)



We have been doing some testing re submitting jobs with windows version 7.4.1
 
We have previously had no problems with linux central managers v7.2.3 and windows clients
v7.2.4
 
Testing with PCs shows
submit from 7.2.4 to 7.2.4 OK
submit from 7.2.4 to 7.4.1 OK
submit from 7.4.1 to 7.2.4 NOT OK
submit from 7.4.1 to 7.4.1 NOT OK
 
There appears to be a DNS hostname lookup failure with the 7.4.1 schedd (see log below).
 
We tried updating the linux CM to 7.4.1 but it makes no difference.
See the 3 log extracts below. The config files on the windows machines are identical.
 
Thanks for any insights/help.
 
Cheers
 
Greg
 
 
Excerpt from 7.4.1 schedd log that does NOT submit OK.
 
01/18 14:36:09 Locale: English_United States.1252
01/18 14:36:09 ******************************************************
01/18 14:36:09 ** condor_schedd.exe (CONDOR_SCHEDD) STARTING UP
01/18 14:36:09 ** C:\PROGRA~1\condor\bin\condor_schedd.exe
01/18 14:36:09 ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
01/18 14:36:09 ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
01/18 14:36:09 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
01/18 14:36:09 ** $CondorPlatform: INTEL-WINNT50 $
01/18 14:36:09 ** PID = 8660
01/18 14:36:09 ** Log last touched 1/18 11:37:29
01/18 14:36:09 ******************************************************
01/18 14:36:09 Using config source: c:\PROGRA~1\condor\condor_config
01/18 14:36:09 Using local config sources:
01/18 14:36:09    C:\PROGRA~1\condor/condor_config.local
01/18 14:36:09 DaemonCore: Command Socket at <130.116.146.156:9175>
01/18 14:36:10 History file rotation is enabled.
01/18 14:36:10   Maximum history file size is: 100000000 bytes
01/18 14:36:10   Number of rotated history files is: 5
01/18 14:36:10 my_popen: CreateProcess failed
01/18 14:36:10 Failed to execute C:\PROGRA~1\condor/bin/condor_shadow.pvm, ignoring
01/18 14:36:10 my_popen: CreateProcess failed
01/18 14:36:10 Failed to execute C:\PROGRA~1\condor/bin/condor_shadow.std, ignoring
01/18 14:36:12 Calling Handler <DaemonCore::HandleReqSocketHandler> (4)
01/18 14:36:12 Received TCP command 479 (STORE_CRED) from  <130.116.146.156:9137>, access level WRITE
01/18 14:36:12 Calling HandleReq <cred_access_handler> (0)
01/18 14:36:12 Return from HandleReq <cred_access_handler> (handler: 0.016s, sec: 0.000s)
01/18 14:36:12 Return from Handler <DaemonCore::HandleReqSocketHandler>
01/18 14:36:12 Calling Handler <DaemonCore::HandleReqSocketHandler> (4)
01/18 14:36:12 Received TCP command 1111 (QMGMT_CMD) from  <130.116.146.156:9304>, access level READ
01/18 14:36:12 Calling HandleReq <handle_q> (0)
01/18 14:36:12 Return from HandleReq <handle_q> (handler: 0.094s, sec: 0.000s)
01/18 14:36:12 Return from Handler <DaemonCore::HandleReqSocketHandler>
01/18 14:36:12 Received UDP command 421 (RESCHEDULE) from  <130.116.146.156:9630>, access level WRITE
01/18 14:36:12 Calling HandleReq <reschedule_negotiator> (0)
01/18 14:36:12 Return from HandleReq <reschedule_negotiator> (handler: 0.000s, sec: 0.000s)
01/18 14:36:15 Sent ad to central manager for hit023@xxxxxxxx
01/18 14:36:15 Sent ad to 1 collectors for hit023@xxxxxxxx
01/18 14:36:15 Failed to send RESCHEDULE to local negotiator:
01/18 14:36:46 Sent ad to central manager for hit023@xxxxxxxx
01/18 14:36:46 Sent ad to 1 collectors for hit023@xxxxxxxx
01/18 14:36:46 Failed to send RESCHEDULE to local negotiator:
01/18 14:37:10 Calling Handler <DaemonCore::HandleReqSocketHandler> (4)
01/18 14:37:10 Received TCP command 493 (NEGOTIATE_WITH_SIGATTRS) from  <130.116.24.145:9926>, access level NEGOTIATOR
01/18 14:37:10 Calling HandleReq <doNegotiate> (0)
01/18 14:37:10 Negotiator hostname lookup failed!
01/18 14:37:10 Return from HandleReq <doNegotiate> (handler: 0.000s, sec: 0.000s)
01/18 14:37:10 Return from Handler <DaemonCore::HandleReqSocketHandler>
01/18 14:37:17 Increasing flock level for hit023 to 1.
01/18 14:37:17 Sent ad to central manager for hit023@xxxxxxxx
01/18 14:37:17 Sent ad to 1 collectors for hit023@xxxxxxxx
Excerpt from 7.2.4 schedd log that does submit OK.
 
1/18 11:39:04 ******************************************************
1/18 11:39:04 ** condor_schedd.exe (CONDOR_SCHEDD) STARTING UP
1/18 11:39:04 ** C:\PROGRA~1\condor\bin\condor_schedd.exe
1/18 11:39:04 ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
1/18 11:39:04 ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
1/18 11:39:04 ** $CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $
1/18 11:39:04 ** $CondorPlatform: INTEL-WINNT50 $
1/18 11:39:04 ** PID = 9256
1/18 11:39:04 ** Log last touched 1/15 15:20:45
1/18 11:39:04 ******************************************************
1/18 11:39:04 Using config source: c:\PROGRA~1\condor\condor_config
1/18 11:39:04 Using local config sources:
1/18 11:39:04    C:\PROGRA~1\condor/condor_config.local
1/18 11:39:04 DaemonCore: Command Socket at <130.116.146.156:9675>
1/18 11:39:04 History file rotation is enabled.
1/18 11:39:04   Maximum history file size is: 100000000 bytes
1/18 11:39:04   Number of rotated history files is: 5
1/18 11:39:05 my_popen: CreateProcess failed
1/18 11:39:05 Failed to execute C:\PROGRA~1\condor/bin/condor_shadow.pvm, ignoring
1/18 11:39:05 my_popen: CreateProcess failed
1/18 11:39:05 Failed to execute C:\PROGRA~1\condor/bin/condor_shadow.std, ignoring
1/18 11:39:29 Calling Handler <DaemonCore::HandleReqSocketHandler>
1/18 11:39:29 Received TCP command 479 (STORE_CRED) from  <130.116.146.156:9315>, access level WRITE
1/18 11:39:29 Calling HandleReq <cred_access_handler> (0)
1/18 11:39:29 Return from HandleReq <cred_access_handler> (handler: 0.109s, sec: 0.000s)
1/18 11:39:29 Return from Handler <DaemonCore::HandleReqSocketHandler>
1/18 11:39:29 Calling Handler <DaemonCore::HandleReqSocketHandler>
1/18 11:39:29 Received TCP command 1111 (QMGMT_CMD) from  <130.116.146.156:9264>, access level READ
1/18 11:39:29 Calling HandleReq <handle_q> (0)
1/18 11:39:29 Return from HandleReq <handle_q> (handler: 0.078s, sec: 0.016s)
1/18 11:39:29 Return from Handler <DaemonCore::HandleReqSocketHandler>
1/18 11:39:30 Received UDP command 421 (RESCHEDULE) from  <130.116.146.156:9090>, access level WRITE
1/18 11:39:30 Calling HandleReq <reschedule_negotiator> (0)
1/18 11:39:30 Sent ad to central manager for hit023@xxxxxxxx
1/18 11:39:30 Sent ad to 1 collectors for hit023@xxxxxxxx
1/18 11:39:30 Called reschedule_negotiator()
1/18 11:39:30 Return from HandleReq <reschedule_negotiator> (handler: 0.016s, sec: 0.000s)
1/18 11:40:00 Sent ad to central manager for hit023@xxxxxxxx
1/18 11:40:00 Sent ad to 1 collectors for hit023@xxxxxxxx
1/18 11:40:00 Calling Handler <DaemonCore::HandleReqSocketHandler>
1/18 11:40:00 Received TCP command 493 (NEGOTIATE_WITH_SIGATTRS) from  <130.116.24.145:9581>, access level NEGOTIATOR
1/18 11:40:00 Calling HandleReq <doNegotiate> (0)
1/18 11:40:00 Negotiating for owner: hit023@xxxxxxxx
 
Excerpt from NegotiatorLog on central manager (linux 7.2.4 and 7.4.1 same errors)
 
1/18 11:36:42 ---------- Started Negotiation Cycle ----------
1/18 11:36:42 Phase 1:  Obtaining ads from collector ...
1/18 11:36:42   Getting all public ads ...
1/18 11:36:43   Sorting 951 ads ...
1/18 11:36:43   Getting startd private ads ...
1/18 11:36:43 Got ads: 951 public and 466 private
1/18 11:36:43 Public ads include 2 submitter, 466 startd
1/18 11:36:43 Phase 2:  Performing accounting ...
1/18 11:36:43 Phase 3:  Sorting submitter ads by priority ...
1/18 11:36:43 Phase 4.1:  Negotiating with schedds ...
1/18 11:36:43   Negotiating with hit023@xxxxxxxx at <130.116.146.156:9007>
1/18 11:36:43 0 seconds so far
1/18 11:36:43 attempt to connect to <130.116.146.156:9007> failed: Connection re
fused (connect errno = 111).
1/18 11:36:43     Failed to connect to hit023@xxxxxxxx (<130.116.146.156:9007>)
1/18 11:36:43   Error: Ignoring schedd for this cycle