[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Job not running on windows when submitted from Linux Machine



Hi All,

  I am submitting "Hello World" program from Linux machine to run on WinXP m/c, but its failing to execute. When i checked, the StartLog on windows machine, its giving socket related errors and exiting with status 108. I am passing required dlls in submit file. Please help me to overcome this issue. BTW i am able to run it on linux m/c if submitted from windows one but not other way round!!!!

I am attaching submit file, ShadowLog of submitter(Linux m/c), and StartLog of execute m/c(WinXP).

--------------------------------------------------------------------
              submit file
--------------------------------------------------------------------
universe =  vanilla
environment = path=c:\winnt\system32
executable = test.exe
output = test.out
error = test.err
log = test.log
Requirements =  (OpSys == "WINNT51")
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/condor/loop/cygwin1.dll
should_transfer_files = YES
TRANSFER_FILES = ALWAYS
queue


---------------------------------------------------------------------
                  ShadowLog
-----------------------------------------------------------------------
9/23 16:47:02 PASSWD_CACHE_REFRESH is undefined, using default value of 300

9/23 16:47:02 ******************************************************
9/23 16:47:02 ** condor_shadow (CONDOR_SHADOW) STARTING UP
9/23 16:47:02 ** /usr/local/condor/sbin/condor_shadow
9/23 16:47:02 ** $CondorVersion: 6.6.10 Jun 13 2005 $
9/23 16:47:02 ** $CondorPlatform: I386-LINUX_RH9 $
9/23 16:47:02 ** PID = 4532
9/23 16:47:02 ******************************************************
9/23 16:47:02 Using config file: /home/condor/condor_config
9/23 16:47:02 Using local config files: /home/condor/hosts/neo4/condor_config.local
9/23 16:47:02 DaemonCore: Command Socket at <192.168.5.24:34280>
9/23 16:47:02 SHADOW_TIMEOUT_MULTIPLIER is undefined, using default value of 0
9/23 16:47:03 Getting JobAd from schedd manually.
9/23 16:47:03 SHADOW_TIMEOUT_MULTIPLIER is undefined, using default value of 0
9/23 16:47:03 Success in retreving JobAd.
9/23 16:47:03 Initializing a VANILLA shadow
9/23 16:47:03 SHADOW_TIMEOUT_MULTIPLIER is undefined, using default value of 0
9/23 16:47:03 (140.0) (4532): ENABLE_USERLOG_LOCKING is undefined, using default value of True
9/23 16:47:03 (140.0) (4532): UserLog = /home/condor/test.log
9/23 16:47:03 (140.0) (4532): *** Reserved Swap = 5120
9/23 16:47:03 (140.0) (4532): *** Free Swap = 522104
9/23 16:47:03 (140.0) (4532): *** ClassAd Dump: BaseShadow::baseInit() ***
MyType = "Job"
TargetType = "Machine"
ClusterId = 140
QDate = 1127474204
CompletionDate = 0
Owner = "condor"
RemoteWallClockTime = 0.000000
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
RemoteUserCpu = 0.000000
RemoteSysCpu = 0.000000
ExitStatus = 0
NumCkpts = 0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime = 0
TotalSuspensions = 0
LastSuspensionTime = 0
CumulativeSuspensionTime = 0
ExitBySignal = FALSE
CondorVersion = "$CondorVersion: 6.6.10 Jun 13 2005 $"
CondorPlatform = "$CondorPlatform: I386-LINUX_RH9 $"
RootDir = "/"
Iwd = "/home/condor"
JobUniverse = 5
Cmd = "/home/condor/test.exe"
MinHosts = 1
MaxHosts = 1
WantRemoteSyscalls = FALSE
WantCheckpoint = FALSE
JobPrio = 0
User = "condor@xxxxxxxxxxxxxxxxxx"
NiceUser = FALSE
Env = ""
JobNotification = 2
UserLog = "/home/condor/test.log"
CoreSize = 0
KillSig = "SIGTERM"
Rank = 0.000000
In = "/dev/null"
TransferIn = FALSE
Out = "test.out"
Err = "test.err"
BufferSize = 524288
BufferBlockSize = 32768
ShouldTransferFiles = "YES"
WhenToTransferOutput = "ON_EXIT"
TransferFiles = "ONEXIT"
TransferInput = "/home/condor/loop/cygwin1.dll"
ImageSize = 9
ExecutableSize = 9
DiskUsage = 1275
Requirements = ((OpSys == "WINNT51")) && (Arch == "INTEL") && (Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize) && (HasFileTransfer)
PeriodicHold = FALSE
PeriodicRelease = FALSE
PeriodicRemove = FALSE
> > LeaveJobInQueue = FALSE
Args = ""
ProcId = 0
WantMatchDiagnostics = TRUE
LastMatchTime = 1127474205
NumJobMatches = 1
OrigMaxHosts = 1
JobStatus = 2
EnteredCurrentStatus = 1127474222
CurrentHosts = 1
RemoteHost = "neo7.neo.gridlogics.com"
RemoteVirtualMachineID = 1
ShadowBday = 1127474222
JobStartDate = 1127474222
JobCurrentStartDate = 1127474222
JobRunCount = 1
ServerTime = 1127474223
MyAddress = "<192.168.5.24:34280>"
9/23 16:47:03 (140.0) (4532): --- End of ClassAd ---
9/23 16:47:03 (140.0) (4532): Entering DCStartd::activateClaim()
9/23 16:47:23 (140.0) (4532): condor_read(): timeout reading buffer.
9/23 16:47:23 (140.0) (4532): DCStartd::activateClaim: failed to receive reply from <192.168.5.27:1028>
9/23 16:47:23 (140.0) (4532): Request to run on <192.168.5.27:1028> was REFUSED
9/23 16:47:23 (140.0) (4532): setting exit reason on <192.168.5.27:1028> to 108
9/23 16:47:23 (140.0) (4532): Resource <192.168.5.27:1028> changing state from PRE to FINISHED
9/23 16:47:23 (140.0) (4532): Job 140.0 is being evicted
9/23 16:47:23 (140.0) (4532): Entering DCStartd::deactivateClaim(forceful)
9/23 16:47:23 (140.0) (4532): DCStartd::deactivateClaim: successfully sent command
9/23 16:47:23 (140.0) (4532): Killed starter (fast) at <192.168.5.27:1028>
9/23 16:47:23 (140.0) (4532): logEvictEvent with unknown reason (108), aborting
9/23 16:47:23 (140.0) (4532): Entering BaseShadow::updateJobInQueue
9/23 16:47:23 (140.0) (4532): SHADOW_TIMEOUT_MULTIPLIER is undefined, using default value of 0
9/23 16:47:23 (140.0) (4532): AUTHENTICATE_FS: used file /tmp/qmgr_hRy4uH, status: 1
9/23 16:47:23 (140.0) (4532): Updating Job Queue: SetAttribute(LastVacateTime, 1127474243)
9/23 16:47:23 (140.0) (4532): Updating Job Queue: SetAttribute(BytesSent, 0.000000)
9/23 16:47:23 (140.0) (4532): Updating Job Queue: SetAttribute(BytesRecvd, 0.000000)
9/23 16:47:23 (140.0) (4532): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 108
9/23 16:57:02 PASSWD_CACHE_REFRESH is undefined, using default value of 300


--------------------------------------------------------------------
              StartLog on WInXP
-----------------------------------------------------------------------
9/22 16:48:51 DaemonCore: in SendAliveToParent()
9/22 16:48:51 DaemonCore: attempting to connect to '<192.168.5.27:1026>'
9/22 16:48:51 STARTD_TIMEOUT_MULTIPLIER is undefined, using default value of 0
9/22 16:49:14 DaemonCore: Command received via UDP from host <192.168.5.24:32812>
9/22 16:49:14 DaemonCore: received command 440 (MATCH_INFO), calling handler (command_match_info)
9/22 16:49:14 match_info called
9/22 16:49:14 Received match <192.168.5.27:1028>#1383344982
9/22 16:49:14 Started match timer (23) for 120 seconds.
9/22 16:49:14 State change: match notification protocol successful
9/22 16:49:14 Changing state: Unclaimed -> Matched
9/22 16:49:29 DaemonCore: Command received via TCP from host <192.168.5.24:34276>
9/22 16:49:29 DaemonCore: received command 442 (REQUEST_CLAIM), calling handler (command_request_claim)
9/22 16:49:29 Canceled match timer (23)
9/22 16:49:29 Schedd addr = <192.168.5.24:32772>
9/22 16:49:29 Alive interval = 300
9/22 16:49:29 Received capability from schedd (<192.168.5.27:1028>#1383344982)
9/22 16:49:29 Rank of this claim is: 0.000000
9/22 16:49:29 Request accepted.
9/22 16:49:44 Remote owner is condor@xxxxxxxxxxxxxxxxxx
9/22 16:49:44 State change: claiming protocol successful
9/22 16:49:44 Changing state: Matched -> Claimed
9/22 16:49:44 Started claim timer (25) w/ 300 second alive interval.
9/22 16:49:44 Attempting to send update via UDP to collector neo4.neo.gridlogics.com <192.168.5.24:9618>
9/22 16:49:44 Sent update to 1 collector(s)
9/22 16:49:44 Swap space: 485004
9/22 16:49:44 Looking up RESERVED_DISK parameter
9/22 16:49:44 Reserving 5120 kbytes for file system
9/22 16:49:44 Disk space: 17674780
9/22 16:49:44 Started polling timer.
9/22 16:49:59 DaemonCore: Command received via TCP from host <192.168.5.24:34282>
9/22 16:49:59 DaemonCore: received command 444 (ACTIVATE_CLAIM), calling handler (command_activate_claim)
9/22 16:49:59 Got activate_claim request from shadow (<192.168.5.24:34282>)
9/22 16:49:59 Read request ad and starter from shadow.
9/22 16:49:59 Swap space: 485040
9/22 16:49:59 Looking up RESERVED_DISK parameter
9/22 16:49:59 Reserving 5120 kbytes for file system
9/22 16:49:59 Disk space: 17674780
9/22 16:49:59 condor_write(): Socket closed when trying to write buffer
9/22 16:49:59 Buf::write(): condor_write() failed
9/22 16:49:59 Can't send eom to shadow.
9/22 16:49:59 Attempting to send update via UDP to collector neo4.neo.gridlogics.com <192.168.5.24:9618>
9/22 16:49:59 Sent update to 1 collector(s)
9/22 16:50:14 DaemonCore: Command received via TCP from host <192.168.5.24:34283>
9/22 16:50:14 DaemonCore: received command 404 (DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
9/22 16:50:14 Called deactivate_claim_forcibly()
9/22 16:50:29 DaemonCore: Command received via UDP from host <192.168.5.24:32812>
9/22 16:50:29 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler (command_handler)
9/22 16:50:29 State change: received RELEASE_CLAIM command
9/22 16:50:29 Canceled claim timer (25)
9/22 16:50:29 Changing state and activity: Claimed/Idle -> Preempting/Vacating
9/22 16:50:29 Entered vacate_client <192.168.5.24:32772> NEO4...
9/22 16:50:29 STARTD_TIMEOUT_MULTIPLIER is undefined, using default value of 0
9/22 16:50:29 State change: No preempting claim, returning to owner
9/22 16:50:29 Changing state and activity: Preempting/Vacating -> Owner/Idle
9/22 16:50:29 State change: IS_OWNER is false
9/22 16:50:29 Changing state: Owner -> Unclaimed
9/22 16:50:29 Canceled polling timer (27)
9/22 16:50:44 DaemonCore: Command received via UDP from host <192.168.5.24:32812>
9/22 16:50:44 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler (command_handler)
9/22 16:50:44 Error: can't find resource with capability (<192.168.5.27:1028>#1383344982)
9/22 16:50:44 Attempting to send update via UDP to collector neo4.neo.gridlogics.com <192.168.5.24:9618>
9/22 16:50:44 Sent update to 1 collector(s)
9/22 16:54:44 Swap space: 484236
9/22 16:54:44 Looking up RESERVED_DISK parameter
9/22 16:54:44 Reserving 5120 kbytes for file system
9/22 16:54:44 Disk space: 17674864
9/22 16:54:48 Attempting to send update via UDP to collector neo4.neo.gridlogics.com <192.168.5.24:9618>
9/22 16:54:48 Sent update to 1 collector(s)

------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX----------------------