[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] failed to transfer executable file



Hi,

I'm trying to transfer "large" (280MB) files from my XP submit machine to
the XP worker machines and I get intermittent errors (always the hardest to
figure out!) returned when using condor_submit: 

ERROR: failed to transfer executable file .\evaluate_condor_int.exe

The condor_submit command works about 80% of the time.
Note that the executable file evaluate_condor_int.exe is 71,780 bytes The
large data file m.mat is  246,650,384 bytes

The juicy bits from the ShedLog include

===================


4/24 12:04:38 Negotiating for owner: cqhoward@xxxxxxxxxxxxxxxxxxxxxxx
4/24 12:04:38 Checking consistency running and runnable jobs
4/24 12:04:38 Tables are consistent
4/24 12:04:38 condor_write(): Socket closed when trying to write buffer
4/24 12:04:38 Buf::write(): condor_write() failed
4/24 12:04:38 Can't send job eom to mgr
4/24 12:04:40 Started shadow for job 311217.0 on "<129.127.197.31:1028>",
(shadow pid = 2140)
4/24 12:05:00 condor_read(): timeout reading buffer.
4/24 12:05:00 ReliSock::get_bytes_nobuffer: Failed to receive file.
4/24 12:05:00 get_file(): ERROR: received 0 bytes, expected 71780!
4/24 12:05:00 Failed to receive file from client in SendSpoolFile.
4/24 12:05:12 IO: Incoming packet is too big
4/24 12:05:12 Started shadow for job 311218.0 on "<129.127.197.29:1028>",
(shadow pid = 1256)
4/24 12:05:33 condor_read(): timeout reading buffer.
4/24 12:05:33 ReliSock::get_bytes_nobuffer: Failed to receive file.
4/24 12:05:33 get_file(): ERROR: received 0 bytes, expected 71780!
4/24 12:05:33 Failed to receive file from client in SendSpoolFile.
4/24 12:05:53 condor_read(): timeout reading buffer.
4/24 12:05:53 Started shadow for job 311324.0 on "<129.127.236.88:1028>",
(shadow pid = 2744)
4/24 12:05:53 DaemonCore: Command received via TCP from host
<129.127.197.31:3963>
4/24 12:05:53 DaemonCore: received command 443 (VACATE_SERVICE), calling
handler (vacate_service)
4/24 12:05:53 Got VACATE_SERVICE from <129.127.197.31:3963>
4/24 12:05:53 Sent RELEASE_CLAIM to startd on <129.127.197.31:1028>
4/24 12:05:53 Match record (<129.127.197.31:1028>, 311217, 0) deleted
4/24 12:05:53 DaemonCore: Command received via UDP from host
<129.127.197.39:3664>
4/24 12:05:53 DaemonCore: received command 60014 (DC_INVALIDATE_KEY),
calling handler (handle_invalidate_key())
4/24 12:05:53 DaemonCore: Command received via UDP from host
<129.127.236.95:3800>
4/24 12:05:53 DaemonCore: received command 60014 (DC_INVALIDATE_KEY),
calling handler (handle_invalidate_key())
4/24 12:05:53 DaemonCore: Command received via UDP from host
<129.127.236.91:2560>
4/24 12:05:53 DaemonCore: received command 60014 (DC_INVALIDATE_KEY),
calling handler (handle_invalidate_key())
4/24 12:05:53 DaemonCore: Command received via UDP from host
<129.127.236.97:2217>
4/24 12:05:53 DaemonCore: received command 60014 (DC_INVALIDATE_KEY),
calling handler (handle_invalidate_key())
4/24 12:05:53 DaemonCore: Command received via UDP from host
<129.127.14.42:3100>
4/24 12:05:53 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling
handler (HandleProcessExitCommand())
4/24 12:05:53 Scheduler::Relinquish - mrec is NULL, can't relinquish
4/24 12:05:53 Null parameter --- match not deleted
4/24 12:05:55 Started shadow for job 311219.0 on "<129.127.197.39:1028>",
(shadow pid = 440)
4/24 12:06:07 Started shadow for job 311339.0 on "<129.127.236.172:1028>",
(shadow pid = 2216)

========================

Oddities in the ShadowLog include:


4/24 12:11:04 ******************************************************
4/24 12:11:04 Using config file: C:\Condor\condor_config
4/24 12:11:04 Using local config files: C:\Condor/condor_config.local
4/24 12:11:04 DaemonCore: Command Socket at <129.127.14.42:3505>
4/24 12:11:05 Initializing a VANILLA shadow
4/24 12:11:05 (311238.0) (3500): Request to run on <129.127.239.135:1028>
was ACCEPTED
4/24 12:11:06 (311245.0) (628): **** condor_shadow (condor_SHADOW) EXITING
WITH STATUS 100
4/24 12:11:06 ******************************************************



Has anyone seen this before?


Carl Howard 
Dept Mechanical Engineering
University of Adelaide
S.A. 5005
Australia

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>