[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Downloading big files is interrupted
- Date: Thu, 09 Jan 2014 10:46:06 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Downloading big files is interrupted
On 1/9/2014 2:42 AM, Leon Thielen wrote:
we running HTCondor version 8.1.2.
Condor master and client are Windows 7 machines.
Submit host is the master
All the files will reside on a Linux a machine. Linux and windows are connected via samba.
It may help if you post the corresponding snippet from the ShadowLog on
the submit machine (from the same timeframe as the below snippet). The
file transfer happens between the condor_shadow (on submit machine) and
the condor_starter (on the execute machine), and the below starter log
snippet implies that shadow stopped sending...
Also, if there is a shared filesystem between the submit and execute
node (via samba or whatever), I am kinda wondering if/why you desire
HTCondor to transfer the files in the first place. Ie you could just
leverage the shared filesystem...
transfer_input_files from big files will be interrupted after reading a couple of bytes. If running a job with a small input file (15,294,016) it works.
Running with a bigger file (9,195,290,624) we get
get_file(): ERROR: received 605356032 bytes, expected 9195290624!
Running job with an even bigger file we get
get_file(): ERROR: received 902758400 bytes, expected 30967531520!
01/08/14 09:59:26 setting the orig job name in starter
01/08/14 09:59:26 setting the orig job iwd in starter
01/08/14 09:59:26 Chirp config summary: IO false, Updates false, Delayed updates true.
01/08/14 09:59:26 Initialized IO Proxy.
01/08/14 09:59:26 Setting resource limits not implemented!
01/08/14 10:00:13 condor_read(): timeout reading 65536 bytes from <10.10.20.209:55472>.
01/08/14 10:00:13 ReliSock::get_bytes_nobuffer: Failed to receive file.
01/08/14 10:00:13 get_file(): ERROR: received 902758400 bytes, expected 30967531520!
01/08/14 10:00:14 DoDownload: STARTER at 10.10.20.65 failed to receive file C:\condor\execute\dir_2636\reference-big.zip
01/08/14 10:00:14 File transfer failed (status=0).
01/08/14 10:00:14 ERROR "Failed to transfer files" at line 2120 in file c:\condor\execute\dir_27920\userdir\src\condor_starter.v6.1\jic_shadow.cpp
01/08/14 10:00:14 ShutdownFast all jobs.
01/08/14 10:00:14 condor_read() failed: recv(fd=1064) returned -1, errno = 10054 , reading 5 bytes from <10.10.20.209:55479>.
01/08/14 10:00:14 IO: Failed to read packet header
Can somebody help me too solve this issue?
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at:
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685