[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Downloading big files is interrupted



On Thu, Jan 09, 2014 at 08:42:00AM +0000, Leon Thielen wrote:
> Hi,
> we running HTCondor version 8.1.2.
> Condor master and client are Windows 7 machines.
> Submit host is the master
> All the files will reside on a Linux a machine. Linux and windows are connected via samba.
> 
> transfer_input_files from big files will be interrupted after reading a couple of bytes. If running a job with a small input file (15,294,016) it works.
> Running with a bigger file (9,195,290,624) we get
> get_file(): ERROR: received 605356032 bytes, expected 9195290624!
> Running job with an even bigger file we get
> get_file(): ERROR: received 902758400 bytes, expected 30967531520!
> 
> StarterLog.slot1_1 :
> 
> 01/08/14 09:59:26 setting the orig job name in starter
> 01/08/14 09:59:26 setting the orig job iwd in starter
> 01/08/14 09:59:26 Chirp config summary: IO false, Updates false, Delayed updates true.
> 01/08/14 09:59:26 Initialized IO Proxy.
> 01/08/14 09:59:26 Setting resource limits not implemented!
> 01/08/14 10:00:13 condor_read(): timeout reading 65536 bytes from <10.10.20.209:55472>.
> 01/08/14 10:00:13 ReliSock::get_bytes_nobuffer: Failed to receive file.
> 01/08/14 10:00:13 get_file(): ERROR: received 902758400 bytes, expected 30967531520!
> 01/08/14 10:00:14 DoDownload: STARTER at 10.10.20.65 failed to receive file C:\condor\execute\dir_2636\reference-big.zip
> 01/08/14 10:00:14 File transfer failed (status=0).
> 01/08/14 10:00:14 ERROR "Failed to transfer files" at line 2120 in file c:\condor\execute\dir_27920\userdir\src\condor_starter.v6.1\jic_shadow.cpp
> 01/08/14 10:00:14 ShutdownFast all jobs.
> 01/08/14 10:00:14 condor_read() failed: recv(fd=1064) returned -1, errno = 10054 , reading 5 bytes from <10.10.20.209:55479>.
> 01/08/14 10:00:14 IO: Failed to read packet header
> 
> Can somebody help me too solve this issue?

It looks like the file size is limited to a 32 bit number:

$ printf "%09x\n" 605356032 9195290624 
024150000
224150000
^

$ printf "%09x\n" 902758400 30967531520
035cf0000
735cf0800
^

This would appear to be an internal HTCondor limitation.

-- 
Dan