[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Downloading big files is interrupted




It indeed turned out to be a bug.  Thanks to both Leon T and Dan F.

The bug is now fixed in the source code; all future releases of HTCondor will include the bug fix. Details can be found at
  https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=4150

Leon, HTCondor v8.1.4 is scheduled for public release on 2/20/14. If you want a preview release ahead of that time (i.e. if you cannot wait until then to fix the below issue), you could email
  htcondor-admin@xxxxxxxxxxx
and ask for pre-release binaries, or you could build from source yourself.

Thank you for taking the time to report the below and for your interest in HTCondor,

best
Todd

On 1/9/2014 2:42 AM, Leon Thielen wrote:
Hi,
we running HTCondor version 8.1.2.
Condor master and client are Windows 7 machines.
Submit host is the master
All the files will reside on a Linux a machine. Linux and windows are connected via samba.

transfer_input_files from big files will be interrupted after reading a couple of bytes. If running a job with a small input file (15,294,016) it works.
Running with a bigger file (9,195,290,624) we get
get_file(): ERROR: received 605356032 bytes, expected 9195290624!
Running job with an even bigger file we get
get_file(): ERROR: received 902758400 bytes, expected 30967531520!

StarterLog.slot1_1 :

01/08/14 09:59:26 setting the orig job name in starter
01/08/14 09:59:26 setting the orig job iwd in starter
01/08/14 09:59:26 Chirp config summary: IO false, Updates false, Delayed updates true.
01/08/14 09:59:26 Initialized IO Proxy.
01/08/14 09:59:26 Setting resource limits not implemented!
01/08/14 10:00:13 condor_read(): timeout reading 65536 bytes from <10.10.20.209:55472>.
01/08/14 10:00:13 ReliSock::get_bytes_nobuffer: Failed to receive file.
01/08/14 10:00:13 get_file(): ERROR: received 902758400 bytes, expected 30967531520!
01/08/14 10:00:14 DoDownload: STARTER at 10.10.20.65 failed to receive file C:\condor\execute\dir_2636\reference-big.zip
01/08/14 10:00:14 File transfer failed (status=0).
01/08/14 10:00:14 ERROR "Failed to transfer files" at line 2120 in file c:\condor\execute\dir_27920\userdir\src\condor_starter.v6.1\jic_shadow.cpp
01/08/14 10:00:14 ShutdownFast all jobs.
01/08/14 10:00:14 condor_read() failed: recv(fd=1064) returned -1, errno = 10054 , reading 5 bytes from <10.10.20.209:55479>.
01/08/14 10:00:14 IO: Failed to read packet header

Can somebody help me too solve this issue?
Thanks Leon



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685