[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Downloading big files is interrupted



On 1/9/2014 2:42 AM, Leon Thielen wrote:
Hi,
we running HTCondor version 8.1.2.
Condor master and client are Windows 7 machines.
Submit host is the master
All the files will reside on a Linux a machine. Linux and windows are connected via samba.


It may help if you post the corresponding snippet from the ShadowLog on the submit machine (from the same timeframe as the below snippet). The file transfer happens between the condor_shadow (on submit machine) and the condor_starter (on the execute machine), and the below starter log snippet implies that shadow stopped sending...

Also, if there is a shared filesystem between the submit and execute node (via samba or whatever), I am kinda wondering if/why you desire HTCondor to transfer the files in the first place. Ie you could just leverage the shared filesystem...

regards,
Todd



transfer_input_files from big files will be interrupted after reading a couple of bytes. If running a job with a small input file (15,294,016) it works.
Running with a bigger file (9,195,290,624) we get
get_file(): ERROR: received 605356032 bytes, expected 9195290624!
Running job with an even bigger file we get
get_file(): ERROR: received 902758400 bytes, expected 30967531520!

StarterLog.slot1_1 :

01/08/14 09:59:26 setting the orig job name in starter
01/08/14 09:59:26 setting the orig job iwd in starter
01/08/14 09:59:26 Chirp config summary: IO false, Updates false, Delayed updates true.
01/08/14 09:59:26 Initialized IO Proxy.
01/08/14 09:59:26 Setting resource limits not implemented!
01/08/14 10:00:13 condor_read(): timeout reading 65536 bytes from <10.10.20.209:55472>.
01/08/14 10:00:13 ReliSock::get_bytes_nobuffer: Failed to receive file.
01/08/14 10:00:13 get_file(): ERROR: received 902758400 bytes, expected 30967531520!
01/08/14 10:00:14 DoDownload: STARTER at 10.10.20.65 failed to receive file C:\condor\execute\dir_2636\reference-big.zip
01/08/14 10:00:14 File transfer failed (status=0).
01/08/14 10:00:14 ERROR "Failed to transfer files" at line 2120 in file c:\condor\execute\dir_27920\userdir\src\condor_starter.v6.1\jic_shadow.cpp
01/08/14 10:00:14 ShutdownFast all jobs.
01/08/14 10:00:14 condor_read() failed: recv(fd=1064) returned -1, errno = 10054 , reading 5 bytes from <10.10.20.209:55479>.
01/08/14 10:00:14 IO: Failed to read packet header

Can somebody help me too solve this issue?
Thanks Leon



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685