[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Error in large file transfer : Condor or Winsock Bug ?



On 9/1/06, Christopher Mellen <Chris.Mellen@xxxxxxxxxx> wrote:
I'm trying to upload a very large input file, size in excess of 4Gb, using
Condor 6.6.11 on a cluster of XP machines.

The file continually fails with a 'failed to do upload message' (or similar)
reported in the job logs.

Looking at the ShadowLog on the submit machine I see the following errors
reported :

9/1 13:40:38 Initializing a VANILLA shadow
9/1 13:40:38 (30740.0) (19384): Request to run on <xxx.xxx.xxx.176:4284> was
ACCEPTED
9/1 13:41:35 (30742.0) (13248): ReliSock: put_file: TransmitFile() failed,
errno=10022
9/1 13:41:35 (30742.0) (13248): ERROR "DoUpload: Failed to send file
E:\Temp_2\\XXX_depthprocessed_ts.txt , exiting at 1399
" at line 1398 in file ..\src\condor_c++_util\file_transfer.C

In the above it is the XXX_depthprocessed_ts.txt file that is > 4Gb.

Hence the problem seems to be in src\condor_c++_util\file_transfer.C. Is
this a Condor related fault or a fault in the underlying winsock file
transfer mechanism ?

Any ideas much appreciated ....

There is a bug in the file transfer on windows in 6.6 series when
transferring > 2GB files
in 6.6.11 this was mitigated by allowing the sum total of the files
transferred to be > 2GB but (IIRC) no individual file can still be
over 2 GB. (This is all totally fixed on 6.8 series)

I am not 100% sure about this though because the bug I found would
actually *work* on files from 4 to 6 GB since it was an int overflow
bug.

I suggest you split the input file into 4 files, each 1 GB and try
transferring that way. If this works you
* know it's a bug in 6.6.11 and you can move to 6.8 where this should be fixed
* have a work around :) in the meantime

If you are submitting a text file this big have you considered
compressing it pre transfer and reading it with a automatic
decompression stream? You may not be able to change your program on
this front - but if you can may decent libraries exist to do this.

Matt