[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Corrupt files on HTCondor transfer to node



To narrow down the source of the problem, can you have your job print the md5sum of the file before unpacking it?  And perhaps the file size? (To see if it's being corrupted versus truncated somehow)


Cheers,
-zach


> -----Original Message-----
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of
> Roberto Tavares
> Sent: Wednesday, April 04, 2018 3:20 PM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] Corrupt files on HTCondor transfer to node
> 
> Hello,
> 
> I'm having some trouble when running multiple jobs on HTCondor. My only
> guess is that in some random moment a transferred file is corrupted the the
> transmission procedure.
> 
> What I got:
> 
> Several .tar.gz files (datafiles). Let's say, one of those files is
> pack.tar.gz
> 
> Several tests (1, 2, 3, ...12) that uses pack.tar.gz.
> 
> pack.tar.gz is a valid file (it can be uncompressed at submission node).
> 
> from the 12 tests, 11 works. One test (random), I got the following error:
> 
> gzip: stdin: unexpected end of file
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
> 
> The testing processing is the same (just changing some parameters on the
> following steps).
> 
> The only thing that I can imagine is that the file transfer at some point
> fails (maybe a network issue?).
> 
> Is there a way to solve this problem?
> 
> Thanks
> 
> Roberto