[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] "File Transfer Using a URL" seems to be useless on Windows!



Hi Alex

 

You can effectively offload the file transfer from the submit node to the execute node/s by using

a batch file wrapper script as your Condor “executable”, e.g.

 

%windir%\system32\net use \\fileserver\path\condor_stuff

copy \\fileserver\path\condor_stuff\real.exe .

copy \\fileserver\path\condor_stuff\input_data.dat .

real.exe

copy output_data.dat \\fileserver\path\condor_stuff

del /q *.*

 

You can use xcopy to transfer whole folders if necessary.

You can also use some error checking, e.g after each copy statement

 

IF %ERRORLEVEL% NEQ 0 EXIT 1

 

and then use the following in the submit file to rerun the job if the copy fails

 

== 0)

 

Cheers

 

Greg

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Alexey Smirnov
Sent: Wednesday, 2 April 2014 2:25 AM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] "File Transfer Using a URL" seems to be useless on Windows!

 

Thanks a lot for your replies!

 

The normal file transfer with "transfer_input_files = \\fileservername\path\file" works perfect so there is no access problems for an authorized user. The CREDD server is also configured properly and "whoami test" confirms that.

 

The reason why I started to play with URL file transfer mode is a hope to distribute the network load and to avoid all traffic being channeled thru a submitting node. 

 

I can explain the problem I'm trying to solve.

 

Recently trying to submit a bundle of 50 jobs with big (~1.5GB) input files we massively got errors:

====

007 (027.000.000) 03/19 16:27:24 Shadow exception!

            Error from slot1_15@[...]: Failed to transfer files

====

Smaller bundle (let's say 5 jobs) worked fine. Delays (NextJobStartDelay) helped a bit but many jobs still failed. I think we definitely faced with a network bandwidth bottleneck resulting in file access timeouts:

 

================= StarterLog.slot1_15 ================

03/19/14 16:26:23 Received GoAhead from peer to receive C:\condor\execute\dir_8344\file.

03/19/14 16:26:23 get_file(): going to write to filename C:\condor\execute\dir_8344\file

03/19/14 16:26:25 get_file: Receiving 1320771848 bytes

03/19/14 16:26:55 condor_read(): timeout reading 65536 bytes from <[...]>.

03/19/14 16:26:55 ReliSock::get_bytes_nobuffer: Failed to receive file.

03/19/14 16:26:55 get_file: wrote 0 bytes to file

03/19/14 16:26:55 get_file(): ERROR: received 0 bytes, expected 1320771848!

03/19/14 16:26:55 DoDownload: STARTER at [...] failed to receive file C:\condor\execute\dir_8344\file

03/19/14 16:26:55 DoDownload: exiting at 2215

03/19/14 16:26:55 FileTransfer: created download transfer process with id 6

03/19/14 16:26:55 DaemonCore: in SendAliveToParent()

03/19/14 16:26:55 Completed DC_CHILDALIVE to daemon at <[...]>

03/19/14 16:26:55 DaemonCore: Leaving SendAliveToParent() - success

03/19/14 16:26:55 File transfer failed (status=0).

03/19/14 16:26:55 Calling client FileTransfer handler function.

03/19/14 16:26:55 ERROR "Failed to transfer files" at line 2050 in file c:\condor\execute\dir_29540\userdir\src\condor_starter.v6.1\jic_shadow.cpp

03/19/14 16:26:55 ShutdownFast all jobs.

03/19/14 16:26:55 Got ShutdownFast when no jobs running.

================================================

 

All input files are located on a really good and fast network storage so there should be no issues from that side. I see the only weak link in a submitting node which at first needs to download ~75GB from the network storage and then to upload it on executing nodes. 

 

Alexey

 

 

 

On Tue, Apr 1, 2014 at 8:05 PM, Zachary Miller <zmiller@xxxxxxxxxxx> wrote:

On Tue, Apr 01, 2014 at 10:14:55AM -0500, Todd Tannenbaum wrote:
>
> Meanwhile, someone more familiar with the file transfer via URL plugin
> can chime in....

I would agree with all that Todd said.

Although curl supports it, I haven't really found a convincing case for using
the "file://" type of URL, as HTCondor has first-class file transfer.

In some future development series, I plan to re-work the file transfer plugin
architecture significantly.  At the moment they are provided by the local admin
and I would like to see the model change to them being supplied and run by the
user, as you were expecting.


Cheers,
-zach


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/