[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] "File Transfer Using a URL" seems to be useless on Windows!



Thanks a lot for your replies!

The normal file transfer with "transfer_input_files = \\fileservername\path\file" works perfect so there is no access problems for an authorized user. The CREDD server is also configured properly and "whoami test" confirms that.

The reason why I started to play with URL file transfer mode is a hope to distribute the network load and to avoid all traffic being channeled thru a submitting node. 

I can explain the problem I'm trying to solve.

Recently trying to submit a bundle of 50 jobs with big (~1.5GB) input files we massively got errors:
====
007 (027.000.000) 03/19 16:27:24 Shadow exception!
Error from slot1_15@[...]: Failed to transfer files
====
Smaller bundle (let's say 5 jobs) worked fine. Delays (NextJobStartDelay) helped a bit but many jobs still failed. I think we definitely faced with a network bandwidth bottleneck resulting in file access timeouts:

================= StarterLog.slot1_15 ================
03/19/14 16:26:23 Received GoAhead from peer to receive C:\condor\execute\dir_8344\file.
03/19/14 16:26:23 get_file(): going to write to filename C:\condor\execute\dir_8344\file
03/19/14 16:26:25 get_file: Receiving 1320771848 bytes
03/19/14 16:26:55 condor_read(): timeout reading 65536 bytes from <[...]>.
03/19/14 16:26:55 ReliSock::get_bytes_nobuffer: Failed to receive file.
03/19/14 16:26:55 get_file: wrote 0 bytes to file
03/19/14 16:26:55 get_file(): ERROR: received 0 bytes, expected 1320771848!
03/19/14 16:26:55 DoDownload: STARTER at [...] failed to receive file C:\condor\execute\dir_8344\file
03/19/14 16:26:55 DoDownload: exiting at 2215
03/19/14 16:26:55 FileTransfer: created download transfer process with id 6
03/19/14 16:26:55 DaemonCore: in SendAliveToParent()
03/19/14 16:26:55 Completed DC_CHILDALIVE to daemon at <[...]>
03/19/14 16:26:55 DaemonCore: Leaving SendAliveToParent() - success
03/19/14 16:26:55 File transfer failed (status=0).
03/19/14 16:26:55 Calling client FileTransfer handler function.
03/19/14 16:26:55 ERROR "Failed to transfer files" at line 2050 in file c:\condor\execute\dir_29540\userdir\src\condor_starter.v6.1\jic_shadow.cpp
03/19/14 16:26:55 ShutdownFast all jobs.
03/19/14 16:26:55 Got ShutdownFast when no jobs running.
================================================

All input files are located on a really good and fast network storage so there should be no issues from that side. I see the only weak link in a submitting node which at first needs to download ~75GB from the network storage and then to upload it on executing nodes. 

Alexey




On Tue, Apr 1, 2014 at 8:05 PM, Zachary Miller <zmiller@xxxxxxxxxxx> wrote:
On Tue, Apr 01, 2014 at 10:14:55AM -0500, Todd Tannenbaum wrote:
>
> Meanwhile, someone more familiar with the file transfer via URL plugin
> can chime in....

I would agree with all that Todd said.

Although curl supports it, I haven't really found a convincing case for using
the "file://" type of URL, as HTCondor has first-class file transfer.

In some future development series, I plan to re-work the file transfer plugin
architecture significantly.  At the moment they are provided by the local admin
and I would like to see the model change to them being supplied and run by the
user, as you were expecting.


Cheers,
-zach

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/