[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Issues with transferring files from URLs



On Mon, Nov 03, 2014 at 02:51:13PM +0000, Brian Candler wrote:
> The documentation says:
> 
> "For vanilla and vm universe jobs only, a file may be specified by 
> giving a URL, instead of a file name. The implementation for URL 
> transfers requires both configuration and available plug-in."
> 
> but these are indeed present (/etc/condor/condor_config has 
> FILETRANSFER_PLUGINS which includes /usr/lib/condor/libexec/curl_plugin)
> 
> WORKAROUND: I was able to make it work by setting "should_transfer_files 
> = yes".
> 
> However, is this right? Surely a URL should always be fetched, 
> regardless of whether or not you are in the same filesystem domain, 
> since URLs don't appear in the filesystem anyway?

What you did is correct.  I agree this is confusing and needs better
documentation.  The slightly more technical answer is that without
"should_transfer_files" that nothing is fetched (i.e. assumed to be
accessable via shared filesystem) and that includes the URLs.  I think
a good argument can be made that such behavior violates the principle
of least surprise.


> (2) Given a unimplemented URL scheme (like "https"), I found a 
> difference between my test personal condor node and my production condor 
> node. The former would leave the job idle because of a classAd matching 
> condition which was never true:
> 
> 1   ( TARGET.HasFileTransfer && 
> stringListMember("https",HasFileTransferPluginMethods) )
> 
> but the latter puts the job into a "held" (H) state, saying
> 
> Hold reason: Error from slot1@xxxxxxxxxxxxxxxx: STARTER at 192.168.6.42 
> failed to receive file /var/lib/condor/execute/dir_24716/xxxx.xxxx: 
> FILETRANSFER:1:FILETRANSFER: plugin for type https not found!
> 
> (Aside: if a plugin for https is not present, wouldn't it be better to 
> abort the job rather than put it into a 'held' state indefinitely, as 
> this isn't a condition which is likely to fix itself?)

Possbily.  We generally like jobs to go on hold when there is a problem with
the input so that it's quite obvious to the user and they have a chance to do
something about it.  And in this case, the job should never have matched in the
first place.


> Anyway, I managed to drill down to find the difference, and it turns out 
> to be different behaviour depending on whether you set
> 
>     should_transfer_files = if_needed
> 
> or
> 
>     should_transfer_files = yes
> 
> I cannot find this behaviour documented anywhere. Looking at
> 
> http://research.cs.wisc.edu/htcondor/manual/current/2_5Submitting_Job.html#SECTION00354000000000000000
> 
> it says the default value is "should_transfer_files = if_needed" and 
> this will enable the file transfer mechanism if the machines are in 
> different filesystem domains. This implies to me that if the machines 
> are in different filesystem domains this should behave the same as 
> "should_transfer_files = yes", but actually the generated requirements 
> expressions are different in these two cases.

Again, I agree with you here... URLs should receive special treatment when
considering whether file transfer is needed.


Thanks for your report and thoughtful analysis.  I created a defect ticket
for these issues:  https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=4692

While we may fix these issues in the current development series, the work may
be postponed because in the longer-term, I plan to redo the transfer plugin
architecture so that the plugins are user-supplied and not admin-supplied,
along with a number of other changes (e.g. status/keepalives and batching).


Hopefully your workarounds will suffice for now, and at the very least we
need to document the current behavior more clearly.  Thanks again for your
report.


Cheers,
-zach