[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Issues with transferring files from URLs



A couple of questions, maybe bugs.


(1) Running a personal condor (8.0.7), given the following submit file (foo.sub) with the input from a URL:

universe = vanilla
executable = /bin/sh
arguments = "'-c' 'cat condor_submit.html'"
transfer_executable = no
transfer_input_files = http://research.cs.wisc.edu/htcondor/manual/current/condor_submit.html
output = foo.out
error = foo.err
queue

I find that the file is not transferred at all. foo.err says:

    cat: condor_submit.html: No such file or directory

The documentation says:

"For vanilla and vm universe jobs only, a file may be specified by giving a URL, instead of a file name. The implementation for URL transfers requires both configuration and available plug-in."

but these are indeed present (/etc/condor/condor_config has FILETRANSFER_PLUGINS which includes /usr/lib/condor/libexec/curl_plugin)

WORKAROUND: I was able to make it work by setting "should_transfer_files = yes".

However, is this right? Surely a URL should always be fetched, regardless of whether or not you are in the same filesystem domain, since URLs don't appear in the filesystem anyway?



(2) Given a unimplemented URL scheme (like "https"), I found a difference between my test personal condor node and my production condor node. The former would leave the job idle because of a classAd matching condition which was never true:

1 ( TARGET.HasFileTransfer && stringListMember("https",HasFileTransferPluginMethods) )

but the latter puts the job into a "held" (H) state, saying

Hold reason: Error from slot1@xxxxxxxxxxxxxxxx: STARTER at 192.168.6.42 failed to receive file /var/lib/condor/execute/dir_24716/xxxx.xxxx: FILETRANSFER:1:FILETRANSFER: plugin for type https not found!

(Aside: if a plugin for https is not present, wouldn't it be better to abort the job rather than put it into a 'held' state indefinitely, as this isn't a condition which is likely to fix itself?)

Anyway, I managed to drill down to find the difference, and it turns out to be different behaviour depending on whether you set

    should_transfer_files = if_needed

or

    should_transfer_files = yes

I cannot find this behaviour documented anywhere. Looking at

http://research.cs.wisc.edu/htcondor/manual/current/2_5Submitting_Job.html#SECTION00354000000000000000

it says the default value is "should_transfer_files = if_needed" and this will enable the file transfer mechanism if the machines are in different filesystem domains. This implies to me that if the machines are in different filesystem domains this should behave the same as "should_transfer_files = yes", but actually the generated requirements expressions are different in these two cases.

You can reproduce it with the following test case:

---- bar.sub ----
universe = vanilla
executable = /bin/sh
arguments = "-c true"
transfer_executable = no
transfer_input_files = https://example.net/
should_transfer_files = if_needed
#should_transfer_files = yes
error = bar.err
requirements = ( TARGET.Machine == "nonexistent" )
queue
--------

Submit with "condor_submit bar.sub" then use "condor_q -analyze <pid>"

[Results with should_transfer_files = if_needed]

The Requirements expression for your job is:

( ( TARGET.Machine == "nonexistent" ) ) && ( TARGET.Arch == "X86_64" ) &&
    ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) &&
    ( TARGET.Memory >= RequestMemory ) && ( ( TARGET.HasFileTransfer ) ||
      ( TARGET.FileSystemDomain == MY.FileSystemDomain ) )

[Results with should_transfer_files = yes]

The Requirements expression for your job is:

( ( TARGET.Machine == "nonexistent" ) ) && ( TARGET.Arch == "X86_64" ) &&
    ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) &&
    ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer &&
      stringListMember("https",HasFileTransferPluginMethods) )

This to me also seems like a bug, as I was expecting should_transfer_files = (yes|if_needed) to behave the same when the nodes are in different filesystem domains. But if it's not, I think it should be documented accordingly.

To me the correct behaviour would be something like this:

- if any input or output file is a URL, then add stringListMember("<scheme>",HasFileTransferPluginMethods) to the requirements

- if should_transfer_files = yes, then add ( TARGET.HasFileTransfer )

- if should_transfer_files = if_needed, then add
( ( TARGET.HasFileTransfer ) ||
      ( TARGET.FileSystemDomain == MY.FileSystemDomain ) )

Regards,

Brian Candler.