[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor file-transfer vs networked storage



On 8/22/22 15:35, Matthew T West via HTCondor-users wrote:
When working on a single homogeneous compute cluster, are there any advantages to using HTCondor's file-transfer rather than working off shared network storage?


Hi Matthew:

There are several advantages to using explicit file transfer. Perhaps the biggest advantage is error handling. If the file cannot be transferred, or there is a typo, or disk error, HTCondor will notice and your job won't start. Should such an error happen with a shared filesystem, it probably won't happen until after your job starts, and it becomes the job's responsibility not just to detect the error, but to properly propagate the error up and out to HTCondor, so it can re-run the job. This is often hard to do, especially if you use 3rd party software. Usually what ends up happening is that the error is not correctly propagated out, and the job leaves the queue without correct or complete output, leading to very hard to debug problems. (Or worse, quietly missing data)

If you are using the native file transfer mechanism (i.e. not an URL), then file transfer is throttled by the access point. If using shared network filesystems, it is often possible for a lot of concurrent access to crash the file server or otherwise cause several performance problems.

HTCondor records in the job ad the number of input and output bytes transferred, which can be useful in determining how to size and provision network and disk size and bandwidth. This is harder to measure if using a shared file system.

Now, there's no such thing a free lunch. It is often difficult to know a-priori what the input file set it, in which case a shared filesystem might make more sense. Also, in the case where a job just needs a very small subset of a very large file, there may be performance benefits to reading that small chunk from a share filesystem instead of asking HTCondor to copy the whole file over, just to access a small part of it.


-greg