[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor file-transfer vs networked storage

Hi Matt,

in our case, we have an inhomogeneous cluster with a shared filesystem â inhomogeneous for example in the sense that not all nodes have equal access to the shared filesystem.
Apart from this use case, however, we also recommend usage of local scratch even for the nodes with ideal connectivity to the shared filesystem (in some cases), and I guess this is the case you are asking for.

A shared filesystem usually degrades if many single-core jobs (in the order of thousands of jobs or more) access many tiny small files. Common examples for this are Python VirtualEnvs,
Julia package repositories, C++ code with interpreter usage on top (parsing the headers), many tiny input files etc.
So the short answer is: While cluster filesystems excel at high-throughput sequential file access, they often degrade if you access many small files, or lock many files.
Other cases are writing output in small chunks into many tiny small files in one common folder, requiring folder locks from various nodes. This can also cause degradation.

Note that read-only access of large input datasets is usually not a problem at all for cluster filesystems, especially if it is sequential.

We recommend various solutions for our users:

- Put a tarball on the shared filesystem, and extract this to scratch at the start of their job. This is essentially like the sandboxing you get with file transfer, but leveraging the fast shared filesystem
  for staging the input (which in many cases is useful for large software packages / virtualenvs).

- Write many tiny output files locally first, and move them (or maybe even tarball them) to the shared filesystem at the end of the job.

- Use HTCondor file transfer, which also allows them to use nodes without shared filesystem access (in an inhomogeneous cluster),
  but (depending on your setup) may  of course bottleneck earlier than the shared filesystem in terms of network depending on the size of the input / output.

- Use CernVM-FS for software / virtualenvs. For example, we provide a full Anaconda installation this way, and any large dependencies custom user software may rely on. While CernVM-FS is also a shared filesystem,
  it performs quite well with many small files, since it is read-only by design and caches locally behind the scenes (and dedupes and other nice things for this use case).
  It's basically a filesystem made to solve the problem of distributing software (and nowadays, containers) efficiently to many nodes.


Am 22.08.22 um 22:35 schrieb Matthew T West via HTCondor-users:
Good evening Nick,

I am readily aware of the value of HTCondor's file-transfer mechanism and associated sandboxing. But that wasn't my issue.

My question was:

When working on a single homogeneous compute cluster, are there any advantages to using HTCondor's file-transfer rather than working off shared network storage?

So not a grid or distributed campus pool or pulling from remote storage, but a single homogeneous compute cluster in one location that includes a networked file-system. I do apologize if everything after the question in my original email confused matters. Here might be a better way to put it:

Under what conditions does a shared file-server's degrade such that it would be better to work from local scratch, performance and throughput wise?


On 22/08/2022 20:37, Nick LeRoy wrote:
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

On Sat, Aug 20, 2022 at 8:44 AM Matthew T West via HTCondor-users
<htcondor-users@xxxxxxxxxxx> wrote:
Hi All,

When working on a single homogeneous compute cluster, are there any advantages to using HTCondor's file-transfer rather than working off shared network storage? I guess it would depend on the network and storage speeds.

It's just interesting that the "always work in local scratch" mindset I am used to is seen a serious backward step performance wise:

Scratch therefore only useful if your network storage or interconnects are slow or saturated ... copying bulk data to local storage / getting all users to copy to local/scratch storage is a quick way to saturate your storage infrastructure.

I can find other instances of this HPC conventional wisdom and it intuitively makes sense. But I don't understand networked storage well, so I am asking the HTCondor hivemind for their thoughts.

You need to remember that HTCondor can work in many different
environments, among these being WANs, campus-type structures, and
grids. For these types of scenarios, file transfer is preferable, if
not required.

HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at:

Oliver Freyermuth
UniversitÃt Bonn
Physikalisches Institut, Raum 1.047
NuÃallee 12
53115 Bonn
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature