[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Caching large executable on worker nodes



Hi all,

we are currently in a situation where transferring the executable to the
execute machine for each job starts to get a limiting factor. Our case
is the following:

- large executable (500MB), which is the same for a large number of jobs
within one cluster (jobs only differ in input arguments)

- few execute machines, i.e. each execute machine will run many such
jobs (transferring the executable each time although this would not be
necessary)

- we are using the file transfer mechanism, but I believe the problem
would be similar with a shared file system

- we would like to keep the current job structure for various reasons,
i.e. we would rather not combine multiple jobs into one longer-running
one (I can provide the arguments for this if needed)


My goal would be to reduce the time and network traffic for transferring
the executable thousands of times.

A very natural idea would be to cache the executable on each execute
machine, hoping that we can make use of it in case we get another job of
the same cluster. I probably would be able to hack something that will
do the trick, although doing it properly might take quite some effort
(when and how to clean up the cache?, ...)

On the other hand, this seems like a very common problem, so I was
wondering whether Condor offers some built-in magic to cope with this?
Maybe I am missing something obvious?

Are there any recommended best practices for my case?

Thank you very much in advance,

Jens