[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Cannot use singularity cache directory



I set up a local singularity registry from our small cluster and wanted to profit from the caching of images included in singularity. We often use images bigger than 1GB and transferring the image to the executor for each job seems wasteful (and slow in our case). Apptainer/singularity has a nice feature where if you load the images from a registry, it checks the hash to see if the image exists
in the cache folder and only download it if not already available locally.

The documentation [1] suggests that the singularity cache directory is configurable with the sentence "There you will find parameters to customize things such as [...], cache directory," but it actually is not. The environment variable used to configure it "APPTAINER_CACHEDIR" is overwritten by htcondor and set to the "execute_dir", which is temporary and not adapted for caching. I checked the code and this is done in `condor_starter-V6.1/singularity.cpp:l403-411`. The comments suggest this config is done for image rebuilt from docker images (which might not be cached, I don't have much experience with the "docker://" handle). It would be nice it the APPTAINER_CACHEDIR was only set to "execute_dir" when it is not already defined by the user or the condor configuration.

For now, I am thinking of "solving" this problem by using a DAG job, with a first job that pull the image only if necessary and the second that uses the local image but it complexifies the workflow unnecessarily. Does anyone see a better solution to reuse a singularity image by storing it on the local storage of an executor?

[1]: https://htcondor.readthedocs.io/en/latest/admin-manual/singularity-support.html?highlight=cache# [2]: https://github.com/htcondor/htcondor/blob/V10_0_2/src/condor_starter.V6.1/singularity.cpp#L403