[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] "incremental" (singularity) jobs



Hi

I do not have an all encompassing solution but I have given this type of problem some thoughts already

the whole point of a container solution (including but not limited to) is to isolate the processes in the container from the the rest of the world (host / other containers) so from the container host (in this case condor execute node) this is an inter process communication problem. the options then are

- pipe from data management to singularity

submit file could look like

executable = sh
arguments = -c "process_on_the_execute_node fetch data | singularity exec $MY_SINGULARITY_EXEC_OPTIONS my_script_inside_the_container | post_processing_on the execute_node"

- use a socket or a fifo declared on the host bound the singularity image : data management does its thing writes to the socket or fifo , the processes inside the container just read from there oblivious to the fact that it is also handled from outside the container

since data_management and the container processes run in parallel this could probably be a (dynamic ?) DAG


- bind a common directory from the host in the container and read and write files (this will lead to concurrency concerns between data_management and container)


- shared memory. I believe it is possible but I think the config on the host would be way too contrapted to be useful at the scale of a condor cluster (using /dev/shm is actually previous scenario from a functional standpoint)


That being said do not forget that it is possible to subclass singularity images for your own benefit, use recipes.

http://singularity.lbl.gov/docs-recipes

https://www.sylabs.io/guides/2.5.1/user-guide/container_recipes.html

if your data management client is not too convoluted that's the route I would personally investigate with a series of recipes looking like

Bootstrap: shub
From: my_3rd_party_image


%help
adding "data management" to my-3rd_party_image

# Both of the below are copied before %post

# 1. This is how to copy files for legacy < 2.3


%setup
ÂÂÂ add_my_data_management_repository
ÂÂÂ apt update
ÂÂÂ apt install my_data_management

%files
ÂÂÂ copy_configuration_to_make_my_data_management_client_useful


depending on the amount of 3rd party images and the complexity of the installation of the data_management client, this may or may not be a viable option.

HTH
Philippe



On 08/19/2018 12:18 AM, Michael Hanke wrote:
Hi,

I cannot find a straightforward solution for the following problem, and
I would be glad if someone could put me on the right track on how to do
it, or how to reframe the problem.

We have jobs to process that cover a wide range of data processing. They
all have in common that specific code/applications come in singularity
images that are provided by 3rd-parties. To perform the computations,
data need to be pulled from a data management system at the beginning
and results need to be put back into it at the end. The execute nodes do
not have the required data management software, though. Given that the
core processing is done via singularity, it would be easy to provide the
data management software via such an image as well. However, it would be
very difficult to fold it into all the individual singularity images
provided by 3rd-parties.

Q: Is it possible to bind three singularity jobs (each with its own
singularity image) together, such that they run on any, but the exact
same machine, and that they all share a common temporary work dir (the
execute nodes have no shared filesystem). The shared work dir is
important, as the size of the dataset is substantial (>x*100GB) and
moving the job results between prep, computation and finalize stages
would lead to substantial stress on the network, while the final results
tend to be rather small.

I'd be happy for any suggestions. Thanks!

Michael

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/