[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Parallel Universe on Kubernetes




On 6/19/23 5:45 PM, Sochat, Vanessa via HTCondor-users wrote:

Hi Folks!

 

Iâm the HPC monkey of which he speaks! Iâve been creating a Kubernetes Operator with HTCondor as the scheduler, and doing fairly well up until I needed to use the parallel universe. To not clutter your inboxes, here is a summary of where I am currently at!

 

https://gist.github.com/vsoch/2073136f0833983efc92b4eeb52d49dd

 

TLDR: if we could easily adopt the current setup with the docker images here https://github.com/htcondor/htcondor/tree/main/build/docker/services to allow for this parallel universe, that would likely be the example that I need to get it working in Kubernetes. The current working (for basic jobs) setup is here: https://github.com/converged-computing/htcondor-operator and my (so far) failed attempts are under the single opened PR to add LAMMPS. Iâm happy to show you / debug anything you might be interested in. Thanks again for your help, and apologies for my noob-level expertise â Iâm only about a day into using this beastie!

 

That's very impressive for a day's work.  If you can stand on HTCondor k8s, configuring it to run parallel jobs with a dedicated scheduler is pretty straightforward -- just point the startds at the one schedd that can run parallel jobs, as described here: https://htcondor.readthedocs.io/en/latest/admin-manual/setting-up-special-environments.html?highlight=dedicated%20scheduler#selecting-and-setting-up-a-dedicated-scheduler


You may need to be careful to pick an implementation of MPI that works well with your application, we can't help you there.


-greg