[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Parallel Universe on Kubernetes



Hi All,

I know a number of institutions within the OSG, most notably the PRP, leverage kubernetes as their resource(?) manager with HTCondor running as a job scheduler in pods on top of it. I am not well enough versed in the complexity of the *dedicated scheduler* mechanism for parallel universe, so I couldn't help myself. I cannot intrinsically see why what she is asking isn't possible, but will happily defer to the developers if I am wrong.

Not that she needs me to speak for her, but I also want to put a good word in for Vanessa. Her energy is boundless, mind-boggling productive and she has an intense desire to make the tools and infrastructure not just better but friendlier to use. Having her as part of our community with her wealth of experience with containers and k8s would be both a boom for the dHTC ecosystem and a general delight.

Cheers,
Matt

P.S. - as one of the foremost askers of noob question on this list, there is no reason to apologize.

Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
www.exeter.ac.uk/research/researchcomputing/support/researchit
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom
On 19/06/2023 23:45, Sochat, Vanessa wrote:
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

Hi Folks!

 

Iâm the HPC monkey of which he speaks! Iâve been creating a Kubernetes Operator with HTCondor as the scheduler, and doing fairly well up until I needed to use the parallel universe. To not clutter your inboxes, here is a summary of where I am currently at!

 

https://gist.github.com/vsoch/2073136f0833983efc92b4eeb52d49dd

 

TLDR: if we could easily adopt the current setup with the docker images here https://github.com/htcondor/htcondor/tree/main/build/docker/services to allow for this parallel universe, that would likely be the example that I need to get it working in Kubernetes. The current working (for basic jobs) setup is here: https://github.com/converged-computing/htcondor-operator and my (so far) failed attempts are under the single opened PR to add LAMMPS. Iâm happy to show you / debug anything you might be interested in. Thanks again for your help, and apologies for my noob-level expertise â Iâm only about a day into using this beastie!

 

Best,

 

Vanessa

 

From: Matthew T West <m.t.west@xxxxxxxxxxxx>
Date: Monday, June 19, 2023 at 2:01 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Sochat, Vanessa <sochat1@xxxxxxxx>
Subject: Parallel Universe on Kubernetes

Good evening all,

I have someone from cloud HPC community curious about running multi-node
MPI jobs with HTCondor with a pool on a kubernetes cluster? Is it
possible with just grid universe or does one need to set up parallel
universe? This work is leveraging the existing container images for
central manager, access point and worker nodes.

I grant this isn't a common use-case for this community, but I feel it's
worth asking.

Cheers,
Matthew West