[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] HTCondor within Slurm?
- Date: Mon, 15 Jul 2019 07:53:22 -0500
- From: Greg Thain <gthain@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] HTCondor within Slurm?
On 7/13/19 10:03 AM, Steffen Grunewald wrote:
I've been asked to install HTCondor on a HPC cluster running Slurm.
While this sounds crazy to me, I might just be ignorant, so I'd like
to ask here before denying the request - has it been done somewhere
else, for whichever reason, and if you did it, would you like to
share your insights?
We don't think this is crazy at all.Â The fundamental idea of High
Throughput Computing is to be able to use as many machines as possible,
whether they are dedicated to the purpose, sometimes-idle machines you
can "borrow" from someone else, cloud machines you can rent for money,
or others.Â Several sites, including here at the UW, backfill slurm
clusters with jobs from HTCondor systems.
There are two ways to do this. This first involves running a HTCondor
worker node setup on the SLURM clusters work nodes, but only activating
it when SLURM tells us it is idle.Â The slurm prologue and epilogue
hooks are helpful here.Â Example scripts with PBS, that work pretty much
the same with slurm are available on our wiki site here:
The advantage of this approach is that it is easy to set up, easy to
debug from the condor side.Â The disadvantage is that slurm doesn't know
about these jobs, so it cannot account for them or make scheduling
decisions about them.Â Like any federated systems, the jobs need to be
prepared to run in a "foreign" environment, with perhaps a different
Linux distro, different locally installed software, etc.Â Generally, we
configure the start expressions on these machines so that users have to
opt-in to using them, to minimize surprises.
A second way is more complicated to set up, but gives slurm more
visibility to the jobs.Â This method relies on the job router to convert
vanilla condor jobs in the condor's schedd to grid jobs that go to
slurm, and then the slurm scheduler sees these as jobs, and can schedule
them as it sees fit, and accounts for them in the usual way.
We'd be happy to give you a hand to help set up either of these methods.