Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor within Slurm?

Date: Mon, 15 Jul 2019 07:53:22 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] HTCondor within Slurm?

On 7/13/19 10:03 AM, Steffen Grunewald wrote:

Hello all,

I've been asked to install HTCondor on a HPC cluster running Slurm.
While this sounds crazy to me, I might just be ignorant, so I'd like
to ask here before denying the request - has it been done somewhere
else, for whichever reason, and if you did it, would you like to
share your insights?


Steffen:

We don't think this is crazy at all.Â The fundamental idea of HighThroughput Computing is to be able to use as many machines as possible,whether they are dedicated to the purpose, sometimes-idle machines youcan "borrow" from someone else, cloud machines you can rent for money,or others.Â Several sites, including here at the UW, backfill slurmclusters with jobs from HTCondor systems.

There are two ways to do this. This first involves running a HTCondorworker node setup on the SLURM clusters work nodes, but only activatingit when SLURM tells us it is idle.Â The slurm prologue and epiloguehooks are helpful here.Â Example scripts with PBS, that work pretty muchthe same with slurm are available on our wiki site here:https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToScavengeCyclesThe advantage of this approach is that it is easy to set up, easy todebug from the condor side.Â The disadvantage is that slurm doesn't knowabout these jobs, so it cannot account for them or make schedulingdecisions about them.Â Like any federated systems, the jobs need to beprepared to run in a "foreign" environment, with perhaps a differentLinux distro, different locally installed software, etc.Â Generally, weconfigure the start expressions on these machines so that users have toopt-in to using them, to minimize surprises.

A second way is more complicated to set up, but gives slurm morevisibility to the jobs.Â This method relies on the job router to convertvanilla condor jobs in the condor's schedd to grid jobs that go toslurm, and then the slurm scheduler sees these as jobs, and can schedulethem as it sees fit, and accounts for them in the usual way.


We'd be happy to give you a hand to help set up either of these methods.

-greg

Follow-Ups:
- Re: [HTCondor-users] HTCondor within Slurm?
  - From: Michael Di Domenico

References:
- [HTCondor-users] HTCondor within Slurm?
  - From: Steffen Grunewald

Prev by Date: Re: [HTCondor-users] HTCondor within Slurm?
Next by Date: [HTCondor-users] HTCondor on Debian 10 (problem with renamed libboost python libraries)
Previous by thread: Re: [HTCondor-users] HTCondor within Slurm?
Next by thread: Re: [HTCondor-users] HTCondor within Slurm?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] HTCondor within Slurm?