[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Using condor to "bulk-submit" jobs to SLURM

HEPCloud has a system whereby we are using glideinwms and bosco/blahp to submit glideins to a number of big SLURM systems including NERSC, TACC Stampede2, TACC Frontera, PSC Bridges2, and SDSC Expanse.  We have found scheduling is best (as long as we have load) if we are submitting a glidein that actually spawns a multi-node job--i.e. one slurm job -> 100 full nodes of stuff calling back to HTCondor.  Of course we have the whole CMS global pool production to feed it.  

We're told there is another feature soon to come as soon as htcondor 8.9.12 in which  a separate schedd is launched on an edge service node within the HPC cluster--it is going to be called "Lumberjack" and whole workflows can be delegated to it on a large basis, and it will be key to working at places that allow no outbound network access whatsoever from their worker nodes.


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Andrew Melo <andrew.malone.melo@xxxxxxx>
Sent: Friday, February 12, 2021 2:45 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Using condor to "bulk-submit" jobs to SLURM

We run SLURM as our local batch system, but have many users who are
experienced with Condor due to its usage @ other institutions. We
would like to configure/deploy something to allow these users to
submit Condor jobs which will run on our SLURM nodes.

We run CMS and LIGO jobs through the OSG, so we're familiar with
HTCondor-CE, but we're wary of allowing users to directly submit this
way, since there is a 1:1 mapping of Condor jobs to SLURM jobs, and
particularly with SLURM's backfill scheduler, many and/or short jobs
showing up at once can severely hamper the responsiveness of the
scheduler. Additionally, for a reason I think we're close to
diagnosing, blahp submits/generates enough RPCs to account for >90% of
the entire scheduler load, even though CMS and LIGO occupy less than a
1/3rd of the total job count on the cluster.

Ideally, we could have a system like glideinWMS where the local
scheduler receives a long-running pilot who then executes multiple
user jobs inside, but it's my understanding that deploying such a
service is extremely non-trivial. I've also thought about possibly
merging several condor jobs into a single SLURM "job array" (which
somewhat behaves like a Condor cluster from the scheduling side), but
it appears the job router operates on a job-per-job basis and there's
not a good place to "coalesce" individual jobs into larger arrays.

Does anyone have good ideas on how to approach this?

Andrew Melo
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: