[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [External] Task scheduling with wall-clock-time of SLURM nodes



Hello,

 

Assuming that youâre sbatching a starter for the SLURM node, which then  joins the pool and gets matched to the pending job, you can set up a starter attribute such as âExpirationTimeâ indicating the UNIX timestamp at which the worker will terminate.

 

So say you sbatch an HTCondor starter SLURM job with --time=6:00:00 for a six-hour lifetime. Youâd then calculate what now plus six hours would be via â$(($(date +%s) + (6 * 3600))) in bash, for example, which for me right now is 1694457722, and set that as the ExpirationTime for the SLURM-launched starter.

 

ExpirationTime = 1694457722

 

You could also set a RunimeRemaining _expression_ like so based on the ExpirationTime value:

 

RuntimeRemaining = ExpirationTime - time()

 

Then your requirements _expression_ could easily match to machines with enough RuntimeRemaining to satisfy the jobâs EstimatedRuntime:

 

EstimatedRuntime = 4 * 3600

Requirements = TARGET.RuntimeRemaining > EstimatedRuntime

 

If a machine doesnât have a RuntimeRemaining, such as for a dedicated node, youâd want to be able to match to both, so youâd want to check for the attribute:

 

Requirements = isUndefined(TARGET.RuntimeRemaining) \

? TRUE \

: TARGET.RuntimeRemaining > MY.EstimatedRuntime

 

Hopefully this proves helpful.

 

Michael Pelletier

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Seung-Jin Sul
Sent: Friday, September 8, 2023 1:21 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [External] [HTCondor-users] Task scheduling with wall-clock-time of SLURM nodes

 

Hi, 

 

I am using SLURM nodes to create pools of HTCondor workers and I am running a separate service that watches `condor_q` and executes `sbatch` or `scacncel` on demand. 

What I am trying to do is pass a runtime constraint for a task to HTCondor so that it can schedule the task to the SLURM node that has enough life left (enough wallclock time left). 

For example, if a task needs more than 1hr estimated runtime, I want to let HTCondor schedule the task to any SLURM nodes that have more than 1hr life time. 

 

Anyone has done it? Any ideas will be appreciated.

 

Thank you!

 

Best regards, 

Seung