[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Immediate Jobs



Hello,

 

On my part, we use HTCondor for both distributed and parallel jobs, on the same compute systems.

 

Distributed jobs are more of an HTC requirement, although it would be nice to be able to set priorities within the submit file (ie: Project B, Priority #7). A single user can work on multiple projects and have various tasks within those projects.

 

As for parallel jobs, Iâm well aware that HTCondor is not fully featured for HPC usage, but we try to make it work nonetheless. A few tweaks here and there would probably make things a lot easier for both sysadmins and users. Again, a priority system would be ideal. If the cluster(s) are fully loaded or the required specific resources for the more urgent job are already taken, then the scheduler would need to put on hold the less urgent jobs (or relocate them) until the more urgent ones are completed. Being able to change the priority of running jobs would also be needed (ie: thereâs a new priority #1, and this is the THIRD new priority #1â.).

 

Maybe thereâs already a way to make all that happen, but I havenât gotten the time to investigate if and how this could be configured within HTCondor.

In my opinion, if HTCondor had a little more to give regarding HPC requirements, it would probably become the go-to scheduler for most, if not all computing systems.

Containers are also getting more and more popular. I think some are moving to Nomad because of this. My users do not require containers (yet), so I havenât taken the time to compare the two products.

 

Martin

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Collin Mehring
Sent: May 24, 2022 8:40 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Immediate Jobs

 

Hello Condor Users,

 

I've been enjoying HTCondor Week 2022 so far and wanted to continue the conversation started by Peter Couvares' "Future of Computation" talk, specifically about "Immediate" or "Online" jobs that need to run right away. Since I'm attending virtually and can't do so at the Terrace social tonight I'm turning to the users group.

 

To summarize the need/ask: There are occasionally jobs where the output is needed right away in order to be useful, and HTCondor doesn't currently have a great way of handling this. This is opposed to typical batch jobs where the time spent in queue is usually irrelevant (to a certain point). I understand that the desire for immediate turn-around time is at odds with the high-throughput nature of HTCondor, but the scheduling system needs to be aware of these exceptions if the resources/EPs are to be shared with normal batch jobs.

 

The solution presented in the talk was to have dedicated slots set up for these jobs. The downside to this is low utilization of those resources when there are no jobs of this type, or impacting other currently running jobs on the machine if oversubscribing the cores. The second downside didn't seem to matter much for this case and was what they went with.

 

It was also mentioned during the post-talk discussion that the condor_now tool might help with this problem. I haven't used this tool myself, but from the documentation it looks like it replaces one currently running job from a scheduler with another idle job from the same scheduler. Essentially just reassigning the existing claim. This isn't ideal for our particular use-case because:

a) It requires the scheduler to have an existing claim to reuse, which may not be the case.

b) It requires the new job to have already been submitted and exist on the schedd. This adds some overhead time, but more importantly requires the submitting program to have to wait until the job is created before it can continue. (In testing for DAG jobs it took 30-40 seconds from submit time until the condor_dagman was running and had submitted the first job in the graph, and then an additional 30-60 seconds for that job to match and start running if resources were available.)

 

Our current solution to this uses the now-removed Compute On Demand (COD) functionality of HTCondor. I don't recommend doing this because it won't work in the current version (9.0+), but I think it demonstrates what we're trying to do. We're also "submitting" from our own Python API for creating submissions, which then sends the info to a service stack that eventually ends up doing the actual condor_submit_dag using the Python bindings. The specifics of how this part works aren't important, but if you want more details I did a presentation on it at HTCondor Week 2019 and the slides should be out there. The relevant part is that the user can pass an argument at submit time in the Python API which tells the submissions service to use COD instead of condor_submit_dag. When this happens the submissions service does roughly these steps:

  1. Builds a jobAd from the submission graph
  2. Get a list of startdAds for eligible slots from the Collector
  3. Iterate through the startdAds finding the best match
  4. Create a claim on the matching slot and activate the jobAd there

This glosses over some important details, like user permissions and tracking the claims, but for this request I want to focus on step 4. The core of this feature request is to provide a way of doing that step. Given a jobAd and a matching startdAd, run the job on the startd. This bypasses the scheduling system to start the job as soon as possible while still informing the Negotiator that the slot can't accept other work. The whole process typically takes less than a second from user submit to the job actually running. That said there are many downsides to this approach that limit its use to when the start time is absolutely critical:

  • Since you're skipping the Negotiator you lose all of the benefits of why you're probably using HTCondor to begin with:
    • No fair-share or group accounting
    • Reimplementing sorting matches (luckily easy with the included Python bindings)
  • You're also skipping the Schedd which means:
    • No way of querying job info out-of-the-box
    • No re-queuing the process, once it exits it's gone
    • No handling of file transfer
  • Can only claim entire slots, even if they're partitionable

So we use this sparingly, but it is important for the times it's needed.

 

My ideal solution would be something like adding a startJob(jobAd) function to the htcondor.Startd class. This would probably require the Startd object to have been initialized with a StartdPrivate ad, and the calling user/host to have permissions in the matching ALLOW config entries. COD suspends currently running jobs on the slot when it's claimed, but I wouldn't mind if they were just vacated to make room.

 

With all of that out of the way, is anyone else out there running into a similar requirement? What was your solution? If you decided to use an entirely different system that doesn't interface with HTCondor how did that impact your batch pool? What would your dream solution be? Is there something else important I'm completely missing?

 

I'd love to hear from people.

 

Best,

Collin

--

Collin Mehring | Software Engineer - Distributed Computing