[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Is it possible to immediately suspend jobs of a DAGman job?



Michael, thanks for your explanation. My intention of holding a dagman job is to release the resource, then what I have to do is to issue condor_rm command, is it correct?

hufh

On Fri, Jan 4, 2019 at 12:21 AM Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx> wrote:

A suspended job retains the claim. It uses the standard Linux process-suspend mechanism via signals. Just as a process suspended with âkill âSUSPâ stays in the process table and keeps its memory allocation, a suspended HTCondor job keeps its claim.

Â

In the past Iâve used a suspend policy on desktop workstations which run jobs when idle. Certain jobs didnât like to be simply killed, so I tried to make sure that they were evicted as infrequently as possible. I would have the job immediately suspend when the user returned so as to avoid any impact on their use of the machine, and then unsuspend when the machine went idle again, or it would vacate after a certain amount of time spent suspended if there were other matching machines available.

Â

In order to release a claim, the job has to be vacated or removed.

Â

The âkeep_claim_idleâ setting in a job submission has to do with avoiding negotiator overhead for matching jobs. Increasing it just means that the claim can be reused without having to go through returning it to the schedd and having it reassigned â the start daemon can just be directly asked to run another job.

Â

Michael V. Pelletier
Information Technology
Digital Transformation & Innovation
Integrated Defense Systems
Raytheon Company

Â

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of hufh
Sent: Thursday, January 3, 2019 12:45 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [External] Re: [HTCondor-users] Is it possible to immediately suspend jobs of a DAGman job?

Â

Mark,

Â

Thanks for your reply! It works now. But looks like the slot is still claimed, not released, is that expected? or we need to set some configs like "keep_claim_idle" to release it?

Â

hufh

Â

On Thu, Jan 3, 2019 at 2:22 AM Mark Coatsworth <coatsworth@xxxxxxxxxxx> wrote:

Hello,

Â

The behavior you're seeing is as expected. Running condor_hold on a running DAGMan will only hold DAGMan itself, not any jobs running under it.

Â

If you want to suspend the jobs running under DAGMan, you have to do this manually:

Â

condor_hold <DAGManJobId>

condor_hold -constraint "DAGManJobId == <DAGManJobId>"

Â

Later, to release them all again:

Â

condor_release <DAGManJobId>

condor_release -constraint "DAGManJobId == <DAGManJobId>"

Â

Hope this helps,

Â

Mark

Â

Â

Â

On Wed, Jan 2, 2019 at 9:35 AM hufh <hufh2004@xxxxxxxxx> wrote:

Hi all,

Â

I am using DAGMan to run jobs, and want to suspend it, but i only found that condor_hold can't immediately stop running jobs until next ones. I have tried condor_suspend, but looks like it doesn't work for DAGman jobs, could you tell me if a DAGman jobs can be immediately suspended? Thanks a lot!

Â

hufh

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


Â

--

Mark Coatsworth

Systems Programmer

Center for High Throughput Computing

Department of Computer Sciences

University of Wisconsin-Madison

+1 608 206 4703

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/