[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to hold/Release all dag jobs when hold/release dagman job?



I find that it is possible to use one command to hold/release all jobs in dag job, just adding a ClusterId statement in constraint argument:
condor_hold -constraint "ClusterId==178||DAGManJobId==178"
178 is the ClusterId of the dagman job. This works for dag job hold/release/rm.
But there is a side effect for condor_rm: you will read two job abort events for each submitted job in the log file. This make my program into an infinity loop. I think one event is from my command, the other one is from dagman job.
Thanks Kent.

在 2013-8-22 AM12:46,"R. Kent Wenger" <wenger@xxxxxxxxxxx>写道:
On Wed, 21 Aug 2013, 钱晓明 wrote:

I know all dag jobs can be reomved when I condor_rm dagman job, but
hold/release is not the case.
How can I make all jobs held/released according to dagman job status? I
think I should add something in my job submit file.

It's not too hard (assuming you don't have nested DAGs).  You do two condor_hold commands -- one to hold the DAGMan job itself, and one to hold the node jobs.

Here's an example:

manta(222)% condor_q

-- Submitter: wenger@xxxxxxxxxxxxxxxxx : <128.105.14.228:51653> : manta.cs.wisc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
 318.0   wenger          8/21 11:42   0+00:00:29 R  0   1.7  condor_dagman
 320.0   wenger          8/21 11:43   0+00:00:03 R  10  0.0 job_dagman_node_pr

2 jobs; 0 completed, 0 removed, 0 idle, 2 running, 0 held, 0 suspended
manta(223)% condor_hold 318
All jobs in cluster 318 have been held
manta(224)% condor_hold -constraint "DAGManJobId==318"
All jobs matching constraint (DAGManJobId==318) have been held
manta(225)% condor_q

-- Submitter: wenger@xxxxxxxxxxxxxxxxx : <128.105.14.228:51653> : manta.cs.wisc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
 318.0   wenger          8/21 11:42   0+00:00:42 H  0   1.7  condor_dagman
 320.0   wenger          8/21 11:43   0+00:00:30 H  10  0.0 job_dagman_node_pr

2 jobs; 0 completed, 0 removed, 0 idle, 0 running, 2 held, 0 suspended
manta(226)%


If you have sub-DAGs, you'll have to do the condor_hold with the constraint for each sub-DAG.

I'm thinking that we should create a command that does this automatically, including handling sub-DAGs...

Kent Wenger
CHTC Team
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/