[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_hold and DAGs?



On May 3, 2007, at 3:35 PM, Armen Babikyan wrote:
Does condor have a command that I can run that will send a "hold" message to a DAG and all sub-DAGs?

Assuming you mean "a DAG and all its submitted jobs", then for DAGMan job id xyz, just run:

% condor_hold -constraint 'DAGManJobId == xyz || Cluster == xyz'

(DAGMan publishes its own job id into each submitted job's classad, in the DAGManJobId attribute, so you just need to reference it as a boolean constraint.)

This will put the DAGMan job and its currently-submitted nodes on hold more or less simultaneously, but that's okay -- it doesn't matter to DAGMan whether it goes on hold right before or right after its jobs. When you release, DAGMan should recover correctly either way.

If you want to put a "tree" of DAGs and sub-DAGs on hold, it's a little more complicated unless you can submit them with a custom classad attribute in common.

Since I'm writing a feature request, I may as well go all the way: it would be nice to have ... another command to hold all jobs in a DAG tree so that running jobs continue running, but all Idle jobs are held, and existing dagman instances don't submit new jobs until being released.

You can do this by running condor_hold on only DAGMan itself, and leaving its currently-submitted jobs alone.

I'm using version of 6.7.20 of Condor, and haven't had a need to upgrade ("if it ain't broke don't fix it"). I've looked through the changelogs for new versions, but haven't seen this feature. Please let me know if I've missed it!

The command above works with any version of DAGMan, but you might need to modify it slightly for versions before 6.8, when the DAGManJobId was a string containing the entire Condor job ID, rather than an integer containing only the cluster id. I.e.:

% condor_hold -constraint 'DAGManJobId == "xyz.0" || Cluster == xyz'

-Peter

--
Peter Couvares                        University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
pfc@xxxxxxxxxxx                       1210 W. Dayton St. Rm #4241
(608) 265-8936                        Madison, WI 53706-1685