[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_hold and DAGs?



Hi,

Does condor have a command that I can run that will send a "hold" message to a DAG and all sub-DAGs? And similarly release them? Since condor_hold operates on cluster_id and process_id, and DAGs seem to run "in-band" with respect to Condor Daemons (as "scheduler" processes), I wouldn't figure condor/condor_dagman instances have a mechanism that (e.g.) sends hold messages to all their children before sending one to themselves, but please let me know if I've missed something.

I know this functionality could probably be emulated through a script that reads job classads, figures out the DAG job tree underneath a particular instance, but that may be somewhat messy and error-prone. I'm curious to know if this strategy has worked well for other people. Is there a better solution that's internal to condor, or coming out in a future release?

Since I'm writing a feature request, I may as well go all the way: it would be nice to have a small dichotomy in commands related to holding DAGs: - one command to hold all jobs in a DAG tree the traditional way (i.e. running jobs are vacated before being put back on the scheduler's queue) - another command to hold all jobs in a DAG tree so that running jobs continue running, but all Idle jobs are held, and existing dagman instances don't submit new jobs until being released.

I'm using version of 6.7.20 of Condor, and haven't had a need to upgrade ("if it ain't broke don't fix it"). I've looked through the changelogs for new versions, but haven't seen this feature. Please let me know if I've missed it!

Thanks,

Armen

--
Armen Babikyan
MIT Lincoln Laboratory
armenb@xxxxxxxxxx . 781-981-1796