[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job dependency spec for condor_submit?



Hi Michael,

On Wed, Jan 26, 2011 at 5:05 PM, Michael Hanke <michael.hanke@xxxxxxxxx> wrote:
One common pattern in these tools is the specification of job
dependencies via "qsub -hold_jid". Is there a way to 'emulate' this with
Condor? Patching these tools to write a DAGMAN compatible job
description would be a fairly big task, maybe even impossible, and
definitely no fun. They are designed to submit one task at a time --
clearly with SGE's workflow in mind.

So -hold_jid, as I understand it, sets up a parent-child relationship for the submitted job -- it becomes a child job of a list jobs supplied via the -hold_jid option. And the submitted job, the child, won't run until all the parent jobs are completed.

Have I got that correct?

So the short answer is: no, there's no way to build a parent-child relationship for jobs in Condor, dynamically, without using DAGMan.

You could possibly do this with an external-to-Condor process though.

If you were to submit all jobs held (hold=True in your submit ticket) and you were to put a custom attribute in your submit ticket that is a comma-separated list of <clusterid>.<jobid> (or whatever you like) jobs to make the parent of your submitted job, you could write an external process that scans your scheduler queue and releases held jobs when all the jobs in it's parent job list are done (either leave the queue or are in the C state in the queue).

Not easy, but not impossible.

You have to work around the fact that Condor does clustered submissions (so do you support cluster IDs in the parent job list, or do you require a cluster.job for parents in the list).

The only other gotcha I can think of is submitting one job at a time can be detrimental to the health and throughput capabilities of a scheduler. There are performance benefits to gathering up like jobs in to a single cluster submission in Condor. So if you're looking to scale this up, you'll have to address this issue.

Regards,
- Ian