[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Local universe in DAGs



Hi all,Â

I'm using condor 8.3.7 with 2 worker nodes and a 'personal' (i.e. everything) host all on Ubuntu 14.04.Â

In submitting DAG jobs, I have noticed that if one node uses the 'local' universe, then the job completes successfully (according to the Schedd log), but the dag.nodes.log file is not written to.Â

This means that any child nodes of this node will never be kicked off and the dag sits around waiting forever.Â

See the following example:Â

test.dag
JOB A local.sub
JOB B vanilla.sub
JOB C vanilla.sub

PARENT A CHILD B C

local.sub
executable = /bin/sleep
arguments=10
universe = local
queue

vanilla.sub
executable = /bin/sleep
arguments=10
universe = local
queue

This dag never kicks off jobs B, C and gives an output file of:

000 (177593.000.000) 10/08 12:26:17 Job submitted from host: <xxx.xxx.x.xxx:42260?addrs=xxx.xxx.x.xxx-42260>
  DAG Node: A
...

Is this expected behaviour? If so, then why?Â

I would like to run jobs in the local universe here, as in production our submit host does not have a Startd (since it runs other stuff), but I would like to run a job which effectively looks at jobs on the cluster and loops until a certain subset do not exist (since they will block the dependent jobs in this DAG).

Note: I have tried this on our production setup, which uses Condor 8.2.4, and it works as expected, so I assume this is a bug after this version, as opposed to new functionality? The problem is also in the git master (8.5.1 at current).Â

Thanks,Â

Matt