[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_q and nested dags



condor_q -dag <id>

is the equivalent of 

condor_q -constraint 'ClusterId == <id> || DAGManJobId == <dagid>'

This will only go 1 level deep because the DAGManJobId of a job is the parent's cluster id (so that we know how to build the tree).

But I think DAGMan sets the JobBatchName attributes of *all* of the jobs in the DAG to the same value, regardless of
how deep in the tree of jobs it is.  so you can do this

condor_q -dag -constraint 'JobBatchName == "run.dag+<id>"'

where <id> is the dagman job id of the top level dag.

or if you specify a batch-name when you submit the dag, you can do this

condor_q -dag -constraint 'JobBatchName == "<name>"'

-tj


-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stuart Anderson
Sent: Tuesday, January 14, 2020 1:55 PM
To: condor-users@xxxxxxxxxxx
Subject: [HTCondor-users] condor_q and nested dags

Is there a way to have condor_q to report all the jobs in a nested dag workflow?

By default it appears that condor_q -dag only goes 1 level deep, e.g.,

[root@ldas-pcdev2 ~]# condor_q 73483629.0 -dag -nobatch


-- Schedd: ldas-pcdev2.ligo.caltech.edu : <10.14.0.19:17000> @ 01/14/20 11:46:51
 ID          OWNER/NODENAME                        SUBMITTED     RUN_TIME ST PRI SIZE CMD
73483629.0   charlie.hoy                          1/14 06:14   0+05:32:22 R  0    0.3 condor_dagman -p 0 -f -l .
73483631.0    |-fbb3b83a55f12c0d92cc49e43ec57efb  1/14 06:14   0+05:32:17 R  0    0.3 condor_dagman -p 0 -f -l .

Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for all users: 8669 jobs; 0 completed, 0 removed, 2352 idle, 3646 running, 2671 held, 0 suspended


And then I have to parse that output and query the sub-dag(s) explicitly to see the next level,

[root@ldas-pcdev2 ~]# condor_q 73483631.0 -dag -nobatch


-- Schedd: ldas-pcdev2.ligo.caltech.edu : <10.14.0.19:17000> @ 01/14/20 11:49:30
 ID          OWNER/NODENAME                        SUBMITTED     RUN_TIME ST PRI SIZE  CMD
73483631.0   |-fbb3b83a55f12c0d92cc49e43ec57efb   1/14 06:14   0+05:34:56 R  0     0.3 condor_dagman -p 0 -f -l
73483636.0    |-c8c841b84df8a7ceb514a9dde511c837  1/14 06:14   0+02:28:47 R  0   171.0 lalinference_mpi_wrapper
73483637.0    |-00cad7cf786b26e4d6460f0fec333ec0  1/14 06:14   0+01:39:30 R  0   196.0 lalinference_mpi_wrapper
73483638.0    |-fc3cb53752d42dbccbadc6a453a4e10d  1/14 06:14   0+01:36:30 R  0   171.0 lalinference_mpi_wrapper
73483639.0    |-8a0e2bac8f165e5fc8c64113070ad91c  1/14 06:14   0+00:36:40 R  0   171.0 lalinference_mpi_wrapper
...


It would be nice to have an option (if it doesn't exist already) that shows the full workflow in one go, e.g., have -dag take an optional integer value that specifies how many levels to report (or condor_q -dagfull).

Thanks.

--
Stuart Anderson
sba@xxxxxxxxxxx




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/