[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] DAGMan Hangs Near End



Dear All,

I have a DAGMan pipeline that starts fine, but never completes, because the last few jobs are queued but never run. A down-scaled version of it works, so I doubt that it's a programming error. There are many available nodes; why won't those jobs run? How can I analyze the individual job within the DAGMan that says "Queued"?

Thank you so much,
Oren

-- Submitter: ibicluster.uchicago.cc : <172.16.0.149:42470> : ibicluster.uchicago.cc
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
904.0 livne 9/28 13:09 0+00:15:40 R 0 7.3 condor_dagman -f -

1 jobs; 0 idle, 1 running, 0 held
===================================================================================

Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX   728   108       0       620 0          0        0

               Total   728   108       0       620 0          0        0

===================================================================================
9/28 13:23:33 Event: ULOG_EXECUTE for Condor Node D_chr10 (1009.0)
9/28 13:23:33 Number of idle job procs: 1
9/28 13:23:43 Event: ULOG_JOB_TERMINATED for Condor Node D_chr10 (1009.0)
9/28 13:23:43 Node D_chr10 job proc (1009.0) completed successfully.
9/28 13:23:43 Node D_chr10 job completed
9/28 13:23:43 Number of idle job procs: 1
9/28 13:23:43 Of 107 nodes total:
9/28 13:23:43  Done     Pre   Queued    Post   Ready Un-Ready   Failed
9/28 13:23:43   ===     ===      ===     ===     === ===      ===
9/28 13:23:43   104       0        1       0 0          2        0


--
A person is just about as big as the things that make him angry.