[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DAGman jobs failing custom requirements



On Fri, 25 Jan 2013, Smithies, Russell wrote:

We're just starting out getting dagman jobs working and have run into a small problem.
Our normal condor_submit jobs work OK, and when I run each individual job in the DAG it works OK, but when I submit the whole DAG the job doesn't run. The Sched log says it's failed requirements e.g. "The Requirements attribute for job 7918.0 did not evaluate. Unable to start job" and the job just sits in the queue idle.
If I condor_qedit the requirements and remove the first arg (which is TARGET.Site == MY.Site ) then the DAG runs to completion.
We have this extra 'Site" attribute as we're geographically distributed and it's best to have the users running their jobs locally for better file IO. This is set in each servers condor_config.

Is 'TARGET.Site == MY.Site' set in APPEND_REQUIREMENTS in your
configuration?

I don't know offhand why things should be any different with DAGMan -- DAGMan just runs condor_submit to actually submit the job.

A couple of things that should help diagnose it:

* Your dagman.out file -- I'm interested to see what arguments DAGMan is passing to condor_submit, and if anything strange is going on there.

* The output of 'condor_q -l' for a job inside and outside of the DAG.
(Or 'condor_history -l' if the job finished before you got a chance to
run condor_q.)

Kent Wenger
CHTC Team