[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_submit_dag doesn't work but condor_dagman does



Mark,

I think I've figured it out. One of our job transforms modified the dagman job's REQUIREMENTS in a way that prevented it from actually running.

Sorry for the trouble


Vlad



On 10/29/21 1:18 PM, Mark Coatsworth wrote:
Hi Vlad,

I've looked into this more closely. It seems that the version mismatch
error is a false flag. If you run condor_dagman manually using the
argument list provided in the .condor.sub file, the error message
always comes up (regardless of actual versions installed) due to a
string quoting issue.

Moreover, if this was indeed a version mismatch, condor_dagman is
supposed to exit immediately, not sit idle. So I think the actual
problem is something else.

The next step is looking through your dagman output in detail. Can you
try turning up the debug level in your configuration:

DAGMAN_DEBUG = D_FULLDEBUG

Then resubmit your dag using condor_submit_dag, let it sit idle for a
few minutes without submitting jobs, then send me your .dagman.out
file? I think we'll find some clues in there.

Feel free to follow up with me personally, we can share our findings
with the list once we've figured this out.

Mark


On Fri, Oct 29, 2021 at 12:23 PM Vladimir Brik
<vladimir.brik@xxxxxxxxxxxxxxxx> wrote:

  > condor_dagman -version
  > condor_submit_dag -version
Versions seem the same:
sub-2 ~ # condor_dagman -version
$CondorVersion: 9.2.0 Sep 23 2021 BuildID: 557262 PackageID:
9.2.0-1 $
$CondorPlatform: x86_64_CentOS8 $
sub-2 ~ # condor_submit_dag -version
$CondorVersion: 9.2.0 Sep 23 2021 BuildID: 557262 PackageID:
9.2.0-1 $
$CondorPlatform: x86_64_CentOS8 $

rpm -V says binaries haven't been modified, and I see no
aliases, wrappers, PATH issues...

Running condor_submit_dag with -AllowVersionMismatch still
results in the DAG job just sitting in the queue, not
creating jobs. However, with -AllowVersionMismatch, running
condor_dagman manually works as expected.

When I run condor_q -bet on the dag job, I get
375422.000:  This schedd's StartSchedulerUniverse evalutes
to true for this job.

Regular REQUIREMENTS are ignored for universe 7 jobs, right?

Not sure where to go from here


Vlad


On 10/29/21 11:26 AM, Mark Coatsworth wrote:
Hi Vlad,

That message indicates a version mismatch between your
condor_submit_dag and condor_dagman binaries. Is it possible one of
the binaries you're using is an older copy, or you have some other
unusual system configuration that includes different versions?

What output are you seeing from the following two commands?

condor_dagman -version
condor_submit_dag -version

Either way I don't think we've changed the argument syntax in quite
some time, so you should be able to run condor_submit_dag with the
-AllowVersionMismatch and have that work correctly.

Mark


On Fri, Oct 29, 2021 at 9:45 AM Vladimir Brik
<vladimir.brik@xxxxxxxxxxxxxxxx> wrote:

Hello

I am running into a problem where if I submit a dag using
condor_dagman directly everything is fine, but if I use
condor_submit_dag, the dag job just sits there not
submitting any jobs.

I ran the command of the job created by condor_submit_dag
and the .condor_dagman.out contained the following:

10/29/21 09:35:53 Error: the version (: 9.2.0 Sep 23 2021
BuildID: 557262 PackageID: 9.2.0-1 $) of this DAG's HTCondor
submit file (created by condor_submit_dag) is invalid!
10/29/21 09:35:53 **** condor_dagman (condor_DAGMAN) pid
504426 EXITING WITH STATUS 1

Running condor 9.2.0 on Centos8

Can anybody help?



Thanks

Vlad
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/