[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_submit_dag doesn't work but condor_dagman does



Hi Vlad,

I've looked into this more closely. It seems that the version mismatch
error is a false flag. If you run condor_dagman manually using the
argument list provided in the .condor.sub file, the error message
always comes up (regardless of actual versions installed) due to a
string quoting issue.

Moreover, if this was indeed a version mismatch, condor_dagman is
supposed to exit immediately, not sit idle. So I think the actual
problem is something else.

The next step is looking through your dagman output in detail. Can you
try turning up the debug level in your configuration:

DAGMAN_DEBUG = D_FULLDEBUG

Then resubmit your dag using condor_submit_dag, let it sit idle for a
few minutes without submitting jobs, then send me your .dagman.out
file? I think we'll find some clues in there.

Feel free to follow up with me personally, we can share our findings
with the list once we've figured this out.

Mark


On Fri, Oct 29, 2021 at 12:23 PM Vladimir Brik
<vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
>
>  > condor_dagman -version
>  > condor_submit_dag -version
> Versions seem the same:
> sub-2 ~ # condor_dagman -version
> $CondorVersion: 9.2.0 Sep 23 2021 BuildID: 557262 PackageID:
> 9.2.0-1 $
> $CondorPlatform: x86_64_CentOS8 $
> sub-2 ~ # condor_submit_dag -version
> $CondorVersion: 9.2.0 Sep 23 2021 BuildID: 557262 PackageID:
> 9.2.0-1 $
> $CondorPlatform: x86_64_CentOS8 $
>
> rpm -V says binaries haven't been modified, and I see no
> aliases, wrappers, PATH issues...
>
> Running condor_submit_dag with -AllowVersionMismatch still
> results in the DAG job just sitting in the queue, not
> creating jobs. However, with -AllowVersionMismatch, running
> condor_dagman manually works as expected.
>
> When I run condor_q -bet on the dag job, I get
> 375422.000:  This schedd's StartSchedulerUniverse evalutes
> to true for this job.
>
> Regular REQUIREMENTS are ignored for universe 7 jobs, right?
>
> Not sure where to go from here
>
>
> Vlad
>
>
> On 10/29/21 11:26 AM, Mark Coatsworth wrote:
> > Hi Vlad,
> >
> > That message indicates a version mismatch between your
> > condor_submit_dag and condor_dagman binaries. Is it possible one of
> > the binaries you're using is an older copy, or you have some other
> > unusual system configuration that includes different versions?
> >
> > What output are you seeing from the following two commands?
> >
> > condor_dagman -version
> > condor_submit_dag -version
> >
> > Either way I don't think we've changed the argument syntax in quite
> > some time, so you should be able to run condor_submit_dag with the
> > -AllowVersionMismatch and have that work correctly.
> >
> > Mark
> >
> >
> > On Fri, Oct 29, 2021 at 9:45 AM Vladimir Brik
> > <vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
> >>
> >> Hello
> >>
> >> I am running into a problem where if I submit a dag using
> >> condor_dagman directly everything is fine, but if I use
> >> condor_submit_dag, the dag job just sits there not
> >> submitting any jobs.
> >>
> >> I ran the command of the job created by condor_submit_dag
> >> and the .condor_dagman.out contained the following:
> >>
> >> 10/29/21 09:35:53 Error: the version (: 9.2.0 Sep 23 2021
> >> BuildID: 557262 PackageID: 9.2.0-1 $) of this DAG's HTCondor
> >> submit file (created by condor_submit_dag) is invalid!
> >> 10/29/21 09:35:53 **** condor_dagman (condor_DAGMAN) pid
> >> 504426 EXITING WITH STATUS 1
> >>
> >> Running condor 9.2.0 on Centos8
> >>
> >> Can anybody help?
> >>
> >>
> >>
> >> Thanks
> >>
> >> Vlad
> >> _______________________________________________
> >> HTCondor-users mailing list
> >> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>
> >> The archives can be found at:
> >> https://lists.cs.wisc.edu/archive/htcondor-users/
> >
> >
> >
> > --
> > Mark Coatsworth
> > Systems Programmer
> > Center for High Throughput Computing
> > Department of Computer Sciences
> > University of Wisconsin-Madison
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> >
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/



-- 
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison