[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_submit_dag doesn't work but condor_dagman does



Thanks for letting us know Vlad, glad you've got it working again!

Mark

On Fri, Oct 29, 2021 at 1:33 PM Vladimir Brik
<vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
>
> Mark,
>
> I think I've figured it out. One of our job transforms
> modified the dagman job's REQUIREMENTS in a way that
> prevented it from actually running.
>
> Sorry for the trouble
>
>
> Vlad
>
>
>
> On 10/29/21 1:18 PM, Mark Coatsworth wrote:
> > Hi Vlad,
> >
> > I've looked into this more closely. It seems that the version mismatch
> > error is a false flag. If you run condor_dagman manually using the
> > argument list provided in the .condor.sub file, the error message
> > always comes up (regardless of actual versions installed) due to a
> > string quoting issue.
> >
> > Moreover, if this was indeed a version mismatch, condor_dagman is
> > supposed to exit immediately, not sit idle. So I think the actual
> > problem is something else.
> >
> > The next step is looking through your dagman output in detail. Can you
> > try turning up the debug level in your configuration:
> >
> > DAGMAN_DEBUG = D_FULLDEBUG
> >
> > Then resubmit your dag using condor_submit_dag, let it sit idle for a
> > few minutes without submitting jobs, then send me your .dagman.out
> > file? I think we'll find some clues in there.
> >
> > Feel free to follow up with me personally, we can share our findings
> > with the list once we've figured this out.
> >
> > Mark
> >
> >
> > On Fri, Oct 29, 2021 at 12:23 PM Vladimir Brik
> > <vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
> >>
> >>   > condor_dagman -version
> >>   > condor_submit_dag -version
> >> Versions seem the same:
> >> sub-2 ~ # condor_dagman -version
> >> $CondorVersion: 9.2.0 Sep 23 2021 BuildID: 557262 PackageID:
> >> 9.2.0-1 $
> >> $CondorPlatform: x86_64_CentOS8 $
> >> sub-2 ~ # condor_submit_dag -version
> >> $CondorVersion: 9.2.0 Sep 23 2021 BuildID: 557262 PackageID:
> >> 9.2.0-1 $
> >> $CondorPlatform: x86_64_CentOS8 $
> >>
> >> rpm -V says binaries haven't been modified, and I see no
> >> aliases, wrappers, PATH issues...
> >>
> >> Running condor_submit_dag with -AllowVersionMismatch still
> >> results in the DAG job just sitting in the queue, not
> >> creating jobs. However, with -AllowVersionMismatch, running
> >> condor_dagman manually works as expected.
> >>
> >> When I run condor_q -bet on the dag job, I get
> >> 375422.000:  This schedd's StartSchedulerUniverse evalutes
> >> to true for this job.
> >>
> >> Regular REQUIREMENTS are ignored for universe 7 jobs, right?
> >>
> >> Not sure where to go from here
> >>
> >>
> >> Vlad
> >>
> >>
> >> On 10/29/21 11:26 AM, Mark Coatsworth wrote:
> >>> Hi Vlad,
> >>>
> >>> That message indicates a version mismatch between your
> >>> condor_submit_dag and condor_dagman binaries. Is it possible one of
> >>> the binaries you're using is an older copy, or you have some other
> >>> unusual system configuration that includes different versions?
> >>>
> >>> What output are you seeing from the following two commands?
> >>>
> >>> condor_dagman -version
> >>> condor_submit_dag -version
> >>>
> >>> Either way I don't think we've changed the argument syntax in quite
> >>> some time, so you should be able to run condor_submit_dag with the
> >>> -AllowVersionMismatch and have that work correctly.
> >>>
> >>> Mark
> >>>
> >>>
> >>> On Fri, Oct 29, 2021 at 9:45 AM Vladimir Brik
> >>> <vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> Hello
> >>>>
> >>>> I am running into a problem where if I submit a dag using
> >>>> condor_dagman directly everything is fine, but if I use
> >>>> condor_submit_dag, the dag job just sits there not
> >>>> submitting any jobs.
> >>>>
> >>>> I ran the command of the job created by condor_submit_dag
> >>>> and the .condor_dagman.out contained the following:
> >>>>
> >>>> 10/29/21 09:35:53 Error: the version (: 9.2.0 Sep 23 2021
> >>>> BuildID: 557262 PackageID: 9.2.0-1 $) of this DAG's HTCondor
> >>>> submit file (created by condor_submit_dag) is invalid!
> >>>> 10/29/21 09:35:53 **** condor_dagman (condor_DAGMAN) pid
> >>>> 504426 EXITING WITH STATUS 1
> >>>>
> >>>> Running condor 9.2.0 on Centos8
> >>>>
> >>>> Can anybody help?
> >>>>
> >>>>
> >>>>
> >>>> Thanks
> >>>>
> >>>> Vlad
> >>>> _______________________________________________
> >>>> HTCondor-users mailing list
> >>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> >>>> subject: Unsubscribe
> >>>> You can also unsubscribe by visiting
> >>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>>>
> >>>> The archives can be found at:
> >>>> https://lists.cs.wisc.edu/archive/htcondor-users/
> >>>
> >>>
> >>>
> >>> --
> >>> Mark Coatsworth
> >>> Systems Programmer
> >>> Center for High Throughput Computing
> >>> Department of Computer Sciences
> >>> University of Wisconsin-Madison
> >>> _______________________________________________
> >>> HTCondor-users mailing list
> >>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> >>> subject: Unsubscribe
> >>> You can also unsubscribe by visiting
> >>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>>
> >>> The archives can be found at:
> >>> https://lists.cs.wisc.edu/archive/htcondor-users/
> >>>
> >> _______________________________________________
> >> HTCondor-users mailing list
> >> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>
> >> The archives can be found at:
> >> https://lists.cs.wisc.edu/archive/htcondor-users/
> >
> >
> >



-- 
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison