[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAGMAN delay between submit of job and scheduling of that job

> > This subject has been previously visited (see subject: 'DAGMAN slow
> > startup'), but I was hoping somebody might have some more insight. I
> > submit dependent jobs via the condor DAG submit, and I'm finding that
> > there is a delay between when the condor_dagman starts running and
> > submits the first job in my DAG and when that job actually gets farmed
> > out to one of the machines in my network. The delay is actually
> > significant. Anywhere between 2 to 5 minutes. On the odd occasion, it
> > will start up almost immediately, so I'm assuming its related to waiting
> > for a reschedule event or something and is kind of luck of the draw.
> >
> > When I submit any of these jobs with a plain ol' condor_submit, it
> > finds a dance partner pretty quickly and starts running. It seems to
> > only be when dagman submits a job. I don't know the underlying logic
> > behind these calls, so I don't know if that makes any sense to those of
> > you who are developing for Condor.
> This is kind of strange. There's really no significant difference between
> DAGMan submitting a job versus manually running condor_submit (DAGMan
> actually runs condor_submit to submit each job). Especially if you are
> actually seeing the jobs in the queue, but they are not running, it seems
> unlikely that DAGMan itself has much to do with this. I wonder if the
> problem has something to do with the *pattern* of submits when you run
> DAGMan as opposed to submitting jobs manually. I'm not a real expert on
> the negotiation cycle, but that's kind of an initial guess.
> Kent Wenger
> Condor Team

Thanks for the response Kent.  I'm hoping there's some sort of difference between doing a condor_submit and dagman doing a submit :|... there's gotta be! :)

I'm curious if other people experience this delay or if its just me.  I can take the easiest Hello World jobs, and condor_submit them where it takes seconds to turn around and run and finish every time, then take that job and create a one-line DAG file (with one line in it: "job 1 C:\condor\jobs\helloworld.cmd") and submit_dag that and have it take anywhere from 5 to 10 minutes to complete :|.  If its just me then there's gotta be some configuration that I'm not using (or currently abusing).

Offhand, is there a simple way to speed up negotiation?  Maybe a way to force more rescheduling to happen more often?

a little more information: it seems that the job sits with open machines under the status: 'match but reject the job for unknown reasons'.  Doing a condor_reschedule always unbungles it.  I guess that's just the bucket when there's no other valid status for a given job?

Upgrade to Hotmail Plus and share more photos with bigger attachments. Click here to find out how Click here to find out how