[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAGMAN delay between submit of job and scheduling of that job



On Wed, 1 Oct 2008, Steve Shaw wrote:

This subject has been previously visited (see subject: 'DAGMAN slow startup'), but I was hoping somebody might have some more insight. I submit dependent jobs via the condor DAG submit, and I'm finding that there is a delay between when the condor_dagman starts running and submits the first job in my DAG and when that job actually gets farmed out to one of the machines in my network. The delay is actually significant. Anywhere between 2 to 5 minutes. On the odd occasion, it will start up almost immediately, so I'm assuming its related to waiting for a reschedule event or something and is kind of luck of the draw.

When I submit any of these jobs with a plain ol' condor_submit, it finds a dance partner pretty quickly and starts running. It seems to only be when dagman submits a job. I don't know the underlying logic behind these calls, so I don't know if that makes any sense to those of you who are developing for Condor.

This is kind of strange. There's really no significant difference between DAGMan submitting a job versus manually running condor_submit (DAGMan actually runs condor_submit to submit each job). Especially if you are actually seeing the jobs in the queue, but they are not running, it seems unlikely that DAGMan itself has much to do with this. I wonder if the
problem has something to do with the *pattern* of submits when you run
DAGMan as opposed to submitting jobs manually. I'm not a real expert on the negotiation cycle, but that's kind of an initial guess.

Kent Wenger
Condor Team