[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Queue problems



On Tue, Apr 18, 2006 at 01:12:30PM -0500, Todd Tannenbaum wrote:
> At 10:15 AM 4/18/2006, Andy Wettstein wrote:
> >Hi
> >
> >We're running condor 6.7.18 and have noticed a problem when we add a
> >machine requirement to the submit file.  We have a submit file like
> >this:
> >
> >Executable     = hello.sh
> >Universe       = vanilla
> >Output         = hello.out
> >Log            = hello.log
> >Requirements   = (machine == "xxx1")
> >Queue 10
> >
> >hello.sh just echoes hello and sleeps for 30 seconds.  If we submit this
> >job and then change to machine xxx2 and submit again, we don't get any
> >jobs run on xxx2 until all the jobs on xxx1 have completed.  From what I
> >can tell, when we submit jobs this way condor stops trying to match
> >jobs in the queue after it rejects a job.  So since xxx1 has 4 vm's it
> >condor will start 4 jobs on it, then see it can't run the next job, and
> >then just skip the rest of the queue instead of trying to match the jobs
> >than should be able to run on xxx2.  If we take out the machine
> >requirement condor does run jobs simultaneously on xxx1 and xxx2 as
> >expected.
> >
> >Could this be a configuration error of some sort or is this a bug with
> >condor?
> 
> This is an unfortunate bug that has been recently fixed for the next 
> Condor release.  So with v6.7.19+ you should not have to worry about it.
> 
> But w/ v6.7.18, there is a bug in the code that automatically sets 
> SIGNIFICANT_ATTRIBUTES.
> There are a couple ways you can work around it.
> 
> v6.7.18 work around idea #1
> Use a submit file that adds one level of indirection to the 
> Requirements, like so :
>     executable = hello.sh
>     requirements = wanted
>     +wanted = (machine == "xxx1")
>     queue 10
> 
> v6.7.18 work around idea #2
> In our condor_config file, add
>     SIGNIFICANT_ATTRIBUTES = ClusterId
> and then *restart* the schedd (condor_restart -schedd).
> 
> Work around #1 will result in better negotiation, but requires 
> changes to all submit files.
> Work around #2 requires no changes to submit files, but will result 
> in negotiation that performs as good/bad as in Condor v6.6.x.
> 
> Again, this has already been fixed in the code for v6.7.19, which 
> would normally appear on the web within a week or so (but this may be 
> delayed by a few days because of the Condor Week conference in 
> Madison, WI next week).  Note that v6.7.19 of Condor is the *last* 
> developer release before the next v6.8.0 stable release.

Ok.  I tested out workaround #1 and it worked fine.  We only have 1
user that noticed this, so I think that won't be much of a problem to
change the submit files.

Thanks
Andy