[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor_negotiator/condor_collector scheduling problem
- Date: Wed, 10 May 2006 18:25:21 -0400
- From: Armen Babikyan <armenb@xxxxxxxxxx>
- Subject: Re: [Condor-users] condor_negotiator/condor_collector scheduling problem
Thanks! Autoclustering was indeed the problem. Replacing
"MY_RESOURCE_1" with "MY_RESOURCE_1 == TRUE" didn't solve the problem,
but forcing the negotiator to consider all jobs with
"NEGOTIATE_ALL_JOBS_IN_CLUSTER = true" did.
Thanks again, :-)
Erik Paulson wrote:
You're getting burned by Autoclustering, which is turned on in 6.7.18 (and
a few 6.7s back). There is no problem with the negotiator.
Autoclustering allows Condor to "combine" jobs that "look" the same when
it comes to matchmake with them. If one job gets rejected, all the other
jobs that "look the same" will also be rejected, so there is no need to
negotiate for them.
The problem is Condor believes that RESOUCRE_1 jobs are the same as RESOURCE_2
jobs. When your last RESOURCE_1 job starts running, the next negotiation cycle
that looks for new resources finally sends a RESOURCE_2 job to the negotiator
and the RESOURCE_2 jobs will start.
I think the problem is the syntax you're using for MY_RESOURCE_1 and _2.
Requirements = MY_RESOURCE_1 == TRUE
instead of just
Requirements = MY_RESOURCE_1
in your submit file. I think that will show Condor that it needs to consider
MY_RESOURCE_1 as an autocluster'able attribute. (I also think we're still
working on autoclustering to fix problems like this, but I don't know the
If that doesn't work, try putting
NEGOTIATE_ALL_JOBS_IN_CLUSTER = true
in the config file for your submit machine, and retry your experiments. (I'm
not sure what that does with autoclustering, but it might fix it)
If that doesn't work, this will work for sure. Put:
START = TARGET.ClusterId > 0
into the config file of one of your execute machines, and reconfig the
execute machine. That will effectively turn off autoclustering in
your pool (because it will force Condor to autocluster on ClusterId, which
it usually can ignore)
MIT Lincoln Laboratory
armenb@xxxxxxxxxx . 781-981-1796