[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Steffen Grunewald wrote:
For a homogeneous pool, and "simple" job clusters (identical specs for all
jobs) NEGOTIATE_ALL_JOBS_IN_CLUSTER is suggested to be set to False.
On the other hand, there may be situations where the first job of a single
cluster continues to fail (for whatever reason: memory overcommit comes to
mind) thus blocking all others.

Hi Steffen  -

What version of Condor are you working with?

Starting back w/ Condor v7.0.x and above, the default built-in auto clustering mechanism in Condor should prevent the situations you describe above --- and do so in a much more efficient/scalable manner than setting NEGOTIATE_ALL_JOBS_IN_CLUSTER to TRUE (which is the kiss of performance death if you have thousands of jobs).

Is it possible to - e.g. once per given time period (4 hours?) - "flush"
the queue by temporarily setting the macro to True?

Maybe something else is going on? With Condor v7.0.x and above with the default auto-clustering, I assert you should never have to resort to NEGOTIATE_ALL_JOBS_IN_CLUSTER = True. Are you over-riding autoclustering in your config file by expliciting setting SIGNIFICANT_ATTRIBUTES or some such on your condor_config on your submit hosts?


Todd Tannenbaum                       University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257