[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor's calculated memory vs image size of jobs in queue



Paul,
	This looks like it might be a problem with the automatic job clustering.
As a test you might try enabling NEGOTIATE_ALL_JOBS_IN_CLUSTER to see if that
solves the problem before digging deeper.

Thanks.

On Wed, May 16, 2007 at 11:55:44AM -0500, Paul Armor wrote:
> 
> I'm noticing an interesting edge case in our pool, where a user has lots 
> of jobs queued up... some may get evicted after some amount of run time, 
> fail to match when they try to pick up where they left off after a 
> checkpoint/eviction as their SIZE had grown to larger than the "Memory" 
> value determined on start up on the compute node.  When that job has the 
> lowest job id for that user in the queue, schedd will just spin from that 
> point on, only trying to schedule that job, and no others...
> 

-- 
Stuart Anderson  anderson@xxxxxxxxxxxxxxxx
http://www.ligo.caltech.edu/~anderson