[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor's calculated memory vs image size of jobs in queue
- Date: Wed, 16 May 2007 12:13:53 -0700
- From: Stuart Anderson <anderson@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] condor's calculated memory vs image size of jobs in queue
This looks like it might be a problem with the automatic job clustering.
As a test you might try enabling NEGOTIATE_ALL_JOBS_IN_CLUSTER to see if that
solves the problem before digging deeper.
On Wed, May 16, 2007 at 11:55:44AM -0500, Paul Armor wrote:
> I'm noticing an interesting edge case in our pool, where a user has lots
> of jobs queued up... some may get evicted after some amount of run time,
> fail to match when they try to pick up where they left off after a
> checkpoint/eviction as their SIZE had grown to larger than the "Memory"
> value determined on start up on the compute node. When that job has the
> lowest job id for that user in the queue, schedd will just spin from that
> point on, only trying to schedule that job, and no others...
Stuart Anderson anderson@xxxxxxxxxxxxxxxx