[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] avoiding vanilla job eviction

On Thu, Nov 08, 2007 at 03:33:58PM -0700, Pasquale Tricarico wrote:
> Hi,
> In our condor cluster, we have two classes of users (not sorted by
> importance whatsoever):
> A) users running a few jobs for a long time (weeks), sometime using
> the vanilla universe (only option, code links to libpthread);
> B) users running many jobs (more than the available nodes) all at the
> same time, for a short period (less than one day typically).
> The problem is that the B class of users typically have a very low
> effective priority (condor_userprio...), so their jobs can easily
> cause the eviction of vanilla jobs from A class users. This is a
> problem, because this way A class users lose all the time already put
> into the job, as the vanilla jobs cannot checkpoint. Since the
> standard universe is sometimes not an option, is there a way to
> configure Condor in such a way that vanilla jobs are never (or almost
> never..) evicted but just kept in memory while other jobs are running?
> Or maybe some other trick so that vanilla jobs are not restarted from
> scratch, but just suspended while waiting for enough priority? Thanks
> for your suggestions.

On our pool, we have defined 


and job eviction has gone completely.
This wouldn't suspend the long-running jobs in favour of the short ones
though - and our policy currently is a mixed "first come-first serve"
(until all resources are claimed) and "fair share" (based on cumulative
priority, with a short CLAIM_WORKLIFE) one.

Not sure whether this is what you (and your users) want...


Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html