[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] avoiding vanilla job eviction


In our condor cluster, we have two classes of users (not sorted by
importance whatsoever):

A) users running a few jobs for a long time (weeks), sometime using
the vanilla universe (only option, code links to libpthread);

B) users running many jobs (more than the available nodes) all at the
same time, for a short period (less than one day typically).

The problem is that the B class of users typically have a very low
effective priority (condor_userprio...), so their jobs can easily
cause the eviction of vanilla jobs from A class users. This is a
problem, because this way A class users lose all the time already put
into the job, as the vanilla jobs cannot checkpoint. Since the
standard universe is sometimes not an option, is there a way to
configure Condor in such a way that vanilla jobs are never (or almost
never..) evicted but just kept in memory while other jobs are running?
Or maybe some other trick so that vanilla jobs are not restarted from
scratch, but just suspended while waiting for enough priority? Thanks
for your suggestions.