[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] controlling job preemption by user's Unix group



I am strongly suspecting this isn't possible but....

We have a lab with machines owned by one of two people: b & m.

In this lab we occasionally have a problem with people consuming a huge
chunk of available condor slots.   Most of the time this isn't a problem
per-se because the owners want the machines to be used as much as
possible. Ideally they'd like to see the entire cluster used 100% of the
time.

However, it becomes a problem when there is demand for 110% (or more) of
the available resources.  They want to use preemption to say "people who
work for b get priority on b's systems and people who work for m get
priority on m's systems" in those cases.

In other words, they want to preempt strictly based on who owns the
machine - people in Unix group B have priority on B's machines and
people in Unix group M have priority on M's machines -when the cluster
is full- but there would be no other priority considerations for job
preemption.  Long running jobs wouldn't be penalized for being
long-running just because someone who didn't have jobs going launched
something.  The only criteria for preempting a job is if a Unix-group-B
user's job is running on one of m's machines and there are no other
resources available for a Unix-group-M user's jobs (in either direction,
really).

It's trivial for me to detect machine ownership (I've had that set in
Condor since we started running it).  However, the last time I asked
about determining a user's Unix group memberships automatically I was
told it wasn't possible with the version of Condor available
(6.something).  I'm not seeing anything in the 7.x release notes that
makes me think that might have changed so I'm not overly hopeful about
solving this problem.  Having the user's set their affiliation in their
job files is just asking for problems, it really needs to be set by the
system.

I am presuming that if I could somehow automate setting their Unix
groups I can set a formula in PREEMPTION_REQUIREMENTS that ignores job
run time and just looks at group & machine-owner but I'm not sure how I
can add the "but only if no other resources are available on any other
host" part.  (Frankly, I'm horrible at writing Condor config equations :)

Any hints or tips would be greatly appreciated.

thanks,
nomad