[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] preemption_requirements behaviour in upgrade to v6.6.2




I have already sent this message to condor_admin but, as I haven't received any answer, maybe someone on this list can give an idea...

Thanks!

Paulo Amado Mendes
---------------------------------------------------------------


We've been running for a couple of months a small condor pool in our Department, using both dedicated and personal computers running vanilla jobs in Windows 2000 and Windows XP.

Some of our users run rather long processes that can reach, individually,
several days, but others have problems that run in shorter periods of time,
from tens of minutes to a couple of hours.

In order to adjust condor preemption_requirements behaviour to suit our
needs, we've customized condor_config file with the following two lines:

NEGOTIATOR_INTERVAL     = 180
PREEMPTION_REQUIREMENTS	= $(ActivationTimer) < (5 * $(MINUTE)) \
                           && RemoteUserPrio > SubmittorPrio * 1.5

Our idea is not to kill "long processes" that have been running for some time,
but to enable users with a higher priority to enter in the pool and have their jobs
started instead of another long process being started. We don't want "long processes"
to monopolize the pool but we also don't want to kill these processes if they have
been running for some time...
These policies have been implemented and tested in Condor version 6.5.5 and also
worked flawlessily under Condor version 6.6.0.


We have recently upgraded our central manager and all computers to Condor
version 6.6.2. After upgrading our pool we have no longer been able to have
this policy working. There are "long processes" occupying all computer nodes
and although there are users with higher priority submiting their jobs, condor
doesn't start them instead of starting new "long processes". To sum up,
the information given by "condor_q -analyze" is that "ALL the machines are available
to run your job" but "PREEMPTION_REQUIREMENTS == false".


I have no clue for this behaviour. It used to work prettty well in order to balance
our lab users resources...


Does the new 6.6.2 version treat the PREEMPTION_REQUIREMENTS expression
in a differente way than the previous one? As we preserved the config file from the
previous condor instalation, should we try to run condor_reconfig in the central
manager? Will we lose some/all the jobs that are currently running? Is there any
mistake that we can't see?


I have alreadey received an email by Nathan Mueller regarding the release of
Condor version 6.6.4. As we are only running condor in windows machines, we
are strongly encouraged to upgrade to this new version as soon as possible, isn't
that right?


We thank you in advance any help provided to sort out this problem.
Best regards and have a nice annual Condor meeting.

Paulo Amado Mendes
-------------------------
University of Coimbra
Department of Civil Engineering
Portugal


Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>