[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] RE: clarification required please

Matt Hope:
Attempting to debug some bizarre behaviour on our windows farm found the following inconsistencies.

A) CurrentRank

3.6.1 Startd ClassAd Attributes

... first ...

: A float which represents this machine owner's affinity for running the Condor job which it is currently hosting. If not currently hosting a Condor job, CurrentRank is -1.0.

... a bit further down ...

: The value of the RANK expression when evaluated against the ClassAd of the ``current'' job using this machine. If the resource has been claimed but no job is running, the ``current'' job ClassAd is the one that was used when claiming the resource. If a job is currently running, that job's ClassAd is the ``current'' one. If the resource is between jobs, the ClassAd of the last job that was run is used for CurrentRank.

which is true?

This is easily discovered by examining the computer's ClassAd. Notice that you can find the current rank of a machine by doing:

condor_status -l <name> | grep -i rank

Uhh... Does grep work on Windows? If not, just skim the output and look for the CurrentRank.

It appears to me that the CurrentRank is 0 unless a job is running. It is not -1, nor the last rank that was used. You can do your own easy test to verify this.

Perhaps the real question is "what should CurrentRank be?".

B) Preemption
from the supplied config file

## The negotiator will not preempt a job running on a given machine
## unless the PREEMPTION_REQUIREMENTS expression evaluates to true
## and the owner of the idle job has a better priority than the owner
## of the running job. This expression defaults to true.
UWCS_PREEMPTION_REQUIREMENTS = $(StateTimer) > (1 * $(HOUR)) && RemoteUserPrio > SubmittorPrio * 1.2

does this means that, in addition to this PREEMPTION_REQUIREMENTS evaluating to true the user prio must be better or that this particular expression causes this.

The condor_negotiator checks the priority internally as well.

C) Vacation
Also I believe there is a bug on the windows port:

I doubt it's Windows specific.

vanilla jobs do not immediately go to the killing state they remain in the preempting state till the timeout expires (we were using the default UWCS value for KILL as I thought it would not matter)

I suspect that this is a bug in the documentation, not in the code. I will ask someone else to weigh in on this, and maybe fix the documentation if necessary.


Condor Support Information: http://www.cs.wisc.edu/condor/condor-support/ To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with unsubscribe condor-users <your_email_address>