[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs rejected by machines



Wouldn't it be nice if we had a way to -format "%e" an expression and get all all MY references resolved?

Start = 1 + 1 = 2
Requirements = $(START) && ...

-> Requirements = 1 + 1 = 2 && ...

Best,


matt

On 08/23/2010 11:00 AM, Dan Bradley wrote:

To diagnose why machine requirements will not match to a job, I
recommend getting a full dump of the machine ClassAd:

condor_status -long <machine name>

Then look at the Start expression and at all the expressions that it
refers to.

--Dan

On 8/22/10 7:39 PM, Jolly, Ben wrote:
Hi

I am trying to run a bunch of jobs in the 'vanilla' universe with a
'stock' Condor setup on around 15-20 dual core machines (so 30-40
slots depending how many are connected). The problem is that all the
jobs sit 'Idle', and when I try a status check the vast majority of
slots are sitting 'Unclaimed' and 'Idle'. There are a few with 'Owner'
status but no others are 'Claimed' or 'Busy'. My jobs are the only
ones in the queue. We have looked through the config file on each of
the client machines and had a quick play with the 'START = ' line,
changing the value to 'true' instead of '$(UWCS_START)'. This worked
brilliantly except that Condor then ran all my jobs all the time,
regardless of whether or not a user was logged on to the machine
(about half of the machines in the pool are used by people during the
day, the other half are dedicated). When we changed the START variable
back work ceased on all the machines and they are now all 'Unclaimed'.
A 'condor_q -analyze' c!
om!
mand gives the result '34 reject your job because of their own
requirements' (there are only 34 slots available).

Does anyone know what could be causing this? I guess I should mention
that we are running Condor version 7.5.2 (built Apr 19 2010 - 232940)
under Windows (Platform: INTEL-WINNT50) Our UWCS_START is:

# Only start jobs if:
# 1) the keyboard has been idle long enough, AND
# 2) the load average is low enough OR the machine is currently
# running a Condor job
# (NOTE: Condor will only run 1 job at a time on a given resource.
# The reasons Condor might consider running a different job while
# already running one are machine Rank (defined above), and user
# priorities.)
UWCS_START = ( (KeyboardIdle> $(StartIdleTime)) \
&& ( $(CPUIdle) || \
(State != "Unclaimed"&& State != "Owner")) )



Where

StartIdleTime = 15 * $(MINUTE)
CPUIdle = ($(NonCondorLoadAvg)<= $(BackgroundLoad))
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/