[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs rejected by machines




To diagnose why machine requirements will not match to a job, I recommend getting a full dump of the machine ClassAd:

condor_status -long <machine name>

Then look at the Start expression and at all the expressions that it refers to.

--Dan

On 8/22/10 7:39 PM, Jolly, Ben wrote:
Hi

I am trying to run a bunch of jobs in the 'vanilla' universe with a 'stock' Condor setup on around 15-20 dual core machines (so 30-40 slots depending how many are connected).  The problem is that all the jobs sit 'Idle', and when I try a status check the vast majority of slots are sitting 'Unclaimed' and 'Idle'.  There are a few with 'Owner' status but no others are 'Claimed' or 'Busy'.  My jobs are the only ones in the queue.  We have looked through the config file on each of the client machines and had a quick play with the 'START = ' line, changing the value to 'true' instead of '$(UWCS_START)'.  This worked brilliantly except that Condor then ran all my jobs all the time, regardless of whether or not a user was logged on to the machine (about half of the machines in the pool are used by people during the day, the other half are dedicated).  When we changed the START variable back work ceased on all the machines and they are now all 'Unclaimed'.  A 'condor_q -analyze' com!
  mand gives the result '34 reject your job because of their own requirements' (there are only 34 slots available).

Does anyone know what could be causing this?  I guess I should mention that we are running Condor version 7.5.2 (built Apr 19 2010 - 232940) under Windows (Platform: INTEL-WINNT50)  Our UWCS_START is:

# Only start jobs if:
# 1) the keyboard has been idle long enough, AND
# 2) the load average is low enough OR the machine is currently
#    running a Condor job
# (NOTE: Condor will only run 1 job at a time on a given resource.
# The reasons Condor might consider running a different job while
# already running one are machine Rank (defined above), and user
# priorities.)
UWCS_START	= ( (KeyboardIdle>  $(StartIdleTime)) \
                     &&  ( $(CPUIdle) || \
                          (State != "Unclaimed"&&  State != "Owner")) )



Where

StartIdleTime		= 15 * $(MINUTE)
CPUIdle			= ($(NonCondorLoadAvg)<= $(BackgroundLoad))
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/