[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs blocked as Idle in Multi-CPU machine



There is one other trick, condor_q -ana -l
which will tell you for any given job the reason
why it was not matched the last time around the negotiation cycle.
"unknown reasons" around here are most often that an accounting
group is over quota, but if you don't have group quotas it wouldn't
apply to you.

Steve Timm


On Wed, 22 Aug 2007, Jones, Torrin A (US SSA) wrote:

I hate when you get an answer like that from condor_q.  Since we don't
know what the "unknown reasons" are, the best bet is probably to look at
the log files and see if you can figure it out.  You probably need to
look at the CollectorLog and NegotiatiorLog on the central manager and
maybe the StartLog on the execute machine.

Just a side note, I've seen it happen to my jobs also upon submit.
However, on the next negotiation cycle (300 seconds later I think), the
job runs.  Usually I can get it to run quicker if I submit a dummy job
(A job that prints out "Hello World") to every computer in the queue.  I
haven't found a better way around this yet.


	-----Original Message-----
	From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of ye huang
	Sent: Wednesday, August 22, 2007 14:26
	To: Condor-Users Mail List
	Subject: Re: [Condor-users] Jobs blocked as Idle in Multi-CPU
machine


	condor_q  -analyzer and -better-analyzer upon NODEA says
	------
	ye@nodea:~$ condor_q -analyze


	-- Submitter: nodea.gridgroup.eif.ch : < 160.98.20.75:40855
<http://160.98.20.75:40855> > : nodea.gridgroup.ei
	f.ch
	 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
	---
	002.000:  Run analysis summary.  Of 2 machines,
	      0 are rejected by your job's requirements
	      0 reject your job because of their own requirements
	      0 match but are serving users with a better priority in
the pool
	      2 match but reject the job for unknown reasons
	      0 match but will not currently preempt their existing job
	      0 are available to run your job
	1 jobs; 1 idle, 0 running, 0 held

	ye@nodea:~$ condor_q -better-analyze

	-- Submitter: nodea.gridgroup.eif.ch : < 160.98.20.75:40855
<http://160.98.20.75:40855> > : nodea.gridgroup.ei
	f.ch
	---
	002.000:  Run analysis summary.  Of 2 machines,
	      0 are rejected by your job's requirements
	      0 reject your job because of their own requirements
	      0 match but are serving users with a better priority in
the pool
	      2 match but reject the job for unknown reasons
	      0 match but will not currently preempt their existing job
	      0 are available to run your job
	------

	condor_q  -analyzer and -better-analyzer upon NODEB says:

	------
	ye@nodeb:~$ condor_q -analyze


	-- Submitter: nodeb.gridgroup.eif.ch : < 160.98.20.76:57419> :
nodeb.gridgroup.ei
	f.ch
	 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
	0 jobs; 0 idle, 0 running, 0 held

	ye@nodeb:~$ condor_q -better-analyze

	-- Submitter: nodeb.gridgroup.eif.ch : <160.98.20.76:57419> :
nodeb.gridgroup.ei
	f.ch

	------




--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.