Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Jobs blocked as Idle in Multi-CPU machine
- Date: Wed, 22 Aug 2007 14:42:42 -0700
- From: "Jones, Torrin A \(US SSA\)" <torrin.jones@xxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Jobs blocked as Idle in Multi-CPU machine
Title: Message
I hate
when you get an answer like that from condor_q. Since we don't know what
the "unknown reasons" are, the best bet is probably to look at the log files and
see if you can figure it out. You probably need to look at the
CollectorLog and NegotiatiorLog on the central manager and maybe the StartLog on
the execute machine.
Just a side note, I've seen it happen to my jobs also upon submit.
However, on the next negotiation cycle (300 seconds later I think),
the job runs. Usually I can get it to run quicker if I submit a dummy job
(A job that prints out "Hello World") to every computer in the queue. I
haven't found a better way around this yet.
condor_q -analyzer and -better-analyzer upon
NODEA says
------
ye@nodea:~$ condor_q -analyze
-- Submitter:
nodea.gridgroup.eif.ch : < 160.98.20.75:40855> :
nodea.gridgroup.ei
f.ch
ID
OWNER
SUBMITTED RUN_TIME ST PRI SIZE
CMD
---
002.000: Run analysis summary. Of 2
machines,
0 are rejected by your job's
requirements
0 reject your job because of
their own requirements
0 match but are
serving users with a better priority in the
pool
2 match but reject the job for unknown
reasons
0 match but will not currently
preempt their existing job
0 are available
to run your job
1 jobs; 1 idle, 0 running, 0 held
ye@nodea:~$
condor_q -better-analyze
-- Submitter: nodea.gridgroup.eif.ch : < 160.98.20.75:40855> :
nodea.gridgroup.ei
f.ch
---
002.000: Run analysis
summary. Of 2 machines,
0 are rejected
by your job's requirements
0 reject your job
because of their own requirements
0 match
but are serving users with a better priority in the
pool
2 match but reject the job for unknown
reasons
0 match but will not currently
preempt their existing job
0 are available
to run your job
------
condor_q -analyzer and
-better-analyzer upon NODEB says:
------
ye@nodeb:~$ condor_q
-analyze
-- Submitter: nodeb.gridgroup.eif.ch : < 160.98.20.76:57419> :
nodeb.gridgroup.ei
f.ch
ID
OWNER
SUBMITTED RUN_TIME ST PRI SIZE CMD
0 jobs; 0 idle,
0 running, 0 held
ye@nodeb:~$ condor_q -better-analyze
--
Submitter: nodeb.gridgroup.eif.ch
: <160.98.20.76:57419> :
nodeb.gridgroup.ei
f.ch
------