[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Lazy jobs that never really start running



Hi,

I'm having problems with tasks staying in the queue in JobState 2 but never actually start running,
although there are lots of idle/unclaimed processors.

When running condor_q -analyze for one of these Job I get this:

 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
12200.000:  Request is being serviced

But the job is not running on any of the processors. I tried to put the job on hold and release it to restart
but the same thing happened. No processing, and there is no sign of the jobID in the log files, except that
it is submitted and later held/released.


Similar problem appears for a few idle tasks too:
condor_q -analyze reports:

    56 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      2 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
	No successful match recorded.
	Last failed match: Tue Jul 05 19:08:18 2005
	Reason for last match failure: no match found


This is very frustraring since I have lots of free processors but somehow I can't use if for a few jobs.
BTW all of the jobs are identical and 95% of them runs without any problems.

Cheers,
Szabolcs