[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] job reject for Unknown reasons



Hi,

last week I was asking about getting job info... I did not clear all my
answers, but I'd like to add one more in a new threat.

This morning I found my condor "hanged". I have 100 vms, but this
morning I only had 20. After some node reconfig (in some nodes, it
said no master found, but I really was) I finally got my 100 vms.

Well, after all, I released all my jobs cause they were Hold, and
checking job status found this:

$ condor_q -better-analyze 92200.0


-- Submitter: cdf-bcnhead.pic.org.es : <193.146.196.45:46328> : cdf-bcnhead.pic.org.es
---
92200.000:  Run analysis summary.  Of 100 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      4 match but are serving users with a better priority in the pool
     82 match but reject the job for unknown reasons
     14 match but will not currently preempt their existing job
      0 are available to run your job
        Last successful match: Mon Jul  9 11:11:31 2007

And after a while, most jobs went to Hold state again, so I have 18 vm
claimed and 82 unclaimed, and 311 jobs; 0 idle, 20 running, 291 held


So, how may I know why jobs are not running? Why they go to I->H state?

TIA
-- 
Arnau Bria
http://blog.emergetux.net
Bombing for peace is like fucking for virginity