[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Tracing why nodes reject jobs?
- Date: Thu, 15 Jul 2010 11:52:23 -0400
- From: "Jonathan D. Proulx" <jon@xxxxxxxxxxxxx>
- Subject: [Condor-users] Tracing why nodes reject jobs?
I have a user who queued a couple hundred identical Standard Universe
jobs (well the parameters were a little different but the class ads
were the same), most completed but 15 are hanging aroundin idle state
after having accumulated some runtime, but will no longer match any
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
78745.005: Run analysis summary. Of 429 machines,
19 are rejected by your job's requirements
410 reject your job because of their own requirements
0 match but are serving users with a better priority
in the pool
0 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 match but are currently offline
) are available to run your job
Last successful match: Tue Jul 617:33:30 2010
Last failed match: Thu Jul 15 11:46:30 2010
Reason for last match failure: no match found
The 19 rejected for Job requirements are clear (wrong ARCH), the 410
for node rrequirements is odd in several ways:
1) there are 410 total systems available and 344 are currently claimed
so I'd expect those to be either "match but are serving users with a
better priority in the pool" or "match but will not currently preempt
their existing job"
2) clearly they used to match some or the job wouldn't have had runtime
Where/how can I see why a specific node rejects a given job?