[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] -better-analyze doesn't tell me details (7.0.4)



On Thu, Nov 06, 2008 at 04:13:53PM +0100, Steffen Grunewald wrote:
> > Two "unknown reasons" I've hit before are (a) the negotiation cycle
> > just hasn't happened yet since this job was submitted and (b)
> > the user in question has exceeded his group quota.
> 
> (a): I have waited for hours, and other jobs got scheduled
> (b): group quota aren't in use
> 
> Some more ideas?

Moving the jobs to another submit machine, they started running immediately.
That forced me to return to the machine involved, and check the logs.
Looking through the SchedLog of the submit machine, I eventually spotted 
an out of memory condition due to deactivated swapspace.

11/6 09:40:36 (pid:5775) Negotiating for owner: $user@$domain
11/6 09:40:36 (pid:5775) Checking consistency running and runnable jobs
11/6 09:40:36 (pid:5775) Tables are consistent
11/6 09:40:36 (pid:5775) Rebuilt prioritized runnable job list in 0.002s.
11/6 09:40:36 (pid:5775) Swap space estimate reached! No more jobs can be run!
11/6 09:40:36 (pid:5775)     Solution: get more swap space, or set RESERVED_SWAP = 0
11/6 09:40:36 (pid:5775)     0 jobs matched, 30 jobs idle
11/6 09:40:36 (pid:5775) Out of servers - 0 jobs matched, 30 jobs idle, 0 jobs rejected
11/6 09:40:36 (pid:5775) Increasing flock level for $user@$domain to 1.

That means that even condor_q -long -better-analyze output can be misleading...

Thanks,
 Steffen