[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] jobs wait in idle mode unecessarily



On Wed, Jun 23, 2004 at 11:25:56AM +0800, Raymond Wong wrote:
> Hi,
> 
> Encountered similar problem too. Noticed this especially if I am
> submitting jobs from my central manager (which is a XP PC running Condor
> 6.6.1). However, when you mention that the job take ages to start, it
> does start up utimately? For my case, jobs submitted will always miss an
> negotiation cycle and get matched 5min later (the next cycle). 

Condor is a distributed system, so sometimes one side isn't ready to go
when the other side is. In your case, submitting from the central manager
is probably the problem - there's no network delay involved, so the schedd
didn't get the network delay time it usually would to be ready to talk
to the central manager. That it scheduled jobs on the next negotiation 
cycle means that everything is working as it should.

(The above is a simplification, btw)

> 
> Anyway, noticed something really bad in your schedd log:
> 
> 6/21 12:22:09 Scheduler::Relinquish - mrec is NULL, can't relinquish
> 6/21 12:22:09 Null parameter --- match not deleted
> 

This is NOT "really bad". It's normal. 

> I think this implies that the schedd on your host has crashed! You may
> want to check if the job has been successfully submitted for negotiation
> in the first place!
> 

It implies no such thing. The state through the schedd is complex, and
occasionally we do "needless" things. We probably write those messages
out to the logfile. 

-Erik