[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] ERROR: unable to update job ad!



On Tue, Mar 14, 2006 at 03:53:57PM -0800, Chris Tracy wrote:
>          I was recently tasked with setting up a Condor pool here at SCU on
> our CentOS 4 systems.  At first I did it with condor-6.6.10, ran into a
> few issues related to linux-2.6 support, but made my way around them only
> to ultimately be stymied by the following error in the StarterLog whenever
> I tried to submit the sh_loop example job:
> 
> ERROR: unable to update job ad!  Aborting OsProc::StartJob
> 
>          I could find no reference to this bug on any mailing list posting
> or any web-page about condor.  So I decided to go to 6.7.17, as it was
> supposed to have better support for the 2.6 kernel.  Indeed it does, and I
> was able to take out all my config workarounds.  However, I still get the
> same problem when I try to submit the "sh_loop" example job.
> 
>          Ultimately everything points to condor_starter dying unexpectedly.
> So I set STARTER_DEBUG = D_ALL in config_config.local for the one execute
> node in the test cluster and resubmitted the job.  This gives the
> following output in StarterLog:
> 
<...>
>          I'm at a loss as to what to do at this point.  If I had the code
> I'd go look to see what was in OsProc::StartJob, but alas, I don't.  Has
> anyone ever encountered this issue before?
> 

Weird, I've never seen this one before. 

What's the output of condor_q -l for the job that failed, and what's the
result of 'condor_config_val JOB_RENICE_INCREMENT' on the machine where the
job is executing?

-Erik