[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Weird flurry of condor_shadow problem emails this morning
- Date: Wed, 12 Mar 2008 10:15:53 -0500
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Weird flurry of condor_shadow problem emails this morning
On Mar 10, 2008, at 9:58 AM, Ian Chesal wrote:
My inbox was full of messages from my condor_schedd on the other
the world telling me about problems with condor_shadow. The emails all
looked like this:
Subject: [Condor] Condor job 10725.239 put on hold
This is an automated email from the Condor system
on machine "pg-schedd1.altera.com". Do not reply.
Condor job 10725.239 has been put on hold.
No condor_shadow installed that supports vanilla jobs
on resources older than V6.3.3
Please correct this problem and release the job with
My first thought was maybe the NFS file system where we host condor
down. Nope. I got smart to this years ago and now, on my central
servers, I keep Condor on local disk. So there's a copy of
in /opt/condor/sbin. And it says it's 6.8.6 I386-LINUX_RHEL3 just like
Nothing has been changed in /opt/condor. Time stamps are fine.
Very mysterious. The emails happened around 7:00 am. I didn't see them
until 10:00 am. Looking at the queue on the scheduler now everything
either I or R, so it all got released automatically.
Can anyone offer some insight into what might have occurred here?
*never* run anything older that 6.7.x at Altera. My guess is that this
message might get sent if a condor_shadow binary can't be found -- is
that possible? Someone /opt/condor/sbin/condor_shadow couldn't be seen
by the condor_schedd process running the machine perhaps?
A missing condor_shadow binary shouldn't cause this error. What will
cause it is a machine ad that's missing its CondorVersion attribute
and a missing condor_shadow.std binary.
Thanks and regards,
UW-Madison Condor Team