[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] SCHEDD not running right on upgraded CE with Condor 7.6.6




# condor_status
Error: communication error
CEDAR:6001:Failed to connect to <192.84.86.98:9618>

# condor_status -schedd
Error: communication error
CEDAR:6001:Failed to connect to <192.84.86.98:9618>

# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:30:48:33:54:A3
inet addr:192.84.86.98 Bcast:192.84.86.127 Mask:255.255.255.224


We got the Condor RPM from the official Condor site.

# rpm -q condor
condor-7.6.6-1


Steven.....


On 04/03/2012 07:29 PM, Steven C Timm wrote:
What's the output of condor_status -any?
Does it report any schedulers?
What about condor_status -schedd?
Finally, does your system-wide condor config file actually reflect the new paths where the condor binaries are?
There is a lot of stuff that changed locations with condor 7.6.6, worse if you mix rpms from the condor web site and the OSG
Web site some binaries and some directories can wind up in different and incompatible places.
Is the condor 7.6.6 rpm from the OSG repo or somewhere else?  Are you sure the OSG repo didn't clobber your other condor?
This all smells like something like that.

rpm -q condor

will tell the story--if it returns an "osg" in the rpm name then you could very well be dealing with a condor_config file that is expecting
binaries and/or address files in other places than where they are.


Steve Timm

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Steven Lo
Sent: Tuesday, April 03, 2012 9:20 PM
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] SCHEDD not running right on upgraded CE with Condor 7.6.6


We use Rocks to install Condor RPM.

We have the following line in /etc/sysconfig/condor to point to the system wide configuration file:
CONDOR_CONFIG="/share/apps/condor/etc/condor_config_7.6.6"

The condor_config_7.6.6 is attached.

Did not see any alarming errors in either MasterLog and SchedLog files.
Both
are attached as well.

BTW, did not see neither Schedd_Event_Log nor ShadowLog files which lead us to believe that it's not accepting jobs.

Thanks.....

Steven.....


On 04/03/2012 06:50 PM, Alain Roy wrote:
On Apr 3, 2012, at 8:37 PM, Steven Lo wrote:
Hi,

We just upgraded Condor from 7.4.1 to 7.6.6 on one of our CE.

When we do a condor_q, the following error pops out:

# condor_q
Error:

Extra Info: You probably saw this error because the condor_schedd is
not running on the machine you are trying to query.
We did see that both schedd and startd are running:

condor    6518  6490  0 17:42 ?        00:00:00 condor_startd -f
condor    6520  6490  0 17:42 ?        00:00:00 condor_schedd -f
That's interesting. How did you install Condor? Do you have CONDOR_CONFIG set? Are there errors in the MasterLog or the SchedLog?

-alain
------------------------------
Alain Roy
Condor Project
roy@xxxxxxxxxxx
http://www.cs.wisc.edu/condor


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/