[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem running 7.2.0 on x86_64 RHEL5



Hi all,

I have just tried upgrading our Condor installation from 7.0.2 to 7.2 on
a 64-bit RHEL5 system and hit a problem. I hope someone can shed some
light on it for me.

The 7.0.2 installation was done using the dynamically compiled RPM and
everything worked smoothly with the system running the Master, Scheduler
and Start Daemons (DAEMON_LIST = MASTER, SCHEDD, STARTD in the local
config file).

I removed the 7.0.2 RPM then installed the dynamically compiled RPM for
7.2.0 (rpm -i condor-7.2.0-...x86_64.rpm). It installed to a directory
different to the 7.0.2 release.

After configuring Condor 7.2.0, upon execution only the Master daemon
would stay running and the following error showed up in both the
StartLog and SchedLog log files (here's the StartLog one):

1/20 12:26:12 (fd:3) (pid:13112)
******************************************************
1/20 12:26:12 (fd:3) (pid:13112) ** condor_startd (CONDOR_STARTD)
STARTING UP
1/20 12:26:12 (fd:3) (pid:13112) ** /opt/condor-7.2.0/sbin/condor_startd
1/20 12:26:12 (fd:3) (pid:13112) ** SubsystemInfo: name=STARTD
type=STARTD(7) class=DAEMON(1)
1/20 12:26:12 (fd:3) (pid:13112) ** Configuration: subsystem:STARTD
local:<NONE> class:DAEMON
1/20 12:26:12 (fd:3) (pid:13112) ** $CondorVersion: 7.2.0 Dec 19 2008
BuildID: 121001 $
1/20 12:26:12 (fd:3) (pid:13112) ** $CondorPlatform: X86_64-LINUX_RHEL5
$
1/20 12:26:12 (fd:3) (pid:13112) ** PID = 13112
1/20 12:26:12 (fd:3) (pid:13112) ** Log last touched 1/20 12:26:02
1/20 12:26:12 (fd:3) (pid:13112) ** Running as root: Privilege switching
in effect
1/20 12:26:12 (fd:3) (pid:13112)
******************************************************
1/20 12:26:12 (fd:3) (pid:13112) Using config source:
/opt/condor-7.2.0/etc/condor_config
1/20 12:26:12 (fd:3) (pid:13112) Using local config sources:
1/20 12:26:12 (fd:3) (pid:13112)
/opt/condor-7.2.0/local.rubble/condor_config.local
1/20 12:26:12 (fd:3) (pid:13112) Config 'LOG': no prefix ==>
'$(LOCAL_DIR)/log'
1/20 12:26:12 (fd:3) (pid:13112) Running as root.  Enabling specialized
core dump routines
1/20 12:26:12 (fd:5) (pid:13112) Setting up command socket
1/20 12:26:12 (fd:5) (pid:13112) CONDOR_INHERIT: "13094
<172.16.236.32:44418> 0 8*6*0*1*<172.16.236.32:0>*0*0*0*
9*2*0*0*<172.16.236.32:0>*0* 0"
1/20 12:26:12 (fd:5) (pid:13112) Parent PID = 13094
1/20 12:26:12 (fd:5) (pid:13112) Parent Command Sock =
<172.16.236.32:44418>
1/20 12:26:12 (fd:5) (pid:13112) Inheriting Command Sockets
1/20 12:26:12 (fd:5) (pid:13112) ERROR "Failed to parse serialized
socket information (4,-1): '8*6*0*1*<172.16.236.32:0>*0*0*0*'" at line
1606 in file sock.cpp

I rolled back to the 7.0.5 release and everything is now running
smoothly. Is there a configuration (or other) change that I may have
missed in the 7.2.0 documentation that I should be looking for?

Regards and thanks,
John

John Twyman
School of Geosciences
p: +61 2 9351 3189
m: +61 401 992 836
f: +61 2 9351 0184
c: http://www.geosci.usyd.edu.au/users/john/calendar/