[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_master dying in 7.6.9



Hi

I'm seeing my condor_master die on startup in 7.6.9 on one of my
systems. I'm using the x86_64 rpms from the condor rhel6 repo on rhel6.3

condor_master from the 7.6.6 and 7.6.8 rpms runs ok.

condor_master from 7.6.9 also runs fine on a number of other rhel6,
scientific linux 6 and Fedora 16 systems that I administer so it may be
a misconfiguration on this one particular host.

Any hints on how to debug this would be appreciated.

A fragment of the MasterLog file is given below. I note the WARNING
message, but don't see a problem with the name configuration for the
host thats giving this problem but I might have overlooked something. It
does seem strange that it should think that localhost6 is related to an
ipv4 address though.

Thanks.

Roderick Johnstone

Extract from MasterLog (note that the ip address that I changed to
xxx.xxx.xxx.xxx is the same as the hex number of the WARNING line
decodes to).

08/17/12 14:57:32 ******************************************************
08/17/12 14:57:32 ** condor_master (CONDOR_MASTER) STARTING UP
08/17/12 14:57:32 ** /usr/sbin/condor_master
08/17/12 14:57:32 ** SubsystemInfo: name=MASTER type=MASTER(2)
class=DAEMON(1)
08/17/12 14:57:32 ** Configuration: subsystem:MASTER local:<NONE>
class:DAEMON
08/17/12 14:57:32 ** $CondorVersion: 7.6.9 Aug 15 2012 BuildID: 58927 $
08/17/12 14:57:32 ** $CondorPlatform: x86_64_rhap_6.3 $
08/17/12 14:57:32 ** PID = 6768
08/17/12 14:57:32 ** Log last touched 8/17 14:51:35
08/17/12 14:57:32 ******************************************************
08/17/12 14:57:32 Using config source: /etc/condor/condor_config
08/17/12 14:57:32 Using local config sources:
08/17/12 14:57:32    /etc/condor/condor_config.local
08/17/12 14:57:32 DaemonCore: command socket at <xxx.xxx.xxx.xxx:9631>
08/17/12 14:57:32 DaemonCore: private command socket at
<xxx.xxx.xxx.xxx:9631>
08/17/12 14:57:32 Setting maximum accepts per cycle 4.
08/17/12 14:57:32 Started DaemonCore process
"/usr/sbin/condor_collector", pid and pgroup = 6769
08/17/12 14:57:32 Started DaemonCore process
"/usr/sbin/condor_negotiator", pid and pgroup = 6770
08/17/12 14:57:32 WARNING: forward resolution of localhost6 doesn't
match xxxxxxxx!
Stack dump for process 6768 at timestamp 1345211852 (18 frames)
/usr/sbin/condor_master(dprintf_dump_stack+0x63)[0x557df3]
/usr/sbin/condor_master(_Z18linux_sig_coredumpi+0x40)[0x4d50a0]
/lib64/libpthread.so.0[0x3d0ca0f500]
/lib64/libc.so.6(__nss_hostname_digits_dots+0x49)[0x3d0befc6a9]
/lib64/libc.so.6(gethostbyname+0x90)[0x3d0bf01f50]
/usr/sbin/condor_master(_Z18verify_name_has_ipPc7in_addr+0x31)[0x500371]
/usr/sbin/condor_master(_ZN8IpVerify6VerifyE12DCpermissionPK11sockaddr_inPKcP8MyStringS7_+0x4f4)[0x502864]
/usr/sbin/condor_master(_ZN10DaemonCore6VerifyEPKc12DCpermissionPK11sockaddr_inS1_+0x85)[0x4bc3b5]
/usr/sbin/condor_master(_ZN10DaemonCore9HandleReqEP6StreamS1_+0xc18)[0x4ca858]
/usr/sbin/condor_master(_ZN10DaemonCore22HandleReqSocketHandlerEP6Stream+0x5b)[0x4cd51b]
/usr/sbin/condor_master(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x6a4)[0x4cdc34]
/usr/sbin/condor_master(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x1a)[0x4cdd0a]
/usr/sbin/condor_master(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x40)[0x569980]
/usr/sbin/condor_master(_ZN10DaemonCore17CallSocketHandlerERib+0x135)[0x4c3bb5]
/usr/sbin/condor_master(_ZN10DaemonCore6DriverEv+0x2013)[0x4c8883]
/usr/sbin/condor_master(main+0x10eb)[0x4d6a8b]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3d0be1ecdd]
/usr/sbin/condor_master[0x4b0339]