[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor not working after changing central managers hostname and adding to domain



Hi,
I had 3 windows computers connected to a linux server as the central manager on a lan all with static ips. condor was working fine until we decided to try adding the computers to a domain(until now they were just on a regular network).I gave the windows computers dynamic ips and the server a static ip that was added to the networks DNS. the organization wanted us to change the linux servers hostname from master.soXXXX to masterso.  when I changed the name I had to reset the linux server (for the first time since the initial condor instalation). When it restarted condor did not start up automaticly and when I started it manually I saw that it was trying to access the folder local.somaster and not local.master so I copied local.master to a folder local.somaster.I also changed all the places in the config files that it says master.soXXXX to masterso. After I did that and ran condor_status it only recognizes itself .The windows computers show two blank lines when I run condor_status.I am able to ping between all of the computers (from linux to windows only by ip not by name).
how can I make condor start up on each restart of the central manager?
why isn't it recognizing the Windows computers?

Thank you,
Levi

I attached part of the masterlog from the linux and windows machine

here is part of the masterlog from the central manager:
7/28 12:36:47 ******************************************************
7/28 12:36:47 ** condor_master (CONDOR_MASTER) STARTING UP
7/28 12:36:47 ** /usr/local/condor/sbin/condor_master
7/28 12:36:47 ** $CondorVersion: 7.0.3 Jun 20 2008 BuildID: 91405 $
7/28 12:36:47 ** $CondorPlatform: I386-LINUX_RHEL5 $
7/28 12:36:47 ** PID = 5887
7/28 12:36:47 ** Log last touched 7/28 12:28:25
7/28 12:36:47 ******************************************************
7/28 12:36:47 Using config source: /usr/local/condor/etc/condor_config
7/28 12:36:47 Using local config sources:
7/28 12:36:47    /opt/condor-7.0.3/local.master/condor_config.local
7/28 12:36:47 DaemonCore: Command Socket at <172.24.0.9:33183>
7/28 12:36:47 Started DaemonCore process "/opt/condor-7.0.3/sbin/condor_collector", pid and pgroup = 5888
7/28 12:36:50 Started DaemonCore process "/opt/condor-7.0.3/sbin/condor_negotiator", pid and pgroup = 5891
7/28 12:36:50 Started DaemonCore process "/opt/condor-7.0.3/sbin/condor_schedd", pid and pgroup = 5892
7/28 12:36:50 Started DaemonCore process "/opt/condor-7.0.3/sbin/condor_startd", pid and pgroup = 5893
7/28 12:36:53 ******************************************************
7/28 12:36:53 ** condor_master (CONDOR_MASTER) STARTING UP
7/28 12:36:53 ** /usr/local/condor/sbin/condor_master
7/28 12:36:53 ** $CondorVersion: 7.0.3 Jun 20 2008 BuildID: 91405 $
7/28 12:36:53 ** $CondorPlatform: I386-LINUX_RHEL5 $
7/28 12:36:53 ** PID = 5904
7/28 12:36:53 ** Log last touched 7/28 12:36:50
7/28 12:36:53 ******************************************************
7/28 12:36:53 Using config source: /usr/local/condor/etc/condor_config
7/28 12:36:53 Using local config sources:
7/28 12:36:53    /opt/condor-7.0.3/local.master/condor_config.local
7/28 12:36:53 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable)
7/28 12:36:53 ERROR "Can't get lock on "/tmp/condor-lock.somaster0.898698379592137/InstanceLock"" at line 848 in file master.C
7/28 13:36:50 Preen pid is 6092
7/28 13:36:50 Child 6092 died, but not a daemon -- Ignored


Here is part from one of the windows computers masterlog

7/28 12:45:13 ******************************************************
7/28 12:45:13 ** Condor (CONDOR_MASTER) STARTING UP
7/28 12:45:13 ** C:\condor\bin\condor_master.exe
7/28 12:45:13 ** $CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $
7/28 12:45:13 ** $CondorPlatform: INTEL-WINNT50 $
7/28 12:45:13 ** PID = 1592
7/28 12:45:13 ** Log last touched 7/28 12:45:51
7/28 12:45:13 ******************************************************
7/28 12:45:13 Using config source: C:\condor\condor_config
7/28 12:45:13 Using local config sources:
7/28 12:45:13    C:\condor/condor_config.local
7/28 12:45:14 DaemonCore: Command Socket at <172.24.1.66:1050>
7/28 12:56:02 WinFirewall: get_CurrentProfile failed: 0x800706d9
7/28 12:56:02 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 2740
7/28 12:56:02 Started DaemonCore process "C:\condor/bin/condor_startd.exe", pid and pgroup = 3016
7/28 12:56:24 condor_read(): timeout reading 5 bytes from <172.24.1.66:1163>.
7/28 12:56:24 IO: Failed to read packet header
7/28 12:56:24 Sent signal 15 to SCHEDD (pid 2740)
7/28 12:56:33 Sent signal 15 to STARTD (pid 3016)
7/28 12:56:33 condor_write(): Socket closed when trying to write 302 bytes to <172.24.1.66:1168>, fd is 840
7/28 12:56:33 Buf::write(): condor_write() failed
7/28 12:56:33 SECMAN: Error sending response classad!
MyType = "(unknown type)"
TargetType = "(unknown type)"
AuthMethods = "NTSSPI,KERBEROS"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "PREFERRED"
Authentication = "OPTIONAL"
Encryption = "OPTIONAL"
Integrity = "OPTIONAL"
Enact = "NO"
Subsystem = "STARTD"
ParentUniqueID = "michs13611con2:1592:1217241914"
ServerPid = 3016
SessionDuration = "8640000"
NewSession = "YES"
RemoteVersion = "$CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $"
ServerCommandSock = "<172.24.1.66:1164>"
Command = 60010
AuthCommand = 60008
7/28 12:56:33 The SCHEDD (pid 2740) exited with status 0
7/28 12:56:33 condor_write(): Socket closed when trying to write 302 bytes to <172.24.1.66:1172>, fd is 496
7/28 12:56:33 Buf::write(): condor_write() failed
7/28 12:56:33 SECMAN: Error sending response classad!
MyType = "(unknown type)"
TargetType = "(unknown type)"
AuthMethods = "NTSSPI,KERBEROS"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "PREFERRED"
Authentication = "OPTIONAL"
Encryption = "OPTIONAL"
Integrity = "OPTIONAL"
Enact = "NO"
Subsystem = "SCHEDD"
ParentUniqueID = "michs13611con2:1592:1217241914"
ServerPid = 2740
SessionDuration = "8640000"
NewSession = "YES"
RemoteVersion = "$CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $"
ServerCommandSock = "<172.24.1.66:1163>"
Command = 60010
AuthCommand = 60008
ServerTime = 1217242564
7/28 12:56:33 Received child alive command from unknown pid 2740
7/28 12:56:33 ERROR: DC_AUTHENTICATE unable to receive auth_info!
7/28 12:56:33 ERROR: DC_AUTHENTICATE unable to receive auth_info!
7/28 12:56:33 The STARTD (pid 3016) exited with status 0
7/28 12:56:33 All daemons are gone.  Restarting.
7/28 12:56:33 Restarting master right away.
7/28 12:56:33 Running as NT Service = 1
7/28 12:56:33 Doing exec( "C:\WINDOWS\system32\cmd.exe /Q /C net stop Condor & net start Condor" )
7/28 12:56:34 SetEnvironmentVariable failed, errno=203
7/28 12:56:34 ******************************************************
7/28 12:56:34 ** Condor (CONDOR_MASTER) STARTING UP
7/28 12:56:34 ** C:\condor\bin\condor_master.exe
7/28 12:56:34 ** $CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $
7/28 12:56:34 ** $CondorPlatform: INTEL-WINNT50 $
7/28 12:56:34 ** PID = 3160
7/28 12:56:34 ** Log last touched 7/28 12:56:33
7/28 12:56:34 ******************************************************
7/28 12:56:34 Using config source: C:\condor\condor_config
7/28 12:56:34 Using local config sources:
7/28 12:56:34    C:\condor/condor_config.local
7/28 12:56:34 DaemonCore: Command Socket at <172.24.1.66:1210>
7/28 13:06:39 WinFirewall: get_CurrentProfile failed: 0x800706d9
7/28 13:06:39 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 4012
7/28 13:06:39 Started DaemonCore process "C:\condor/bin/condor_startd.exe", pid and pgroup = 4024
7/28 13:06:59 condor_read(): timeout reading 5 bytes from <172.24.1.66:1221>.
7/28 13:06:59 IO: Failed to read packet header
7/28 13:06:59 Sent signal 15 to SCHEDD (pid 4012)
7/28 13:07:09 Sent signal 15 to STARTD (pid 4024)
7/28 13:07:09 condor_write(): Socket closed when trying to write 302 bytes to <172.24.1.66:1231>, fd is 836
7/28 13:07:09 Buf::write(): condor_write() failed
7/28 13:07:09 SECMAN: Error sending response classad!
MyType = "(unknown type)"
TargetType = "(unknown type)"
AuthMethods = "NTSSPI,KERBEROS"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "PREFERRED"
Authentication = "OPTIONAL"
Encryption = "OPTIONAL"
Integrity = "OPTIONAL"
Enact = "NO"
Subsystem = "STARTD"
ParentUniqueID = "michs13611con2:3160:1217242594"
ServerPid = 4024
SessionDuration = "8640000"
NewSession = "YES"
RemoteVersion = "$CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $"
ServerCommandSock = "<172.24.1.66:1222>"
Command = 60010
AuthCommand = 60008
7/28 13:07:09 The SCHEDD (pid 4012) exited with status 0
7/28 13:07:09 condor_write(): Socket closed when trying to write 302 bytes to <172.24.1.66:1232>, fd is 608
7/28 13:07:09 Buf::write(): condor_write() failed
7/28 13:07:09 SECMAN: Error sending response classad!
MyType = "(unknown type)"
TargetType = "(unknown type)"
AuthMethods = "NTSSPI,KERBEROS"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "PREFERRED"
Authentication = "OPTIONAL"
Encryption = "OPTIONAL"
Integrity = "OPTIONAL"
Enact = "NO"
Subsystem = "SCHEDD"
ParentUniqueID = "michs13611con2:3160:1217242594"
ServerPid = 4012
SessionDuration = "8640000"
NewSession = "YES"
RemoteVersion = "$CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $"
ServerCommandSock = "<172.24.1.66:1221>"
Command = 60010
AuthCommand = 60008
ServerTime = 1217243199
7/28 13:07:09 ERROR: DC_AUTHENTICATE unable to receive auth_info!
7/28 13:07:09 ERROR: DC_AUTHENTICATE unable to receive auth_info!
7/28 13:07:09 The STARTD (pid 4024) exited with status 0
7/28 13:07:09 All daemons are gone.  Restarting.
7/28 13:07:09 Restarting master right away.
7/28 13:07:09 Running as NT Service = 1
7/28 13:07:09 Doing exec( "C:\WINDOWS\system32\cmd.exe /Q /C net stop Condor & net start Condor" )
7/28 13:07:10 SetEnvironmentVariable failed, errno=203
7/28 13:07:10 ******************************************************
7/28 13:07:10 ** Condor (CONDOR_MASTER) STARTING UP
7/28 13:07:10 ** C:\condor\bin\condor_master.exe
7/28 13:07:10 ** $CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $
7/28 13:07:10 ** $CondorPlatform: INTEL-WINNT50 $
7/28 13:07:10 ** PID = 2652
7/28 13:07:10 ** Log last touched 7/28 13:07:09
7/28 13:07:10 ******************************************************
7/28 13:07:10 Using config source: C:\condor\condor_config
7/28 13:07:10 Using local config sources:
7/28 13:07:10    C:\condor/condor_config.local
7/28 13:07:10 DaemonCore: Command Socket at <172.24.1.66:1256>
7/28 13:17:09 WinFirewall: get_CurrentProfile failed: 0x800706d9
7/28 13:17:09 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 3356
7/28 13:17:09 Started DaemonCore process "C:\condor/bin/condor_startd.exe", pid and pgroup = 3364
7/28 14:17:10 Preen pid is 2200
7/28 14:17:10 Child 2200 died, but not a daemon -- Ignored