[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor not working after changing central managers [Sec=Unclassified]hostname and adding to domain [Sec=Unclassified]



Thank you for your help . I did have the name mixed up however that wasn't the only problem.
the windows computers weren't getting access to the central manager even though I had the allow_write and allow_read set to *. even when I set it to *,"domain_name" it didn't work. it only worked when I put in the ip range in numbers.Do you have any idea why it does not recognize the domain name and why having it set to * doesn't allow every computer to connect.
thanks,
Levi

On Mon, Aug 4, 2008 at 2:04 AM, Troy Robertson <Troy.Robertson@xxxxxxxxxx> wrote:
Couple of things that I can think of Levi,

Have you changed the CONDOR_HOST value on each of the Windows machines to point to the new central manager name?
Also, why is the central manager trying to take out a lock "/tmp/condor-lock.somaster0 when it should be masterso.  Better check your config to make sure you haven't got somaster and masterso mixed up.


Troy

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Levi G
Sent: Friday, 1 August 2008 6:47 PM
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] condor not working after changing central managers [Sec=Unclassified]hostname and adding to domain [Sec=Unclassified]

Hi,
I had 3 windows computers connected to a linux server as the central manager on a lan all with static ips. condor was working fine until we decided to try adding the computers to a domain(until now they were just on a regular network).I gave the windows computers dynamic ips and the server a static ip that was added to the networks DNS. the organization wanted us to change the linux servers hostname from master.soXXXX to masterso.  when I changed the name I had to reset the linux server (for the first time since the initial condor instalation). When it restarted condor did not start up automaticly and when I started it manually I saw that it was trying to access the folder local.somaster and not local.master so I copied local.master to a folder local.somaster.I also changed all the places in the config files that it says master.soXXXX to masterso. After I did that and ran condor_status it only recognizes itself .The windows computers show two blank lines when I run condor_status.I am able to ping between all of the computers (from linux to windows only by ip not by name).
how can I make condor start up on each restart of the central manager?
why isn't it recognizing the Windows computers?

Thank you,
Levi

I attached part of the masterlog from the linux and windows machine

here is part of the masterlog from the central manager:
7/28 12:36:47 ******************************************************
7/28 12:36:47 ** condor_master (CONDOR_MASTER) STARTING UP
7/28 12:36:47 ** /usr/local/condor/sbin/condor_master
7/28 12:36:47 ** $CondorVersion: 7.0.3 Jun 20 2008 BuildID: 91405 $
7/28 12:36:47 ** $CondorPlatform: I386-LINUX_RHEL5 $
7/28 12:36:47 ** PID = 5887
7/28 12:36:47 ** Log last touched 7/28 12:28:25
7/28 12:36:47 ******************************************************
7/28 12:36:47 Using config source: /usr/local/condor/etc/condor_config
7/28 12:36:47 Using local config sources:
7/28 12:36:47    /opt/condor-7.0.3/local.master/condor_config.local
7/28 12:36:47 DaemonCore: Command Socket at <172.24.0.9:33183>
7/28 12:36:47 Started DaemonCore process "/opt/condor-7.0.3/sbin/condor_collector", pid and pgroup = 5888
7/28 12:36:50 Started DaemonCore process "/opt/condor-7.0.3/sbin/condor_negotiator", pid and pgroup = 5891
7/28 12:36:50 Started DaemonCore process "/opt/condor-7.0.3/sbin/condor_schedd", pid and pgroup = 5892
7/28 12:36:50 Started DaemonCore process "/opt/condor-7.0.3/sbin/condor_startd", pid and pgroup = 5893
7/28 12:36:53 ******************************************************
7/28 12:36:53 ** condor_master (CONDOR_MASTER) STARTING UP
7/28 12:36:53 ** /usr/local/condor/sbin/condor_master
7/28 12:36:53 ** $CondorVersion: 7.0.3 Jun 20 2008 BuildID: 91405 $
7/28 12:36:53 ** $CondorPlatform: I386-LINUX_RHEL5 $
7/28 12:36:53 ** PID = 5904
7/28 12:36:53 ** Log last touched 7/28 12:36:50
7/28 12:36:53 ******************************************************
7/28 12:36:53 Using config source: /usr/local/condor/etc/condor_config
7/28 12:36:53 Using local config sources:
7/28 12:36:53    /opt/condor-7.0.3/local.master/condor_config.local
7/28 12:36:53 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable)
7/28 12:36:53 ERROR "Can't get lock on "/tmp/condor-lock.somaster0.898698379592137/InstanceLock"" at line 848 in file master.C
7/28 13:36:50 Preen pid is 6092
7/28 13:36:50 Child 6092 died, but not a daemon -- Ignored


Here is part from one of the windows computers masterlog

7/28 12:45:13 ******************************************************
7/28 12:45:13 ** Condor (CONDOR_MASTER) STARTING UP
7/28 12:45:13 ** C:\condor\bin\condor_master.exe
7/28 12:45:13 ** $CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $
7/28 12:45:13 ** $CondorPlatform: INTEL-WINNT50 $
7/28 12:45:13 ** PID = 1592
7/28 12:45:13 ** Log last touched 7/28 12:45:51
7/28 12:45:13 ******************************************************
7/28 12:45:13 Using config source: C:\condor\condor_config
7/28 12:45:13 Using local config sources:
7/28 12:45:13    C:\condor/condor_config.local
7/28 12:45:14 DaemonCore: Command Socket at <172.24.1.66:1050>
7/28 12:56:02 WinFirewall: get_CurrentProfile failed: 0x800706d9
7/28 12:56:02 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 2740
7/28 12:56:02 Started DaemonCore process "C:\condor/bin/condor_startd.exe", pid and pgroup = 3016
7/28 12:56:24 condor_read(): timeout reading 5 bytes from <172.24.1.66:1163>.
7/28 12:56:24 IO: Failed to read packet header
7/28 12:56:24 Sent signal 15 to SCHEDD (pid 2740)
7/28 12:56:33 Sent signal 15 to STARTD (pid 3016)
7/28 12:56:33 condor_write(): Socket closed when trying to write 302 bytes to <172.24.1.66:1168>, fd is 840
7/28 12:56:33 Buf::write(): condor_write() failed
7/28 12:56:33 SECMAN: Error sending response classad!
MyType = "(unknown type)"
TargetType = "(unknown type)"
AuthMethods = "NTSSPI,KERBEROS"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "PREFERRED"
Authentication = "OPTIONAL"
Encryption = "OPTIONAL"
Integrity = "OPTIONAL"
Enact = "NO"
Subsystem = "STARTD"
ParentUniqueID = "michs13611con2:1592:1217241914"
ServerPid = 3016
SessionDuration = "8640000"
NewSession = "YES"
RemoteVersion = "$CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $"
ServerCommandSock = "<172.24.1.66:1164>"
Command = 60010
AuthCommand = 60008
7/28 12:56:33 The SCHEDD (pid 2740) exited with status 0
7/28 12:56:33 condor_write(): Socket closed when trying to write 302 bytes to <172.24.1.66:1172>, fd is 496
7/28 12:56:33 Buf::write(): condor_write() failed
7/28 12:56:33 SECMAN: Error sending response classad!
MyType = "(unknown type)"
TargetType = "(unknown type)"
AuthMethods = "NTSSPI,KERBEROS"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "PREFERRED"
Authentication = "OPTIONAL"
Encryption = "OPTIONAL"
Integrity = "OPTIONAL"
Enact = "NO"
Subsystem = "SCHEDD"
ParentUniqueID = "michs13611con2:1592:1217241914"
ServerPid = 2740
SessionDuration = "8640000"
NewSession = "YES"
RemoteVersion = "$CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $"
ServerCommandSock = "<172.24.1.66:1163>"
Command = 60010
AuthCommand = 60008
ServerTime = 1217242564
7/28 12:56:33 Received child alive command from unknown pid 2740
7/28 12:56:33 ERROR: DC_AUTHENTICATE unable to receive auth_info!
7/28 12:56:33 ERROR: DC_AUTHENTICATE unable to receive auth_info!
7/28 12:56:33 The STARTD (pid 3016) exited with status 0
7/28 12:56:33 All daemons are gone.  Restarting.
7/28 12:56:33 Restarting master right away.
7/28 12:56:33 Running as NT Service = 1
7/28 12:56:33 Doing exec( "C:\WINDOWS\system32\cmd.exe /Q /C net stop Condor & net start Condor" )
7/28 12:56:34 SetEnvironmentVariable failed, errno=203
7/28 12:56:34 ******************************************************
7/28 12:56:34 ** Condor (CONDOR_MASTER) STARTING UP
7/28 12:56:34 ** C:\condor\bin\condor_master.exe
7/28 12:56:34 ** $CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $
7/28 12:56:34 ** $CondorPlatform: INTEL-WINNT50 $
7/28 12:56:34 ** PID = 3160
7/28 12:56:34 ** Log last touched 7/28 12:56:33
7/28 12:56:34 ******************************************************
7/28 12:56:34 Using config source: C:\condor\condor_config
7/28 12:56:34 Using local config sources:
7/28 12:56:34    C:\condor/condor_config.local
7/28 12:56:34 DaemonCore: Command Socket at <172.24.1.66:1210>
7/28 13:06:39 WinFirewall: get_CurrentProfile failed: 0x800706d9
7/28 13:06:39 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 4012
7/28 13:06:39 Started DaemonCore process "C:\condor/bin/condor_startd.exe", pid and pgroup = 4024
7/28 13:06:59 condor_read(): timeout reading 5 bytes from <172.24.1.66:1221>.
7/28 13:06:59 IO: Failed to read packet header
7/28 13:06:59 Sent signal 15 to SCHEDD (pid 4012)
7/28 13:07:09 Sent signal 15 to STARTD (pid 4024)
7/28 13:07:09 condor_write(): Socket closed when trying to write 302 bytes to <172.24.1.66:1231>, fd is 836
7/28 13:07:09 Buf::write(): condor_write() failed
7/28 13:07:09 SECMAN: Error sending response classad!
MyType = "(unknown type)"
TargetType = "(unknown type)"
AuthMethods = "NTSSPI,KERBEROS"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "PREFERRED"
Authentication = "OPTIONAL"
Encryption = "OPTIONAL"
Integrity = "OPTIONAL"
Enact = "NO"
Subsystem = "STARTD"
ParentUniqueID = "michs13611con2:3160:1217242594"
ServerPid = 4024
SessionDuration = "8640000"
NewSession = "YES"
RemoteVersion = "$CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $"
ServerCommandSock = "<172.24.1.66:1222>"
Command = 60010
AuthCommand = 60008
7/28 13:07:09 The SCHEDD (pid 4012) exited with status 0
7/28 13:07:09 condor_write(): Socket closed when trying to write 302 bytes to <172.24.1.66:1232>, fd is 608
7/28 13:07:09 Buf::write(): condor_write() failed
7/28 13:07:09 SECMAN: Error sending response classad!
MyType = "(unknown type)"
TargetType = "(unknown type)"
AuthMethods = "NTSSPI,KERBEROS"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "PREFERRED"
Authentication = "OPTIONAL"
Encryption = "OPTIONAL"
Integrity = "OPTIONAL"
Enact = "NO"
Subsystem = "SCHEDD"
ParentUniqueID = "michs13611con2:3160:1217242594"
ServerPid = 4012
SessionDuration = "8640000"
NewSession = "YES"
RemoteVersion = "$CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $"
ServerCommandSock = "<172.24.1.66:1221>"
Command = 60010
AuthCommand = 60008
ServerTime = 1217243199
7/28 13:07:09 ERROR: DC_AUTHENTICATE unable to receive auth_info!
7/28 13:07:09 ERROR: DC_AUTHENTICATE unable to receive auth_info!
7/28 13:07:09 The STARTD (pid 4024) exited with status 0
7/28 13:07:09 All daemons are gone.  Restarting.
7/28 13:07:09 Restarting master right away.
7/28 13:07:09 Running as NT Service = 1
7/28 13:07:09 Doing exec( "C:\WINDOWS\system32\cmd.exe /Q /C net stop Condor & net start Condor" )
7/28 13:07:10 SetEnvironmentVariable failed, errno=203
7/28 13:07:10 ******************************************************
7/28 13:07:10 ** Condor (CONDOR_MASTER) STARTING UP
7/28 13:07:10 ** C:\condor\bin\condor_master.exe
7/28 13:07:10 ** $CondorVersion: 7.0.2 Jun  9 2008 BuildID: 89891 $
7/28 13:07:10 ** $CondorPlatform: INTEL-WINNT50 $
7/28 13:07:10 ** PID = 2652
7/28 13:07:10 ** Log last touched 7/28 13:07:09
7/28 13:07:10 ******************************************************
7/28 13:07:10 Using config source: C:\condor\condor_config
7/28 13:07:10 Using local config sources:
7/28 13:07:10    C:\condor/condor_config.local
7/28 13:07:10 DaemonCore: Command Socket at <172.24.1.66:1256>
7/28 13:17:09 WinFirewall: get_CurrentProfile failed: 0x800706d9
7/28 13:17:09 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 3356
7/28 13:17:09 Started DaemonCore process "C:\condor/bin/condor_startd.exe", pid and pgroup = 3364
7/28 14:17:10 Preen pid is 2200
7/28 14:17:10 Child 2200 died, but not a daemon -- Ignored
___________________________________________________________________________

   Australian Antarctic Division - Commonwealth of Australia
IMPORTANT: This transmission is intended for the addressee only. If you are not the
intended recipient, you are notified that use or dissemination of this communication is
strictly prohibited by Commonwealth law. If you have received this transmission in error,
please notify the sender immediately by e-mail or by telephoning +61 3 6232 3209 and
DELETE the message.
       Visit our web site at http://www.antarctica.gov.au/
___________________________________________________________________________
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/