[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Win7 Condor 8.0.4 "can't find address for local master"



It's hard to know exactly what the problem is from this.  but there are some things you can do to gather more information if this happens again.

    condor_status -master -af Name MyAddress

will list the names and addresses of all of the masters that the collector knows about.   Perhaps the condor_master was binding to the wrong IP
address.

Another way to get similar information is to run

   condor_who -verbose

condor_who gets its information from scraping log files, so it will work even if the master can't contact the collector, but it can only
see information that is still in the log - it works best shortly after HTCondor was started up.

You can also find out a lot about what things condor_restart tried to do by setting this in condor_config

TOOL_DEBUG = D_HOSTNAME D_FULLDEBUG

Then look at the ToolLog after you get the error message from condor_restart.

On 12/11/2013 9:11 AM, Andrew Mole wrote:
Follow-up:

I have since removed and reinstalled and the problem seems to have gone away for now. It would be good to know why this occurs though, as this is not the first time.

Best regards,

Andrew


On 9 Dec, 2013, at 4:28 PM, "Andrew Mole" <Andrew.Mole@xxxxxxxx> wrote:

I have recently done a fresh install of Condor 8.0.4 on Win7 (actually a virtual machine, identified below as VPC01).

 

I have modified the ALLOW_ADMINISTRATOR to include two additional machines

 

ALLOW_ADMINISTRATOR=PC195, PC016, VPC01

 

I have also added some allow_config lines in the condor_config.local

 

ALLOW_CONFIG = andrew.mole@*, *@PC195, *@PC016, *@VPC01

 

However, when I try to condor_restart or condor_reconfig I get the following error message…

 

C:\Users\andrew.mole>condor_restart

Can't find address for local master

Perhaps you need to query another pool.

 

What does this mean? I have attached the MasterLog below. Should all these machine names be listed as long versions e.g. PC195.clients.global.company.com ?

 

MasterLog

 

12/05/13 18:56:23 ******************************************************

12/05/13 18:56:23 ** condor (CONDOR_MASTER) STARTING UP

12/05/13 18:56:23 ** C:\condor\bin\condor_master.exe

12/05/13 18:56:23 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)

12/05/13 18:56:23 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON

12/05/13 18:56:23 ** $CondorVersion: 8.0.4 Oct 19 2013 BuildID: 189770 $

12/05/13 18:56:23 ** $CondorPlatform: x86_64_Windows7 $

12/05/13 18:56:23 ** PID = 3560

12/05/13 18:56:23 ** Log last touched time unavailable (No such file or directory)

12/05/13 18:56:23 ******************************************************

12/05/13 18:56:23 Using config source: C:\condor\condor_config

12/05/13 18:56:23 Using local config sources:

12/05/13 18:56:23    C:\condor/condor_config.local

[…]

12/06/13 02:30:21 Sent signal 15 to STARTD (pid 3488)

12/06/13 02:30:21 DefaultReaper unexpectedly called on pid 3488, status 0.

12/06/13 02:30:21 The STARTD (pid 3488) exited with status 0

12/06/13 02:30:21 restarting C:\condor/bin/condor_startd.exe in 10 seconds

12/06/13 02:30:22 PowerEventHander: Waking machine (APM)

12/06/13 02:30:31 Started DaemonCore process "C:\condor/bin/condor_startd.exe", pid and pgroup = 3236

12/06/13 03:00:19 PowerEventHander: Machine entering hibernation

12/06/13 03:11:23 PowerEventHander: Waking machine to handle an event (Automatic)

12/06/13 03:11:23 ERROR "IpVerify::Verify: called with unknown permission 10

" at line 692 in file c:\condor\execute\dir_18384\userdir\src\condor_io\condor_ipverify.cpp

12/06/13 03:11:23 Sent SIGKILL to KBDD (pid 3892) and all its children.

12/06/13 03:11:23 Sent SIGKILL to SCHEDD (pid 3812) and all its children.

12/06/13 03:11:23 Sent SIGKILL to STARTD (pid 3236) and all its children.

12/06/13 03:11:23 **** condor (condor_MASTER) pid 3560 EXITING WITH STATUS 1

12/06/13 03:11:23 Sock::bind - _state is not correct

12/06/13 03:11:23 SafeSock::my_ip_str() failed to bind: _state = 0

12/06/13 03:11:23 Sock::bind - _state is not correct

12/06/13 03:11:23 SafeSock::my_ip_str() failed to bind: _state = 0

12/06/13 03:11:23 sendMsg:sendto failed - errno: 42

12/06/13 03:11:23 Failed to send non-blocking update to .

 

 

____________________________________________________________
Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/