[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Translating command codes in MasterLog and StarterLog



Dear John

>
> Do you use host-based security?  


If by that you mean we only secure against hostnames, which we control, then yes

> is ALLOW_WRITE set to IP addresse
> (s) or DNS name(s)?


It is set to DNS names with various masks using * characters

> do you have a setting in your config file for ALLOW_DAEMON? (you
> wouldn't normally).


No

>
> On possible reason for this is that the daemon receiving the message
> doesn't recognise the IP address of the sender.  
> This can happen with host-based security when your config file uses
> machine names rather than IP adresses in the ALLOW_WRITE and
> ALLOW_READ expressions, and the machine name isn't getting
> translated into an IP address correctly in all cases.  


Aah, ok, this is starting to make sense, if there were intermittent network outages / issues where those workstations lived then those errors would start cropping up.  So that I can pass this to our networking team, do you have a list of those cases when the translation might not work ?

FYI we use cnames to point to our condor manager, I intend to swap it out again soon for a VM possibly running the cycle server stack, beautiful to see a quick DNS change keep your pool alive while you do a hot swap out of a central manager, however I think the academic license limits to 4k slots, is that right ?  We are a university, but we have just over 8k slots in our pool during term time, perhaps I should talk to cycleserver directly...

> This name-
> lookup problem seems to particularly a problem in the Windows
> version of Condor especially on machines that have more than one net adapter.


Yes, those workstations are Windows, but I don't think they have 2 network adapters

>
> Alternatively there is also a known problem with the grid manager
> daemon sending DC_KEEPALIVE messages to the master when NOT using
> host-based security.


Thanks for clearing up those error messages

All the best

James

>
> -tj
>
>
> On 6/9/2011 7:54 AM, James Osborne wrote:

> Dear Administrators
>
> I run a pool of Windows XP machines and have used Linux submit
> machines in the past
>
> I would like to know if anybody has seen, or can explain log
> messages like the following:
>
> MasterLog
>
>         PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE)
>
> StarterLog
>
>         PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60000
> (DC_RAISESIGNAL), access level DAEMON: reason: DAEMON authorization
> policy contains no matching ALLOW entry for this request;
> identifiers used for this host:
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE>,<HOSTNAME_OF_THIS_EXECUTE_NODE>
>         PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60000
> (DC_RAISESIGNAL), access level DAEMON: reason: cached result for
> DAEMON; see first case for the full reason
>
> I know the log snipets don't match up in time, but only the context
> matters I think, and understanding those command codes e.g. code
> 60008 and 60000
>
> All of the jobs I submitted eventually ran on the pool, and all
> execute machines _should_ have identical Condor configurations
>
> Thanks in advance
>
> James
>
> # MasterLog
>
> 4/18 08:59:10 ******************************************************
> 4/18 08:59:10 ** Condor (CONDOR_MASTER) STARTING UP
> 4/18 08:59:10 ** c:\condor\bin\condor_master.exe
> 4/18 08:59:10 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
> 4/18 08:59:10 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
> 4/18 08:59:10 ** $CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $
> 4/18 08:59:10 ** $CondorPlatform: INTEL-WINNT50 $
> 4/18 08:59:10 ** PID = 460
> 4/18 08:59:10 ** Log last touched 4/15 20:22:29
> 4/18 08:59:10 ******************************************************
> 4/18 08:59:10 Using config source: C:\Condor\condor_config
> 4/18 08:59:10 Using local config sources:
> 4/18 08:59:10    C:\Condor\condor_config_local
> 4/18 08:59:10 DaemonCore: Command Socket at
> <<IP_ADDRESS_OF_THIS_EXECUTE_NODE>:9613>
> 4/18 08:59:11 Started DaemonCore process "C:\Condor/bin/
> condor_startd.exe", pid and pgroup = 600
> 4/18 08:59:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: DAEMON authorization policy contains no
> matching ALLOW entry for this request; identifiers used for this
> host: <IP_ADDRESS_OF_THIS_EXECUTE_NODE>,<HOSTNAME_OF_THIS_EXECUTE_NODE>
> 4/18 09:19:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 09:38:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 09:58:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 09:59:11 Preen pid is 2820
> 4/18 09:59:12 Child 2820 died, but not a daemon -- Ignored
> 4/18 10:17:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 10:37:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 10:56:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 11:16:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 11:35:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 11:55:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 12:14:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 12:34:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 12:53:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 13:13:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 13:32:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 13:52:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 14:11:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 14:31:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 14:50:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 15:10:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 15:29:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 15:49:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 16:08:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 16:28:14 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/18 16:47:44 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60008 (DC_CHILDALIVE),
> access level DAEMON: reason: cached result for DAEMON; see first
> case for the full reason
> 4/19 08:53:43 UnsetEnv(NET_REMAP_ENABLE): SetEnvironmentVariable
> failed, errno=203
>
> # StarterLog.slot1
>
> /4 09:36:22 ** condor_starter (CONDOR_STARTER) STARTING UP
> 4/4 09:36:22 ** C:\Condor\bin\condor_starter.exe
> 4/4 09:36:22 ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
> 4/4 09:36:22 ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
> 4/4 09:36:22 ** $CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $
> 4/4 09:36:22 ** $CondorPlatform: INTEL-WINNT50 $
> 4/4 09:36:22 ** PID = 2568
> 4/4 09:36:22 ** Log last touched 4/1 17:18:51
> 4/4 09:36:22 ******************************************************
> 4/4 09:36:22 Using config source: C:\Condor\condor_config
> 4/4 09:36:22 Using local config sources:
> 4/4 09:36:22    C:\Condor\condor_config_local
> 4/4 09:36:22 DaemonCore: Command Socket at
> <<IP_ADDRESS_OF_THIS_EXECUTE_NODE>:9613>
> 4/4 09:36:22 GLEXEC_JOB not supported on this platform; ignoring
> 4/4 09:36:22 Setting resource limits not implemented!
> 4/4 09:36:23 Communicating with shadow
> <<IP_ADDRESS_OF_LINUX_SUBMIT_HOST>:51634>
> 4/4 09:36:23 Submitting machine is "james.condor.cf.ac.uk"
> 4/4 09:36:23 setting the orig job name in starter
> 4/4 09:36:23 setting the orig job iwd in starter
> 4/4 09:36:25 File transfer completed successfully.
> 4/4 09:36:26 Job 7039.0 set to execute immediately
> 4/4 09:36:26 Starting a VANILLA universe job with ID: 7039.0
> 4/4 09:36:26 Tracking process family by login "condor-reuse-slot1"
> 4/4 09:36:26 IWD: C:\Condor\execute\dir_2568
> 4/4 09:36:26 Output file: C:\Condor\execute\dir_2568\job-out.txt
> 4/4 09:36:26 Error file: C:\Condor\execute\dir_2568\job-error.txt
> 4/4 09:36:26 Renice expr "19" evaluated to 19
> 4/4 09:36:26 About to exec C:\WINDOWS\system32\cmd.exe /Q /C condor_exec.bat
> 4/4 09:36:26 Create_Process succeeded, pid=3604
> 4/4 09:44:20 Process exited, pid=3604, status=0
> 4/4 09:44:20 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60000
> (DC_RAISESIGNAL), access level DAEMON: reason: DAEMON authorization
> policy contains no matching ALLOW entry for this request;
> identifiers used for this host:
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE>,<HOSTNAME_OF_THIS_EXECUTE_NODE>
> 4/4 09:44:20 PERMISSION DENIED to unauthenticated user from host
> <IP_ADDRESS_OF_THIS_EXECUTE_NODE> for command 60000
> (DC_RAISESIGNAL), access level DAEMON: reason: cached result for
> DAEMON; see first case for the full reason
>
> -----
>
> Dr James Osborne
> Condor Service Manager & Application Support Engineer
> Advanced Research Computing Division
> Cardiff University, Redwood Building
> King Edward VII Avenue, Cardiff
> CF10 3NB
>
> Tel: +44(0)2920 874657
> Fax: +44(0)2920 870734

>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
>
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
>
https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
>
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
>
https://lists.cs.wisc.edu/archive/condor-users/