[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor-C SetEffectiveOwner - Permission denied [SEC=UNCLASSIFIED]



Still following up on this problem,

Below is a few lines from the submit machines SchedLog which I am hoping might be enlightening to someone because I have no idea.
With FULLDEBUG the line about Queue super user now stands out.
I have uninstalled/reinstalled, new config files, removed directory, all to no avail.  
Job submission works fine as a Personal Pool (Windows 7), and as Vanilla submit in a wider pool (all linux).  
It is just under Condor-C with submitter configured as a Personal Pool and performing a grid job submission that this authentication problem crops up.

Why?

Troy


SCHEDLOG
02/17/14 10:58:16 (pid:5504) Received TCP command 1111 (QMGMT_READ_CMD) from unauthenticated@unmapped <147.66.85.78:34702>, access level READ
02/17/14 10:58:16 (pid:5504) Number of Active Workers 0
02/17/14 10:58:16 (pid:5504) QMGR Connection closed
02/17/14 10:58:17 (pid:5504) Received TCP command 1112 (QMGMT_WRITE_CMD) from SYSTEM <147.66.85.78:34708>, access level WRITE
02/17/14 10:58:17 (pid:5504) Queue super user not allowed to set owner to troy@domain, because this instance of the schedd has never seen that user submit any jobs.
02/17/14 10:58:17 (pid:5504) SetEffectiveOwner security violation: setting owner to troy@domain when active owner is "SYSTEM"
02/17/14 10:58:17 (pid:5504) condor_read(): Socket closed when trying to read 5 bytes from <147.66.85.78:34708>
02/17/14 10:58:17 (pid:5504) IO: EOF reading packet header
02/17/14 10:58:17 (pid:5504) QMGR Connection closed
02/17/14 10:58:17 (pid:5504) Received TCP command 1111 (QMGMT_READ_CMD) from unauthenticated@unmapped <147.66.85.78:34715>, access level READ
02/17/14 10:58:17 (pid:5504) Number of Active Workers 0
02/17/14 10:58:17 (pid:5504) QMGR Connection closed
02/17/14 10:58:19 (pid:5504) Received TCP command 1111 (QMGMT_READ_CMD) from unauthenticated@unmapped <147.66.85.78:34732>, access level READ


---------------------------------------------------------------------------------------------------
> On Tue, 11 Feb 2014 14:40:15 Troy Robertson wrote:
> 
> Hi,
> 
> I'm struggling with HTCondor-C.
> 
> This was originally working on our system but during the 2 years I have
> been away something failed and users reverted to using it as a single
> pool.
> Still running 7.8.8 across a number of dedicated linux processors with
> Windows user submit machines.  I don't want to upgrade until I find the
> answer to this issue.
> 
> If a grid job is submitted it sits locally Idle with: Request has not
> been considered by the Matchmaker Gridmanager process keeps starting
> up, repeatedly failing to set permissions for something? And then
> exiting.  SchedLog shows something similar.
> 
> I have Googled my heart out to no avail.  Have re-installed at Windows
> submit machine.  What is it about the uid's/permissions?
> As I said, jobs submitted as Vanilla rather than Grid to the same
> remote central manager run as per normal.  The Gahp_worker never fires,
> so I think it is a problem locally.
> 
> Can anyone please be of assistance.
> 
> Troy
> 
> 
> GridmanagerLog:
> ...
> 02/11/14 14:23:40 [7608] TokenCache contents:
> troy@domain
> 02/11/14 14:23:40 [7608] DaemonCore: in SendAliveToParent()
> 02/11/14 14:23:40 [7608] DaemonCore::IsPidAlive(): OpenProcess failed
> 02/11/14 14:23:40 [7608] DaemonCore: in SendAliveToParent() - ppid
> 4740l disappeared!
> 02/11/14 14:23:40 [7608] Checking proxies
> 02/11/14 14:23:43 [7608] Initialized the following authorization table:
> 02/11/14 14:23:43 [7608] Authorizations yet to be resolved:
> 02/11/14 14:23:43 [7608] allow READ:  */* */*
> 02/11/14 14:23:43 [7608] allow WRITE:  */* */local@xxxxx */147.66.85.62
> */147.66.85.62
> 02/11/14 14:23:43 [7608] allow NEGOTIATOR:  */ local@xxxxx
> */147.66.85.62 */147.66.85.62
> 02/11/14 14:23:43 [7608] allow ADMINISTRATOR:  */ local@xxxxx
> */147.66.85.62 */147.66.85.62
> 02/11/14 14:23:43 [7608] allow OWNER:  */ local@xxxxx */NEW-
> 50985.aad.gov.au */147.66.85.62 */147.66.85.62 */147.66.85.62
> 02/11/14 14:23:43 [7608] allow DAEMON:  */* */ local@xxxxx
> */147.66.85.62 */147.66.85.62
> 02/11/14 14:23:43 [7608] allow ADVERTISE_STARTD:  */* */ local@xxxxx
> */147.66.85.62 */147.66.85.62
> 02/11/14 14:23:43 [7608] allow ADVERTISE_SCHEDD:  */* */ local@xxxxx
> */147.66.85.62 */147.66.85.62
> 02/11/14 14:23:43 [7608] allow ADVERTISE_MASTER:  */* */ local@xxxxx
> */147.66.85.62 */147.66.85.62
> 02/11/14 14:23:43 [7608] Received ADD_JOBS signal
> 02/11/14 14:23:43 [7608] in doContactSchedd()
> 02/11/14 14:23:43 [7608] TokenCache contents:
> troy@domain
> 02/11/14 14:23:43 [7608] SetEffectiveOwner(troy@domain) failed with
> errno=13: Permission denied.
> 02/11/14 14:23:43 [7608] Failed to connect to schedd! Will retry
> 02/11/14 14:23:45 [7608] Evaluating staleness of remote job statuses.
> 02/11/14 14:23:48 [7608] in doContactSchedd()
> 02/11/14 14:23:48 [7608] TokenCache contents:
> troy@domain
> ...[SNIP]...
> 02/11/14 14:24:23 [7608] SetEffectiveOwner(troy@domain) failed with
> errno=13: Permission denied.
> 02/11/14 14:24:23 [7608] Failed to connect to schedd! Will retry
> 02/11/14 14:24:28 [7608] in doContactSchedd()
> 02/11/14 14:24:28 [7608] TokenCache contents:
> troy@domain
> 02/11/14 14:24:28 [7608] SetEffectiveOwner(troy@domain) failed with
> errno=13: Permission denied.
> 02/11/14 14:24:28 [7608] Failed to connect to schedd!
> 02/11/14 14:24:28 [7608] ERROR "Too many failures connecting to
> schedd!" at line 1246 in file
> c:\condor\execute\dir_11160\userdir\src\condor_gridmanager\gridmanager.
> cpp
> 02/11/14 14:28:40 init_user_ids: want user 'troy@domain', current is
> '(null)@(null)'
> 02/11/14 14:28:40 Found credential for user troy@domain'
> 02/11/14 14:28:40 LogonUser completed.
> 
> SchedLog:
> ...
> 02/11/14 13:36:11 (pid:4740) SetEffectiveOwner security violation:
> setting owner to troy@domain when active owner is "SYSTEM"
> 02/11/14 13:36:12 (pid:4740) Number of Active Workers 0
> 02/11/14 13:36:14 (pid:4740) Number of Active Workers 0
> 02/11/14 13:36:15 (pid:4740) Number of Active Workers 0
> 02/11/14 13:36:16 (pid:4740) SetEffectiveOwner security violation:
> setting owner to troy@domain when active owner is "SYSTEM"
> 02/11/14 13:36:17 (pid:4740) Number of Active Workers 0
> 02/11/14 13:36:18 (pid:4740) Number of Active Workers 0
> 02/11/14 13:36:20 (pid:4740) Number of Active Workers 0
> 02/11/14 13:36:21 (pid:4740) SetEffectiveOwner security violation:
> setting owner to troy@domain when active owner is "SYSTEM"
> 02/11/14 13:36:21 (pid:4740) condor_gridmanager (PID 7652, owner troy)
> exited with return code 4.
> 02/11/14 13:36:21 (pid:4740) Number of Active Workers 0
> 
> 
> Condor_config.local:
> ...
> UID_DOMAIN = $(FULL_HOSTNAME)
> 
> #TRUST_UID_DOMAIN = TRUE
> 
> HOSTALLOW_READ = *
> HOSTALLOW_WRITE = *
> 
> ##  Daemons
> DAEMON_LIST=MASTER SCHEDD COLLECTOR NEGOTIATOR
> 
> ## GRID PARAMS
> GRIDMANAGER_LOG = $(LOG)/GridLogs/GridmanagerLog.$(USERNAME)
> C_GAHP_LOG = $(LOG)/GridLogs/CGAHPLog.$(USERNAME)
> C_GAHP_WORKER_THREAD_LOG = $(LOG)/GridLogs/CGAHPWorkerLog.$(USERNAME)
> 
> ## DEBUGGING
> GRIDMANAGER_DEBUG          = D_FULLDEBUG
> C_GAHP_DEBUG = D_FULLDEBUG
> C_GAHP_WORKER_THREAD_DEBUG = D_FULLDEBUG
> 
> ## Security
> SEC_DEFAULT_NEGOTIATION = OPTIONAL
> SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE
> 
> 
> Submit file:
> Universe = grid
> Executable = /usr/bin/R
> transfer_executable = False
> Arguments = --version
> Error = Error_$(Cluster).$(Process).txt
> Output = Output_$(Cluster).$(Process).txt Log = Condor_log.txt
> should_transfer_files = ALWAYS 
> when_to_transfer_output = ON_EXIT
> grid_resource = condor server1.a.b.c server1.a.b.c
> +remote_requirements = Arch == "X86_64" && OpSys == "LINUX"
> +remote_universe = vanilla
> +remote_shouldtransferfiles = "YES"
> +remote_whentotransferoutput = "ON_EXIT"
> Queue
> 

___________________________________________________________________________

    Australian Antarctic Division - Commonwealth of Australia
IMPORTANT: This transmission is intended for the addressee only. If you are not the
intended recipient, you are notified that use or dissemination of this communication is
strictly prohibited by Commonwealth law. If you have received this transmission in error,
please notify the sender immediately by e-mail or by telephoning +61 3 6232 3209 and
DELETE the message.
        Visit our web site at http://www.antarctica.gov.au/
___________________________________________________________________________