[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] issues getting with condor



Thanks! That got it!

-----Original Message-----
From: Todd Tannenbaum [mailto:tannenba@xxxxxxxxxxx] 
Sent: Wednesday, May 29, 2013 5:44 PM
To: HTCondor-Users Mail List
Cc: Dunn, George Jr
Subject: Re: [HTCondor-users] issues getting with condor

On 5/29/2013 3:31 PM, Dunn, George Jr wrote:
> Hi all,
>
> I have installed condor from source, the tarball, and the repo (I am 
> on CentOS 6) all with similar results.
>
> There are two questions I have at this point.
>
> 1)When people say that the daemons must be started as root. Does this 
> mean that they should all show up as running as root?
>

By default, the ps command shows effective uid of the process, not the "real" uid.  When you start the HTCondor daemons as root, they try to spend 99% of their time running as user "condor" and only switch back to effective user root when they need to do something as root (this is defensive programming).  So even if you start the condor_master as user root, typically "ps" will show it running as user "condor".


To verify that the daemons really have root access (e.g. that condor_master was started as root, as required for HTCondor to run jobs as the submitting user), you could do "ps axo pid,ruid,cmd" to display the real uid (ruid) for each process -- an ruid of 0 is root.

Or the ReadUid also appears in the master classad, so you could do

   condor_status -master -l | grep RealUid

and verify that RealUid is 0 for all machines.

> 2)If so and that is not the case (ie all but condor_procd are running 
> as the user condor) Is this why I am getting
>
> Failed to open '/home/<user>/condor-test/simple.out' as standard output:
> Permission denied (errno 13)
>
> when I try the example here:
> http://research.cs.wisc.edu/htcondor/tutorials/intl-grid-school-3/subm
> it_first.html)
>
> Or here:
> http://spinningmatt.wordpress.com/2010/07/26/getting-started-installin
> g-a-single-node-condor-pool/
>
> I saw an earlier mailing list question from 2007 that seems to address 
> this issue (hence question 1)
>
> https://lists.cs.wisc.edu/archive/htcondor-users/2007-April/msg00175.s
> html
>
> It also mentions the UID_DOMAIN name matching but at this point this 
> node has a resolvable FQDN that is set as the hostname and is the only 
> node in the pool and has manager, submit, and execute roles.
>
> Can anyone please help? I REALLY want to use this ! J
>

Did you configure to use slot users in your config file with SLOT<N>_USER?  I am guessing you did not, but if you did then the slot user specified must have access.

Assuming you did not configure slot users, try setting
   TRUST_UID_DOMAIN = True
   SOFT_UID_DOMAIN = True
in your condor_config file(s) then do a condor_reconfig.

regards,
Todd