[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to tell Condor not to run jobs as nobody?

The Debian package will start the condor_master as root. When the HTCondor daemons are started as root, they immediately switch their effective UID to ‘condor’. They switch their effective UID back to root only when necessary (usually to act as a job owner). The current effective UID is what ps reports.

Setting the following in the config file of the slave machines should do what you want:

SLOT1_USER = slaveadmin
SLOT2_USER = slaveadmin

If a slave machine has more than 2 slots, you’ll need to add more SLOTx_USER lines.

 -- Jaime Frey

On Mar 4, 2014, at 2:52 PM, J J <999iscool@xxxxxxxxx> wrote:



I'm running Ubuntu 12.04. I downloaded deb and use dkpg to install them.

Here is ps aux from the master node:

condor     947  0.0  0.9  11268  4056 ?        Ss   18:03   0:00 /usr/sbin/condor_master -pidfile /var/run/condor/condor.pid
root      1161  0.0  0.4   5184  1800 ?        S    18:03   0:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 10000000 -S 60 -C 109
condor    1162  0.0  1.1  11540  5064 ?        Ss   18:03   0:00 condor_collector -f
condor    1163  0.0  1.1  11600  4944 ?        Ss   18:03   0:00 condor_negotiator -f
condor    1164  0.0  1.2  12544  5728 ?        Ss   18:03   0:00 condor_schedd -f
1001      2245  0.0  0.1   4388   840 pts/0    S+   20:45   0:00 grep --color=auto condor

I supposed condor_master is not root. So are you saying I must start them as root? How come it was starting as condor from a default installation?  (sudo dkpg -i condor.deb)

>  If (b), do your specified slot users exist in the /etc/passwd file of the system?

Yes. Each slave has a unix user called slaveadmin. And we are not using NFS.

Supposed we just have one master, one slave. Each slave has 2 slots - I guess # doesn't matter here. I am hoping to send job from master to slave and have slave run the jobs as slaveadmin, so that the jobs can use slaveadmin's environment variables, ssh keys.



On Tue, Mar 4, 2014 at 1:56 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
Re all the below, I assume you are using Linux?

Also, what is it you are trying to achieve?  Do you want your jobs to a) run as the user who submitted them (common desire if there is a shared file system like NFS across all nodes and all job files live on the shared file system), or b) run as specific slot users?    If (a), is the /etc/passwd file synced across all your machines such that each machine has the same list of user logins with the same associated UIDs?  If (b), do your specified slot users exist in the /etc/passwd file of the system?

Are you starting the condor_master daemon as user root?  Below you say "master runs as adminFOO and slave has an admin user called adminBAR", I do not know what you mean by this.  HTCondor cannot run jobs as different users unless the condor_master is started as root.


On 3/4/2014 12:29 PM, J J wrote:

I am using 7.9.2 on master and 8.0.3 on slave nodes. My requirement is
simple; all computations are done internally, not publicly facing at all.
We have zero security risk really, this is just a small cluster set up for
in-house testing and we trust our staff.

According to this [1] and [2],

* if UID_DOMAIN on master and slaves don't match, job is run as nobody
* if TRUST_UID_DOMAIN is TRUE, UID_DOMAIN check is skipped
* if UID_DOMAIN is * on both nodes, that's effectively the same as
* I can set a particular user for each slot by SLOT1_USER, SLOT2_USER.

I tried all the above and the method in [1] on both master and slave's
condor_config.local and I still run jobs as nobody.

I can tell this by having a Python script
#!/usr/bin/env python
import getpass

and the outcome is nobody.

Using SLOTx_USER method, the job is disconnected from slaves and then put
into idle forever.

The master runs as adminFOO and slave has an admin user called adminBAR.
Can someone point out where my mistake is? I tried all combinations of