[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] startd doesn't start



Hi Alessandra,

I successfully reproduced your problem and understand what is happening.

When the HTCondor service is started as root, the HTCondor daemons have the ability to run as root, but they use it very sparingly. HTCondor wants to run 99% of the time with an effective uid of an account that is less privileged than root. This is a good thing :). By default, this less privileged account is the "condor" account, but you can override this by setting CONDOR_IDS to specify the uid/gid to use if the "condor" account does not exist etc.

The issue is that you have CONDOR_IDS=0.0 set; that is an insecure configuration, as tells HTCondor to use a uid of 0 (and a gid of 0) whenever it wants to run as user "condor", which effectively defeats the whole idea, as now HTCondor will always be running with full root powers even at times it does not need it.

The reason setting CONDOR_IDS=0.0 no longer works in v8.6.0 is this patch which appeared in v8.5.2:
  https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5467
Effectively this patch is causing your startd to abort at startup when it attempts to spawn off processes to run benchmarks; the startd wants to run the benchmark processes as the less privileged "condor" account (as no root access is required), but CONDOR_IDS=0.0 tells the startd to use the root account in place of the condor account, and that is no longer permitted.

I think it would help for us to add a patch that has HTCondor immediately abort on startup with a clear/helpful error message if CONDOR_IDS=0.0.

But the fix for you is to get rid of CONDOR_IDS=0.0 and just let HTCondor use the "condor" account, or if you don't have a condor account, pick CONDOR_IDS with any uid/gid other than 0. For instance, the HTCondor Manual suggests using the uid/gid of the "daemon" account if for some reason you do not allow a condor account; take a peek in the index of the Manual at the CONDOR_IDS entry for other ideas.

Hope this helps
Todd


On 3/18/2017 3:43 PM, Alessandra Forti wrote:
In the 8.4.11 version has errors too looking at it, but somehow it still
works
03/18/17 20:41:54 ERROR: Attempt to initialize user_priv with root
privileges rejected
03/18/17 20:41:54 set_user_egid() called when UserIds not inited!
03/18/17 20:41:54 set_user_euid() called when UserIds not inited!
03/18/17 20:41:54 Create_Process(/usr/libexec/condor/condor_kflops):
child failed because PRIV_USER_FINAL process was still root before exec()


On 18/03/2017 20:29, Alessandra Forti wrote:
Hi Greg,

thanks for your reply. We don't have LDAP/NIS the users are local to
the grid cluster. Puppet creates a condor user as well. CONDOR_IDS is
set to 0.0. in both the 8.4.11 and 8.6.1 installation. I did enable
D_FULLDEBUG but I cannot find any information about whatever user is
used other than

03/18/17 20:23:47 Running as root.  Enabling specialized core dump
routines
03/18/17 20:23:47 Daemon Log is logging: D_FULLDEBUG D_ALWAYS D_ERROR
D_COMMAND

and then eventually the error reported.

cheers
alessandra

On 18/03/2017 16:11, Greg Thain wrote:
On 03/17/2017 04:28 AM, Alessandra Forti wrote:

In the StartLog files I have this error

03/17/17 08:20:35 ERROR: Attempt to initialize user_priv with root
privileges rejected
03/17/17 08:20:35 ERROR "Programmer Error: attempted switch to user
privilege, but user ids are not initialized" at line 1500 in file

When started as root, the startd spends most of it's runtime as a
non-root user, for security reasons.  In those places where it needs
root, it will setuid back to root temporarily, but then switch back
to a non-root uid.

This error says that the startd is trying to switch effective user id
to some non-root user, but the numeric id (or gid) of that non-root
user is zero, which is clearly an error, and rather than run with
improperly elevated privileges, the startd aborts.

So, it would be useful to know which user it is trying to run as,
which a bit more of the log above this, especially running the startd
with D_FULLDEBUG will show.  Also, the values of the config setting
CONDOR_IDS may be involved.  This error can be caused if you are
getting your passwd file entries from NIS or LDAP, and somehow the
startd didn't load that library or configuration.

-greg


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

--
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... (Anonymous)


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

--
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... (Anonymous)



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685