[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] startd doesn't start



I tried to install a 8.6.1 node on the testbed that works and Startd doesn't start, if I downgrade to 8.4.11 it does. So it seems something peculiar to 8.6.1.

Is anyone on this list using 8.6.1? I guess for now I'm going to stick to 8.4.11 but if there is an answer to solve this please let me know.

cheers
alessandra

On 17/03/2017 09:46, Alessandra Forti wrote:
I've attached the diff of the output of the condor_config_val -dump in case it can help.

On 17/03/2017 09:28, Alessandra Forti wrote:
Hi,

I'm in a bit of a pickle and can't understand what I'm doing wrong. I have two small testbeds which I should have the same configuration and one works and the other doesn't. They both are configured with puppet.

The one that doesn't work is condor-8.6.1 the one that works is condor-8.4.11.

They are both started by root, on both the UID domain is set to the same value both on the head node and the pool node (as a matter of fact startd doesn't start on the head node either), the both have the same pool_password, but there are some differences. For example the 8.6.1 condor_shared_p starts automatically while in 8.4.11 it doesn't. We don't The pool_password are created differently that's why I stuck with the one that worked on at least one testbed. I can see startd starting for few seconds and then dying or, according to the logs, getting killed

In the StartLog files I have this error

03/17/17 08:20:35 ERROR: Attempt to initialize user_priv with root privileges rejected
03/17/17 08:20:35 ERROR "Programmer Error: attempted switch to user privilege, but user ids are not initialized" at line 1500 in file

While the MasterLog I have an endless series of these messages

03/17/17 03:20:33 restarting /usr/sbin/condor_startd in 3600 seconds
03/17/17 04:20:33 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 2717119
03/17/17 04:20:34 DefaultReaper unexpectedly called on pid 2717119, status 1024.
03/17/17 04:20:34 The STARTD (pid 2717119) exited with status 4
03/17/17 04:20:34 restarting /usr/sbin/condor_startd in 3600 seconds
03/17/17 05:20:34 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 2723991
03/17/17 05:20:35 DefaultReaper unexpectedly called on pid 2723991, status 1024.
03/17/17 05:20:35 The STARTD (pid 2723991) exited with status 4

I can only find references to these errors that are pretty old or not applicable.

thanks for any help

cheers
alessandra

-- 
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... (Anonymous)


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... (Anonymous)

-- 
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... (Anonymous)