[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Help with setting up a new cluster



The installation went ok and all the services are running and healthy, but
the two machines cannot communicate with each other.

What did you do to test this? By default, condor_reconfig and condor_restart only communicate with the local daemons.

I also suspect something is very wrong since neither sudo condor_reconfig or sudo condor_restart works on either machine.

This indicates a local authentication or authorization problem, but both of those are tricky. :)

There are no firewalls enabled on either machine. (I can ssh for example between the machines)

For youre reference, by default, HTCondor only needs port 9618 to be open inbound.

volcano@volcano:~$ sudo condor_restart
ERROR
SECMAN:2010:Received "DENIED" from server for user condor_pool@ using
method IDTOKENS.
Can't send Restart command to local master

Looking at your condor_config.central, and what the error is saying, it looks like you took the advice "## To expand your condor pool beyond a single host, set ALLOW_WRITE to match all of the hosts" without following the instructions in the preceeding (admittedly very long) comment. If you're very confident in the security of your network, your current config will probably work if you follow enable host-based security; the comment has specific instructions.

It is more secure to configure HTCondor with user-based security, which is why it is the default. If you look at the error message above,
it specifies the user you authenticated as (condor_pool@).  I'm pretty
sure condor_restart requires ADMINISTRATOR access, so you should set

ALLOW_ADMINISTRATOR = $(ALLOW_ADMINISTRATOR) condor_pool@

In fact, you should probably unset all of the other ALLOW_* values
you set; they should all already be set correctly as a result of
running get_htcondor.

kenway@haleakala:/etc/condor$ sudo condor_restart
[sudo] password for kenway:
ERROR
SECMAN:2010:Received "DENIED" from server for user condor_pool@ using
method IDTOKENS.
Can't send Restart command to local master

It looks like you were consistent (yay!) between the different nodes' security configurations, so the same advice applies here. That
should get the administrative commands working.

The next step is probably checking condor_status, to see if the EPs can report to the CM.

- ToddM