[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [External] - Re: Debugging HTCondor Authentication Errors



We have updated the quick start guide for installing an HTCondor pool.

https://htcondor.readthedocs.io/en/latest/admin-manual/quick-start-condor-pool.html

Thank you for helping us fixing its shortcomings.

...Tim

On 6/23/20 1:52 PM, tim@xxxxxxxxxxx wrote:

I think that the instructions need improvement.

Your log indicates that the files /etc/condor/password.d/POOL does not exist.

The directions incorrectly tell you to start HTCondor before configuration is complete.

You need to run the condor_store_cred command before trying to start HTCondor.

Also a stray "\" appears in the security configuration. In my presentation, this line was split due to the large font. In the manual, we should get rid of it.

I hope you find this advice helpful.

...Tim

P.S. The default location for the pool password is /etc/condor/passwords.d/POOL (That should also be changed in the manual.)

On 6/23/20 1:37 PM, wesley.taylor@xxxxxxxxxxx wrote:
Since my last email came through as a blank message, I am going to try again. Sorry about that.

"""
I turned on the insane debugging and got the following:
_________________________________________________________________________________________________________________
06/23/20 09:56:46 (fd:13) (pid:5857) (D_ALWAYS) read_password_from_filename():
read_secure_file(/etc/condor/password.d/POOL) failed!
06/23/20 09:56:46 (fd:13) (pid:5857) (D_PRIV) PRIV_CONDOR --> PRIV_ROOT at
/var/lib/condor/execute/slot3/dir_3977/userdir/.tmpEsbepJ/BUILD/condor-8.9.7/src/condor_utils/secure_file.cpp:213
06/23/20 09:56:46 (fd:13) (pid:5857) (D_PRIV) PRIV_ROOT --> PRIV_CONDOR at
/var/lib/condor/execute/slot3/dir_3977/userdir/.tmpEsbepJ/BUILD/condor-8.9.7/src/condor_utils/secure_file.cpp:216
06/23/20 09:56:46 (fd:13) (pid:5857) (D_ALWAYS:2) ERROR:
read_secure_file(/etc/condor/password.d/POOL): open() failed: No such file or
directory (errno: 2)
06/23/20 09:56:46 (fd:13) (pid:5857) (D_ALWAYS) read_password_from_filename():
read_secure_file(/etc/condor/password.d/POOL) failed!
06/23/20 09:56:46 (fd:13) (pid:5857) (D_SECURITY) PW: Server sending.
06/23/20 09:56:46 (fd:13) (pid:5857) (D_SECURITY) In server_send: -1.
06/23/20 09:56:46 (fd:13) (pid:5857) (D_SECURITY) Server send '', '', 0 0 0
__________________________________________________________________________________________________________________

No symlinks in the hierarchy, everything owned by root:root with exec
permissions all the way down. I believe I am starting the daemons as root? It
looks like the privilege escalation is going fine based on what the log shows
here. At least I have something to go off of now, but at first glance I don't
understand why open() would fail with code 2 on that file. I double checked
with a python interactive shell as root and I was able to open and spit out
bytes from that file just fine.

My thinking is that maybe I am not starting condor correctly, so it can't try
to open the file as root? I am on CentOS7, and started condor with 'sudo
systemctl start condor'.

I also double-checked my security config, and the only differences between my 
config file and the readthedocs are:
______________________________
ALLOW_DAEMON = *
ALLOW_NEGOTIATOR = *
______________________________

Which I do not think are relevant to this issue. Does anyone have any more 
ideas on things I could check to see what's going on?

-Wes

"""

Wesley Taylor â Cluster Manager
Numerica Corporation (www.numerica.us)
5042 Technology Parkway #100
Fort Collins, Colorado 80528
âï (970) 207 2232
ð wesley.taylor@xxxxxxxxxxx


-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Zach Miller
Sent: Tuesday, June 23, 2020 9:30 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] [External] - Re: Debugging HTCondor Authentication Errors

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hello,

You might also check the permissions and ownership on the parent directories.  read_secure_file() wants to make sure the file couldn't have been tampered with.

Are there symlinks in the directory hierarchy?

You are starting the daemons as root?

You can turn on more (insane) debugging by setting "COLLECTOR_DEBUG = D_ALL:2".  It is A LOT, but that will get you every message.  You may also need to increase the size of the log if it rotates too quickly (MAX_COLLECTOR_LOG).

Looking at the code, though, read_secure_file() should already be logging why it failed even without changing the debug level.  Is there anything earlier in the log that gives a hint?  (Feel free to send the whole log offline if you'd like).


Cheers,
-zach


ïOn 6/23/20, 10:20 AM, "HTCondor-users on behalf of wesley.taylor@xxxxxxxxxxx" <htcondor-users-bounces@xxxxxxxxxxx on behalf of wesley.taylor@xxxxxxxxxxx> wrote:

    Hi Zach!

    Thanks for that explanation, I was obviously looking in the wrong direction there. I changed the permissions like you said, but I am still getting the same error from my collector about reading the file. Is there a debug setting I can add to get it to tell me more about why the read_secure_file call failed?

    -Wes

    Wesley Taylor â Cluster Manager
    Numerica Corporation (https://usg02.safelinks.protection.office365.us/?url="">)
    5042 Technology Parkway #100
    Fort Collins, Colorado 80528
    âï (970) 207 2232
    ð wesley.taylor@xxxxxxxxxxx


    -----Original Message-----
    From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Zach Miller
    Sent: Monday, June 22, 2020 10:02 PM
    To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
    Subject: [External] - Re: [HTCondor-users] Debugging HTCondor Authentication Errors

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


    Hi Wes,

    (First, no problem sending email any time you have a question!)

    You wrote:
    > It looks like for some reason condor can't read the POOL file, even
    > though the file (and its parent directory) are owned by the user and
    > group condor:condor, and everyone has execute permissions on /etc and
    > /etc/condor. I also made selinux permissive just in case that was the
    > issue.

    Clearly the error message here should be better.  When HTCondor says, "read_secure_file(/etc/condor/password.d/POOL) failed!", in this case it's not because it couldn't read the file but that the file was TOO permissive.

    The file for PASSWORD authentication should be chmod 600 and owned by root:root.  Try fixing that and let us know if that did the trick.


    Cheers,
    -zach



    On 6/22/20, 7:11 PM, "HTCondor-users on behalf of wesley.taylor@xxxxxxxxxxx" <htcondor-users-bounces@xxxxxxxxxxx on behalf of wesley.taylor@xxxxxxxxxxx> wrote:

        Hello,

        I feel a little bad for emailing everyone twice in the same day, but I am still getting familiar with HTCondor.

        I tested a minicondor last week and was having a ball of a time, but now I am looking to scale up and have hit some hiccups. I am following the "Setting up an HTCondor Pool" (https://usg02.safelinks.protection.office365.us/?url="">) but with some minor modifications to try and make the test setup better match the production system's architecture. I simply changed the configuration so one machine had the roles of "Submit" and "Central Manager" and I have two "Execute" machines located on the same network.

        I went through the guide, but when I started everything up they weren't authenticating with one another. On both the "Execute" machines I am getting the following from my StartLog (I set STARTD_DEBUG  = D_SECURITY:2 in my config):

        __________________________________________________________________________________________________________________________________________________________________________________________________
        06/22/20 17:33:14 SECMAN: new session, doing initial authentication.
        06/22/20 17:33:14 SECMAN: authenticating RIGHT NOW.
        06/22/20 17:33:14 SECMAN: AuthMethodsList: PASSWORD
        06/22/20 17:33:14 SECMAN: Auth methods: PASSWORD
        06/22/20 17:33:14 AUTHENTICATE: setting timeout for <192.168.0.69:9618> to 20.
        06/22/20 17:33:14 AUTHENTICATE: in authenticate( addr == '<192.168.0.69:9618>', methods == 'PASSWORD')
        06/22/20 17:33:14 AUTHENTICATE: can still try these methods: PASSWORD
        06/22/20 17:33:14 HANDSHAKE: in handshake(my_methods = 'PASSWORD')
        06/22/20 17:33:14 HANDSHAKE: handshake() - i am the client
        06/22/20 17:33:14 HANDSHAKE: sending (methods == 512) to server
        06/22/20 17:33:14 HANDSHAKE: server replied (method = 512)
        06/22/20 17:33:14 AUTHENTICATE: will try to use 512 (PASSWORD)
        06/22/20 17:33:14 AUTHENTICATE: do_authenticate is 1.
        06/22/20 17:33:14 PW.
        06/22/20 17:33:14 PW: getting name.
        06/22/20 17:33:14 PW: Generating ra.
        06/22/20 17:33:14 PW: Client sending.
        06/22/20 17:33:14 Client sending: 0, 19(condor_pool@worker1), 256
        06/22/20 17:33:14 PW: Client receiving.
        06/22/20 17:33:14 Server sent status indicating not OK.
        06/22/20 17:33:14 PW: Client received ERROR from server, propagating
        06/22/20 17:33:14 PW: CLient sending two.
        06/22/20 17:33:14 In client_send_two.
        06/22/20 17:33:14 Can't send null for random string.
        06/22/20 17:33:14 Client sending: 0() 0 0
        06/22/20 17:33:14 Sent ok.
        06/22/20 17:33:14 AUTHENTICATE: method 512 (PASSWORD) failed.
        06/22/20 17:33:14 AUTHENTICATE: can still try these methods:
        06/22/20 17:33:14 HANDSHAKE: in handshake(my_methods = '')
        06/22/20 17:33:14 HANDSHAKE: handshake() - i am the client
        06/22/20 17:33:14 HANDSHAKE: sending (methods == 0) to server
        06/22/20 17:33:14 HANDSHAKE: server replied (method = 0)
        06/22/20 17:33:14 AUTHENTICATE: no available authentication methods succeeded!
        06/22/20 17:33:14 SECMAN: required authentication with collector 192.168.0.69 failed, so aborting command DC_START_TOKEN_REQUEST.
        06/22/20 17:33:14 Failed to request a new token: DAEMON:1:failed to start command for token request with remote daemon at '<192.168.0.69:9618>'.|AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
        __________________________________________________________________________________________________________________________________________________________________________________________________


        So then I went and looked at the CollectorLog on the Manager:
        __________________________________________________________________________________________________________________________________________________________________________________________________
        06/22/20 18:03:15 DC_AUTHENTICATE: required authentication of 192.168.0.70 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
        06/22/20 18:03:15 read_password_from_filename(): read_secure_file(/etc/condor/password.d/POOL) failed!
        06/22/20 18:03:15 read_password_from_filename(): read_secure_file(/etc/condor/password.d/POOL) failed!
        ___________________________________________________________________________________________________________________________________________________________________________________________________

        (Don't pay attention to the fact the timestamps are really far apart, I have just been trying some more things in the past little bit)

        It looks like for some reason condor can't read the POOL file, even though the file (and its parent directory) are owned by the user and group condor:condor, and everyone has execute permissions on /etc and /etc/condor. I also made selinux permissive just in case that was the issue.

        Does anyone have any further steps I can take to figure out why this read is failing?

        Thank you!
        -Wes


        Wesley Taylor â Cluster Manager
        Numerica Corporation (https://usg02.safelinks.protection.office365.us/?url="">)
        5042 Technology Parkway #100
        Fort Collins, Colorado 80528
        âï (970) 207 2232
        ð wesley.taylor@xxxxxxxxxxx




    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://usg02.safelinks.protection.office365.us/?url="">

    The archives can be found at:
    https://usg02.safelinks.protection.office365.us/?url="">


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://usg02.safelinks.protection.office365.us/?url="">

The archives can be found at:
https://usg02.safelinks.protection.office365.us/?url="">

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736