[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Going from Condor 7.7 to HTCondor 8.8



Thanks, Greg. That solution got me through that problem, but revealed others.

I'll post detailed errors on request, but the short version is our condor master is a dual-homed host. There were no problems with this in condor 7.7; I even have another condor pool running 8.3.8 that has no problems either.

But with 8.8 the host-based security appears to get confused about which interface to use, no matter if I set CONDOR_HOST to either of the master's two interfaces; it also gets confused because a reverse DNS lookup won't give consistent results for our condor master.

I want to turn HTCondor's security completely and utterly off. It's not necessary for our small site.

However, as I noted before and Greg confirmed through his suggestion, the configuration line

SEC_DEFAULT_AUTHENTICATION = NEVER

doesn't turn security completely off. There's another configuration line in the default condor_config:

use SECURITY : HOST_BASED

I've done web searches on the HTCondor documentation, but I can't find any alternatives to "HOST_BASED" documented anywhere. Commenting out the line doesn't change anything.

How do I completely turn off security?

On 5/23/19 9:13 PM, Hitchen, Greg (IM&T, Kensington WA) wrote:
Hi William

We run a Windows pool, well mainly windows execute nodes (some linux) and only windows submit nodes.
Our Central managers are all linux.

Going from 8.4 to 8.6 things looked OK until we tried to submit jobs. Similar authentication errors.
We needed the following:

SEC_DEFAULT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_NEGOTIATION = OPTIONAL
SEC_DEFAULT_ENCRYPTION = OPTIONAL
SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE
SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = True

on ALL nodes, i.e. CM, execute and submit nodes. If not on all nodes then the CM will NOT
be able to communicate.

Not sure if it will fix your problem but maybe worth a try.

Cheers

Greg

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of William Seligman
Sent: Friday, 24 May 2019 4:23 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Going from Condor 7.7 to HTCondor 8.8

Background: I'm the sysadmin of a small CentOS 6 computing farm. For years our
small condor pool was running Condor 7.7; higher versions offered no new
features we needed. Then the user required a new (unrelated) software
installation for which the old CentOS 5 condor 7.7 libraries were incompatible
and they requested I upgrade to HTCondor 8.8.

  From that point until now, I have not been able to get HTCondor 8.8 to fully
run on the farm. My debugging steps included erasing the condor_config* files
and replacing them with those from the RPMs and completely wiping the contents
of LOCAL_DIR.

Where I'm at now: Although the condor services start up properly, I can't submit
any jobs. The error is:

# condor_submit myfile.cmd
Submitting job(s)
ERROR: Failed to connect to local queue manager
SECMAN:2007:Failed to end classad message.

The results of web searches on this error have not helped. For the record:

- I've followed the instructions at
<https://lists.cs.wisc.edu/archive/htcondor-users/2008-March/msg00178.shtml>
multiple times. Since I had started with a fresh LOCAL_DIR, the file
LOCAL_DIR/spool/job_queue.log had no invalid entries, but I gave it a try anyway.

- At present, the users are not submitting any condor jobs, so schedd is not busy.

- Schedd is running:

# ps -elf | grep schedd
4 S condor     60019   59973  0  80   0 - 13065 poll_s May22 ?        00:00:07
condor_schedd -f

- The firewall is off. Neither iptables nor netfilter is running. (Our site has
Cisco firewall that I've configured to block off port 9618 from the outside, so
I'm concerned.)

- nmap tells me that port 9618 on the CONDOR_HOST is open.

- The only error in SchedLog is
DC_AUTHENTICATE: Unable to reconcile!

- I turned on debugging in condor_config.local:
    TOOL_DEBUG = D_ALL
    SUBMIT_DEBUG = D_ALL

and ran the job with
# condor_submit -debug myfile.cmd

I can post the results on request. I'm no expert, but the relevant lines appear
to be:

05/23/19 15:57:02 (fd:5) (pid:863797) (D_SECURITY) SECMAN: command 1112
QMGMT_WTE_CMD to schedd at <129.236.252.84:9618> from TCP port 19038 (blocking).
05/23/19 15:57:02 (fd:5) (pid:863797) (D_SECURITY) SECMAN:: default CLIENT
meths: FS,KERBEROS,GSI,CLAIMTOBE
05/23/19 15:57:02 (fd:5) (pid:863797) (D_NETWORK) condor_write(fd=4 schedd at
<9.236.252.84:9618>,,size=416,timeout=0,flags=0,non_blocking=0)
05/23/19 15:57:02 (fd:5) (pid:863797) (D_NETWORK) condor_read(fd=4 schedd at
<1.236.252.84:9618>,,size=5,timeout=0,flags=0,non_blocking=0)
05/23/19 15:57:02 (fd:5) (pid:863797) (D_NETWORK) Stream::get(int) failed to re
padding
05/23/19 15:57:02 (fd:5) (pid:863797) (D_ALWAYS) SECMAN: no classad from
serverfailing


- The only non-default lines in the condor_config file are:

BIND_ALL_INTERFACES = TRUE
SEC_DEFAULT_AUTHENTICATION = NEVER


Is there anything else I can do?

Thanks!




Attachment: smime.p7s
Description: S/MIME Cryptographic Signature