[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor_q -global stopped working after condor upgrade



Hello.

'condor_q -global' stopped working after we upgraded condor from 8.3.8
to 8.7.1 on our password-protected condor pool. Disabling authentication
fixes the problem in 8.7.2 (I assume it would also work in 8.7.1), so I
am guessing the problem has something to do with my security settings. I
don't see any authentication errors in the logs, but condor_q -global
only prints local queue. condor_status -schedd works fine, and
everything else seems to work fine.

Could somebody help?

This is what used to work in 8.3.8:

Central manager (also schedd):
ALLOW_DAEMON = $(FULL_HOSTNAME) $(IP_ADDRESS) condor_pool@*/*
SEC_PASSWORD_FILE = ...
SEC_DEFAULT_AUTHENTICATION_METHODS = FS, Password
SEC_DEFAULT_AUTHENTICATION = Required
SEC_DEFAULT_INTEGRITY = Required
# Make condor_status work from remote hosts for non-root users
SEC_READ_AUTHENTICATION = Optional
SEC_READ_INTEGRITY = Optional
# Make condor_q -global work from this host for non-root users
SEC_CLIENT_AUTHENTICATION = Optional
SEC_CLIENT_INTEGRITY = Optional

Several submitters:
ALLOW_DAEMON = $(FULL_HOSTNAME) $(IP_ADDRESS) *.icecube.wisc.edu
SEC_PASSWORD_FILE = ...
SEC_DEFAULT_AUTHENTICATION_METHODS = FS, PASSWORD, GSI
SEC_DEFAULT_AUTHENTICATION = OPTIONAL
SEC_DEFAULT_INTEGRITY = OPTIONAL


# Defaults do not apply to negotiator security subsystem


SEC_NEGOTIATOR_AUTHENTICATION_METHODS = FS, PASSWORD, GSI


SEC_NEGOTIATOR_AUTHENTICATION = OPTIONAL


SEC_NEGOTIATOR_INTEGRITY = OPTIONAL


I see nothing obviously wrong in the logs. Looking at the collector log,
it seems condor_q -global is able to retrieve list of schedds, but then
nothing happens in the schedd log:
==> CollectorLog <==
09/11/17 12:13:00 Got QUERY_SCHEDD_ADS
09/11/17 12:13:00 (Sending 2 ads in response to query)
09/11/17 12:13:00 Query info: matched=2; skipped=0; query_time=0.000093;
send_time=0.000150; type=Scheduler; requirements={((TotalRunningJobs > 0
|| TotalIdleJobs > 0 || TotalHeldJobs > 0 || TotalRemovedJobs > 0 ||
TotalJobAds > 0))}; locate=0; limit=0; from=TOOL;
peer=<172.16.223.27:42402>; projection={ScheddIpAddr CondorVersion Name
Machine}

==> SchedLog <==
09/11/17 12:13:00 (pid:3348) Number of Active Workers 0

One strange entry does pop up regularly in CollectorLog, but I don't
know if it's related:
DaemonCore: Can't receive command request from 172.16.223.23 (perhaps a
timeout?)
(172.16.223.23 is the local address).


Any help would be appreciated!


Thanks,

Vlad