[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_q -global stopped working after condor upgrade



On 9/11/2017 3:21 PM, John M Knoeller wrote:
When you don't specify a username or -allusers to condor_q, then it needs to authenticate in order to know who to query jobs for.

So you need to either specify a username or -allusers with condor_q -global, or you need to have a method for READ authentication that
works remotely.


Note that as of HTCondor v8.6.1, most sites do not need to worry about adding an authentication method for condor_q -global to always work, because of the patch associated with this ticket sets things up correctly by default for the vast majority of installations:
  https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6125

I am guessing that Vlad's configuration explicitly sets SEC_READ_AUTHENTICATION_METHODS, SEC_CLIENT_AUTHENTICATION_METHODS, or SEC_DEFAULT_AUTHENTICATION_METHODS in a way that does not let the HTCondor default policy apply. If so, Vlad, take a look at the comment in the above ticket that starts with "Config-based work-around".

regards,
Todd



-tj

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Vladimir Brik
Sent: Monday, September 11, 2017 2:55 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor_q -global stopped working after condor upgrade

Actually, something strange is going on.

When I run "condor_q -global" as a non-root user, I get a bunch of
authentication errors, but if I run "condor_q -global -all" or "condor_q
-global $(whoami)" as the same non-root user, everything works fine.

Why would this be?


Vlad



On 09/11/2017 12:33 PM, Vladimir Brik wrote:
Apologies. Realized my mistake as soon as I sent the email. "condor_q
-global -all" works fine. I guess the reason "condor_q -global" (run as
root) only shows local queue is that it's looking for jobs owned by root
in remote queues. I think it's a bit inconsistent maybe? It would be
nice if "condor_q -global" when run as root showed *all* jobs in remote
queues, to match its behavior for the local queue.


Vlad



On 09/11/2017 12:23 PM, Vladimir Brik wrote:
Hello.

'condor_q -global' stopped working after we upgraded condor from 8.3.8
to 8.7.1 on our password-protected condor pool. Disabling authentication
fixes the problem in 8.7.2 (I assume it would also work in 8.7.1), so I
am guessing the problem has something to do with my security settings. I
don't see any authentication errors in the logs, but condor_q -global
only prints local queue. condor_status -schedd works fine, and
everything else seems to work fine.

Could somebody help?

This is what used to work in 8.3.8:

Central manager (also schedd):
ALLOW_DAEMON = $(FULL_HOSTNAME) $(IP_ADDRESS) condor_pool@*/*
SEC_PASSWORD_FILE = ...
SEC_DEFAULT_AUTHENTICATION_METHODS = FS, Password
SEC_DEFAULT_AUTHENTICATION = Required
SEC_DEFAULT_INTEGRITY = Required
# Make condor_status work from remote hosts for non-root users
SEC_READ_AUTHENTICATION = Optional
SEC_READ_INTEGRITY = Optional
# Make condor_q -global work from this host for non-root users
SEC_CLIENT_AUTHENTICATION = Optional
SEC_CLIENT_INTEGRITY = Optional

Several submitters:
ALLOW_DAEMON = $(FULL_HOSTNAME) $(IP_ADDRESS) *.icecube.wisc.edu
SEC_PASSWORD_FILE = ...
SEC_DEFAULT_AUTHENTICATION_METHODS = FS, PASSWORD, GSI
SEC_DEFAULT_AUTHENTICATION = OPTIONAL
SEC_DEFAULT_INTEGRITY = OPTIONAL


# Defaults do not apply to negotiator security subsystem


SEC_NEGOTIATOR_AUTHENTICATION_METHODS = FS, PASSWORD, GSI


SEC_NEGOTIATOR_AUTHENTICATION = OPTIONAL


SEC_NEGOTIATOR_INTEGRITY = OPTIONAL


I see nothing obviously wrong in the logs. Looking at the collector log,
it seems condor_q -global is able to retrieve list of schedds, but then
nothing happens in the schedd log:
==> CollectorLog <==
09/11/17 12:13:00 Got QUERY_SCHEDD_ADS
09/11/17 12:13:00 (Sending 2 ads in response to query)
09/11/17 12:13:00 Query info: matched=2; skipped=0; query_time=0.000093;
send_time=0.000150; type=Scheduler; requirements={((TotalRunningJobs > 0
|| TotalIdleJobs > 0 || TotalHeldJobs > 0 || TotalRemovedJobs > 0 ||
TotalJobAds > 0))}; locate=0; limit=0; from=TOOL;
peer=<172.16.223.27:42402>; projection={ScheddIpAddr CondorVersion Name
Machine}

==> SchedLog <==
09/11/17 12:13:00 (pid:3348) Number of Active Workers 0

One strange entry does pop up regularly in CollectorLog, but I don't
know if it's related:
DaemonCore: Can't receive command request from 172.16.223.23 (perhaps a
timeout?)
(172.16.223.23 is the local address).


Any help would be appreciated!


Thanks,

Vlad

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685