Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Inconsistent output of "condor_q -glo"?
- Date: Fri, 19 Nov 2021 17:27:49 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Inconsistent output of "condor_q -glo"?
On 11/18/2021 2:50 AM, Steffen
Grunewald wrote:
Good morning,
Hi Steffen,
Some ideas inline below...
after a major reconfig of our Hypatia cluster, with a couple of jobs having
been held before, I'm now getting somewhat inconsistent output from condor_q:
root@condormaster:.# condor_status -schedd
Name Machine RunningJobs IdleJobs HeldJobs
hypatia1.hypatia.local@xxxxxxxxxxxxxxxxxx hypatia1.my.domain 0 0 0
hypatia2.hypatia.local@xxxxxxxxxxxxxxxxxx hypatia2.my.domain 0 0 183
hypatia3.hypatia.local@xxxxxxxxxxxxxxxxxx hypatia3.my.domain 0 0 0
TotalRunningJobs TotalIdleJobs TotalHeldJobs
Total 0 0 183
root@condormaster:.# condor_q -schedd hypatia1.my.domain
All queues are empty
root@condormaster:.# condor_q -schedd hypatia2.my.domain
All queues are empty
root@condormaster:.# condor_q -schedd hypatia3.my.domain
All queues are empty
For the above commands, does the following work:
condor_q -allusers -name hypatia2.my.domain
?
Note the use of "-name" instead of "-schedd" .... I think you wanted
-name here.
Also by default, the schedd will only show the jobs owned by the
user making the query. Adding "-allusers" will give information for
all users, regardless of who issued the condor_q command.
(same if I use "hypatia*.hypatia.local")
root@condormaster:.# condor_q -glo
-- Failed to fetch ads from: <10.150.100.102:4597?addrs=10.150.100.102-4597&alias=hypatia2.my.domain> : hypatia2.my.domain
AUTHENTICATE:1003:Failed to authenticate with any method
AUTHENTICATE:1004:Failed to authenticate using FS
root@condormaster:.#
I have compared the output of "condor_config_val -dump" for hypatia1 and hypatia2,
and see no difference (except the few machine-/IP-specific lines).
What's behind those AUTHENTICATE:100{3,4} failures?
So instead of the above, does the following command work:
condor_q -global -allusers
?
If adding the "-allusers' works, here is an explanation: the schedd
will only show jobs owned by the user who issued condor_q. To do
this, the schedd needs to know who issued the condor_q command via
authentication, and the error is likely a result of no
authentication method that works over the network being configured.
By adding "-allusers", the schedd does not need to know who issued
the command, and can just return all the jobs (assuming the host has
READ authorization).
If you want "-allusers" to be the default whenever a condor_q
command is issued, you can add the following to the condor_config:
CONDOR_Q_ONLY_MY_JOBS = False
Another way to handle this would be to allow READ access using
CLAIMTOBE authentication. CLAIMTOBE is not secure (the client can
claim to be anybody), but the idea here is to only allow it for READ
operations. This would allow users to issue a condor_q command from
a remote machine and still see only their jobs.
In the ScheddLog, I see
DC_AUTHENTICATE: reason for authentication failure: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXvkEMCP)
Since /tmp has permissions 1777, what causes the lstat() error?
You are issuing "condor_q" on machine A, and it is trying to talk
to a schedd on machine B. The schedd is trying to authenticate the
person who issued the "condor_q" command as explained above (unless
you use -allusers). The "FS" authentication method (for FileSystem
authentication) works as follows: the schedd asks the client to
create a file in /tmp, and then the schedd does an lstat() on the
file to read the file ownership and thus authenticate the identity
of the person issuing condor_q. This lstat() failed because /tmp is
not shared between machine A and machine B, and thus the schedd is
unable to lstat() the file it asked condor_q to create because the
file is not there.
Hope the above helps,
regards
Todd