[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Permission Denied caching



Dear HTCondor experts,

after a short (~5 minute) DNS and partial network outage today, we've observed several cases of:

PERMISSION DENIED to condor_pool@xxxxxxxxxx from host XXX.YYY.ZZZ.XXX for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER: reason: cached result for ADVERTISE_MASTER; see first case for the full reason

on the Central Manager (i.e. the collector), which persisted over hours. It seems the cache entries never expire?

The first message was:
PERMISSION DENIED to condor_pool@xxxxxxxxxx from host XXX.YYY.ZZZ.XXX for command 1 (UPDATE_SCHEDD_AD), access level ADVERTISE_SCHEDD: reason: ADVERTISE_SCHEDD authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: XXX.YYY.ZZZ.XXX, hostname size = 0, original ip address = XXX.YYY.ZZZ.XXX
which is of course expected and fine if DNS fails, since the hostname verification via DNS is needed for Kerberos auth.

This could be easily fixed by restarting condor services on the central manager. However, looking at the code:
https://github.com/htcondor/htcondor/blob/master/src/condor_io/condor_ipverify.cpp
I can not make out an automatic expiration of such DENY entries from temporary DNS failures.

Is the only way to recover from something like this a restart of the collector, or am I missing something?

Cheers,
Oliver

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature