[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Jobs remaining idle due to permission denied issue



I am new to condor administration and am having trouble getting a new condor setup working. The system runs Ubuntu 18.04 and has one central node and many execute nodes which have been set up followingÂhttps://www-auth.cs.wisc.edu/lists/htcondor-users/2019-December/msg00000.shtml, including a security configuration identical (except for host names) to the one in slide 13 here:ÂÂhttps://agenda.hep.wisc.edu/event/1325/session/16/contribution/41/material/slides/0.pdf. condor_status shows the expected executed nodes. However, when I submit jobs, they remain idle indefinitely.

On the central node, I have the following issues showing up in the logs:

SchedLog:
Can't find address for startd kremlin
SECMAN: FAILED: Received "DENIED" from server for user condor_pool@kremlin using method PASSWORD.
ERROR: SECMAN:2010:Received "DENIED" from server for user condor_pool@kremlin using method PASSWORD.
Failed to start non-blocking update toÂ<<< ip address >>>.

CollectorLog:
PERMISSION DENIED to condor_pool@kremlin from hostÂ<<< ip address >>>Âfor command 1 (UPDATE_SCHEDD_AD), access level ADVERTISE_SCHEDD: reason: cached result for ADVERTISE_SCHEDD; see first case for the full reason
DC_AUTHENTICATE: Command not authorized, done!

NegotiatorLog:
PERMISSION DENIED to condor_pool@kremlin from hostÂ<<< ip address >>>Âfor command 421 (Reschedule), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: <<< ip address >>>,<<< host name >>>, hostname size = 1, original ip address = <<< ip address >>>

I have double checked that the central node and execute node have the same password POOL. I have also tried disabling the authentication requirementsÂset in the security config, but this only caused the execute node to disappearÂfrom condor_status's output (even after regenerating POOL and running condor_config and / or restarting on both central and execute nodes).

Any help would be appreciated.

Thank you,
Jonathan Bailey