[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs remaining idle due to permission denied issue



Hi Jonathon,

From the error messages, it looks like the authentication worked (I.e., right password) but the authorization was denied.  Whatâs in your various ALLOW_* and DENY_* configurations?  Particularly, I suspect you want to double-check the value of ALLOW_DAEMON.

Brian

Sent from my iPhone

On Feb 3, 2020, at 7:02 PM, Jonathan Bailey <jbaile@xxxxxxxxx> wrote:

ï
I am new to condor administration and am having trouble getting a new condor setup working.  The system runs Ubuntu 18.04 and has one central node and many execute nodes which have been set up following https://www-auth.cs.wisc.edu/lists/htcondor-users/2019-December/msg00000.shtml, including a security configuration identical (except for host names) to the one in slide 13 here:  https://agenda.hep.wisc.edu/event/1325/session/16/contribution/41/material/slides/0.pdf.  condor_status shows the expected executed nodes.  However, when I submit jobs, they remain idle indefinitely.

On the central node, I have the following issues showing up in the logs:

SchedLog:
Can't find address for startd kremlin
SECMAN: FAILED: Received "DENIED" from server for user condor_pool@kremlin using method PASSWORD.
ERROR: SECMAN:2010:Received "DENIED" from server for user condor_pool@kremlin using method PASSWORD.
Failed to start non-blocking update to <<< ip address >>>.

CollectorLog:
PERMISSION DENIED to condor_pool@kremlin from host <<< ip address >>> for command 1 (UPDATE_SCHEDD_AD), access level ADVERTISE_SCHEDD: reason: cached result for ADVERTISE_SCHEDD; see first case for the full reason
DC_AUTHENTICATE: Command not authorized, done!

NegotiatorLog:
PERMISSION DENIED to condor_pool@kremlin from host <<< ip address >>> for command 421 (Reschedule), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: <<< ip address >>>,<<< host name >>>, hostname size = 1, original ip address = <<< ip address >>>

I have double checked that the central node and execute node have the same password POOL.  I have also tried disabling the authentication requirements set in the security config, but this only caused the execute node to disappear from condor_status's output (even after regenerating POOL and running condor_config and / or restarting on both central and execute nodes).

Any help would be appreciated.

Thank you,
Jonathan Bailey

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/