[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Starter does not recognize job script as executable when ACL is used to set access rights.

Hi Sergey,

When you log in to the execute machine as user2000, and run "groups" on the command line, what do you see?

I think what is happening is HTCondor is switching user ID but is not switching to 1001 group ID as you are expecting.  My guess is user2000 belongs to multiple groups... let me know what the above command returns.


ïOn 11/22/19, 11:36 AM, "HTCondor-users on behalf of Sergey A. Komissarov via HTCondor-users" <htcondor-users-bounces@xxxxxxxxxxx on behalf of htcondor-users@xxxxxxxxxxx> wrote:

    We are using shared filesystem to prepare condor jobs and ACL to control user access rights.
    The problem is that the workstation where job is prepared does not know anything about users on condor machines.
    The job script is made under some user and group and set executable flag for user and group.
    The job script has owner with uid 10131 and group 1001, and submitted to the condor with +Owner=user20000 option.
    Startd log is the following:
    11/22/19 13:17:09 (fd:19) (pid:56) (D_ALWAYS) Running job as user user20000
    11/22/19 13:17:09 (fd:19) (pid:56) (D_ALWAYS) About to exec /shared/job-dir/start.sh
    11/22/19 13:17:09 (fd:19) (pid:56) (D_PRIV) PRIV_USER --> PRIV_CONDOR at /slots/02/dir_19946/userdir/.tmpWrq8Vb/condor-8.9.2/src/condor_starter.V6.1/os_proc.cpp:568
    11/22/19 13:17:09 (fd:19) (pid:56) (D_DAEMONCORE) In DaemonCore::Create_Process(/shared/job-dir/start.sh,...)
    11/22/19 13:17:09 (fd:21) (pid:56) (D_PRIV) PRIV_CONDOR --> PRIV_USER at /slots/02/dir_19946/userdir/.tmpWrq8Vb/condor-8.9.2/src/condor_daemon_core.V6/daemon_core.cpp:7654
    11/22/19 13:17:09 (fd:21) (pid:56) (D_ALWAYS) Create_Process: Cannot access specified executable "/shared/job-dir/start.sh": errno = 13 (Permission denied)
    11/22/19 13:17:09 (fd:21) (pid:56) (D_PRIV) PRIV_USER --> PRIV_CONDOR at /slots/02/dir_19946/userdir/.tmpWrq8Vb/condor-8.9.2/src/condor_daemon_core.V6/daemon_core.cpp:7669
    This is how job directory looks from the condor execute host after it is submitted and failed to start:
    root@execute# ls -la /shared/job-dir/
    total 12
    drwxrwx---+ 2     10131  1001 4096 Nov 22 14:39 .
    drwxr-xr-x  3     10131  1001 4096 Nov 22 14:49 ..
    -rw-rw----+ 1 user20000 users    0 Nov 22 14:39 stdout
    -rwxrwx---+ 1     10131  1001 1009 Nov 22 14:39 start.sh
    -rw-rw----+ 1 user20000 users    0 Nov 22 14:39 stderr
    root@execute# getfacl /shared/job-dir/start.sh 
    # file: shared/job-dir/start.sh 
    # owner: 10131
    # group: 1001
    If I set 'chmod o+x' for the job script everything works. But It seems like a bug because when I login
    to execute host under user20000 I can start job script without executable flag for the others.
    We have HTCondor 8.9.2 running inside docker cluster, the host and the docker containers uses Ubuntu 16.04.1. 
    Sergey Komissarov
    Senior Software Developer
    This message may contain confidential information
    constituting a trade secret of DATADVANCE. Any distribution,
    use or copying of the information contained in this
    message is ineligible except under the internal
    regulations of DATADVANCE and may entail liability in
    accordance with the current legislation of the Russian
    Federation. If you have received this message by mistake
    please immediately inform me of it. Thank you!
    HTCondor-users mailing list
    To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    The archives can be found at: