[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Starter does not recognize job script as executable when ACL is used to set access rights.


We are using shared filesystem to prepare condor jobs and ACL to control user access rights.

The problem is that the workstation where job is prepared does not know anything about users on condor machines.
The job script is made under some user and group and set executable flag for user and group.

The job script has owner with uid 10131 and group 1001, and submitted to the condor with +Owner=user20000 option.

Startd log is the following:
11/22/19 13:17:09 (fd:19) (pid:56) (D_ALWAYS) Running job as user user20000
11/22/19 13:17:09 (fd:19) (pid:56) (D_ALWAYS) About to exec /shared/job-dir/start.sh
11/22/19 13:17:09 (fd:19) (pid:56) (D_PRIV) PRIV_USER --> PRIV_CONDOR at /slots/02/dir_19946/userdir/.tmpWrq8Vb/condor-8.9.2/src/condor_starter.V6.1/os_proc.cpp:568
11/22/19 13:17:09 (fd:19) (pid:56) (D_DAEMONCORE) In DaemonCore::Create_Process(/shared/job-dir/start.sh,...)
11/22/19 13:17:09 (fd:21) (pid:56) (D_PRIV) PRIV_CONDOR --> PRIV_USER at /slots/02/dir_19946/userdir/.tmpWrq8Vb/condor-8.9.2/src/condor_daemon_core.V6/daemon_core.cpp:7654
11/22/19 13:17:09 (fd:21) (pid:56) (D_ALWAYS) Create_Process: Cannot access specified executable "/shared/job-dir/start.sh": errno = 13 (Permission denied)
11/22/19 13:17:09 (fd:21) (pid:56) (D_PRIV) PRIV_USER --> PRIV_CONDOR at /slots/02/dir_19946/userdir/.tmpWrq8Vb/condor-8.9.2/src/condor_daemon_core.V6/daemon_core.cpp:7669

This is how job directory looks from the condor execute host after it is submitted and failed to start:
root@execute# ls -la /shared/job-dir/
total 12
drwxrwx---+ 2     10131  1001 4096 Nov 22 14:39 .
drwxr-xr-x  3     10131  1001 4096 Nov 22 14:49 ..
-rw-rw----+ 1 user20000 users    0 Nov 22 14:39 stdout
-rwxrwx---+ 1     10131  1001 1009 Nov 22 14:39 start.sh
-rw-rw----+ 1 user20000 users    0 Nov 22 14:39 stderr

root@execute# getfacl /shared/job-dir/start.sh 
# file: shared/job-dir/start.sh 
# owner: 10131
# group: 1001

If I set 'chmod o+x' for the job script everything works. But It seems like a bug because when I login
to execute host under user20000 I can start job script without executable flag for the others.

We have HTCondor 8.9.2 running inside docker cluster, the host and the docker containers uses Ubuntu 16.04.1. 

Sergey Komissarov
Senior Software Developer

This message may contain confidential information
constituting a trade secret of DATADVANCE. Any distribution,
use or copying of the information contained in this
message is ineligible except under the internal
regulations of DATADVANCE and may entail liability in
accordance with the current legislation of the Russian
Federation. If you have received this message by mistake
please immediately inform me of it. Thank you!