[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Starter does not recognize job script as executable when ACL is used to set access rights.



On 11/25/2019 12:55 AM, Sergey A. Komissarov via HTCondor-users wrote:
> Hello Zach,
> 
> user20000 belongs to the single group 'users' with group id 100. Group with id 1001 does not exist on the execute machine.
> 
> user20000@483d941bd1ee:/$ groups
> users
> user20000@483d941bd1ee:/$ cat /etc/passwd | grep user20000
> user20000:x:20000:100::/home/user20000:/bin/false
> user20000@483d941bd1ee:/$ cat /etc/group | grep 10001
> user20000@483d941bd1ee:/$
> 

I think the issue could be that HTCondor deals with primary and 
supplemental group acls, and does not (properly) understand/support 
extended POSIX Access Control Lists for so-called "named user" and 
"named group" controls.

So on the execute machine, HTCondor is launching the job as UID 20000 
(user20000) as you requested.  And also HTCondor should load in that 
users' primary and supplemental groups, which in this case is just GID 100.

The file start.sh is only executable by a process with UID 10131 or a 
GID of 1001. So based on this, the job does not have permission to 
execute, which is what you are seeing.

Are you setting permissions on start.sh via the setfacl command? If so, 
what is the setfacl command you are using?  Is the file system on the 
execute machine mounted with the "acl" option (can see this by running 
/bin/mount) ?

regards
Todd



> ----------
> Sergey Komissarov
> Senior Software Developer
> DATADVANCE
> 
> This message may contain confidential information
> constituting a trade secret of DATADVANCE. Any distribution,
> use or copying of the information contained in this
> message is ineligible except under the internal
> regulations of DATADVANCE and may entail liability in
> accordance with the current legislation of the Russian
> Federation. If you have received this message by mistake
> please immediately inform me of it. Thank you!
> 
> 
> 
> ----- Original Message -----
> From: "Zach Miller" <zmiller@xxxxxxxxxxx>
> To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Cc: "Sergey A. Komissarov" <sergey.komissarov@xxxxxxxxxxxxxx>
> Sent: Friday, November 22, 2019 9:54:30 PM
> Subject: Re: [HTCondor-users] Starter does not recognize job script as executable when ACL is used to set access rights.
> 
> Hi Sergey,
> 
> When you log in to the execute machine as user2000, and run "groups" on the command line, what do you see?
> 
> I think what is happening is HTCondor is switching user ID but is not switching to 1001 group ID as you are expecting.  My guess is user2000 belongs to multiple groups... let me know what the above command returns.
> 
> 
> Cheers,
> -zach
> 
> 
> ïOn 11/22/19, 11:36 AM, "HTCondor-users on behalf of Sergey A. Komissarov via HTCondor-users" <htcondor-users-bounces@xxxxxxxxxxx on behalf of htcondor-users@xxxxxxxxxxx> wrote:
> 
>      Hello,
>      
>      We are using shared filesystem to prepare condor jobs and ACL to control user access rights.
>      
>      The problem is that the workstation where job is prepared does not know anything about users on condor machines.
>      The job script is made under some user and group and set executable flag for user and group.
>      
>      The job script has owner with uid 10131 and group 1001, and submitted to the condor with +Owner=user20000 option.
>      
>      Startd log is the following:
>      11/22/19 13:17:09 (fd:19) (pid:56) (D_ALWAYS) Running job as user user20000
>      11/22/19 13:17:09 (fd:19) (pid:56) (D_ALWAYS) About to exec /shared/job-dir/start.sh
>      11/22/19 13:17:09 (fd:19) (pid:56) (D_PRIV) PRIV_USER --> PRIV_CONDOR at /slots/02/dir_19946/userdir/.tmpWrq8Vb/condor-8.9.2/src/condor_starter.V6.1/os_proc.cpp:568
>      11/22/19 13:17:09 (fd:19) (pid:56) (D_DAEMONCORE) In DaemonCore::Create_Process(/shared/job-dir/start.sh,...)
>      11/22/19 13:17:09 (fd:21) (pid:56) (D_PRIV) PRIV_CONDOR --> PRIV_USER at /slots/02/dir_19946/userdir/.tmpWrq8Vb/condor-8.9.2/src/condor_daemon_core.V6/daemon_core.cpp:7654
>      11/22/19 13:17:09 (fd:21) (pid:56) (D_ALWAYS) Create_Process: Cannot access specified executable "/shared/job-dir/start.sh": errno = 13 (Permission denied)
>      11/22/19 13:17:09 (fd:21) (pid:56) (D_PRIV) PRIV_USER --> PRIV_CONDOR at /slots/02/dir_19946/userdir/.tmpWrq8Vb/condor-8.9.2/src/condor_daemon_core.V6/daemon_core.cpp:7669
>      
>      This is how job directory looks from the condor execute host after it is submitted and failed to start:
>      root@execute# ls -la /shared/job-dir/
>      total 12
>      drwxrwx---+ 2     10131  1001 4096 Nov 22 14:39 .
>      drwxr-xr-x  3     10131  1001 4096 Nov 22 14:49 ..
>      -rw-rw----+ 1 user20000 users    0 Nov 22 14:39 stdout
>      -rwxrwx---+ 1     10131  1001 1009 Nov 22 14:39 start.sh
>      -rw-rw----+ 1 user20000 users    0 Nov 22 14:39 stderr
>      
>      root@execute# getfacl /shared/job-dir/start.sh
>      # file: shared/job-dir/start.sh
>      # owner: 10131
>      # group: 1001
>      user::rwx
>      user:user20000:rwx
>      group::---
>      mask::rwx
>      other::---
>      
>      If I set 'chmod o+x' for the job script everything works. But It seems like a bug because when I login
>      to execute host under user20000 I can start job script without executable flag for the others.
>      
>      We have HTCondor 8.9.2 running inside docker cluster, the host and the docker containers uses Ubuntu 16.04.1.
>      
>      ----------
>      Sergey Komissarov
>      Senior Software Developer
>      DATADVANCE
>      
>      This message may contain confidential information
>      constituting a trade secret of DATADVANCE. Any distribution,
>      use or copying of the information contained in this
>      message is ineligible except under the internal
>      regulations of DATADVANCE and may entail liability in
>      accordance with the current legislation of the Russian
>      Federation. If you have received this message by mistake
>      please immediately inform me of it. Thank you!
>      
>      
>      _______________________________________________
>      HTCondor-users mailing list
>      To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>      subject: Unsubscribe
>      You can also unsubscribe by visiting
>      https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>      
>      The archives can be found at:
>      https://lists.cs.wisc.edu/archive/htcondor-users/
>      
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 


-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685