We have recently gone from testing into production using a pool_password for authentication, having a credential server credd running,
and getting users to run_as_owner. This all works fine (after a couple of unexpected gotchas).
I have now gone back to looking again at encrypt_execute_directory (on the submit side). We had previously tested this before enabling
run_as_owner and things worked fine so long as you allow for non-windows fileservers that store input and output data, and use the cipher
command before uploading output.
This testing was done before the run_as_owner option was in production. It was tested though, but using test VM execute nodes that I created
that had me in the admin group.
Without run_as_owner the StarterLog.slot1 log file has entries like:
03/03/22 09:59:50 setting the orig job name in starter
03/03/22 09:59:50 setting the orig job iwd in starter
03/03/22 09:59:50 Encrypting execute directory "C:\PROGRA~1\condor\execute\dir_13488" to user condor-slot1
03/03/22 09:59:50 Loaded Registry hives for condor-slot1
03/03/22 09:59:50 Chirp config summary: IO false, Updates false, Delayed updates true.
03/03/22 09:59:50 Initialized IO Proxy.
03/03/22 09:59:50 Setting resource limits not implemented!
03/03/22 09:59:50 File transfer completed successfully.
03/03/22 09:59:51 Job 251.7 set to execute immediately
03/03/22 09:59:51 Starting a VANILLA universe job with ID: 251.7
03/03/22 09:59:51 Tracking process family by login "condor-slot1"
With run_as_owner the entries show:
03/03/22 19:02:49 setting the orig job name in starter
03/03/22 19:02:49 setting the orig job iwd in starter
03/03/22 19:02:49 Encrypting execute directory "C:\PROGRA~1\condor\execute\dir_12664" to user hit023
03/03/22 19:02:49 Chirp config summary: IO false, Updates false, Delayed updates true.
03/03/22 19:02:49 IOProxy: couldn't write to C:\PROGRA~1\condor\execute\dir_12664\.chirp.config: Permission denied
03/03/22 19:02:49 Couldn't initialize IO Proxy.
03/03/22 19:02:49 Setting resource limits not implemented!
03/03/22 19:02:49 get_file(): Failed to open file C:\PROGRA~1\condor\execute\dir_12664\condor_exec.exe, errno = 13: Permission denied.
03/03/22 19:02:49 get_file(): consumed 1803 bytes of file transmission
03/03/22 19:02:49 DoDownload: consuming rest of transfer and failing after encountering the following error: STARTER at 188.8.131.52 failed to write to file C:\PROGRA~1\condor\execute\dir_12664\condor_exec.exe: (errno 13) Permission denied
03/03/22 19:02:49 Failed to set execute bit on C:\PROGRA~1\condor\execute\dir_12664\condor_exec.exe, errno=2 (No such file or directory)
03/03/22 19:02:49 File transfer failed (status=0).
03/03/22 19:02:49 ERROR "Failed to transfer files" at line 2468 in file D:\execute\dir_10492\sources\src\condor_starter.V6.1\jic_shadow.cpp
03/03/22 19:02:49 ShutdownFast all jobs.
03/03/22 19:02:49 Failed to open '.update.ad' to read update ad: No such file or directory (2).
03/03/22 19:02:49 condor_read(): Socket closed abnormally when trying to read 21 bytes from <184.108.40.206:61271>, errno=10054
In production though the execute nodes (all in a single AD domain) have in their “users” group “ourdomain\Domain Users” which includes
all our HTCondor users. The allow permissions on the condor “execute” folder on the execute nodes are:
Read & execute
List folder contents
There is no allow for:
For testing I manually added Full control, Modify, and Write permissions on a single execute node but the errors are the same.
SYSTEM on the execute node has full control of the execute folder as well.
Thanks for any info/insights/suggestions/comments.