[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Move EXECUTE location




Hi Ale,

Re the below, some thoughts:

It looks like you changed EXECUTE to equal /some/path/on/NFS.

Given how Linux filesystem permissions work, note that user "condor" must also have execute (x) permission on every directory along this path.
I am guessing this is the problem. See this URL if you do not understand this sentence:
  https://unix.stackexchange.com/questions/13858/do-the-parent-directorys-permissions-matter-when-accessing-a-subdirectory

I really suggest you reconsider the decision to store EXECUTE on NFS; if you have sufficient local disk space on your execute nodes, I suggest you use that before NFS.  My guess is you do not have local space, and thus NFS is your only choice.... (bummer)

Hope the above helps,
Todd



On 10/26/2021 9:01 AM, Alejandro AcuÃa wrote:
Thanks Todd.

I changed just EXECUTE value only on each node/master and runned restart systemctl too. Now, I have a nice response from condor_config_val. The value of the variable was updated.
BUTTTTTT........... after new submits, jobs quickly placed in a hold state and condor shows the following reasons:

Condor_q -hold:
ID       OWNER     HELD_SINCE  HOLD_REASON
589.0     me         10/26 10:42 Error from slot1@node2: Failed to execute [nfs_location]/execute/dir_2707175/condor_exec.exe' with arguments 999000: (errno=13: 'Permission denied')
589.1     me         10/26 10:42 Error from slot2@node2: Failed to execute [nfs_location]/execute/dir_2707176/condor_exec.exe' with arguments 999001: (errno=13: 'Permission denied')

Exports FILE on master:
[nfs_location] node2(rw,sync,root_squash,no_subtree_check)
 
Fstab FILE on node2:
master:[nfs_location] [nfs_local_location] nfs root_squash,rw,sync,hard,intr 0 0

Permissions on [nfs_location]:
drwxrwxrwt (1777)

Permission errors tell me we are at the end of the problem, but what am I missing? 

Regards,
Ale


De: "Todd Tannenbaum" <tannenba@xxxxxxxxxxx>
Para: "htcondor-users" <htcondor-users@xxxxxxxxxxx>, "alejandro acunia" <alejandro.acunia@xxxxxxxxxxxxxxxx>
Enviados: Lunes, 25 de Octubre 2021 12:52:01
Asunto: Re: [HTCondor-users] Move EXECUTE location

On 10/23/2021 7:03 AM, Alejandro AcuÃa wrote:
Hi all,
Linux help needed: I wonder if is possible to move just EXECUTE path (generally placed in /var or LOCAL_DIR) to a different location with better hardware requirements (basically more disk space to execution files). 

If I change value of this variable at condor_config (master config) and run condor_reconfig, the nodes refresh their local config correctly but when I submit a test job, it stay at held status forever. 
Condor_q -analyze reasons: lost logs by references to their original locations  (which I never change because I don't need that)

I prefer to change just execution place because if I change more important global variables as e.g. LOCAL_DIR, I am forced to configure more variables.

Is there any "elegant" way of move this location post installation that reconfigure execution?

Thanks!!!
Alex

Hi Alex,

You can change the location of EXECUTE alone; no need to change LOCAL_DIR.

Some advice:

You cannot change the location of EXECUTE on-the-fly with just condor_reconfig.  You will need to restart HTCondor on each node where you change the location of EXECUTE, by either restarting the HTCondor service (i.e. systemctrl restart condor) or with "condor_restart -master".   If you are running HTCondor v9.x or above, then the upon restarting, the condor_master will create the EXECUTE directory in the new location for you with the proper ownership/permissions.  If you are running an older version of HTCondor, then you will need to create the EXECUTE directory yourself with the same ownership/permissions as the original EXECUTE directory.

It is advised you keep EXECUTE on a local filesystem If you are trying to relocate EXECUTE onto a shared filesystem, such as an NFS volume with root-squash enabled, you will need to chown EXECUTE to permissions 1777.  See details here: https://opensciencegrid.atlassian.net/browse/HTCONDOR-73

Hope the above helps,
Todd




-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx>  University of Wisconsin-Madison
Center for High Throughput Computing    Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                   Madison, WI 53706-1685