[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Cannot execute Job on remote host, permission denied to write condor_exec.exe



On Aug 18, 2010, at 1:56 PM, Lee Mitchell wrote:

> Hello All,  Does anyone have a suggestion for how to get past this issue?
> 
> When I submit from my negotiator host, and jobs can run on my
> negotiator host, but if I force a job to run on some other machine (
> not run on the negotiator)  in the job submission requirements, eg
> 
> ( machine != "uskyarpds0310.air.ups.com" )
> 
> Then the job Runs for 2 seconds and goes into the Hold state.
> 
> condor_q -better says:
> 
> -- Submitter: uskyarpds0310.air.ups.com : <10.224.217.231:8452> :
> uskyarpds0310.air.ups.com
> ---
> 2891.000:  Request is held.
> 
> Hold reason: Error from slot1@xxxxxxxxxxxxxxxxxxxxxxxxx: STARTER at
> 10.224.176.128 failed to write to file
> /opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
> (errno 13) Permission denied
> 
> -------------
> 
> In the logs I see:
> 
> == Shadow Log on submit machine ==
> 
> 08/18 17:47:26 Initializing a VANILLA shadow for job 2891.0
> 08/18 17:47:27 (2891.0) (18621): Request to run on
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxx <10.224.176.128:50433> was ACCEPTED
> 08/18 17:47:28 (2891.0) (18621): DoUpload: (Condor error code 12,
> subcode 13) SHADOW at 10.224.217.231 failed to send file(s) to
> <10.224.176.128:53124>; STARTER at 10.224.176.128 failed to write to
> file /opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
> (errno 13) Permission denied
> 08/18 17:47:28 (2891.0) (18621): Job 2891.0 going into Hold state
> (code 12,13): Error from slot1@xxxxxxxxxxxxxxxxxxxxxxxxx: STARTER at
> 10.224.176.128 failed to write to file
> /opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
> (errno 13) Permission denied
> 08/18 17:47:28 (2891.0) (18621): **** condor_shadow (condor_SHADOW)
> pid 18621 EXITING WITH STATUS 112
> 
> == StarterLog.slot1 on the remote execute node ==
> 
> 08/18 17:47:27 get_file(): Failed to open file
> /opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe,
> errno = 13: Permission denied.
> 08/18 17:47:28 get_file(): consumed 18446296 bytes of file transmission
> 08/18 17:47:28 DoDownload: consuming rest of transfer and failing
> after encountering the following error: STARTER at 10.224.176.128
> failed to write to file
> /opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
> (errno 13) Permission denied
> 08/18 17:47:28 WARNING: File
> /opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe
> can not be accessed by Quill file transfer tracking.
> 08/18 17:47:28 File transfer failed (status=0).
> 08/18 17:47:28 ERROR "Failed to transfer files" at line 1882 in file
> jic_shadow.cpp
> 
> ---------
> Actually,  the failure to write the file to the execute sub dir
> happens for all files transfered, not just the exe.   I see the same
> block of messages in the StarterLog.slot1 for every file that is
> specified in my submit file's transfer_input_files value
> 
> On the remote execute machine, the permissions for the directory
> 
> /opt/condor/app/installation/local.compute-node/execute/dir_10143/
> 
> were: (from ls -l )
> 
> drwxr-xr-x 2 nobody nobody 4096 Aug 18 17:47 dir_10143
> 
> To start condor, I call condor_master as root, and condor has a umask of 0077.
> 
> The filesystem has the following properties output from the  command: mount
> /dev/mapper/vg00-lv_condor_app on /opt/condor/app type ext3 (rw)
> It is a local filesystem, not NFS.
> 
> All machines are the same regarding:  x86_64,  running condor 7.4.2 on RHEL 5.5
> 
> Any requests for futher information or suggestions on how to track
> down the problem would be greatly appreciated.

Make sure all of the parent directories of the Condor execute directory have world execute permission enabled. Condor is trying to writing the job files as the user that it will start the job under ('nobody' in this case).

Thanks and regards,
Jaime Frey
UW-Madison Condor Team