[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Cannot execute Job on remote host, permission denied to write condor_exec.exe



Though I am not sure but I think the problem is because The condor runs as root and can not access the mounted point.

I had the same type of problem in Windows which was because Condor runs as seperate desktop in Windows.
So try running condor as nobody or try converting user groups to be root.

Cheers
Kuldeep Singh Meel
Junior Undergraduate
Department of Computer Science and Engineering
Rice University '12 
IIT Bombay '12



On Wed, Aug 18, 2010 at 1:56 PM, Lee Mitchell <mr.lee.mitchell@xxxxxxxxx> wrote:
Hello All,  Does anyone have a suggestion for how to get past this issue?

When I submit from my negotiator host, and jobs can run on my
negotiator host, but if I force a job to run on some other machine (
not run on the negotiator)  in the job submission requirements, eg

( machine != "uskyarpds0310.air.ups.com" )

Then the job Runs for 2 seconds and goes into the Hold state.

condor_q -better says:

-- Submitter: uskyarpds0310.air.ups.com : <10.224.217.231:8452> :
uskyarpds0310.air.ups.com
---
2891.000:  Request is held.

Hold reason: Error from slot1@xxxxxxxxxxxxxxxxxxxxxxxxx: STARTER at
10.224.176.128 failed to write to file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied

-------------

In the logs I see:

== Shadow Log on submit machine ==

08/18 17:47:26 Initializing a VANILLA shadow for job 2891.0
08/18 17:47:27 (2891.0) (18621): Request to run on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx <10.224.176.128:50433> was ACCEPTED
08/18 17:47:28 (2891.0) (18621): DoUpload: (Condor error code 12,
subcode 13) SHADOW at 10.224.217.231 failed to send file(s) to
<10.224.176.128:53124>; STARTER at 10.224.176.128 failed to write to
file /opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
08/18 17:47:28 (2891.0) (18621): Job 2891.0 going into Hold state
(code 12,13): Error from slot1@xxxxxxxxxxxxxxxxxxxxxxxxx: STARTER at
10.224.176.128 failed to write to file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
08/18 17:47:28 (2891.0) (18621): **** condor_shadow (condor_SHADOW)
pid 18621 EXITING WITH STATUS 112

== StarterLog.slot1 on the remote execute node ==

08/18 17:47:27 get_file(): Failed to open file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe,
errno = 13: Permission denied.
08/18 17:47:28 get_file(): consumed 18446296 bytes of file transmission
08/18 17:47:28 DoDownload: consuming rest of transfer and failing
after encountering the following error: STARTER at 10.224.176.128
failed to write to file
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe:
(errno 13) Permission denied
08/18 17:47:28 WARNING: File
/opt/condor/app/installation/local.compute-node/execute/dir_10143/condor_exec.exe
can not be accessed by Quill file transfer tracking.
08/18 17:47:28 File transfer failed (status=0).
08/18 17:47:28 ERROR "Failed to transfer files" at line 1882 in file
jic_shadow.cpp

---------
Actually,  the failure to write the file to the execute sub dir
happens for all files transfered, not just the exe.   I see the same
block of messages in the StarterLog.slot1 for every file that is
specified in my submit file's transfer_input_files value

On the remote execute machine, the permissions for the directory

/opt/condor/app/installation/local.compute-node/execute/dir_10143/

were: (from ls -l )

drwxr-xr-x 2 nobody nobody 4096 Aug 18 17:47 dir_10143

To start condor, I call condor_master as root, and condor has a umask of 0077.

The filesystem has the following properties output from the  command: mount
/dev/mapper/vg00-lv_condor_app on /opt/condor/app type ext3 (rw)
It is a local filesystem, not NFS.

All machines are the same regarding:  x86_64,  running condor 7.4.2 on RHEL 5.5

Any requests for futher information or suggestions on how to track
down the problem would be greatly appreciated.

Thank You,

Lee
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/