[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] standard output: Permission denied (errno 13)



Hi,

I just set up a new Condor cluster and have been having trouble running even simple jobs. Jobs are matched and begin to run, but immediately switch to state HOLD while complaining about failure to open the output file. Strangely, the log file, which is located in the same directory as the output file is updated just fine (contents included below). I have no trouble writing to files in in the output directory as the user in question. This is a diskless cluster and all filesystems are NFS mounted, but NFS lock support is enabled.

Any help fixing this would be great - details below.

Thanks,

Chance

The Config file:

executable   = test.sh
universe     = vanilla

Log          = test.log.$(Process)
Output       = test.out.$(Process)
Error        = test.err.$(Process)
Arguments    = firstrun
queue


The executable:

#!/bin/sh

MyOutput=$1
typeset -i N
N=0
while [ $N -lt 6 ]; do
    # send something to stdout and stderr
    time echo $MyOutput
    sleep 10
    N=$N+1
done



The log file:

...
000 (016.000.000) 04/24 18:24:11 Job submitted from host: <192.168.99.254:44966>
...
007 (016.000.000) 04/24 18:24:16 Shadow exception!
Error from starter on vm1@xxxxxxxxxxxx: Failed to open '/ work1/possu/test.out.0' as standard output: Permission denied (errno 13)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
...
012 (016.000.000) 04/24 18:24:16 Job was held.
Error from starter on vm1@xxxxxxxxxxxx: Failed to open '/ work1/possu/test.out.0' as standard output: Permission denied (errno 13)
        Code 7 Subcode 13
...



The Master's ShadowLog:

4/24 18:24:16 ******************************************************
4/24 18:24:16 ** condor_shadow (CONDOR_SHADOW) STARTING UP
4/24 18:24:16 ** /work1/condor/sbin/condor_shadow
4/24 18:24:16 ** $CondorVersion: 6.8.4 Feb  1 2007 $
4/24 18:24:16 ** $CondorPlatform: I386-LINUX_RHEL3 $
4/24 18:24:16 ** PID = 10520
4/24 18:24:16 ** Log last touched 4/24 18:11:55
4/24 18:24:16 ******************************************************
4/24 18:24:16 Using config source: /work1/condor/condor_config
4/24 18:24:16 Using local config sources:
4/24 18:24:16    /work1/condor/hosts/syd/condor_config.local
4/24 18:24:16 DaemonCore: Command Socket at <192.168.99.254:45033>
4/24 18:24:16 Initializing a VANILLA shadow for job 16.0
4/24 18:24:16 (16.0) (10520): Request to run on <192.168.99.1:32780> was ACCEPTED 4/24 18:24:16 (16.0) (10520): Job 16.0 going into Hold state (code 7,13): Error from starter on vm1@xxxxxxxxxxxx: Failed to open '/work1/ possu/test.out.0' as standard output: Permission denied (errno 13) 4/24 18:24:16 (16.0) (10520): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 112

--
Chance Reschke
Department of Biochemistry
University of Washington