[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Shadow exception: Permission denied - why?



Hi Ian,
 To answer your questions-

- Condor version:
$CondorVersion: 7.0.4 Jul 16 2008 BuildID: 95033 $
$CondorPlatform: X86_64-LINUX_RHEL5 $

- my platform:
Linux version 2.6.18-92.1.22.el5 (mockbuild@xxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42))

- each time a shadow exception occurs while transferring a different file. So sometimes the size is some bytes, sometimes
some kilobytes..

- I cannot check the size of /var/..spool dir, it says permission denied.
- I think root is the owner of /var/....spool
- I am not sure about which user account is being used to run condor.. but if it is of
any help - I sometimes see 'condor' and sometimes 'nobody' as username when I type top..
not sure particularly what it is when my jobs run.
- universe - vanilla

 I was actually trying to dig a little more before replying.

-Tan

On Tue, Jan 27, 2009 at 11:14 AM, Ian D. Alderman <ialderman@xxxxxxxxxxxxxxxxxx> wrote:

On Jan 26, 2009, at 9:37 PM, Tanzima Zerin Islam wrote:

Thanks Ian for quick reply. Since the Shadow at submitter's machine failed to write file to /var/local/condor/spool/ directory,
I checked to see its permission and looks like the permission of spool directory is set as drwx------, Could this be
the source of this problem? I also checked size of /var and it shows only 5% has been used.

Some more questions:

What version of Condor are you using?
What platform are you running on?
How large is the data being transferred?
How large is the /var/local/condor/spool partition?
What user owns /var/local/condor/spool?
What user account is being used to run Condor?
What universe is the job in question running under?

-Ian


--Tan

On Mon, Jan 26, 2009 at 12:48 PM, Ian D. Alderman <ialderman@xxxxxxxxxxxxxxxxxx> wrote:
Hi, Tan,

What are the permissions on the directory /var/local/condor/spool/cluster2.proc0.subproc0.tmp when this occurs?  How much disk space is free on that partition?

-Ian


On Jan 26, 2009, at 11:11 AM, Tanzima Zerin Islam wrote:

Hi all,
 I am submitting a job to vanilla universe. I need to send condor_vacate_job time to time.
But recently, I see my jobs being in Halt state after I send condor_vacate_job signal.
The error message that I find in the log file is:


007 (002.000.000) 01/26 12:00:30 Shadow exception!
  Error from starter on slot1@xxxxxxxxxx: STARTER at xxxxxx failed to send file(s) to <xxxxxxxxxxxxxx>; SHADOW at xxxxxxxxxx failed to write to file /var/local/condor/spool/cluster2.proc0.subproc0.tmp/tufa420.hmm: (errno 13) Permission denied
  424871712  -  Run Bytes Sent By Job
  424874688  -  Run Bytes Received By Job
...
012 (002.000.000) 01/26 12:00:30 Job was held.
  Error from starter on slot1@xxxxxxxxxxxx: STARTER at xxxxxxxxxxx failed to send file(s) to <xxxxxxxxxxx> SHADOW at xxxxxxxxx failed to write to file /var/local/condor/spool/cluster2.proc0.subproc0.tmp/tufa420.hmm: (errno 13) Permission denied
  Code 12 Subcode 13

I did not have this problem before, it is only recently that I see this shadow exception occurring. Could any of you have any idea whats
causing this problem? Just to test, I have submitted all my files with permission set to 777, but still this problem persists.

Thanks in advance.

--Tan

--
--
Tanzima Zerin Islam
Graduate Student
School of Electrical & Computer Engineering
Purdue University
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

--
===================================
Ian D. Alderman
office: 608.554.4605
cell: 608.217.9959
main: 888.292.5320

Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools




--
--
Tanzima Zerin Islam
Graduate Student
School of Electrical & Computer Engineering
Purdue University

--
===================================
Ian D. Alderman
office: 608.554.4605
cell: 608.217.9959
main: 888.292.5320

Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools




--
--
Tanzima Zerin Islam
Graduate Student
School of Electrical & Computer Engineering
Purdue University