[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Standard Universe Jobs - File Permission Errors



Unfortunately still having problems running vanilla universe jobs across machines that share NFS mounted directories.

UID & GID of the Condor user was set to be the same across all machines, but still get the same problem.

I submit this simple helloworld script;

#!/bin/bash
echo Hello World > /home/star/pb337/condortest/sh/helloworld_sh.txt

with this submit file;

universe = vanilla
initialdir = /home/star/pb337/condortest/sh
executable = helloworld_sh.sh
requirements = (Machine == "lucky4.st-and.ac.uk")
log = helloworld_sh_x64.log
error = helloworld_sh_x64.err
output = helloworld_sh_x64.out
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
queue 1

The requirement is that the job execute on a different machine than the submit machine, but output files to a location that is mounted on all the machines by NFS.

Then get this error file back when the job is submitted;

/var/lib/condor/execute/dir_19606/condor_exec.exe: line 3: /home/star/pb337/condortest/sh/helloworld_sh.txt: Permission denied

The local UID:GID for the Condor user is sync'ed across all machines. UID:GID for the submitting user (in this case, pb337) is synced across all machines from NIS. I don't understand how or why this is going wrong. 

Standard universe jobs would work, but we can't use them for what we need.

Regards,
Paul

On 25 July 2012 11:18, Paul Browne <pb337@xxxxxxxxxxxxxxxx> wrote:
OK, thanks.

I will try syncing the UID:GID of the Condor user across the machines in the pool & see if that resolves the issue,

If not, possibly someone else on this mailing list with a lot of experience in using Condor in an NFS setting might have any information.

Regards,
Paul

On 25 July 2012 00:55, <jhowes@xxxxxxxxxxxxxxxx> wrote:

I am pretty sure this is what your problem is but I am not knowledgeable enough with LINUX/UNIX to tell you how to resolve it.


On 2012-07-24 17:58, Paul Browne wrote:
Sorry, I should also say the errors are write permissions errors & not
read.

- Paul

On 24 July 2012 23:57, Paul Browne <pb337@xxxxxxxxxxxxxxxx [28]>
wrote:

The UID:GID for the Condor user are indeed different on the
different systems. The UID:GID for the other users on the system
should be set by a NIS, so they should be in sync across the
machines in the pool. 

Is this something I can edit manually, ie; stop Condor daemons, edit
Condor user UID:GID in /etc/passwd & /etc/group, restart Condor?

Or would a re-install of Condor be required where a pre-existing
Condor user of synced UID:GID is on the system?

Thanks for your help,
Paul 

On 24 July 2012 17:20, <jhowes@xxxxxxxxxxxxxxxx [26]> wrote:

If you have not already done this, you should check what file
causes the error.  Read or write?

Have you checked UIDs numbers for the userids on each machine?
 Is it possible they are not in sync?  Group Ids also?

Just a couple things I would check for.

On 2012-07-24 10:41, Paul Browne wrote:

Hi, 

Thanks for replying.

These are all Linux nodes, all running
Condor.x86_64.7.8.1.rhel4 on
Scientific Linux.

The problem is that Vanilla Universe jobs running on a
different
execute machine than submit machine can't write to a network
mounted
directory, despite this directory being present in the same
place on
all machines. It must be a permissions issue of some kind, but
I can't
work it out.

Regards,
Paul Browne

On 24 July 2012 15:33, <jhowes@xxxxxxxxxxxxxxxx [10] [10]>
wrote:

Are your nodes running Linux or Windows?

On 2012-07-23 12:18, Paul Browne wrote

Apologies, I am referring to network problems running
Vanilla
universe
jobs, not Standard universe.

Standard universe jobs run fine, but can't be used for our
needs.

Regards,
Paul Browne

On 23 July 2012 18:16, Paul Browne <pb337@xxxxxxxxxxxxxxxx
[2] [2]
[2]> wrote:

We have a small Condor 7.8.1 x86_64 pool of one central
manager
(also a submit & execute machine) & two submit/execute
machines.

When a standard universe job is submitted from one
machine that
requires I/O access to directories which are NFS mounted
in the
same
place on each machine, the jobs will not run or produce
output
due
to file permission errors.

So a job submitted on one machine will only run on the
machine
it
was submitted from, & will not run on any other
machine. I
have
tried to read the admin manual about how to resolve this
issue
without making network mounted directories
world-writeable
(which
would certainly work), but haven't made progress.

Might anyone have ideas about how our pool configuration
might
be
resolved to allow Condor jobs which are submitted from
one
machine
to execute on another machine, when they need I/O access
to
directories which have been NFS mounted in the same
places on
all
machines in the pool?

This is a major problem, reducing our capability for
time-sensitive
computations by (at present) a full two thirds, so any
help
would be
very, very welcome.

Kind regards,
Paul Browne

 

--
__________________________________
Mr. Paul Browne
School of Physics & Astronomy,
University of St Andrews,
North Haugh, St Andrews,
Fife, KY16 9SS,
Scotland, UK

t:  +44 (0)1334 46 3152
e:  pb337@xxxxxxxxxxxxxxxx [1] [1] [1]
__________________________________

--
__________________________________
Mr. Paul Browne
School of Physics & Astronomy,
University of St Andrews,
North Haugh, St Andrews,
 Fife, KY16 9SS,
Scotland, UK

t:  +44 (0)1334 46 3152
e:  pb337@xxxxxxxxxxxxxxxx [3] [3] [3]
__________________________________

Links:
------
[1] mailto:pb337@xxxxxxxxxxxxxxxx [4] [4]
[2] mailto:pb337@xxxxxxxxxxxxxxxx [5] [5]
[3] mailto:pb337@xxxxxxxxxxxxxxxx [6] [6]

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to
condor-users-request@xxxxxxxxedu [7]
[7] with a

subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users [8]
[8]

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/ [9] [9]

--
__________________________________
Mr. Paul Browne
School of Physics & Astronomy,
University of St Andrews,
North Haugh, St Andrews,
Fife, KY16 9SS,
 Scotland, UK

t:  +44 (0)1334 46 3152
e:  pb337@xxxxxxxxxxxxxxxx [11] [11]

__________________________________

Links:
------
[1] mailto:pb337@xxxxxxxxxxxxxxxx [12]
[2] mailto:pb337@xxxxxxxxxxxxxxxx [13]
[3] mailto:pb337@xxxxxxxxxxxxxxxx [14]
[4] mailto:pb337@xxxxxxxxxxxxxxxx [15]
[5] mailto:pb337@xxxxxxxxxxxxxxxx [16]
[6] mailto:pb337@xxxxxxxxxxxxxxxx [17]
[7] mailto:condor-users-request@cs.wisc.edu [18]
[8] https://lists.cs.wisc.edu/mailman/listinfo/condor-users
[19]
[9] https://lists.cs.wisc.edu/archive/condor-users/ [20]
[10] mailto:jhowes@xxxxxxxxxxxxxxxx [21]
[11] mailto:pb337@xxxxxxxxxxxxxxxx [22]

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to
condor-users-request@xxxxxxxxedu [23] with a

subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users [24]


The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/ [25]

--
__________________________________
Mr. Paul Browne
School of Physics & Astronomy,
University of St Andrews,
North Haugh, St Andrews,
Fife, KY16 9SS,
Scotland, UK

t:  +44 (0)1334 46 3152
e:  pb337@xxxxxxxxxxxxxxxx [27]
__________________________________

--
__________________________________
Mr. Paul Browne
School of Physics & Astronomy,
University of St Andrews,
North Haugh, St Andrews,
Fife, KY16 9SS,
 Scotland, UK

t:  +44 (0)1334 46 3152
e:  pb337@xxxxxxxxxxxxxxxx [29] [12] mailto:pb337@xxxxxxxxxxxxxxxx
[13] mailto:pb337@xxxxxxxxxxxxxxxx
[14] mailto:pb337@xxxxxxxxxxxxxxxx
[15] mailto:pb337@xxxxxxxxxxxxxxxx
[16] mailto:pb337@xxxxxxxxxxxxxxxx
[17] mailto:pb337@xxxxxxxxxxxxxxxx
[18] mailto:condor-users-request@cs.wisc.edu
[19] https://lists.cs.wisc.edu/mailman/listinfo/condor-users
[20] https://lists.cs.wisc.edu/archive/condor-users/
[21] mailto:jhowes@xxxxxxxxxxxxxxxx
[22] mailto:pb337@xxxxxxxxxxxxxxxx
[23] mailto:condor-users-request@cs.wisc.edu
[24] https://lists.cs.wisc.edu/mailman/listinfo/condor-users
[25] https://lists.cs.wisc.edu/archive/condor-users/
[26] mailto:jhowes@xxxxxxxxxxxxxxxx
[27] mailto:pb337@xxxxxxxxxxxxxxxx
[28] mailto:pb337@xxxxxxxxxxxxxxxx
[29] mailto:pb337@xxxxxxxxxxxxxxxx

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
__________________________________
Mr. Paul Browne
School of Physics & Astronomy,
University of St Andrews,
North Haugh, St Andrews,
Fife, KY16 9SS,
Scotland, UK

t:  +44 (0)1334 46 3152
e:  pb337@xxxxxxxxxxxxxxxx
__________________________________



--
__________________________________
Mr. Paul Browne
School of Physics & Astronomy,
University of St Andrews,
North Haugh, St Andrews,
Fife, KY16 9SS,
Scotland, UK

t:  +44 (0)1334 46 3152
e:  pb337@xxxxxxxxxxxxxxxx
__________________________________