[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] 6.8.0 and NFS Problem



Hi,

strange problem with 6.8.0 and NFS:

Central Manager is 6.8.0 on Linux, is also NFS server.
Submit and execute machine is a 6.8.0 on Linux, NFS client to above 
machine.

in the global config I have:
USE_NFS         = True
and 
FILESYSTEM_DOMAIN = a.b.c
while the machines are called serv.a.b.c and cli1.a.b.c

Since both machines have two network cards, I added in the local configs 
the respective IP addresses of the machines in a
NETWORK_INTERFACE = 1.2.3.4
statement.

condor_submit complains when I start a job on the NFS client, that has its 
log file on NFS:
~> condor_submit ls.job
Submitting job(s)
WARNING: Log file /home/vetter/test.log is on NFS.
This could cause log file corruption and is _not_ recommended.
.
Logging submit event(s).
1 job(s) submitted to cluster 43.

When I start the job on the NFS server it works well. 

The Warning from condor submit also results in condor_run not working:
~> condor_run hostname
Condor does not have write permission to this directory.

If I cd to /tmp, it works:
/tmp> condor_submit ~/ls.job
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 44.

Questions:
Is there anything wrong with my setup? I have a 6.7.6 on different 
machines, with different central manager, that use this NFS server. no 
problems.

Is there a change in condor since 6.7.6 regarding chown-ing files to and 
from condor? I found in Schedlog of client (i change nambers to names):

8/3 10:04:25 (pid:29713) Error: Unable to chown 
'/home/condor/hosts/dc09/spool/cluster44.proc0.subproc0'
from condor to vetter.magic
8/3 10:04:25 (pid:29713) (44.0) Failed to chown 
/home/condor/hosts/dc09/spool/cluster44.proc0.subproc0 from condor 
to vetter.magic. Job may run into permissions problems when it starts.
8/3 10:04:25 (pid:29713) Error: Unable to chown 
'/home/condor/hosts/dc09/spool/cluster44.proc0.subproc0.tmp' from 
condor to vetter.magic
8/3 10:04:25 (pid:29713) (44.0) Failed to chown 
/home/condor/hosts/dc09/spool/cluster44.proc0.subproc0.tmp from condor to 
vetter.magic. Job may run into permissions problems when it starts.
8/3 10:04:25 (pid:29566) Starting add_shadow_birthdate(44.0)
8/3 10:04:25 (pid:29566) Started shadow for job 44.0 on 
"<132.187.47.29:18672>", (shadow pid = 29714)
8/3 10:04:25 (pid:29566) Shadow pid 29714 for job 44.0 exited with status 
100
8/3 10:04:25 (pid:29566) match (<1.2.3.4:18672>#1154544604#60) out 
of jobs (cluster id 44); relinquishing
8/3 10:04:25 (pid:29566) Sent RELEASE_CLAIM to startd on 
<132.187.47.29:18672>
8/3 10:04:25 (pid:29566) Match record (<1.2.3.4:18672>, 44, -1) 
deleted
8/3 10:04:25 (pid:29722) Error: Unable to chown 
'/home/condor/hosts/dc09/spool/cluster44.proc0.subproc0'
from vetter to condor.condor
8/3 10:04:25 (pid:29722) (44.0) Failed to chown 
/home/condor/hosts/dc09/spool/cluster44.proc0.subproc0 from vetter to 
condor.condor.  User may run into permissions problems when 
fetching sandbox.



-- 
Andreas Vetter				Tel: +49 (0)931 888-5890
Fakultaet fuer Physik und Astronomie	Fax: +49 (0)931 888-5508
Universitaet Wuerzburg