[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] New NFS warning with condor 6.8.1



On Thu September 28 2006 2:38 pm, Steven Timm wrote:
> I put condor 6.8.1 on my first few test nodes and submitted the same
> test vanilla universe job that I always do for testing.
>
> [timm@fnpcg ~]$ condor_submit recon1_1.run
> Submitting job(s)
> WARNING: Log file /home/timm/recon1.log.47070.0 is on NFS.
> This could cause log file corruption and is _not_ recommended.
> .
> Logging submit event(s).
> 1 job(s) submitted to cluster 47070.
>
>
> The log file in question is indeed on nfs, but it has been on nfs
> throughout the whole life of my cluster and I don't see why we
> are just now getting warnings about this.  There haven't been problems
> up until now.

This isn't a new problem, just a new warning about an old problem.

File locking on NFS is inherently unreliable.  We've seen enough cases of NFS 
based job logs getting corrupted (from multiple processes updating the log 
file) that we decided to add the warning.  I suspect that the risk of such 
corruption is reduced if all writers are on the same machine, possibly even 
eliminated, but I don't know for certain.  In particular, corrupted job logs 
tend to make DAGMan very unhappy.

Ultimately, we'd like to implement a more advanced locking mechanism (using a 
separate lock file), but we haven't had time to add this yet.

-Nick

-- 
           <<< There is no spoon. >>>
 /`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
 \    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
 |_*_|   608-265-5761                    Department of Computer Sciences