[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] mixed pool: NFS and non-NFS directories





Alain Roy wrote:

On Mar 21, 2008, at 2:59 PM, Ian Stokes-Rees wrote:
I am running OSG on a small (20 node) Condor pool with NFS and shared home directories for VOs. I am interested in finding out if it is possible and practical to add into the pool other execute nodes which don't have shared NFS or user home directories. Can anyone offer any tips or suggestions regarding this?

Condor is happy to run without a shared filesystem. If a user doesn't request file transfers for a job but, it is assumed that the files are on a shared filesystem. Condor detects if you share a filesystem by looking at FILESYSTEM_DOMAIN. Details are in the manual: I can point you to specifics if you need them. If a user specifics that files should be transferred "if needed", then FILESYSTEM_DOMAIN is used to decide if you are on the shared filesystem, or if the files need to be transmitted.

Out of the box, the OSG Globus installation assumes that Condor is using a shared filesystem, and when it submits jobs to Condor it doesn't tell Condor to transfer files. There is an alternate Condor job manager that does tell Condor to transfer files. Some VDT-specific documentation is at:
http://vdt.cs.wisc.edu/releases/1.8.1/notes/Globus-CondorNFSLite-Setup.html

I don't think it is a documented solution in OSG, but several sites are using it with success. You'll need to install it from the VDT cache instead of the OSG cache.

Note that the NFSLite configuration does not remove all dependence on a shared filesystem in OSG. Typical OSG jobs rely on software pre-installed by the grid user in $OSG_APP, which is assumed to be readable from all of the worker nodes. However, if you only wish to support specific OSG jobs that you know do not depend on $OSG_APP, then you can get by without it. I suppose you could also rsync it to a local disk on the worker nodes at the risk of a temporary condition in which new software has been installed but is not yet available from some nodes. The shared writable $OSG_DATA cannot so easily be faked without a shared filesystem, but it is rarely used by OSG jobs, in my experience. Also note that the worker node client $OSG_GRID is assumed to be accessible from all worker nodes. Since this is just a one-time installation, it can simply be installed locally on the worker nodes without any problem.

--Dan