[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs are not spread across various machines



On Jan 31, 2005, at 11:24 AM, Alain Roy wrote:


However, when I submit some jobs, they run on only on the node they are
submitted on.

The most common reason that I have seen is that Condor has been configured to have a different FILESYSTEM_DOMAIN on each machine--it's set to be the full hostname. If you don't tell Condor to transfer files, it assumes that there isn't a shared filesystem between computers (like NFS) if the FILESYSTEM_DOMAINs on two machines are different.


So: if you have a shared filesystem, FILESYSTEM_DOMAIN should be the same. If you don't have a shared filesystem, you need to tell Condor to transfer files.

What about standard universe jobs?

On my cluster, if I set FILESYSTEM_DOMAIN to the local network domain, Condor runs standard jobs just fine -- and we don't have a common filesystem. (setting FILESYSTEM_DOMAIN to the full hostname, however, causes the problem that Raghu describes).

-Tim