[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor on NFS



Hi. I currently have the Condor home directory shared by NFS to all members of our cluster. This is great for centralized configuration. However, it seems that even a momentary NFS outage (<1-2min) is enough to kill all jobs. They do restart when NFS comes back.

We use NFS over UDP so that clients are able to withstand server reboots with mount options "hard" and "intr" to be sure that jobs simply hang until the server comes back. Rather than waiting, Condor kills the jobs. Is there a configurable timeout I should have set. How can I otherwise make Condor resilient to such NFS outages?

Thanks,
Jacob