[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] NFS mounted home dirs





On 10/10/2016 15:50, Winnie Lacesso wrote:
Dear HTCondor gurus,

In traditional LHC ComputingGrid, users mapped to VO-pool accounts had a
local home dir made everywhere, an empty stub sort of thing.

But in our HTCondor cluster, local users can submit jobs as themselves, &
their home dirs are NFS-mounted to all the WN. Sometimes those NFS servers
need a reboot. What happens to a running HTCondor job of a user?

Does it just "pause" while the WN logs "NFS server calgary.phy not
responding, till trying.... ditto... NFS server calgary.phy ok"
then all is well?

or: is the job more negatively impacted = gets discombobulated & goes
strange, or might die?

Much obliged for any enlightenment or pointers.

Hello Winnie,

We have a similar setup (home directories served over NFS4 from two
servers).

Condor deals well with NFS servers not being available
for a "short" time, short like enough for a reboot.  Running jobs
just hang whenever they need data which is not available, and
pick up again afterwards.

So unless you plan on very frequent rebooting of the NFS server(s),
I'd not be worried.

Greetings, B.