[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Some jobs held on HTcondor 8.0



On 7/11/2013 11:38 AM, Russell Poyner wrote:
 From earlier testing I think some jobs would be held and others succeed
on the same execute node.

Would be great to confirm this...

The current collection of machines should be
nearly identical in that they all implement the same policy from the
CMS. Specifially the hard mounts are the same, and the maps for autofs
are the same.


Good to know - does not rule out the possibility that mounts on some machines are stale, but at least we don't have to worry about multiple file system domains.

autofs mount latency might be an issue since /home/user resides on a SAN
that sometimes has latency issues. However, that didn't seem to be a
problem when this same group of machines was running condor 7.4.4 which
was our previous version.


So /home/user is indeed an autofs mount?

Just to rule out the possibility of HTCondor's fancy new file namespaces mechanism conflicting with autofs (which is something we've encountered in the past but supposedly improved), please try putting the following into your condor_config on all execute nodes (or all nodes):

  PER_JOB_NAMESPACES = False

and let me know if that improves things. Note that the per-job-namespace stuff did not exist in v7.4.4, so it was effectively off back then :).

Thanks,
Todd