[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Re: Condor and PATH_MAX



On Mon, Feb 14, 2005 at 05:11:32PM -0600, Daniel Forrest wrote:
> Following up on my own post...
> 
> >> I have hit a problem with Condor and PATH_MAX.
> >> 
> >> When opening a file, if the path length is >= 239 and <= 243 or ==
> >> 246 then it exits with signal 11.  If the path length is >= 244 and
> >> <= 252 and != 246 then it goes into an infinite loop where the
> >> ShadowLog shows over and over again "Requesting Primary Starter",
> >> but there is never any indication of why the shadow exits.
> >> 
> >> I am guessing there is some kind of buffer overrun which is causing
> >> various kinds of problems depending on how much is overwritten.  My
> >> understanding is that Condor supports POSIX's PATH_MAX of 256.
> >> 
> >> This is Condor 6.6.2.  Is this a known problem?
> 
> Lookup through the release notes for 6.6.6 I see:
> 
>  Fixed a problem where the condor_starter could crash if the job it
>  was running used Condor's file transfer mechanism and the full path
>  names to the job's files became longer than a few hundred characters.
> 
> So I updated to 6.6.8 on Friday and relinked my executables.
> 
> Now if the path length is >= 242 and <= 246 it exits with signal 11.
> If the path length is >= 247 it goes into the infinite loop.  So the
> behavior has changed, but it isn't fixed.
> 

I know it's a problem, but I don't know the exact details. 

What's happening is internally, the remote syscall library rewrites
some of the pathnames to be slightly different URLs - instead of opening
'/tmp/foo', the syscall library opens things like 'remote:/tmp/foo'. The
'remote:' counts against your POSIX_PATH_MAX.

Where the details get sketchy for me is why we don't internally allocate
something like CONDOR_POSIX_PATH_MAX that is bigger than P_P_M, I'll ask
around.

-Erik