[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] startd hangs when using job hooks



On Wed, Mar 03, 2010 at 11:24:28AM -0500, Ian Chesal wrote:
> Hi Michael,
> 
> > In my case the issue mysteriously resolved itself by changing the way
> > STDIN was read in the fetch script. I had been doing, in Perl, a join()
> > on STDIN. When I switch to using a while(<STDIN>) and appending the
> > input to a temporary string the issue went away. Unfortunately, I was
> > not able to create a simple case where I could isolate what was causing
> > the issue. In testing just a join() versus a while() in a fetch script
> > it didn't exhibit the startd hang in either case.
> 
> I'm going to spend some time this afternoon with 7.4.1 trying to
> isolate this. I'm reading STDIN with:
> 
> parse_condor_slot_information_and_populate_global_vars(<STDIN>);

That is almost identical to what I was doing except I passed 
join(<STDIN>) to another function. Locally grabbing stdin (as it sounds 
like you are going to do) is what made it magically work.

> 
> Which is essentially the same as:
> 
> my @array = <STDIN>;
> my_function(@array);
> 
> So I'm pulling all STDIN into an array first. I'll try modifying my
> function so it reads STDIN in a while loop instead. Thanks for the
> tip.
> 
> What's weird is that, looking at my hook file's log output, I can see
> hooks trying to hand off work to Condor. But only 3 out of 8 of them
> try and Condor never seems to get the work. I'm just print'ing the
> class ad to STDOUT. You?

For me, testing on 1 slot and 4 slots, the fetch hook would return work 
and the starter would correctly execute the work. However, at some point 
one of the fetch hooks would cause the pipeFullWrite() error and startd 
would shortly become hung. With 7.4.1 it seemed to occur immediately or 
within the first 4 fetch hooks.

Michael
> 
> > As an additional note, I was seeing the exact same errors as the
> > previous bug hanging startd with just the simple 'exit 0' fetch hook.
> 
> - Ian
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/