[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] startd hangs when using job hooks



>> What's weird is that, looking at my hook file's log output, I can see
>> hooks trying to hand off work to Condor. But only 3 out of 8 of them
>> try and Condor never seems to get the work. I'm just print'ing the
>> class ad to STDOUT. You?
>
> For me, testing on 1 slot and 4 slots, the fetch hook would return work
> and the starter would correctly execute the work. However, at some point
> one of the fetch hooks would cause the pipeFullWrite() error and startd
> would shortly become hung. With 7.4.1 it seemed to occur immediately or
> within the first 4 fetch hooks.

So you're still seeing the pipe errors no matter how you hand off work
on Windows? Nuts.

On my 7.2.2 live farms now, my Windows machines start up and when the
hook runs they immediately pass of a job that runs my hook in an
infinite loop. :) It's complete cheating but it seems to avoid the
pipe error issues. The infinite loop has to do more work (clean that
Condor's starter process would otherwise do) and it has to check the
classad of the machine periodically to make sure it's not trying to be
shut down, but it does seem to keep me away from that pipe error.

It also has the advantage of running my hook script as a domain user,
instead of SYSTEM so I can mount drives and use my shared tool drive
instead of having to push lots of stuff local to the machine (like a
perl distribution and modules and such) just to fetch new jobs.

Would be nice if you could run Windows hooks as a domain user...

- Ian