[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] startd hangs when using job hooks



Michael Moore wrote:
I am trying to implement a set of fetch and prepare hooks. However, when testing the hooks I experience hangs of condor_startd. When startd hangs it quits responding to requests and condor shutdowns. Only a process level kill ends the process.

The host running the hooks is a Windows Vista host running Condor 7.4.1. The prepare hook does take some time to run (on the order of minutes). However, startd does not always hang during the prepare hook. Sometimes startd hangs after the job begins executing, sometimes it doesn't hang at all.

Has anyone else seen similar behavior? Was there a way to work around the problem? Apparently, there was a similar problem in 7.3.2 and prior where a very simple fetch hook would cause startd to hang. I haven't figured out what portion of the hook triggers this behavior, it's very intermittent. Thanks,
Michael Moore

A few issues with hooks on Windows...

http://condor-wiki.cs.wisc.edu/index.cgi/search?s=hook+windows

Specifically...

http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=422
http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=864

Do either of those sound like your problem?

I believe one of those is related to using Windows on a machine with many CPUs -- or at least it is more reproducible there.

Best,


matt