[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Trivial jobs occasionally running for hours



(inline your trimmed top post)

On 10/13/2010 12:14 PM, Paul Haldane wrote:
[I wouldn't normally top-post but best I can do without losing
replies]

Thanks - Mark Calleja pointed me at the error that I was consistently
overlooking.  I'd also had had one of those "doh" moments shortly
after sending the original message and had spotted the (now very
obvious) error in the logs - "Sock::bindWithin - failed to bind any
port within (9600 ~ 9700)".

Yeah, that's no fun. If you're using LOWPORT/HIGHPORT you should consider IN_LOWPORT/IN_HIGHPORT.

http://spinningmatt.wordpress.com/2010/08/08/firewalling-execute-nodes-avoid-lowporthighport-use-in_lowportin_highport/


I've changed the range to 9600-19700 and restarted.  I don't get the
errors but first attempt only ran one job at a time and second isn't
running any (though they're all nicely queued).  I suspect this is a
completely unrelated problem.

I have wondered whether as you suggest using trivial jobs for testing
is unfair on the system.  I should probably set up a more substantial
test job (and be more patient).

Thanks Paul

Trivial jobs aren't really unfair, but if you want to stress the system with short jobs you should run the 7.5 series. It has a number of optimizations for shorter jobs, including recycling shadows to avoid process management overhead.


Best,


matt