[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [Filter Test: P272621] Re: [Filter Test:P272621] Startd's crashing with fatal error gettingprocess info for starter and descendants



On Wed, Jul 05, 2006 at 04:01:12PM -0400, Ian Chesal wrote:
>> 
>> Add a third request to this:
>> 
>> The startd shouldn't die like this when it can't gather process
stats.
>> That doesn't really seem like an I-should-die-and-take-down-my-jobs
kind
>> of situation. Warnings are fine.
>
> Really? What if the user job is violating the local policy
expressions, 
> say because it's driving load up very high and using all of the
memory.
>
> If a daemon doesn't know what it's doing, and if there's a chance that
> what it is doing is causing harm, it seems like the safe thing to do
is
> to shut down. 

Ff there is a policy to enforce then asserting is a reasonable action,
but in this case there's no such policy. It doesn't bother me if the
daemons are getting starved because a job is running at really high
priority. It's a dedicated compute node. If it were controllable via
policy that'd be great.

- Ian