Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [Filter Test: P272621] Re: [Filter Test:P272621] Startd's crashing with fatal error gettingprocess info for starter and descendants

Date: Wed, 5 Jul 2006 16:10:55 -0400
From: "Ian Chesal" <ICHESAL@xxxxxxxxxx>
Subject: Re: [Condor-users] [Filter Test: P272621] Re: [Filter Test:P272621] Startd's crashing with fatal error gettingprocess info for starter and descendants

On Wed, Jul 05, 2006 at 04:01:12PM -0400, Ian Chesal wrote:
>> 
>> Add a third request to this:
>> 
>> The startd shouldn't die like this when it can't gather process
stats.
>> That doesn't really seem like an I-should-die-and-take-down-my-jobs
kind
>> of situation. Warnings are fine.
>
> Really? What if the user job is violating the local policy
expressions, 
> say because it's driving load up very high and using all of the
memory.
>
> If a daemon doesn't know what it's doing, and if there's a chance that
> what it is doing is causing harm, it seems like the safe thing to do
is
> to shut down. 

Ff there is a policy to enforce then asserting is a reasonable action,
but in this case there's no such policy. It doesn't bother me if the
daemons are getting starved because a job is running at really high
priority. It's a dedicated compute node. If it were controllable via
policy that'd be great.

- Ian

References:
- Re: [Condor-users] [Filter Test: P272621] Re: [Filter Test: P272621] Startd's crashing with fatal error getting process info for starter and descendants
  - From: Erik Paulson

Prev by Date: Re: [Condor-users] [Filter Test: P272621] Re: [Filter Test: P272621] Startd's crashing with fatal error getting process info for starter and descendants
Next by Date: [Condor-users] Virtual Condor on Novell OES Linux
Previous by thread: Re: [Condor-users] [Filter Test: P272621] Re: [Filter Test: P272621] Startd's crashing with fatal error getting process info for starter and descendants
Next by thread: [Condor-users] How To TroubleShoot Flocking
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] [Filter Test: P272621] Re: [Filter Test:P272621] Startd's crashing with fatal error gettingprocess info for starter and descendants